1
|
Vemuri K, Kumar S, Chen L, Verzi MP. Dynamic RNA polymerase II occupancy drives differentiation of the intestine under the direction of HNF4. Cell Rep 2024; 43:114242. [PMID: 38768033 PMCID: PMC11264335 DOI: 10.1016/j.celrep.2024.114242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 04/03/2024] [Accepted: 05/02/2024] [Indexed: 05/22/2024] Open
Abstract
Terminal differentiation requires massive restructuring of the transcriptome. During intestinal differentiation, the expression patterns of nearly 4,000 genes are altered as cells transition from progenitor cells in crypts to differentiated cells in villi. We identify dynamic occupancy of RNA polymerase II (Pol II) to gene promoters as the primary driver of transcriptomic shifts during intestinal differentiation in vivo. Changes in enhancer-promoter looping interactions accompany dynamic Pol II occupancy and are dependent upon HNF4, a pro-differentiation transcription factor. Using genetic loss-of-function, chromatin immunoprecipitation sequencing (ChIP-seq), and immunoprecipitation (IP) mass spectrometry, we demonstrate that HNF4 collaborates with chromatin remodelers and loop-stabilizing proteins and facilitates Pol II occupancy at hundreds of genes pivotal to differentiation. We also explore alternate mechanisms that drive differentiation gene expression and find that pause-release of Pol II and post-transcriptional mRNA stability regulate smaller subsets of differentially expressed genes. These studies provide insights into the mechanisms of differentiation in renewing adult tissue.
Collapse
Affiliation(s)
- Kiranmayi Vemuri
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA
| | - Sneha Kumar
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA
| | - Lei Chen
- School of Life Science and Technology, Key Laboratory of Developmental Genes and Human Disease, Southeast University, Nanjing 210096, China
| | - Michael P Verzi
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA; Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08903, USA; Rutgers Center for Lipid Research, New Jersey Institute for Food, Nutrition & Health, Rutgers University, New Brunswick, NJ 08901, USA; NIEHS Center for Environmental Exposures and Disease (CEED), Rutgers EOHSI, Piscataway, NJ 08854, USA.
| |
Collapse
|
2
|
Koopmans F. GOAT: efficient and robust identification of gene set enrichment. Commun Biol 2024; 7:744. [PMID: 38898151 PMCID: PMC11187187 DOI: 10.1038/s42003-024-06454-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 06/14/2024] [Indexed: 06/21/2024] Open
Abstract
Gene set enrichment analysis is foundational to the interpretation of high throughput biology. Identifying enriched Gene Ontology (GO) terms or disease-associated gene sets within a list of gene effect sizes that represent experimental outcomes is an everyday task in life science that crucially depends on robust and sensitive statistical tools. We here present GOAT, a parameter-free algorithm for gene set enrichment analysis of preranked gene lists. The algorithm can precompute null distributions from standardized gene scores, enabling enrichment testing of the GO database in one second. Validations using synthetic data show that estimated gene set p-values are well calibrated under the null hypothesis and invariant to gene list length and gene set size. Application to various real-world proteomics and gene expression studies demonstrates that GOAT identifies more significant GO terms as compared to current methods. GOAT is freely available as an R package and user-friendly online tool for gene set enrichment analyses that includes interactive data visualizations: https://ftwkoopmans.github.io/goat .
Collapse
Affiliation(s)
- Frank Koopmans
- Department of Molecular and Cellular Neurobiology, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, VU University, 1081 HV, Amsterdam, The Netherlands.
| |
Collapse
|
3
|
Candia J, Ferrucci L. Assessment of Gene Set Enrichment Analysis using curated RNA-seq-based benchmarks. PLoS One 2024; 19:e0302696. [PMID: 38753612 PMCID: PMC11098418 DOI: 10.1371/journal.pone.0302696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 04/09/2024] [Indexed: 05/18/2024] Open
Abstract
Pathway enrichment analysis is a ubiquitous computational biology method to interpret a list of genes (typically derived from the association of large-scale omics data with phenotypes of interest) in terms of higher-level, predefined gene sets that share biological function, chromosomal location, or other common features. Among many tools developed so far, Gene Set Enrichment Analysis (GSEA) stands out as one of the pioneering and most widely used methods. Although originally developed for microarray data, GSEA is nowadays extensively utilized for RNA-seq data analysis. Here, we quantitatively assessed the performance of a variety of GSEA modalities and provide guidance in the practical use of GSEA in RNA-seq experiments. We leveraged harmonized RNA-seq datasets available from The Cancer Genome Atlas (TCGA) in combination with large, curated pathway collections from the Molecular Signatures Database to obtain cancer-type-specific target pathway lists across multiple cancer types. We carried out a detailed analysis of GSEA performance using both gene-set and phenotype permutations combined with four different choices for the Kolmogorov-Smirnov enrichment statistic. Based on our benchmarks, we conclude that the classic/unweighted gene-set permutation approach offered comparable or better sensitivity-vs-specificity tradeoffs across cancer types compared with other, more complex and computationally intensive permutation methods. Finally, we analyzed other large cohorts for thyroid cancer and hepatocellular carcinoma. We utilized a new consensus metric, the Enrichment Evidence Score (EES), which showed a remarkable agreement between pathways identified in TCGA and those from other sources, despite differences in cancer etiology. This finding suggests an EES-based strategy to identify a core set of pathways that may be complemented by an expanded set of pathways for downstream exploratory analysis. This work fills the existing gap in current guidelines and benchmarks for the use of GSEA with RNA-seq data and provides a framework to enable detailed benchmarking of other RNA-seq-based pathway analysis tools.
Collapse
Affiliation(s)
- Julián Candia
- Longitudinal Studies Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, United States of America
| | - Luigi Ferrucci
- Longitudinal Studies Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, United States of America
| |
Collapse
|
4
|
Mahzarnia A, Lutz MW, Badea A. A Continuous Extension of Gene Set Enrichment Analysis Using the Likelihood Ratio Test Statistics Identifies Vascular Endothelial Growth Factor as a Candidate Pathway for Alzheimer's Disease via ITGA5. J Alzheimers Dis 2024; 97:635-648. [PMID: 38160360 PMCID: PMC10836573 DOI: 10.3233/jad-230934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/01/2023] [Indexed: 01/03/2024]
Abstract
BACKGROUND Alzheimer's disease (AD) involves brain neuropathologies such as amyloid plaque and hyperphosphorylated tau tangles and is accompanied by cognitive decline. Identifying the biological mechanisms underlying disease onset and progression based on quantifiable phenotypes will help understand disease etiology and devise therapies. OBJECTIVE Our objective was to identify molecular pathways associated with hallmark AD biomarkers and cognitive status, accounting for variables such as age, sex, education, and APOE genotype. METHODS We introduce a pathway-based statistical approach, extending the gene set likelihood ratio test to continuous phenotypes. We first analyzed independently each of the three phenotypes (amyloid-β, tau, cognition) using continuous gene set likelihood ratio tests to account for covariates, including age, sex, education, and APOE genotype. The analysis involved 634 subjects with data available for all three phenotypes, allowing for the identification of common pathways. RESULTS We identified 14 pathways significantly associated with amyloid-β; 5 associated with tau; and 174 associated with cognition, which showed a larger number of pathways compared to biomarkers. A single pathway, vascular endothelial growth factor receptor binding (VEGF-RB), exhibited associations with all three phenotypes. Mediation analysis showed that among the VEGF-RB family genes, ITGA5 mediates the relationship between cognitive scores and pathological biomarkers. CONCLUSIONS We presented a new statistical approach linking continuous phenotypes, gene expression across pathways, and covariates like sex, age, and education. Our results reinforced VEGF RB2's role in AD cognition and demonstrated ITGA5's significant role in mediating the AD pathology-cognition connection.
Collapse
Affiliation(s)
- Ali Mahzarnia
- Department of Radiology, Duke University School of Medicine, Durham, NC, USA
| | - Michael W. Lutz
- Department of Neurology, Duke University School of Medicine, Durham, NC, USA
| | - Alexandra Badea
- Department of Radiology, Duke University School of Medicine, Durham, NC, USA
- Department of Neurology, Duke University School of Medicine, Durham, NC, USA
- Biomedical Engineering, Duke University, Durham, NC, USA
- Brain Imaging and Analysis Center, Duke University School of Medicine, Durham, NC, USA
| |
Collapse
|
5
|
Vemuri K, Kumar S, Chen L, Verzi MP. Dynamic RNA Polymerase II Recruitment Drives Differentiation of the Intestine under the direction of HNF4. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.08.566322. [PMID: 37986803 PMCID: PMC10659318 DOI: 10.1101/2023.11.08.566322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Terminal differentiation requires a massive restructuring of the transcriptome. During intestinal differentiation, the expression patterns of nearly 4000 genes are altered as cells transition from progenitor cells in crypts to differentiated cells in villi. We identified dynamic recruitment of RNA Polymerase II (Pol II) to gene promoters as the primary driver of transcriptomic shifts during intestinal differentiation in vivo. Changes in enhancer-promoter looping interactions accompany dynamic Pol II recruitment and are dependent upon HNF4, a pro-differentiation transcription factor. Using genetic loss-of- function, ChIP-seq and IP mass spectrometry, we demonstrate that HNF4 collaborates with chromatin remodelers and loop-stabilizing proteins and facilitates Pol II recruitment at hundreds of genes pivotal to differentiation. We also explore alternate mechanisms which drive differentiation gene expression and find pause-release of Pol II and post- transcriptional mRNA stability regulate smaller subsets of differentially expressed genes. These studies provide insights into the mechanisms of differentiation in a renewing adult tissue.
Collapse
Affiliation(s)
- Kiranmayi Vemuri
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA
| | - Sneha Kumar
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA
| | - Lei Chen
- School of Life Science and Technology, Key Laboratory of Developmental Genes and Human Disease, Southeast University, Nanjing 210096, China
| | - Michael P. Verzi
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08903, USA
- Rutgers Center for Lipid Research, New Jersey Institute for Food, Nutrition & Health, Rutgers University, New Brunswick, NJ 08901, USA
- NIEHS Center for Environmental Exposures and Disease (CEED), Rutgers EOHSI Piscataway, NJ 08854, USA
- Lead Contact
| |
Collapse
|
6
|
Chen L, Qiu X, Dupre A, Pellon-Cardenas O, Fan X, Xu X, Rout P, Walton KD, Burclaff J, Zhang R, Fang W, Ofer R, Logerfo A, Vemuri K, Bandyopadhyay S, Wang J, Barbet G, Wang Y, Gao N, Perekatt AO, Hu W, Magness ST, Spence JR, Verzi MP. TGFB1 induces fetal reprogramming and enhances intestinal regeneration. Cell Stem Cell 2023; 30:1520-1537.e8. [PMID: 37865088 PMCID: PMC10841757 DOI: 10.1016/j.stem.2023.09.015] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 07/03/2023] [Accepted: 09/28/2023] [Indexed: 10/23/2023]
Abstract
The gut epithelium has a remarkable ability to recover from damage. We employed a combination of high-throughput sequencing approaches, mouse genetics, and murine and human organoids and identified a role for TGFB signaling during intestinal regeneration following injury. At 2 days following irradiation (IR)-induced damage of intestinal crypts, a surge in TGFB1 expression is mediated by monocyte/macrophage cells at the location of damage. The depletion of macrophages or genetic disruption of TGFB signaling significantly impaired the regenerative response. Intestinal regeneration is characterized by the induction of a fetal-like transcriptional signature during repair. In organoid culture, TGFB1 treatment was necessary and sufficient to induce the fetal-like/regenerative state. Mesenchymal cells were also responsive to TGFB1 and enhanced the regenerative response. Mechanistically, pro-regenerative factors, YAP/TEAD and SOX9, are activated in the epithelium exposed to TGFB1. Finally, pre-treatment with TGFB1 enhanced the ability of primary epithelial cultures to engraft into damaged murine colon, suggesting promise for cellular therapy.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Science and Technology, Key Laboratory of Developmental Genes and Human Disease, Southeast University, Nanjing 210096, China.
| | - Xia Qiu
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 00854, USA
| | - Abigail Dupre
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 00854, USA
| | - Oscar Pellon-Cardenas
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 00854, USA
| | - Xiaojiao Fan
- School of Life Science and Technology, Key Laboratory of Developmental Genes and Human Disease, Southeast University, Nanjing 210096, China
| | - Xiaoting Xu
- School of Life Science and Technology, Key Laboratory of Developmental Genes and Human Disease, Southeast University, Nanjing 210096, China
| | - Prateeksha Rout
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 00854, USA
| | - Katherine D Walton
- Department of Internal Medicine, Division of Gastroenterology, University of Michigan Medical School, Ann Arbor, MI 48109, USA; Department of Cell and Developmental Biology, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Joseph Burclaff
- Joint Department of Biomedical Engineering, University of North Carolina at Chapel Hill, and North Carolina State University, Chapel Hill, NC 27695, USA; Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599, USA
| | - Ruolan Zhang
- School of Life Science and Technology, Key Laboratory of Developmental Genes and Human Disease, Southeast University, Nanjing 210096, China
| | - Wenxin Fang
- School of Life Science and Technology, Key Laboratory of Developmental Genes and Human Disease, Southeast University, Nanjing 210096, China
| | - Rachel Ofer
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 00854, USA
| | - Alexandra Logerfo
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 00854, USA
| | - Kiranmayi Vemuri
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 00854, USA
| | - Sheila Bandyopadhyay
- Department of Biological Sciences, Rutgers University-Newark, Newark, NJ 07102, USA
| | - Jianming Wang
- Department of Radiation Oncology, Rutgers Cancer Institute of New Jersey, Rutgers University-New Brunswick, New Brunswick, NJ 08903, USA
| | - Gaetan Barbet
- Child Health Institute of New Jersey, Rutgers University-New Brunswick, New Brunswick, NJ 08901, USA
| | - Yan Wang
- Center for Translation Medicine Research and Development, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Nan Gao
- Department of Biological Sciences, Rutgers University-Newark, Newark, NJ 07102, USA
| | - Ansu O Perekatt
- Department of Chemistry and Chemical Biology, Stevens Institute of Technology, Hoboken, NJ 07030, USA
| | - Wenwei Hu
- Department of Radiation Oncology, Rutgers Cancer Institute of New Jersey, Rutgers University-New Brunswick, New Brunswick, NJ 08903, USA
| | - Scott T Magness
- Joint Department of Biomedical Engineering, University of North Carolina at Chapel Hill, and North Carolina State University, Chapel Hill, NC 27695, USA; Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599, USA
| | - Jason R Spence
- Department of Internal Medicine, Division of Gastroenterology, University of Michigan Medical School, Ann Arbor, MI 48109, USA; Department of Cell and Developmental Biology, University of Michigan Medical School, Ann Arbor, MI 48109, USA; Department of Biomedical Engineering, University of Michigan College of Engineering, Ann Arbor, MI 48109, USA
| | - Michael P Verzi
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 00854, USA; Rutgers Cancer Institute of New Jersey, Rutgers University-New Brunswick, New Brunswick, NJ 08903, USA; Rutgers Center for Lipid Research, New Jersey Institute for Food, Nutrition, and Health, Rutgers University-New Brunswick, New Brunswick, NJ 08901, USA; NIEHS Center for Environmental Exposures and Disease (CEED), Rutgers EOHSI, Piscataway, NJ 08854, USA.
| |
Collapse
|
7
|
McGovern KC, Nixon MP, Silverman JD. Addressing erroneous scale assumptions in microbe and gene set enrichment analysis. PLoS Comput Biol 2023; 19:e1011659. [PMID: 37983251 PMCID: PMC10695402 DOI: 10.1371/journal.pcbi.1011659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 12/04/2023] [Accepted: 11/04/2023] [Indexed: 11/22/2023] Open
Abstract
By applying Differential Set Analysis (DSA) to sequence count data, researchers can determine whether groups of microbes or genes are differentially enriched. Yet sequence count data suffer from a scale limitation: these data lack information about the scale (i.e., size) of the biological system under study, leading some authors to call these data compositional (i.e., proportional). In this article, we show that commonly used DSA methods that rely on normalization make strong, implicit assumptions about the unmeasured system scale. We show that even small errors in these scale assumptions can lead to positive predictive values as low as 9%. To address this problem, we take three novel approaches. First, we introduce a sensitivity analysis framework to identify when modeling results are robust to such errors and when they are suspect. Unlike standard benchmarking studies, this framework does not require ground-truth knowledge and can therefore be applied to both simulated and real data. Second, we introduce a statistical test that provably controls Type-I error at a nominal rate despite errors in scale assumptions. Finally, we discuss how the impact of scale limitations depends on a researcher's scientific goals and provide tools that researchers can use to evaluate whether their goals are at risk from erroneous scale assumptions. Overall, the goal of this article is to catalyze future research into the impact of scale limitations in analyses of sequence count data; to illustrate that scale limitations can lead to inferential errors in practice; yet to also show that rigorous and reproducible scale reliant inference is possible if done carefully.
Collapse
Affiliation(s)
- Kyle C. McGovern
- Program in Bioinformatics and Genomics, Pennsylvania State University, State College, Pennsylvania, United States of America
| | - Michelle Pistner Nixon
- College of Information Sciences and Technology, Pennsylvania State University, State College, Pennsylvania, United States of America
| | - Justin D. Silverman
- Program in Bioinformatics and Genomics, Pennsylvania State University, State College, Pennsylvania, United States of America
- College of Information Sciences and Technology, Pennsylvania State University, State College, Pennsylvania, United States of America
- Departments of Medicine and Statistics, Pennsylvania State University, State College, Pennsylvania, United States of America
- Institute for Computational and Data Science, Pennsylvania State University, State College, Pennsylvania, United States of America
| |
Collapse
|
8
|
Zhu S, Liu N, Gong H, Liu F, Yan G. Identification of biomarkers and sex differences in the placenta of fetal growth restriction. J Obstet Gynaecol Res 2023; 49:2324-2336. [PMID: 37553225 DOI: 10.1111/jog.15735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 06/20/2023] [Indexed: 08/10/2023]
Abstract
AIM Fetal growth restriction (FGR) can lead to short-term and long-term impairments in the fetus. The placenta functions as an exchanger for substance transport, playing a critical role in fetal growth. However, the mechanism from the placental standpoint is still not fully understood. In this study, we aimed to investigate the pathophysiological mechanisms in the placenta that mediated the development of FGR and sex differences. METHODS We analyzed the gene expression profiles of GSE100415 containing specific normotensive FGR placental samples and GSE114691 with canonical samples using three different methods, differentially expressed gene analysis, weighted gene co-expression network analysis, and gene set enrichment analysis. Gene enrichment was performed, including the gene ontology and pathway from the Kyoto Encyclopedia of Genes and Genomes. The important process was then validated in pregnant Wistar rats subcutaneously administered dexamethasone (0.2 mg/kg/d) or saline from gestation Day 9 to 21. RESULTS Our results revealed little difference between the comparison of normal and normotensive FGR placental samples but confirmed the sex difference. Further analyses of the canonical samples identified the occurrence of vascular dysfunction, which was validated by the calculation of the vascular lumen area, showing that the vascular lumen in the FGR group was more than in the control. We also discovered 17 significantly expressed genes from the involved eigengenes. CONCLUSION Our study provides an important theoretical and experimental basis to reevaluate the development of FGR from the placental standpoint and suggests a series of biomarkers for future clinical use.
Collapse
Affiliation(s)
- Sha Zhu
- Department of Obstetrics and Gynecology, Hubei Provincial Hospital of Integrated Chinese and Western Medicine, Wuhan, Hubei, China
| | - Niying Liu
- Department of Obstetrics and Gynecology, Hubei Provincial Hospital of Integrated Chinese and Western Medicine, Wuhan, Hubei, China
| | - Hongjun Gong
- Department of Obstetrics and Gynecology, Hubei Province Dongxihu District Maternal and Child Health Care Hospital, Wuhan, Hubei, China
| | - Fulin Liu
- Sichuan Provincial Key Laboratory for Human Disease Gene Study, Center for Medical Genetics, Department of Laboratory Medicine, Sichuan Academy of Medical Sciences & Sichuan Provincial People's Hospital, University of Electronic Science and Technology, Chengdu, Sichuan, China
- Research Unit for Blindness Prevention, Chinese Academy of Medical Sciences (2019RU026), Sichuan Academy of Medical Sciences & Sichuan Provincial People's Hospital, Chengdu, Sichuan, China
- Université Paris Cité, Paris, France
| | - Ge Yan
- Department of Obstetrics and Gynecology, Hubei Province Dongxihu District Maternal and Child Health Care Hospital, Wuhan, Hubei, China
| |
Collapse
|
9
|
Mahzarnia A, Lutz MW, Badea A. A Continuous Extension of Gene Set Enrichment Analysis using the Likelihood Ratio Test Statistics Identifies VEGF as a Candidate Pathway for Alzheimer's disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.22.554319. [PMID: 37662249 PMCID: PMC10473614 DOI: 10.1101/2023.08.22.554319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Background Alzheimer's disease involves brain pathologies such as amyloid plaque depositions and hyperphosphorylated tau tangles and is accompanied by cognitive decline. Identifying the biological mechanisms underlying disease onset and progression based on quantifiable phenotypes will help understand the disease etiology and devise therapies. Objective Our objective was to identify molecular pathways associated with AD biomarkers (Amyloid-β and tau) and cognitive status (MMSE) accounting for variables such as age, sex, education, and APOE genotype. Methods We introduce a novel pathway-based statistical approach, extending the gene set likelihood ratio test to continuous phenotypes. We first analyzed independently each of the three phenotypes (Amyloid-β, tau, cognition), using continuous gene set likelihood ratio tests to account for covariates, including age, sex, education, and APOE genotype. The analysis involved a large sample size with data available for all three phenotypes, allowing for the identification of common pathways. Results We identified 14 pathways significantly associated with Amyloid-β, 5 associated with tau, and 174 associated with MMSE. Surprisingly, the MMSE outcome showed a larger number of significant pathways compared to biomarkers. A single pathway, vascular endothelial growth factor receptor binding (VEGF-RB), exhibited significant associations with all three phenotypes. Conclusions The study's findings highlight the importance of the VEGF signaling pathway in aging in AD. The complex interactions within the VEGF signaling family offer valuable insights for future therapeutic interventions.
Collapse
|
10
|
Chen S, Zhou Z, Wang Y, Chen S, Jiang J. Machine learning-based identification of cuproptosis-related markers and immune infiltration in severe community-acquired pneumonia. THE CLINICAL RESPIRATORY JOURNAL 2023; 17:618-628. [PMID: 37279744 PMCID: PMC10363779 DOI: 10.1111/crj.13633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 04/24/2023] [Accepted: 05/06/2023] [Indexed: 06/08/2023]
Abstract
BACKGROUND Severe community-acquired pneumonia (SCAP) is one of the world's most common diseases and a major etiology of acute respiratory distress syndrome (ARDS). Cuproptosis is a novel form of regulated cell death that can occur in various diseases. METHODS Our study explored the degree of immune cell infiltration during the onset of severe CAP and identified potential biomarkers related to cuproptosis. Gene expression matrix was obtained from GEO database indexed GSE196399. Three machine learning algorithms were applied: The least absolute shrinkage and selection operator (LASSO), the random forest, and the support vector machine-recursive feature elimination (SVM-RFE). Immune cell infiltration was quantified by single-sample gene set enrichment analysis (ssGSEA) scoring. Nomogram was constructed to verify the applicability of using cuproptosis-related genes to predict the onset of severe CAP and its deterioration toward ARDS. RESULTS Nine cuproptosis-related genes were differentially expressed between the severe CAP group and the control group: ATP7B, DBT, DLAT, DLD, FDX1, GCSH, LIAS, LIPT1, and SLC31A1. All 13 cuproptosis-related genes were involved in immune cell infiltration. A three-gene diagnostic model was constructed to predict the onset of severe CAP: GCSH, DLD, and LIPT1. CONCLUSION Our study confirmed the involvement of the newly discovered cuproptosis-related genes in the progression of SCAP.
Collapse
Affiliation(s)
- Shuyang Chen
- Department of Pulmonary and Critical Care Medicine, Zhongshan HospitalFudan UniversityShanghaiChina
| | - Zheng Zhou
- Department of Pulmonary and Critical Care Medicine, Zhongshan HospitalFudan UniversityShanghaiChina
| | - Yajun Wang
- Department of Pulmonary and Critical Care Medicine, Zhongshan HospitalFudan UniversityShanghaiChina
| | - Shujing Chen
- Department of Pulmonary and Critical Care Medicine, Zhongshan HospitalFudan UniversityShanghaiChina
| | - Jinjun Jiang
- Shanghai Respiratory Research Institute, Zhongshan Hospital, Fudan UniversityShanghaiChina
| |
Collapse
|
11
|
Zhao K, Rhee SY. Interpreting omics data with pathway enrichment analysis. Trends Genet 2023; 39:308-319. [PMID: 36750393 DOI: 10.1016/j.tig.2023.01.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 11/24/2022] [Accepted: 01/13/2023] [Indexed: 02/09/2023]
Abstract
Pathway enrichment analysis is indispensable for interpreting omics datasets and generating hypotheses. However, the foundations of enrichment analysis remain elusive to many biologists. Here, we discuss best practices in interpreting different types of omics data using pathway enrichment analysis and highlight the importance of considering intrinsic features of various types of omics data. We further explain major components that influence the outcomes of a pathway enrichment analysis, including defining background sets and choosing reference annotation databases. To improve reproducibility, we describe how to standardize reporting methodological details in publications. This article aims to serve as a primer for biologists to leverage the wealth of omics resources and motivate bioinformatics tool developers to enhance the power of pathway enrichment analysis.
Collapse
Affiliation(s)
- Kangmei Zhao
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94025, USA.
| | - Seung Yon Rhee
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94025, USA.
| |
Collapse
|
12
|
Griffin AT, Vlahos LJ, Chiuzan C, Califano A. NaRnEA: An Information Theoretic Framework for Gene Set Analysis. ENTROPY (BASEL, SWITZERLAND) 2023; 25:e25030542. [PMID: 36981431 PMCID: PMC10048242 DOI: 10.3390/e25030542] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 03/03/2023] [Accepted: 03/13/2023] [Indexed: 05/26/2023]
Abstract
Gene sets are being increasingly leveraged to make high-level biological inferences from transcriptomic data; however, existing gene set analysis methods rely on overly conservative, heuristic approaches for quantifying the statistical significance of gene set enrichment. We created Nonparametric analytical-Rank-based Enrichment Analysis (NaRnEA) to facilitate accurate and robust gene set analysis with an optimal null model derived using the information theoretic Principle of Maximum Entropy. By measuring the differential activity of ~2500 transcriptional regulatory proteins based on the differential expression of each protein's transcriptional targets between primary tumors and normal tissue samples in three cohorts from The Cancer Genome Atlas (TCGA), we demonstrate that NaRnEA critically improves in two widely used gene set analysis methods: Gene Set Enrichment Analysis (GSEA) and analytical-Rank-based Enrichment Analysis (aREA). We show that the NaRnEA-inferred differential protein activity is significantly correlated with differential protein abundance inferred from independent, phenotype-matched mass spectrometry data in the Clinical Proteomic Tumor Analysis Consortium (CPTAC), confirming the statistical and biological accuracy of our approach. Additionally, our analysis crucially demonstrates that the sample-shuffling empirical null models leveraged by GSEA and aREA for gene set analysis are overly conservative, a shortcoming that is avoided by the newly developed Maximum Entropy analytical null model employed by NaRnEA.
Collapse
Affiliation(s)
- Aaron T. Griffin
- Medical Scientist Training Program, Columbia University Irving Medical Center, New York, NY 10032, USA
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Lukas J. Vlahos
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Codruta Chiuzan
- Department of Biostatistics, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Andrea Califano
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
- Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032, USA
- JP Sulzberger Columbia Genome Center, Columbia University Irving Medical Center, New York, NY 10032, USA
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
- Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA
| |
Collapse
|
13
|
Chen L, Dupre A, Qiu X, Pellon-Cardenas O, Walton KD, Wang J, Perekatt AO, Hu W, Spence JR, Verzi MP. TGFB1 Induces Fetal Reprogramming and Enhances Intestinal Regeneration. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.13.523825. [PMID: 36711781 PMCID: PMC9882197 DOI: 10.1101/2023.01.13.523825] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
The adult gut epithelium has a remarkable ability to recover from damage. To achieve cellular therapies aimed at restoring and/or replacing defective gastrointestinal tissue, it is important to understand the natural mechanisms of tissue regeneration. We employed a combination of high throughput sequencing approaches, mouse genetic models, and murine and human organoid models, and identified a role for TGFB signaling during intestinal regeneration following injury. At 2 days following irradiation (IR)-induced damage of intestinal crypts, a surge in TGFB1 expression is mediated by monocyte/macrophage cells at the location of damage. Depletion of macrophages or genetic disruption of TGFB-signaling significantly impaired the regenerative response following irradiation. Murine intestinal regeneration is also characterized by a process where a fetal transcriptional signature is induced during repair. In organoid culture, TGFB1-treatment was necessary and sufficient to induce a transcriptomic shift to the fetal-like/regenerative state. The regenerative response was enhanced by the function of mesenchymal cells, which are also primed for regeneration by TGFB1. Mechanistically, integration of ATAC-seq, scRNA-seq, and ChIP-seq suggest that a regenerative YAP-SOX9 transcriptional circuit is activated in epithelium exposed to TGFB1. Finally, pre-treatment with TGFB1 enhanced the ability of primary epithelial cultures to engraft into damaged murine colon, suggesting promise for the application of the TGFB-induced regenerative circuit in cellular therapy.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Science and Technology, Key Laboratory of Developmental Genes and Human Disease, Southeast University, Nanjing, China
| | - Abigail Dupre
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, USA
| | - Xia Qiu
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, USA
| | - Oscar Pellon-Cardenas
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, USA
| | - Katherine D. Walton
- Department of Internal Medicine, Gastroenterology, University of Michigan Medical School, Ann Arbor, MI, USA
- Department of Cell and Developmental Biology, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Jianming Wang
- Department of Radiation Oncology, Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ, USA
| | - Ansu O. Perekatt
- Department of Chemistry and Chemical Biology, Stevens Institute of Technology, Hoboken, NJ, USA
| | - Wenwei Hu
- Department of Radiation Oncology, Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ, USA
| | - Jason R. Spence
- Department of Internal Medicine, Gastroenterology, University of Michigan Medical School, Ann Arbor, MI, USA
- Department of Cell and Developmental Biology, University of Michigan Medical School, Ann Arbor, MI, USA
- Department of Biomedical Engineering, University of Michigan College of Engineering, Ann Arbor, MI, USA
| | - Michael P. Verzi
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, USA
- Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ, USA
- Rutgers Center for Lipid Research, New Jersey Institute for Food, Nutrition & Health, Rutgers University, New Brunswick, NJ, USA
- Member of the NIEHS Center for Environmental Exposures and Disease (CEED), Rutgers EOHSI Piscataway, NJ, USA
- Lead Contact
| |
Collapse
|
14
|
Aberasturi DT, Piegorsch WW, Bedrick EJ, Lussier YA. Accounting for extra-binomial variability with differentially expressed genetic pathway data: a collaborative bioinformatic study. Stat (Int Stat Inst) 2023; 12:e518. [PMID: 37885703 PMCID: PMC10601968 DOI: 10.1002/sta4.518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 10/21/2022] [Indexed: 10/28/2023]
Abstract
We describe a collaborative project involving faculty and students in a university bioinformatics/biostatistics center. The project focuses on identification of differentially expressed gene sets ("pathways") in subjects expressing a disease state, medical intervention, or other distinguishable condition. The key feature of the endeavor is the data structure presented to the team: a single cohort of subjects with two samples taken from each subject - one for each of two differing conditions without replication. This particular structure leads to essentially a cohort of 2 × 2 contingency tables, where each table compares the differential gene state with the pathway condition. Recognizing that correlations both within and between pathway responses can disrupt standard 2 × 2 table analytics, we develop methods for analyzing this data structure in the presence of complicated intra-table correlations. These provide some convenient approaches for this problem, using design effect adjustments from sample survey theory and manipulations of the summary 2 × 2 table counts. Monte Carlo simulations show that the methods operate extremely well, validating their use in practice. In the end, the collaborative connections among the team members led to solutions no one of us would have envisioned separately.
Collapse
Affiliation(s)
- Dillon T Aberasturi
- Center for Biomedical Informatics and Biostatistics, University of Arizona, Tucson, AZ, USA
- Bio5 Institute, University of Arizona, Tucson, AZ, USA
| | - Walter W Piegorsch
- Center for Biomedical Informatics and Biostatistics, University of Arizona, Tucson, AZ, USA
- Bio5 Institute, University of Arizona, Tucson, AZ, USA
- Department of Statistics, School of Public Health, University of Arizona, Tucson, AZ, USA
| | - Edward J Bedrick
- Center for Biomedical Informatics and Biostatistics, University of Arizona, Tucson, AZ, USA
- Bio5 Institute, University of Arizona, Tucson, AZ, USA
- Department of Statistics, School of Public Health, University of Arizona, Tucson, AZ, USA
- Department of Medicine, School of Medicine, University of Arizona, Tucson, AZ, USA
| | - Yves A Lussier
- Center for Biomedical Informatics and Biostatistics, University of Arizona, Tucson, AZ, USA
- Bio5 Institute, University of Arizona, Tucson, AZ, USA
- Department of Medicine, School of Medicine, University of Arizona, Tucson, AZ, USA
- Arizona Comprehensive Cancer Center, University of Arizona, Tucson, AZ, USA
| |
Collapse
|
15
|
Hypergraph geometry reflects higher-order dynamics in protein interaction networks. Sci Rep 2022; 12:20879. [PMID: 36463292 PMCID: PMC9719542 DOI: 10.1038/s41598-022-24584-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 11/17/2022] [Indexed: 12/05/2022] Open
Abstract
Protein interactions form a complex dynamic molecular system that shapes cell phenotype and function; in this regard, network analysis is a powerful tool for studying the dynamics of cellular processes. Current models of protein interaction networks are limited in that the standard graph model can only represent pairwise relationships. Higher-order interactions are well-characterized in biology, including protein complex formation and feedback or feedforward loops. These higher-order relationships are better represented by a hypergraph as a generalized network model. Here, we present an approach to analyzing dynamic gene expression data using a hypergraph model and quantify network heterogeneity via Forman-Ricci curvature. We observe, on a global level, increased network curvature in pluripotent stem cells and cancer cells. Further, we use local curvature to conduct pathway analysis in a melanoma dataset, finding increased curvature in several oncogenic pathways and decreased curvature in tumor suppressor pathways. We compare this approach to a graph-based model and a differential gene expression approach.
Collapse
|
16
|
Makrooni MA, O’Shea D, Geeleher P, Seoighe C. Random-effects meta-analysis of effect sizes as a unified framework for gene set analysis. PLoS Comput Biol 2022; 18:e1010278. [PMID: 36197939 PMCID: PMC9576052 DOI: 10.1371/journal.pcbi.1010278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 10/17/2022] [Accepted: 09/18/2022] [Indexed: 11/06/2022] Open
Abstract
Gene set analysis (GSA) remains a common step in genome-scale studies because it can reveal insights that are not apparent from results obtained for individual genes. Many different computational tools are applied for GSA, which may be sensitive to different types of signals; however, most methods implicitly test whether there are differences in the distribution of the effect of some experimental condition between genes in gene sets of interest. We have developed a unifying framework for GSA that first fits effect size distributions, and then tests for differences in these distributions between gene sets. These differences can be in the proportions of genes that are perturbed or in the sign or size of the effects. Inspired by statistical meta-analysis, we take into account the uncertainty in effect size estimates by reducing the influence of genes with greater uncertainty on the estimation of distribution parameters. We demonstrate, using simulation and by application to real data, that this approach provides significant gains in performance over existing methods. Furthermore, the statistical tests carried out are defined in terms of effect sizes, rather than the results of prior statistical tests measuring these changes, which leads to improved interpretability and greater robustness to variation in sample sizes. The role of gene set analysis is to identify groups of genes that are perturbed in a genomics experiment. There are many tools available for this task and they do not all test for the same types of changes. Here we propose a new way to carry out gene set analysis that involves first working out the distribution of the group effect in the gene set and then comparing this distribution to the equivalent distribution in other genes. Tests performed by existing tools for gene set analysis can be related to different comparisons in these distributions of group effects. A unified framework for gene set analysis provides for more explicit null hypotheses against which to test sets of genes for different types of responses to the experimental conditions. These results are more interpretable, because the group effect distributions can be compared visually, providing an indication of how the experimental effect differs between the gene sets.
Collapse
Affiliation(s)
- Mohammad A. Makrooni
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland
| | - Dónal O’Shea
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland
| | - Paul Geeleher
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, Tennessee, United States of America
| | - Cathal Seoighe
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland,* E-mail:
| |
Collapse
|
17
|
Chen B, Zhang J, Wang T, Shao C, Miao L, Zhang S, Shang X. Investigating the evolution process of lung adenocarcinoma via random walk and dynamic network analysis. Front Genet 2022; 13:953801. [PMID: 36246662 PMCID: PMC9559577 DOI: 10.3389/fgene.2022.953801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 09/05/2022] [Indexed: 11/30/2022] Open
Abstract
Lung adenocarcinoma (LUAD) is a typical disease regarded as having multi-stage progression. However, many existing methods often ignore the critical differences among these stages, thereby limiting their effectiveness for discovering key biological molecules and biological functions as signals at each stage. In this study, we propose a method to discover the evolution between biological molecules and biological functions by investigating the multi-stage biological molecules of LUAD. The method is based on the random walk algorithm and the Monte Carlo method to generate clusters as the modules, which were used as subgraphs of the differentiated biological molecules network in each stage. The connection between modules of adjacent stages is based on the measurement of the Jaccard coefficient. The online gene set enrichment analysis tool (DAVID) was used to obtain biological functions corresponding to the individual important modules. The core evolution network was constructed by combining the aforementioned two networks. Since the networks here are all dynamic, we also propose a strategy to visualize the dynamic information together in one network. Eventually, 12 core modules and 11 core biological functions were found through such evolutionary analyses. Among the core biological functions that we obtained, six functions are related to the disease, the biological function of neutrophil chemotaxis is not directly associated with LUAD but can serve as a predictor, two functions may serve as a predictive signal, and two functions need to be verified through more biological evidence. Compared with two alternative design methods, the method proposed in this study performed more efficiently.
Collapse
Affiliation(s)
- Bolin Chen
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Jinlei Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Teng Wang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Ci Shao
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Lijun Miao
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Shengli Zhang
- School of Information Technology, Minzu Normal University of Xingyi, Xingyi, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
- *Correspondence: Xuequn Shang,
| |
Collapse
|
18
|
Abstract
Pathway enrichment analysis (PEA) is a computational biology method that identifies biological functions that are overrepresented in a group of genes more than would be expected by chance and ranks these functions by relevance. The relative abundance of genes pertinent to specific pathways is measured through statistical methods, and associated functional pathways are retrieved from online bioinformatics databases. In the last decade, along with the spread of the internet, higher availability of computational resources made PEA software tools easy to access and to use for bioinformatics practitioners worldwide. Although it became easier to use these tools, it also became easier to make mistakes that could generate inflated or misleading results, especially for beginners and inexperienced computational biologists. With this article, we propose nine quick tips to avoid common mistakes and to out a complete, sound, thorough PEA, which can produce relevant and robust results. We describe our nine guidelines in a simple way, so that they can be understood and used by anyone, including students and beginners. Some tips explain what to do before starting a PEA, others are suggestions of how to correctly generate meaningful results, and some final guidelines indicate some useful steps to properly interpret PEA results. Our nine tips can help users perform better pathway enrichment analyses and eventually contribute to a better understanding of current biology.
Collapse
|
19
|
High-Depth Transcriptome Reveals Differences in Natural Haploid Ginkgo biloba L. Due to the Effect of Reduced Gene Dosage. Int J Mol Sci 2022; 23:ijms23168958. [PMID: 36012222 PMCID: PMC9409250 DOI: 10.3390/ijms23168958] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 07/31/2022] [Accepted: 08/10/2022] [Indexed: 12/13/2022] Open
Abstract
As a representative of gymnosperms, the discovery of natural haploids of Ginkgo biloba L. has opened a new door for its research. Haploid germplasm has always been a research material of interest to researchers because of its special characteristics. However, we do not yet know the special features and mechanisms of haploid ginkgo following this significant discovery. In this study, we conducted a homogenous garden experiment on haploid and diploid ginkgo to explore the differences in growth, physiology and biochemistry between the two. Additionally, a high-depth transcriptome database of both was established to reveal their transcriptional differences. The results showed that haploid ginkgo exhibited weaker growth potential, lower photosynthesis and flavonoid accumulation capacity. Although the up-regulated expression of DEGs in haploid ginkgo reached 46.7% of the total DEGs in the whole transcriptome data, the gene sets of photosynthesis metabolic, glycolysis/gluconeogenesis and flavonoid biosynthesis pathways, which were significantly related to these differences, were found to show a significant down-regulated expression trend by gene set enrichment analysis (GSEA). We further found that the major metabolic pathways in the haploid ginkgo transcriptional database were down-regulated in expression compared to the diploid. This study reveals for the first time the phenotypic, growth and physiological differences in haploid ginkgos, and demonstrates their transcriptional patterns based on high-depth transcriptomic data, laying the foundation for subsequent in-depth studies of haploid ginkgos.
Collapse
|
20
|
Yen NTH, Park SM, Thu VTA, Phat NK, Cho YS, Yoon S, Shin JG, Kim DH, Oh JH, Long NP. Genome-wide gene expression analysis reveals molecular insights into the drug-induced toxicity of nephrotoxic agents. Life Sci 2022; 306:120801. [PMID: 35850247 DOI: 10.1016/j.lfs.2022.120801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 06/30/2022] [Accepted: 07/09/2022] [Indexed: 11/17/2022]
Abstract
Drug-induced nephrotoxicity is frequently reported. However, the mechanisms underlying nephrotoxic medications and their overlapping molecular events, which might have therapeutic value, are unclear. We performed a genome-wide analysis of gene expression and a gene set enrichment analysis to identify common and unique pathways associated with the toxicity of colistin, ifosfamide, indomethacin, and puromycin. Rats were randomly allocated into the treatment or control group. The treatment group received a toxic dose once daily of each investigated drug for 1 week. Differentially expressed genes were found in the drug-treated kidney and liver compared to the control, except for colistin in the liver. Upregulated pathways were mainly related to cell death, cell cycle, protein synthesis, and immune response modulation in the kidney. Cell cycle was upregulated by all drugs. Downregulated pathways were associated with carbon metabolism, amino acid metabolism, and fatty acid metabolism. Indomethacin, colistin, and puromycin shared the most altered pathways in the kidney. Ifosfamide and indomethacin affected molecular processes greatly in the liver. Our findings provide insight into the mechanisms underlying the renal and hepatic adverse effects of the four drugs. Further investigation should explore the combinatory drug therapies that attenuate the toxic effects and maximize the effectiveness of nephrotoxic drugs.
Collapse
Affiliation(s)
- Nguyen Thi Hai Yen
- Department of Pharmacology and PharmacoGenomics Research Center, Inje University College of Medicine, Busan 614-735, Republic of Korea; Center for Personalized Precision Medicine of Tuberculosis, Inje University College of Medicine, Busan 614-735, Republic of Korea
| | - Se-Myo Park
- Department of Predictive Toxicology, Korea Institute of Toxicology, Daejeon 34114, Republic of Korea
| | - Vo Thuy Anh Thu
- Department of Pharmacology and PharmacoGenomics Research Center, Inje University College of Medicine, Busan 614-735, Republic of Korea; Center for Personalized Precision Medicine of Tuberculosis, Inje University College of Medicine, Busan 614-735, Republic of Korea
| | - Nguyen Ky Phat
- Department of Pharmacology and PharmacoGenomics Research Center, Inje University College of Medicine, Busan 614-735, Republic of Korea; Center for Personalized Precision Medicine of Tuberculosis, Inje University College of Medicine, Busan 614-735, Republic of Korea
| | - Yong-Soon Cho
- Department of Pharmacology and PharmacoGenomics Research Center, Inje University College of Medicine, Busan 614-735, Republic of Korea; Center for Personalized Precision Medicine of Tuberculosis, Inje University College of Medicine, Busan 614-735, Republic of Korea
| | - Seokjoo Yoon
- Department of Predictive Toxicology, Korea Institute of Toxicology, Daejeon 34114, Republic of Korea
| | - Jae-Gook Shin
- Department of Pharmacology and PharmacoGenomics Research Center, Inje University College of Medicine, Busan 614-735, Republic of Korea; Center for Personalized Precision Medicine of Tuberculosis, Inje University College of Medicine, Busan 614-735, Republic of Korea
| | - Dong Hyun Kim
- Department of Pharmacology and PharmacoGenomics Research Center, Inje University College of Medicine, Busan 614-735, Republic of Korea
| | - Jung-Hwa Oh
- Department of Predictive Toxicology, Korea Institute of Toxicology, Daejeon 34114, Republic of Korea.
| | - Nguyen Phuoc Long
- Department of Pharmacology and PharmacoGenomics Research Center, Inje University College of Medicine, Busan 614-735, Republic of Korea; Center for Personalized Precision Medicine of Tuberculosis, Inje University College of Medicine, Busan 614-735, Republic of Korea.
| |
Collapse
|
21
|
Saravanakumar K, Santosh SS, Ahamed MA, Sathiyaseelan A, Sultan G, Irfan N, Ali DM, Wang MH. Bioinformatics strategies for studying the molecular mechanisms of fungal extracellular vesicles with a focus on infection and immune responses. Brief Bioinform 2022; 23:6632620. [PMID: 35794708 DOI: 10.1093/bib/bbac250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 05/16/2022] [Accepted: 05/28/2022] [Indexed: 01/19/2023] Open
Abstract
Fungal extracellular vesicles (EVs) are released during pathogenesis and are found to be an opportunistic infection in most cases. EVs are immunocompetent with their host and have paved the way for new biomedical approaches to drug delivery and the treatment of complex diseases including cancer. With computing and processing advancements, the rise of bioinformatics tools for the evaluation of various parameters involved in fungal EVs has blossomed. In this review, we have complied and explored the bioinformatics tools to analyze the host-pathogen interaction, toxicity, omics and pathogenesis with an array of specific tools that have depicted the ability of EVs as vector/carrier for therapeutic agents and as a potential theme for immunotherapy. We have also discussed the generation and pathways involved in the production, transport, pathogenic action and immunological interactions of EVs in the host system. The incorporation of network pharmacology approaches has been discussed regarding fungal pathogens and their significance in drug discovery. To represent the overview, we have presented and demonstrated an in silico study model to portray the human Cryptococcal interactions.
Collapse
Affiliation(s)
- Kandasamy Saravanakumar
- Department of Bio-Health convergence, Kangwon National University, Chuncheon 200-701, Republic of Korea
| | | | - MohamedAli Afaan Ahamed
- School of Life Sciences, B.S. Abdur Rahman Crescent Institute of Science and Technology, Chennai, Tamil Nadu 600048, India
| | - Anbazhagan Sathiyaseelan
- Department of Bio-Health convergence, Kangwon National University, Chuncheon 200-701, Republic of Korea
| | - Ghazala Sultan
- Department of Computer Science, Aligarh Muslim University, Aligarh, Uttar Pradesh, 202002, India
| | - Navabshan Irfan
- Crescent School of Pharmacy, B.S Abdur Rahman Crescent Institute of Science and Technology, Chennai, 600048, India
| | - Davoodbasha Mubarak Ali
- School of Life Sciences, B.S. Abdur Rahman Crescent Institute of Science and Technology, Chennai, Tamil Nadu 600048, India
| | - Myeong-Hyeon Wang
- Department of Bio-Health convergence, Kangwon National University, Chuncheon 200-701, Republic of Korea
| |
Collapse
|
22
|
Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma. PLoS One 2022; 17:e0269570. [PMID: 35749395 PMCID: PMC9231717 DOI: 10.1371/journal.pone.0269570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 05/09/2022] [Indexed: 11/30/2022] Open
Abstract
Deep learning techniques have recently been applied to analyze associations between gene expression data and disease phenotypes. However, there are concerns regarding the black box problem: it is difficult to interpret why the prediction results are obtained using deep learning models from model parameters. New methods have been proposed for interpreting deep learning model predictions but have not been applied to genetics. In this study, we demonstrated that applying SHapley Additive exPlanations (SHAP) to a deep learning model using graph convolutions of genetic pathways can provide pathway-level feature importance for classification prediction of diffuse large B-cell lymphoma (DLBCL) gene expression subtypes. Using Kyoto Encyclopedia of Genes and Genomes pathways, a graph convolutional network (GCN) model was implemented to construct graphs with nodes and edges. DLBCL datasets, including microarray gene expression data and clinical information on subtypes (germinal center B-cell-like type and activated B-cell-like type), were retrieved from the Gene Expression Omnibus to evaluate the model. The GCN model showed an accuracy of 0.914, precision of 0.948, recall of 0.868, and F1 score of 0.906 in analysis of the classification performance for the test datasets. The pathways with high feature importance by SHAP included highly enriched pathways in the gene set enrichment analysis. Moreover, a logistic regression model with explanatory variables of genes in pathways with high feature importance showed good performance in predicting DLBCL subtypes. In conclusion, our GCN model for classifying DLBCL subtypes is useful for interpreting important regulatory pathways that contribute to the prediction.
Collapse
|
23
|
Mubeen S, Tom Kodamullil A, Hofmann-Apitius M, Domingo-Fernández D. On the influence of several factors on pathway enrichment analysis. Brief Bioinform 2022; 23:bbac143. [PMID: 35453140 PMCID: PMC9116215 DOI: 10.1093/bib/bbac143] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 03/21/2022] [Accepted: 03/30/2022] [Indexed: 02/01/2023] Open
Abstract
Pathway enrichment analysis has become a widely used knowledge-based approach for the interpretation of biomedical data. Its popularity has led to an explosion of both enrichment methods and pathway databases. While the elegance of pathway enrichment lies in its simplicity, multiple factors can impact the results of such an analysis, which may not be accounted for. Researchers may fail to give influential aspects their due, resorting instead to popular methods and gene set collections, or default settings. Despite ongoing efforts to establish set guidelines, meaningful results are still hampered by a lack of consensus or gold standards around how enrichment analysis should be conducted. Nonetheless, such concerns have prompted a series of benchmark studies specifically focused on evaluating the influence of various factors on pathway enrichment results. In this review, we organize and summarize the findings of these benchmarks to provide a comprehensive overview on the influence of these factors. Our work covers a broad spectrum of factors, spanning from methodological assumptions to those related to prior biological knowledge, such as pathway definitions and database choice. In doing so, we aim to shed light on how these aspects can lead to insignificant, uninteresting or even contradictory results. Finally, we conclude the review by proposing future benchmarks as well as solutions to overcome some of the challenges, which originate from the outlined factors.
Collapse
Affiliation(s)
- Sarah Mubeen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115 Bonn, Germany
- Fraunhofer Center for Machine Learning, Germany
| | - Alpha Tom Kodamullil
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115 Bonn, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Fraunhofer Center for Machine Learning, Germany
- Enveda Biosciences, Boulder, CO, 80301, USA
| |
Collapse
|
24
|
Wijesooriya K, Jadaan SA, Perera KL, Kaur T, Ziemann M. Urgent need for consistent standards in functional enrichment analysis. PLoS Comput Biol 2022; 18:e1009935. [PMID: 35263338 PMCID: PMC8936487 DOI: 10.1371/journal.pcbi.1009935] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 03/21/2022] [Accepted: 02/18/2022] [Indexed: 11/25/2022] Open
Abstract
Gene set enrichment tests (a.k.a. functional enrichment analysis) are among the most frequently used methods in computational biology. Despite this popularity, there are concerns that these methods are being applied incorrectly and the results of some peer-reviewed publications are unreliable. These problems include the use of inappropriate background gene lists, lack of false discovery rate correction and lack of methodological detail. To ascertain the frequency of these issues in the literature, we performed a screen of 186 open-access research articles describing functional enrichment results. We find that 95% of analyses using over-representation tests did not implement an appropriate background gene list or did not describe this in the methods. Failure to perform p-value correction for multiple tests was identified in 43% of analyses. Many studies lacked detail in the methods section about the tools and gene sets used. An extension of this survey showed that these problems are not associated with journal or article level bibliometrics. Using seven independent RNA-seq datasets, we show misuse of enrichment tools alters results substantially. In conclusion, most published functional enrichment studies suffered from one or more major flaws, highlighting the need for stronger standards for enrichment analysis. Functional enrichment analysis is a commonly used technique to identify trends in large scale biological datasets. In biomedicine, functional enrichment analysis of gene expression data is frequently applied to identify disease and drug mechanisms. While enrichment tests were once primarily conducted with complicated computer scripts, web-based tools are becoming more widely used. Users can paste a list of genes into a website and receive enrichment results in a matter of seconds. Despite the popularity of these tools, there are concerns that statistical problems and incomplete reporting are compromising research quality. In this article, we conducted a systematic examination of published enrichment analyses and assessed whether (i) any statistical flaws were present and (ii) sufficient methodological detail is provided such that the study could be replicated. We found that lack of methodological detail and errors in statistical analysis were widespread, which undermines the reliability and reproducibility of these research articles. A set of best practices is urgently needed to raise the quality of published work.
Collapse
Affiliation(s)
- Kaumadi Wijesooriya
- Deakin University, School of Life and Environmental Sciences, Geelong, Australia
| | - Sameer A. Jadaan
- College of Health and Medical Technology, Middle Technical University, Baghdad, Iraq
| | - Kaushalya L. Perera
- Deakin University, School of Life and Environmental Sciences, Geelong, Australia
| | - Tanuveer Kaur
- Deakin University, School of Life and Environmental Sciences, Geelong, Australia
| | - Mark Ziemann
- Deakin University, School of Life and Environmental Sciences, Geelong, Australia
- * E-mail:
| |
Collapse
|
25
|
Woodward AA, Taylor DM, Goldmuntz E, Mitchell LE, Agopian A, Moore JH, Urbanowicz RJ. Gene-Interaction-Sensitive enrichment analysis in congenital heart disease. BioData Min 2022; 15:4. [PMID: 35151364 PMCID: PMC8841104 DOI: 10.1186/s13040-022-00287-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 01/17/2022] [Indexed: 11/24/2022] Open
Abstract
Background Gene set enrichment analysis (GSEA) uses gene-level univariate associations to identify gene set-phenotype associations for hypothesis generation and interpretation. We propose that GSEA can be adapted to incorporate SNP and gene-level interactions. To this end, gene scores are derived by Relief-based feature importance algorithms that efficiently detect both univariate and interaction effects (MultiSURF) or exclusively interaction effects (MultiSURF*). We compare these interaction-sensitive GSEA approaches to traditional χ2 rankings in simulated genome-wide array data, and in a target and replication cohort of congenital heart disease patients with conotruncal defects (CTDs). Results In the simulation study and for both CTD datasets, both Relief-based approaches to GSEA captured more relevant and significant gene ontology terms compared to the univariate GSEA. Key terms and themes of interest include cell adhesion, migration, and signaling. A leading edge analysis highlighted semaphorins and their receptors, the Slit-Robo pathway, and other genes with roles in the secondary heart field and outflow tract development. Conclusions Our results indicate that interaction-sensitive approaches to enrichment analysis can improve upon traditional univariate GSEA. This approach replicated univariate findings and identified additional and more robust support for the role of the secondary heart field and cardiac neural crest cell migration in the development of CTDs. Supplementary Information The online version contains supplementary material available at (10.1186/s13040-022-00287-w).
Collapse
|
26
|
Hamdaoui Q, Zekri Y, Richard S, Aubert D, Guyot R, Markossian S, Gauthier K, Gaie-Levrel F, Bencsik A, Flamant F. Prenatal exposure to paraquat and nanoscaled TiO 2 aerosols alters the gene expression of the developing brain. CHEMOSPHERE 2022; 287:132253. [PMID: 34543901 DOI: 10.1016/j.chemosphere.2021.132253] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Revised: 09/03/2021] [Accepted: 09/13/2021] [Indexed: 06/13/2023]
Abstract
Nanopesticides are innovative pesticides involving engineered nanomaterials in their formulation to increase the efficiency of plant protection products, while mitigating their environmental impact. Despite the predicted growth of the nanopesticide use, no data is available on their inhalation toxicity and the potential cocktail effects between their components. In particular, the neurodevelopmental toxicity caused by prenatal exposures might have long lasting consequences. In the present study, we repeatedly exposed gestating mice in a whole-body exposure chamber to three aerosols, involving the paraquat herbicide, nanoscaled titanium dioxide particles (nTiO2), or a mixture of both. Particle number concentrations and total mass concentrations were followed to enable a metrological follow-up of the exposure sessions. Based on the aerosols characteristics, the alveolar deposited dose in mice was then estimated. RNA-seq was used to highlight dysregulations in the striatum of pups in response to the in utero exposure. Modifications in gene expression were identified at post-natal day 14, which might reflect neurodevelopmental alterations in this key brain area. The data suggest an alteration in the mitochondrial function following paraquat exposure, which is reminiscent of the pathological process leading to Parkinson disease. Markers of different cell lineages were dysregulated, showing effects, which were not limited to dopaminergic neurons. Exposure to the nTiO2 aerosol modulated the regulation of cytokines and neurotransmitters pathways, perhaps reflecting a minor neuroinflammation. No synergy was found between paraquat and nTiO2. Instead, the neurodevelopmental effects were surprisingly lower than the one measured for each substance separately.
Collapse
Affiliation(s)
- Quentin Hamdaoui
- IGFL, Functional Genomics of Thyroid Hormone Signaling Group, Lyon, France; Laboratoire National de Métrologie et D'essais (LNE), Paris, France
| | - Yanis Zekri
- IGFL, Functional Genomics of Thyroid Hormone Signaling Group, Lyon, France
| | - Sabine Richard
- IGFL, Functional Genomics of Thyroid Hormone Signaling Group, Lyon, France
| | - Denise Aubert
- IGFL, Functional Genomics of Thyroid Hormone Signaling Group, Lyon, France
| | - Romain Guyot
- IGFL, Functional Genomics of Thyroid Hormone Signaling Group, Lyon, France
| | - Suzy Markossian
- IGFL, Functional Genomics of Thyroid Hormone Signaling Group, Lyon, France
| | - Karine Gauthier
- IGFL, Functional Genomics of Thyroid Hormone Signaling Group, Lyon, France
| | | | - Anna Bencsik
- Université Claude Bernard Lyon 1, ANSES, Laboratoire de Lyon, France
| | - Frédéric Flamant
- IGFL, Functional Genomics of Thyroid Hormone Signaling Group, Lyon, France.
| |
Collapse
|
27
|
Wong LM, Li WT, Shende N, Tsai JC, Ma J, Chakladar J, Gnanasekar A, Qu Y, Dereschuk K, Wang-Rodriguez J, Ongkeko WM. Analysis of the immune landscape in virus-induced cancers using a novel integrative mechanism discovery approach. Comput Struct Biotechnol J 2021; 19:6240-6254. [PMID: 34900135 PMCID: PMC8636736 DOI: 10.1016/j.csbj.2021.11.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 11/11/2021] [Accepted: 11/11/2021] [Indexed: 11/17/2022] Open
Abstract
Background The mechanisms of carcinogenesis from viral infections are extraordinarily complex and not well understood. Traditional methods of analyzing RNA-sequencing data may not be sufficient for unraveling complicated interactions between viruses and host cells. Using RNA and DNA-sequencing data from The Cancer Genome Atlas (TCGA), we aim to explore whether virus-induced tumors exhibit similar immune-associated (IA) dysregulations using a new algorithm we developed that focuses on the most important biological mechanisms involved in virus-induced cancers. Differential expression, survival correlation, and clinical variable correlations were used to identify the most clinically relevant IA genes dysregulated in 5 virus-induced cancers (HPV-induced head and neck squamous cell carcinoma, HPV-induced cervical cancer, EBV-induced stomach cancer, HBV-induced liver cancer, and HCV-induced liver cancer) after which a mechanistic approach was adopted to identify pathways implicated in IA gene dysregulation. Results Our results revealed that IA dysregulations vary with the cancer type and the virus type, but cytokine signaling pathways are dysregulated in all virus-induced cancers. Furthermore, we also found that important similarities exist between all 5 virus-induced cancers in dysregulated clinically relevant oncogenic signatures and IA pathways. Finally, we also discovered potential mechanisms for genomic alterations to induce IA gene dysregulations using our algorithm. Conclusions Our study offers a new approach to mechanism identification through integrating functional annotations and large-scale sequencing data, which may be invaluable to the discovery of new immunotherapy targets for virus-induced cancers.
Collapse
Key Words
- Algorithm
- C2, Canonical pathway
- C6, Oncogenic signature
- C7, Immunological signature
- CA, Cancer-associated
- CESC, Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma
- CNA, Copy number alteration
- Cervical squamous cell carcinoma and endocervical adenocarcinoma
- EBV, Epstein-Barr virus
- Epstein-Barr virus
- FDR, False discovery rate
- GSEA, Gene set enrichment analysis
- HBV, Hepatitis B virus
- HCV, Hepatitis C virus
- HNSCC, Head and Neck Squamous Cell Carcinoma
- HPV, Human papillomavirus
- Head and neck squamous cell carcinoma
- Hepatitis B
- Hepatitis C
- Human papillomavirus
- IA, Immune-associated
- LIHC, Liver Hepatocellular Carcinoma
- Liver hepatocellular carcinoma
- MSigDB, Molecular Signature Database
- STAD, Stomach Adenocarcinoma
- Stomach adenocarcinoma
- TCGA
- TCGA, The Cancer Genome Atlas
Collapse
Affiliation(s)
- Lindsay M. Wong
- Department of Surgery, Division of Otolaryngology-Head and Neck Surgery, University of California, San Diego, La Jolla, CA, 92093, USA
- Research Service, VA San Diego Healthcare System, San Diego, CA 92161, USA
| | - Wei Tse Li
- Department of Surgery, Division of Otolaryngology-Head and Neck Surgery, University of California, San Diego, La Jolla, CA, 92093, USA
- Research Service, VA San Diego Healthcare System, San Diego, CA 92161, USA
| | - Neil Shende
- Department of Surgery, Division of Otolaryngology-Head and Neck Surgery, University of California, San Diego, La Jolla, CA, 92093, USA
- Research Service, VA San Diego Healthcare System, San Diego, CA 92161, USA
| | - Joseph C. Tsai
- Department of Surgery, Division of Otolaryngology-Head and Neck Surgery, University of California, San Diego, La Jolla, CA, 92093, USA
- Research Service, VA San Diego Healthcare System, San Diego, CA 92161, USA
| | - Jiayan Ma
- Department of Surgery, Division of Otolaryngology-Head and Neck Surgery, University of California, San Diego, La Jolla, CA, 92093, USA
- Research Service, VA San Diego Healthcare System, San Diego, CA 92161, USA
| | - Jaideep Chakladar
- Department of Surgery, Division of Otolaryngology-Head and Neck Surgery, University of California, San Diego, La Jolla, CA, 92093, USA
- Research Service, VA San Diego Healthcare System, San Diego, CA 92161, USA
| | - Aditi Gnanasekar
- Department of Surgery, Division of Otolaryngology-Head and Neck Surgery, University of California, San Diego, La Jolla, CA, 92093, USA
- Research Service, VA San Diego Healthcare System, San Diego, CA 92161, USA
| | - Yuanhao Qu
- Department of Surgery, Division of Otolaryngology-Head and Neck Surgery, University of California, San Diego, La Jolla, CA, 92093, USA
- Research Service, VA San Diego Healthcare System, San Diego, CA 92161, USA
| | - Kypros Dereschuk
- Department of Surgery, Division of Otolaryngology-Head and Neck Surgery, University of California, San Diego, La Jolla, CA, 92093, USA
- Research Service, VA San Diego Healthcare System, San Diego, CA 92161, USA
| | - Jessica Wang-Rodriguez
- Department of Pathology, University of California San Diego, La Jolla, CA 92093, USA
- Pathology Service, VA San Diego Healthcare System, San Diego, CA 92161, USA
| | - Weg M. Ongkeko
- Department of Surgery, Division of Otolaryngology-Head and Neck Surgery, University of California, San Diego, La Jolla, CA, 92093, USA
- Research Service, VA San Diego Healthcare System, San Diego, CA 92161, USA
- Corresponding author at: Department of Surgery, Division of Otolaryngology-Head and Neck Surgery, University of California, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
28
|
Establishment and Validation of an MTORC1 Signaling-Related Gene Signature to Predict Overall Survival in Patients with Hepatocellular Carcinoma. BIOMED RESEARCH INTERNATIONAL 2021; 2021:6299472. [PMID: 34853791 PMCID: PMC8629633 DOI: 10.1155/2021/6299472] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 11/01/2021] [Accepted: 11/05/2021] [Indexed: 12/14/2022]
Abstract
Background Accurate and effective biomarkers for the prognosis of patients with hepatocellular carcinoma (HCC) are poorly identified. A network-based gene signature may serve as a valuable biomarker to improve the accuracy of risk discrimination in patients. Methods The expression levels of cancer hallmarks were determined by Cox regression analysis. Various bioinformatic methods, such as GSEA, WGCNA, and LASSO, and statistical approaches were applied to generate an MTORC1 signaling-related gene signature (MSRS). Moreover, a decision tree and nomogram were constructed to aid in the quantification of risk levels for each HCC patient. Results Active MTORC1 signaling was found to be the most vital predictor of overall survival in HCC patients in the training cohort. MSRS was established and proved to hold the capacity to stratify HCC patients with poor outcomes in two validated datasets. Analysis of the patient MSRS levels and patient survival data suggested that the MSRS can be a valuable risk factor in two validated datasets and the integrated cohort. Finally, we constructed a decision tree which allowed to distinguish subclasses of patients at high risk and a nomogram which could accurately predict the survival of individuals. Conclusions The present study may contribute to the improvement of current prognostic systems for patients with HCC.
Collapse
|
29
|
Maleki F, Ovens K, McQuillan I, Kusalik AJ. Silver: Forging almost Gold Standard Datasets. Genes (Basel) 2021; 12:genes12101523. [PMID: 34680918 PMCID: PMC8535810 DOI: 10.3390/genes12101523] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 09/19/2021] [Accepted: 09/22/2021] [Indexed: 11/16/2022] Open
Abstract
Gene set analysis has been widely used to gain insight from high-throughput expression studies. Although various tools and methods have been developed for gene set analysis, there is no consensus among researchers regarding best practice(s). Most often, evaluation studies have reported contradictory recommendations of which methods are superior. Therefore, an unbiased quantitative framework for evaluations of gene set analysis methods will be valuable. Such a framework requires gene expression datasets where enrichment status of gene sets is known a priori. In the absence of such gold standard datasets, artificial datasets are commonly used for evaluations of gene set analysis methods; however, they often rely on oversimplifying assumptions that make them biased in favor of or against a given method. In this paper, we propose a quantitative framework for evaluation of gene set analysis methods by synthesizing expression datasets using real data, without relying on oversimplifying or unrealistic assumptions, while preserving complex gene-gene correlations and retaining the distribution of expression values. The utility of the quantitative approach is shown by evaluating ten widely used gene set analysis methods. An implementation of the proposed method is publicly available. We suggest using Silver to evaluate existing and new gene set analysis methods. Evaluation using Silver provides a better understanding of current methods and can aid in the development of gene set analysis methods to achieve higher specificity without sacrificing sensitivity.
Collapse
Affiliation(s)
- Farhad Maleki
- Augmented Intelligence & Precision Health Laboratory, Institute of the McGill University Health Centre, McGill University, Montreal, QC H4A 3S5, Canada;
- Correspondence:
| | - Katie Ovens
- Augmented Intelligence & Precision Health Laboratory, Institute of the McGill University Health Centre, McGill University, Montreal, QC H4A 3S5, Canada;
| | - Ian McQuillan
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK S7N 5C9, Canada; (I.M.); (A.J.K.)
| | - Anthony J. Kusalik
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK S7N 5C9, Canada; (I.M.); (A.J.K.)
| |
Collapse
|
30
|
Huang G, Zhang H, Qu Y, Huang K, Gong X, Wei J, Du H. ARMT: An automatic RNA-seq data mining tool based on comprehensive and integrative analysis in cancer research. Comput Struct Biotechnol J 2021; 19:4426-4434. [PMID: 34471489 PMCID: PMC8379379 DOI: 10.1016/j.csbj.2021.08.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 07/19/2021] [Accepted: 08/06/2021] [Indexed: 11/02/2022] Open
Abstract
The comprehensive and integrative analysis of RNA-seq data, in different molecular layers from diverse samples, holds promise to address the full-scale complexity of biological systems. Recent advances in gene set variant analysis (GSVA) are providing exciting opportunities for revealing the specific biological processes of cancer samples. However, it is still urgently needed to develop a tool, which combines GSVA and different molecular characteristic analysis, as well as prognostic characteristics of cancer patients to reveal the biological processes of disease comprehensively. Here, we develop ARMT, an automatic tool for RNA-Seq data analysis. ARMT is an efficient and integrative tool with user-friendly interface to analyze related molecular characters of single gene and gene set comprehensively based on transcriptome and genomic data, which builds the bridge for deeper information between genes and pathways, to further accelerate scientific findings. ARMT can be installed easily from https://github.com/Dulab2020/ARMT.
Collapse
Affiliation(s)
- Guanda Huang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Haibo Zhang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Yimo Qu
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Kaitang Huang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Xiaocheng Gong
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Jinfen Wei
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Hongli Du
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| |
Collapse
|
31
|
Joly JH, Lowry WE, Graham NA. Differential Gene Set Enrichment Analysis: a statistical approach to quantify the relative enrichment of two gene sets. Bioinformatics 2021; 36:5247-5254. [PMID: 32692836 DOI: 10.1093/bioinformatics/btaa658] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2019] [Revised: 06/24/2020] [Accepted: 07/15/2020] [Indexed: 01/30/2023] Open
Abstract
MOTIVATION Gene Set Enrichment Analysis (GSEA) is an algorithm widely used to identify statistically enriched gene sets in transcriptomic data. However, GSEA cannot examine the enrichment of two gene sets or pathways relative to one another. Here we present Differential Gene Set Enrichment Analysis (DGSEA), an adaptation of GSEA that quantifies the relative enrichment of two gene sets. RESULTS After validating the method using synthetic data, we demonstrate that DGSEA accurately captures the hypoxia-induced coordinated upregulation of glycolysis and downregulation of oxidative phosphorylation. We also show that DGSEA is more predictive than GSEA of the metabolic state of cancer cell lines, including lactate secretion and intracellular concentrations of lactate and AMP. Finally, we demonstrate the application of DGSEA to generate hypotheses about differential metabolic pathway activity in cellular senescence. Together, these data demonstrate that DGSEA is a novel tool to examine the relative enrichment of gene sets in transcriptomic data. AVAILABILITY AND IMPLEMENTATION DGSEA software and tutorials are available at https://jamesjoly.github.io/DGSEA/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- James H Joly
- Mork Family Department of Chemical Engineering and Materials Science, University of Southern California, Los Angeles, CA 90089, USA
| | - William E Lowry
- Department of Molecular, Cell, and Developmental Biology, Los Angeles, Los Angeles, CA 90095, USA.,Broad Center for Regenerative Medicine, Los Angeles, Los Angeles, CA 90095, USA.,Division of Dermatology, David Geffen School of Medicine, Los Angeles, Los Angeles, CA 90095, USA.,Molecular Biology Institute, Los Angeles, Los Angeles, CA 90095, USA.,Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Nicholas A Graham
- Mork Family Department of Chemical Engineering and Materials Science, University of Southern California, Los Angeles, CA 90089, USA.,Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
32
|
Bryan J, Mandan A, Kamat G, Gottschalk WK, Badea A, Adams KJ, Thompson JW, Colton CA, Mukherjee S, Lutz MW. Likelihood ratio statistics for gene set enrichment in Alzheimer's disease pathways. Alzheimers Dement 2021; 17:561-573. [PMID: 33480182 PMCID: PMC8044005 DOI: 10.1002/alz.12223] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
INTRODUCTION The study of Alzheimer's disease (AD) has revealed biological pathways with implications for disease neuropathology and pathophysiology. These pathway-level effects may also be mediated by individual characteristics or covariates such as age or sex. Evaluation of AD biological pathways in the context of interactions with these covariates is critical to the understanding of AD as well as the development of model systems used to study the disease. METHODS Gene set enrichment methods are powerful tools used to interpret gene-level statistics at the level of biological pathways. We introduce a method for quantifying gene set enrichment using likelihood ratio-derived test statistics (gsLRT), which accounts for sample covariates like age and sex. We then use our method to test for age and sex interactions with protein expression levels in AD and to compare the pathway results between human and mouse species. RESULTS Our method, based on nested logistic regressions is competitive with the existing standard for gene set testing in the context of linear models and complex experimental design. The gene sets we identify as having a significant association with AD-both with and without additional covariate interactions-are validated by previous studies. Differences between gsLRT results on mouse and human datasets are observed. DISCUSSION Characterizing biological pathways involved in AD builds on the important work involving single gene drivers. Our gene set enrichment method finds pathways that are significantly related to AD while accounting for covariates that may be relevant to disease development. The method highlights commonalities and differences between human AD and mouse models, which may inform the development of higher fidelity models for the study of AD.
Collapse
Affiliation(s)
- Jordan Bryan
- Department of Statistical Science, Duke University, Durham, NC 27708, USA
| | - Arpita Mandan
- Department of Statistical Science, Duke University, Durham, NC 27708, USA
| | - Gauri Kamat
- Department of Statistical Science, Duke University, Durham, NC 27708, USA
| | | | - Alexandra Badea
- Department of Neurology, Duke University, Durham, NC 27708, USA
| | - Kendra J. Adams
- Department of Neurology, Duke University, Durham, NC 27708, USA
| | | | - Carol A. Colton
- Department of Neurology, Duke University, Durham, NC 27708, USA
| | - Sayan Mukherjee
- Department of Statistical Science, Duke University, Durham, NC 27708, USA
- Departments of Mathematics, Computer Science, and Biostatistics & Bioinformatics, Duke University, Durham, NC 27708, USA
| | - Michael W. Lutz
- Department of Neurology, Duke University, Durham, NC 27708, USA
| |
Collapse
|
33
|
Chen L, Cao W, Aita R, Aldea D, Flores J, Gao N, Bonder EM, Ellison CE, Verzi MP. Three-dimensional interactions between enhancers and promoters during intestinal differentiation depend upon HNF4. Cell Rep 2021; 34:108679. [PMID: 33503426 PMCID: PMC7899294 DOI: 10.1016/j.celrep.2020.108679] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 10/23/2020] [Accepted: 12/30/2020] [Indexed: 12/20/2022] Open
Abstract
Cells in renewing tissues exhibit dramatic transcriptional changes as they differentiate. The contribution of chromatin looping to tissue renewal is incompletely understood. Enhancer-promoter interactions could be relatively stable as cells transition from progenitor to differentiated states; alternatively, chromatin looping could be as dynamic as the gene expression from their loci. The intestinal epithelium is the most rapidly renewing mammalian tissue. Proliferative cells in crypts of Lieberkühn sustain a stream of differentiated cells that are continually shed into the lumen. We apply chromosome conformation capture combined with chromatin immunoprecipitation (HiChIP) and sequencing to measure enhancer-promoter interactions in progenitor and differentiated cells of the intestinal epithelium. Despite dynamic gene regulation across the differentiation axis, we find that enhancer-promoter interactions are relatively stable. Functionally, we find HNF4 transcription factors are required for chromatin looping at target genes. Depletion of HNF4 disrupts local chromatin looping, histone modifications, and target gene expression. This study provides insights into transcriptional regulatory mechanisms governing homeostasis in renewing tissues. Chen et al. provide a survey of enhancer-promoter 3D looping in the intestinal epithelium by HiChIP, in vivo. They find that enhancer-promoter interactions are highly dependent upon the key intestinal transcription factor HNF4. Their findings provide insights into transcriptional regulatory mechanisms governing homeostasis in renewing tissues.
Collapse
Affiliation(s)
- Lei Chen
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA; Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08903, USA
| | - Weihuan Cao
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA
| | - Rohit Aita
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA
| | - Dennis Aldea
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA
| | - Juan Flores
- Department of Biological Sciences, Rutgers University, Newark, NJ 07102, USA
| | - Nan Gao
- Department of Biological Sciences, Rutgers University, Newark, NJ 07102, USA
| | - Edward M Bonder
- Department of Biological Sciences, Rutgers University, Newark, NJ 07102, USA
| | - Christopher E Ellison
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA
| | - Michael P Verzi
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA; Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08903, USA; Rutgers Center for Lipid Research, New Jersey Institute for Food, Nutrition & Health, Rutgers University, New Brunswick, NJ 08901, USA.
| |
Collapse
|
34
|
Application of Transcriptional Gene Modules to Analysis of Caenorhabditis elegans' Gene Expression Data. G3-GENES GENOMES GENETICS 2020; 10:3623-3638. [PMID: 32759329 PMCID: PMC7534440 DOI: 10.1534/g3.120.401270] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Identification of co-expressed sets of genes (gene modules) is used widely for grouping functionally related genes during transcriptomic data analysis. An organism-wide atlas of high-quality gene modules would provide a powerful tool for unbiased detection of biological signals from gene expression data. Here, using a method based on independent component analysis we call DEXICA, we have defined and optimized 209 modules that broadly represent transcriptional wiring of the key experimental organism C. elegans. These modules represent responses to changes in the environment (e.g., starvation, exposure to xenobiotics), genes regulated by transcriptions factors (e.g., ATFS-1, DAF-16), genes specific to tissues (e.g., neurons, muscle), genes that change during development, and other complex transcriptional responses to genetic, environmental and temporal perturbations. Interrogation of these modules reveals processes that are activated in long-lived mutants in cases where traditional analyses of differentially expressed genes fail to do so. Additionally, we show that modules can inform the strength of the association between a gene and an annotation (e.g., GO term). Analysis of “module-weighted annotations” improves on several aspects of traditional annotation-enrichment tests and can aid in functional interpretation of poorly annotated genes. We provide an online interactive resource with tutorials at http://genemodules.org/, in which users can find detailed information on each module, check genes for module-weighted annotations, and use both of these to analyze their own gene expression data (generated using any platform) or gene sets of interest.
Collapse
|
35
|
Maleki F, Ovens K, Hogan DJ, Kusalik AJ. Gene Set Analysis: Challenges, Opportunities, and Future Research. Front Genet 2020; 11:654. [PMID: 32695141 PMCID: PMC7339292 DOI: 10.3389/fgene.2020.00654] [Citation(s) in RCA: 93] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Accepted: 05/29/2020] [Indexed: 12/14/2022] Open
Abstract
Gene set analysis methods are widely used to provide insight into high-throughput gene expression data. There are many gene set analysis methods available. These methods rely on various assumptions and have different requirements, strengths and weaknesses. In this paper, we classify gene set analysis methods based on their components, describe the underlying requirements and assumptions for each class, and provide directions for future research in developing and evaluating gene set analysis methods.
Collapse
|
36
|
Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges. ENTROPY 2020; 22:e22040427. [PMID: 33286201 PMCID: PMC7516904 DOI: 10.3390/e22040427] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Revised: 03/18/2020] [Accepted: 04/03/2020] [Indexed: 12/22/2022]
Abstract
Over the last decade, gene set analysis has become the first choice for gaining insights into underlying complex biology of diseases through gene expression and gene association studies. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Although gene set analysis approaches are extensively used in gene expression and genome wide association data analysis, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. In this article, we provide a comprehensive overview, statistical structure and steps of gene set analysis approaches used for microarrays, RNA-sequencing and genome wide association data analysis. Further, we also classify the gene set analysis approaches and tools by the type of genomic study, null hypothesis, sampling model and nature of the test statistic, etc. Rather than reviewing the gene set analysis approaches individually, we provide the generation-wise evolution of such approaches for microarrays, RNA-sequencing and genome wide association studies and discuss their relative merits and limitations. Here, we identify the key biological and statistical challenges in current gene set analysis, which will be addressed by statisticians and biologists collectively in order to develop the next generation of gene set analysis approaches. Further, this study will serve as a catalog and provide guidelines to genome researchers and experimental biologists for choosing the proper gene set analysis approach based on several factors.
Collapse
|
37
|
Pfeil J, Sanders LM, Anastopoulos I, Lyle AG, Weinstein AS, Xue Y, Blair A, Beale HC, Lee A, Leung SG, Dinh PT, Shah AT, Breese MR, Devine WP, Bjork I, Salama SR, Sweet-Cordero EA, Haussler D, Vaske OM. Hydra: A mixture modeling framework for subtyping pediatric cancer cohorts using multimodal gene expression signatures. PLoS Comput Biol 2020; 16:e1007753. [PMID: 32275708 PMCID: PMC7176284 DOI: 10.1371/journal.pcbi.1007753] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Revised: 04/22/2020] [Accepted: 02/28/2020] [Indexed: 01/21/2023] Open
Abstract
Precision oncology has primarily relied on coding mutations as biomarkers of response to therapies. While transcriptome analysis can provide valuable information, incorporation into workflows has been difficult. For example, the relative rather than absolute gene expression level needs to be considered, requiring differential expression analysis across samples. However, expression programs related to the cell-of-origin and tumor microenvironment effects confound the search for cancer-specific expression changes. To address these challenges, we developed an unsupervised clustering approach for discovering differential pathway expression within cancer cohorts using gene expression measurements. The hydra approach uses a Dirichlet process mixture model to automatically detect multimodally distributed genes and expression signatures without the need for matched normal tissue. We demonstrate that the hydra approach is more sensitive than widely-used gene set enrichment approaches for detecting multimodal expression signatures. Application of the hydra analysis framework to small blue round cell tumors (including rhabdomyosarcoma, synovial sarcoma, neuroblastoma, Ewing sarcoma, and osteosarcoma) identified expression signatures associated with changes in the tumor microenvironment. The hydra approach also identified an association between ATRX deletions and elevated immune marker expression in high-risk neuroblastoma. Notably, hydra analysis of all small blue round cell tumors revealed similar subtypes, characterized by changes to infiltrating immune and stromal expression signatures.
Collapse
Affiliation(s)
- Jacob Pfeil
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California, United States of America
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Lauren M. Sanders
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California, United States of America
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, California, United States of America
- Department of Molecular, Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Ioannis Anastopoulos
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California, United States of America
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - A. Geoffrey Lyle
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, California, United States of America
- Department of Molecular, Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Alana S. Weinstein
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California, United States of America
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Yuanqing Xue
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California, United States of America
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Andrew Blair
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California, United States of America
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Holly C. Beale
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, California, United States of America
- Department of Molecular, Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Alex Lee
- Department of Pediatrics, Division of Hematology and Oncology, University of California, San Francisco, San Francisco, California, United States of America
| | - Stanley G. Leung
- Department of Pediatrics, Division of Hematology and Oncology, University of California, San Francisco, San Francisco, California, United States of America
| | - Phuong T. Dinh
- Department of Pediatrics, Division of Hematology and Oncology, University of California, San Francisco, San Francisco, California, United States of America
| | - Avanthi Tayi Shah
- Department of Pediatrics, Division of Hematology and Oncology, University of California, San Francisco, San Francisco, California, United States of America
| | - Marcus R. Breese
- Department of Pediatrics, Division of Hematology and Oncology, University of California, San Francisco, San Francisco, California, United States of America
| | - W. Patrick Devine
- Department of Anatomic Pathology, University of California, San Francisco, California, San Francisco, United States of America
| | - Isabel Bjork
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Sofie R. Salama
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California, United States of America
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, California, United States of America
- Howard Hughes Medical Institute, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - E. Alejandro Sweet-Cordero
- Department of Pediatrics, Division of Hematology and Oncology, University of California, San Francisco, San Francisco, California, United States of America
| | - David Haussler
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California, United States of America
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, California, United States of America
- Howard Hughes Medical Institute, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Olena Morozova Vaske
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, California, United States of America
- Department of Molecular, Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, California, United States of America
| |
Collapse
|
38
|
Yuan K, Feng Y, Wang H, Zhao L, Wang W, Wang T, Feng Y, Huang G, Xu A. FGL2 is positively correlated with enhanced antitumor responses mediated by T cells in lung adenocarcinoma. PeerJ 2020; 8:e8654. [PMID: 32206449 PMCID: PMC7075367 DOI: 10.7717/peerj.8654] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 01/28/2020] [Indexed: 12/17/2022] Open
Abstract
Lung cancer is the most common malignant tumor, accounting for 25% of cancer-related deaths and 14% of new cancers worldwide. Lung adenocarcinoma is the most common type of pulmonary cancer. Although there have been some improvements in the traditional therapy of lung cancer, the outcome and prognosis of patients remain poor. Lung cancer is the leading cause of cancer-related deaths worldwide, with 1.8 million new cases being diagnosed each year. Precision medicine based on genetic alterations is considered a new strategy of lung cancer treatment that requires highly specific biomarkers for precision diagnosis and treatment. Fibrinogen-like protein 2 (FGL2) plays important roles in both innate and adaptive immunity. However, the diagnostic value of FGL2 in lung cancer is largely unknown. In this study, we systematically investigated the expression profile and potential functions of FGL2 in lung adenocarcinoma. We used the TCGA and Oncomine datasets to compare the FGL2 expression levels between lung adenocarcinoma and adjacent normal tissues. We utilized the GEPIA, PrognoScan and Kaplan-Meier plotter databases to analyze the relationship between FGL2 expression and the survival of lung adenocarcinoma patients. Then, we investigated the potential roles of FGL2 in lung adenocarcinoma with the TIMER database and functional enrichment analyses. We found that FGL2 expression was significantly lower in lung adenocarcinoma tissue compared with adjacent normal tissue. A high expression level of FGL2 was correlated with better prognostic outcomes of lung adenocarcinoma patients, including overall survival and progression-free survival. FGL2 was positively correlated with the infiltration of immune cells, including dendritic cells, CD8+ T cells, macrophages, B cells, and CD4+ T cells, in lung adenocarcinoma. Functional enrichment analyses also showed that a high expression level of FGL2 was positively correlated with enhanced T cell activities, especially CD8+ T cell activation. Thus, we propose that high FGL2 expression, which is positively associated with enhanced antitumor activities mediated by T cells, is a beneficial marker for lung adenocarcinoma treatment outcomes.
Collapse
Affiliation(s)
- Kai Yuan
- School of Life Sciences, Beijing University of Chinese Medicine, Beijing, Beijing, China
- Division of Interdisciplinary Medicine and Biotechnology, Department of Medicine, Beth Israel Deaconess Medical Center/Harvard Medical School, Boston, MA, United States of America
| | - Yanyan Feng
- School of Life Sciences, Beijing University of Chinese Medicine, Beijing, Beijing, China
| | - Hesong Wang
- School of Life Sciences, Beijing University of Chinese Medicine, Beijing, Beijing, China
| | - Lu Zhao
- School of Life Sciences, Beijing University of Chinese Medicine, Beijing, Beijing, China
| | - Wei Wang
- School of Life Sciences, Beijing University of Chinese Medicine, Beijing, Beijing, China
| | - Ting Wang
- School of Life Sciences, Beijing University of Chinese Medicine, Beijing, Beijing, China
| | - Yuyin Feng
- School of Life Sciences, Beijing University of Chinese Medicine, Beijing, Beijing, China
| | - Guangrui Huang
- School of Life Sciences, Beijing University of Chinese Medicine, Beijing, Beijing, China
| | - Anlong Xu
- School of Life Sciences, Beijing University of Chinese Medicine, Beijing, Beijing, China
- State Key Laboratory of Biocontrol, Department of Biochemistry, School of Life Sciences, Sun Yat-Sen (Zhongshan) University, Guangzhou, Guangdong, China
| |
Collapse
|
39
|
Chang HC, Chu CP, Lin SJ, Hsiao CK. Network hub-node prioritization of gene regulation with intra-network association. BMC Bioinformatics 2020; 21:101. [PMID: 32164570 PMCID: PMC7069025 DOI: 10.1186/s12859-020-3444-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 03/06/2020] [Indexed: 11/10/2022] Open
Abstract
Background To identify and prioritize the influential hub genes in a gene-set or biological pathway, most analyses rely on calculation of marginal effects or tests of statistical significance. These procedures may be inappropriate since hub nodes are common connection points and therefore may interact with other nodes more often than non-hub nodes do. Such dependence among gene nodes can be conjectured based on the topology of the pathway network or the correlation between them. Results Here we develop a pathway activity score incorporating the marginal (local) effects of gene nodes as well as intra-network affinity measures. This score summarizes the expression levels in a gene-set/pathway for each sample, with weights on local and network information, respectively. The score is next used to examine the impact of each node through a leave-one-out evaluation. To illustrate the procedure, two cancer studies, one involving RNA-Seq from breast cancer patients with high-grade ductal carcinoma in situ and one microarray expression data from ovarian cancer patients, are used to assess the performance of the procedure, and to compare with existing methods, both ones that do and do not take into consideration correlation and network information. The hub nodes identified by the proposed procedure in the two cancer studies are known influential genes; some have been included in standard treatments and some are currently considered in clinical trials for target therapy. The results from simulation studies show that when marginal effects are mild or weak, the proposed procedure can still identify causal nodes, whereas methods relying only on marginal effect size cannot. Conclusions The NetworkHub procedure proposed in this research can effectively utilize the network information in combination with local effects derived from marker values, and provide a useful and complementary list of recommendations for prioritizing causal hubs.
Collapse
Affiliation(s)
- Hung-Ching Chang
- Division of Biostatistics, Institute of Epidemiology and Preventive Medicine, National Taiwan University, No. 17, Xu-Zhou Road, Taipei, 10055, Taiwan
| | - Chiao-Pei Chu
- Division of Biostatistics, Institute of Epidemiology and Preventive Medicine, National Taiwan University, No. 17, Xu-Zhou Road, Taipei, 10055, Taiwan
| | - Shu-Ju Lin
- Institute of Statistical Science, Academia Sinica, Taipei, 11529, Taiwan
| | - Chuhsing Kate Hsiao
- Division of Biostatistics, Institute of Epidemiology and Preventive Medicine, National Taiwan University, No. 17, Xu-Zhou Road, Taipei, 10055, Taiwan. .,Bioinformatics and Biostatistics Core, Center of Genomic Medicine, National Taiwan University, Taipei, 10055, Taiwan.
| |
Collapse
|
40
|
Geistlinger L, Csaba G, Santarelli M, Ramos M, Schiffer L, Turaga N, Law C, Davis S, Carey V, Morgan M, Zimmer R, Waldron L. Toward a gold standard for benchmarking gene set enrichment analysis. Brief Bioinform 2020; 22:545-556. [PMID: 32026945 PMCID: PMC7820859 DOI: 10.1093/bib/bbz158] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2019] [Revised: 10/11/2019] [Accepted: 11/09/2019] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION Although gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected datasets and biological reasoning on the relevance of resulting enriched gene sets. RESULTS We develop an extensible framework for reproducible benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization and detection of relevant processes. This framework incorporates a curated compendium of 75 expression datasets investigating 42 human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods, identifying significant differences in runtime and applicability to RNA-seq data, fraction of enriched gene sets depending on the null hypothesis tested and recovery of the predefined relevance rankings. We make practical recommendations on how methods originally developed for microarray data can efficiently be applied to RNA-seq data, how to interpret results depending on the type of gene set test conducted and which methods are best suited to effectively prioritize gene sets with high phenotype relevance. AVAILABILITY http://bioconductor.org/packages/GSEABenchmarkeR. CONTACT ludwig.geistlinger@sph.cuny.edu.
Collapse
Affiliation(s)
- Ludwig Geistlinger
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY 10027, USA
| | - Gergely Csaba
- Institute for Implementation Science and Population Health, City University of New York, New York, NY 10027, USA
| | - Mara Santarelli
- Institute for Bioinformatics, Ludwig-Maximilians-Universität München, 80333 Munich, Germany
| | - Marcel Ramos
- Roswell Park Cancer Institute, Buffalo, NY 14203, USA
| | - Lucas Schiffer
- Graduate School of Arts and Sciences, Boston University, Boston, MA 02215, USA
| | - Nitesh Turaga
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia
| | - Charity Law
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Sean Davis
- Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA
| | | | | | | | - Levi Waldron
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY 10027, USA
| |
Collapse
|
41
|
Lauria A, Peirone S, Giudice MD, Priante F, Rajan P, Caselle M, Oliviero S, Cereda M. Identification of altered biological processes in heterogeneous RNA-sequencing data by discretization of expression profiles. Nucleic Acids Res 2020; 48:1730-1747. [PMID: 31889184 PMCID: PMC7038995 DOI: 10.1093/nar/gkz1208] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Revised: 12/05/2019] [Accepted: 12/17/2019] [Indexed: 12/31/2022] Open
Abstract
Heterogeneity is a fundamental feature of complex phenotypes. So far, genomic screenings have profiled thousands of samples providing insights into the transcriptome of the cell. However, disentangling the heterogeneity of these transcriptomic Big Data to identify defective biological processes remains challenging. Here we present GSECA, a method exploiting the bimodal behavior of RNA-sequencing gene expression profiles to identify altered gene sets in heterogeneous patient cohorts. Using simulated and experimental RNA-sequencing data sets, we show that GSECA provides higher performances than other available algorithms in detecting truly altered biological processes in large cohorts. Applied to 5941 samples from 14 different cancer types, GSECA correctly identified the alteration of the PI3K/AKT signaling pathway driven by the somatic loss of PTEN and verified the emerging role of PTEN in modulating immune-related processes. In particular, we showed that, in prostate cancer, PTEN loss appears to establish an immunosuppressive tumor microenvironment through the activation of STAT3, and low PTEN expression levels have a detrimental impact on patient disease-free survival. GSECA is available at https://github.com/matteocereda/GSECA.
Collapse
Affiliation(s)
- Andrea Lauria
- Department of Life Science and System Biology, Università degli Studi di Torino, via Accademia Albertina 13, 10123 Turin, Italy
- IIGM - Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, Candiolo (TO) 10060, Italy
| | - Serena Peirone
- IIGM - Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, Candiolo (TO) 10060, Italy
- Department of Physics and INFN, Università degli Studi di Torino, via P.Giuria 1, 10125 Turin, Italy
| | - Marco Del Giudice
- IIGM - Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, Candiolo (TO) 10060, Italy
- Candiolo Cancer Institute, FPO - IRCCS, Str. Prov.le 142, km 3.95, Candiolo (TO) 10060, Italy
| | - Francesca Priante
- IIGM - Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, Candiolo (TO) 10060, Italy
- Candiolo Cancer Institute, FPO - IRCCS, Str. Prov.le 142, km 3.95, Candiolo (TO) 10060, Italy
| | - Prabhakar Rajan
- Centre for Cell and Molecular Biology, Barts Cancer Institute, Cancer Research UK Barts Centre, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
- The Alan Turing Institute, British Library, 96 Euston Road, London, NW1 2DB, UK
| | - Michele Caselle
- Department of Physics and INFN, Università degli Studi di Torino, via P.Giuria 1, 10125 Turin, Italy
| | - Salvatore Oliviero
- Department of Life Science and System Biology, Università degli Studi di Torino, via Accademia Albertina 13, 10123 Turin, Italy
- IIGM - Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, Candiolo (TO) 10060, Italy
| | - Matteo Cereda
- IIGM - Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, Candiolo (TO) 10060, Italy
- Candiolo Cancer Institute, FPO - IRCCS, Str. Prov.le 142, km 3.95, Candiolo (TO) 10060, Italy
| |
Collapse
|
42
|
Park HW, Weiss ST. Understanding the Molecular Mechanisms of Asthma through Transcriptomics. ALLERGY, ASTHMA & IMMUNOLOGY RESEARCH 2020; 12:399-411. [PMID: 32141255 PMCID: PMC7061151 DOI: 10.4168/aair.2020.12.3.399] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Revised: 01/01/2020] [Accepted: 01/11/2020] [Indexed: 12/18/2022]
Abstract
The transcriptome represents the complete set of RNA transcripts that are produced by the genome under a specific circumstance or in a specific cell. High-throughput methods, including microarray and bulk RNA sequencing, as well as recent advances in biostatistics based on machine learning approaches provides a quick and effective way of identifying novel genes and pathways related to asthma, which is a heterogeneous disease with diverse pathophysiological mechanisms. In this manuscript, we briefly review how to analyze transcriptome data and then provide a summary of recent transcriptome studies focusing on asthma pathogenesis and asthma drug responses. Studies reviewed here are classified into 2 classes based on the tissues utilized: blood and airway cells.
Collapse
Affiliation(s)
- Heung Woo Park
- The Channing Division of Network Medicine, Department of Medicine, Brigham & Women's Hospital and Harvard Medical School, Boston, MA, USA.,Department of Internal Medicine, Seoul National University College of Medicine, Seoul, Korea
| | - Scott T Weiss
- The Channing Division of Network Medicine, Department of Medicine, Brigham & Women's Hospital and Harvard Medical School, Boston, MA, USA.,Partners Center for Personalized Medicine, Partners Health Care, Boston, MA, USA.
| |
Collapse
|
43
|
Zyla J, Marczyk M, Domaszewska T, Kaufmann SHE, Polanska J, Weiner J. Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms. Bioinformatics 2019; 35:5146-5154. [PMID: 31165139 PMCID: PMC6954644 DOI: 10.1093/bioinformatics/btz447] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Revised: 05/08/2019] [Accepted: 06/10/2019] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Analysis of gene set (GS) enrichment is an essential part of functional omics studies. Here, we complement the established evaluation metrics of GS enrichment algorithms with a novel approach to assess the practical reproducibility of scientific results obtained from GS enrichment tests when applied to related data from different studies. RESULTS We evaluated eight established and one novel algorithm for reproducibility, sensitivity, prioritization, false positive rate and computational time. In addition to eight established algorithms, we also included Coincident Extreme Ranks in Numerical Observations (CERNO), a flexible and fast algorithm based on modified Fisher P-value integration. Using real-world datasets, we demonstrate that CERNO is robust to ranking metrics, as well as sample and GS size. CERNO had the highest reproducibility while remaining sensitive, specific and fast. In the overall ranking Pathway Analysis with Down-weighting of Overlapping Genes, CERNO and over-representation analysis performed best, while CERNO and GeneSetTest scored high in terms of reproducibility. AVAILABILITY AND IMPLEMENTATION tmod package implementing the CERNO algorithm is available from CRAN (cran.r-project.org/web/packages/tmod/index.html) and an online implementation can be found at http://tmod.online/. The datasets analyzed in this study are widely available in the KEGGdzPathwaysGEO, KEGGandMetacoreDzPathwaysGEO R package and GEO repository. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Joanna Zyla
- Data Mining Group, Faculty of Automatic Control, Electronic and Computer Science, Institute of Automatic Control, Silesian University of Technology, Gliwice, Poland
- Department of Immunology, Max Planck Institute for Infection Biology, Berlin, Germany
| | - Michal Marczyk
- Data Mining Group, Faculty of Automatic Control, Electronic and Computer Science, Institute of Automatic Control, Silesian University of Technology, Gliwice, Poland
- Yale School of Medicine, Yale Cancer Center, New Haven, CT 06510, USA
| | - Teresa Domaszewska
- Department of Immunology, Max Planck Institute for Infection Biology, Berlin, Germany
| | - Stefan H E Kaufmann
- Department of Immunology, Max Planck Institute for Infection Biology, Berlin, Germany
| | - Joanna Polanska
- Data Mining Group, Faculty of Automatic Control, Electronic and Computer Science, Institute of Automatic Control, Silesian University of Technology, Gliwice, Poland
| | - January Weiner
- Department of Immunology, Max Planck Institute for Infection Biology, Berlin, Germany
| |
Collapse
|
44
|
Mandelboum S, Manber Z, Elroy-Stein O, Elkon R. Recurrent functional misinterpretation of RNA-seq data caused by sample-specific gene length bias. PLoS Biol 2019; 17:e3000481. [PMID: 31714939 PMCID: PMC6850523 DOI: 10.1371/journal.pbio.3000481] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Accepted: 10/08/2019] [Indexed: 11/19/2022] Open
Abstract
Data normalization is a critical step in RNA sequencing (RNA-seq) analysis, aiming to remove systematic effects from the data to ensure that technical biases have minimal impact on the results. Analyzing numerous RNA-seq datasets, we detected a prevalent sample-specific length effect that leads to a strong association between gene length and fold-change estimates between samples. This stochastic sample-specific effect is not corrected by common normalization methods, including reads per kilobase of transcript length per million reads (RPKM), Trimmed Mean of M values (TMM), relative log expression (RLE), and quantile and upper-quartile normalization. Importantly, we demonstrate that this bias causes recurrent false positive calls by gene-set enrichment analysis (GSEA) methods, thereby leading to frequent functional misinterpretation of the data. Gene sets characterized by markedly short genes (e.g., ribosomal protein genes) or long genes (e.g., extracellular matrix genes) are particularly prone to such false calls. This sample-specific length bias is effectively removed by the conditional quantile normalization (cqn) and EDASeq methods, which allow the integration of gene length as a sample-specific covariate. Consequently, using these normalization methods led to substantial reduction in GSEA false results while retaining true ones. In addition, we found that application of gene-set tests that take into account gene–gene correlations attenuates false positive rates caused by the length bias, but statistical power is reduced as well. Our results advocate the inspection and correction of sample-specific length biases as default steps in RNA-seq analysis pipelines and reiterate the need to account for intergene correlations when performing gene-set enrichment tests to lessen false interpretation of transcriptomic data. Analysis of numerous RNA-seq datasets reveals a recurrent sample-specific length bias that causes frequent false positive calls by gene-set enrichment analyses, leading to functional misinterpretation of the data. Its removal requires methods that allow the integration of gene length as sample-specific covariate.
Collapse
Affiliation(s)
- Shir Mandelboum
- School of Molecular Cell Biology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Zohar Manber
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Orna Elroy-Stein
- School of Molecular Cell Biology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
- * E-mail: (OE-S); (RE)
| | - Ran Elkon
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
- * E-mail: (OE-S); (RE)
| |
Collapse
|
45
|
Maleki F, Ovens K, McQuillan I, Kusalik AJ. Size matters: how sample size affects the reproducibility and specificity of gene set analysis. Hum Genomics 2019; 13:42. [PMID: 31639047 PMCID: PMC6805317 DOI: 10.1186/s40246-019-0226-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene set analysis is a well-established approach for interpretation of data from high-throughput gene expression studies. Achieving reproducible results is an essential requirement in such studies. One factor of a gene expression experiment that can affect reproducibility is the choice of sample size. However, choosing an appropriate sample size can be difficult, especially because the choice may be method-dependent. Further, sample size choice can have unexpected effects on specificity. RESULTS In this paper, we report on a systematic, quantitative approach to study the effect of sample size on the reproducibility of the results from 13 gene set analysis methods. We also investigate the impact of sample size on the specificity of these methods. Rather than relying on synthetic data, the proposed approach uses real expression datasets to offer an accurate and reliable evaluation. CONCLUSION Our findings show that, as a general pattern, the results of gene set analysis become more reproducible as sample size increases. However, the extent of reproducibility and the rate at which it increases vary from method to method. In addition, even in the absence of differential expression, some gene set analysis methods report a large number of false positives, and increasing sample size does not lead to reducing these false positives. The results of this research can be used when selecting a gene set analysis method from those available.
Collapse
Affiliation(s)
- Farhad Maleki
- Department of Computer Science, University of Saskatchewan, 110 Science Place, Saskatoon, Canada.
| | - Katie Ovens
- Department of Computer Science, University of Saskatchewan, 110 Science Place, Saskatoon, Canada
| | - Ian McQuillan
- Department of Computer Science, University of Saskatchewan, 110 Science Place, Saskatoon, Canada
| | - Anthony J Kusalik
- Department of Computer Science, University of Saskatchewan, 110 Science Place, Saskatoon, Canada
| |
Collapse
|
46
|
Glazko G, Zybailov B, Emmert-Streib F, Baranova A, Rahmatallah Y. Proteome-transcriptome alignment of molecular portraits achieved by self-contained gene set analysis: Consensus colon cancer subtypes case study. PLoS One 2019; 14:e0221444. [PMID: 31437237 PMCID: PMC6705791 DOI: 10.1371/journal.pone.0221444] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2019] [Accepted: 08/06/2019] [Indexed: 01/10/2023] Open
Abstract
Gene set analysis (GSA) has become the common methodology for analyzing transcriptomics data. However, self-contained GSA techniques are rarely, if ever, used for proteomics data analysis. Here we present a self-contained proteome level GSA of four consensus molecular subtypes (CMSs) previously established by transcriptome dissection of colon carcinoma specimens. Despite notable difference in structure of proteomics and transcriptomics data, many pathway-wide characteristic features of CMSs found at the mRNA level were reproduced at the protein level. In particular, CMS1 features show heavy involvement of immune system as well as the pathways related to mismatch repair, DNA replication and functioning of proteasome, while CMS4 tumors upregulate complement pathway and proteins participating in epithelial-to-mesenchymal transition (EMT). In addition, protein level GSA yielded a set of novel observations visible at the proteome, but not at the transcriptome level, including possible involvement of major histocompatibility complex II (MHC-II) antigens in the known immunogenicity of CMS1 and a connection between cholesterol trafficking and the regulation of Integrin-linked kinase (ILK) in CMS3. Overall, this study proves utility of self-contained GSA approaches as a critical tool for analyzing proteomics data in general and dissecting protein-level molecular portraits of human tumors in particular.
Collapse
Affiliation(s)
- Galina Glazko
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, United States of America
| | - Boris Zybailov
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR, United States of America
| | - Frank Emmert-Streib
- Computational Medicine and Statistical Learning Laboratory, Tampere University of Technology, Korkeakoulunkatu, Tampere, Finland FI
| | - Ancha Baranova
- School of Systems Biology, George Mason University, Manassas VA, United States of America
- Research Center for Medical Genetics, Moscow, Russia
| | - Yasir Rahmatallah
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, United States of America
| |
Collapse
|
47
|
Abstract
The aim of this study was toexplore the long non-coding RNA (lncRNA) expression pattern of non-small cell lung cancer (NSCLC) on a genome-wide scale and investigate their potential biological function in NSCLC.LncRNAs were investigated in 6 pairs of NSCLC and matched adjacent non-tumor lung tissues (NTL) by microarray. A validation cohort was obtained from The Cancer Genome Atlas (TCGA) database and the effect of LINC01614 on diagnosis and prognosis in NSCLC was analyzed. Gene set enrichment analysis (GSEA) was used to predict the potential molecular mechanism of LINC01614, one identified lncRNA.A total of 1392 differentially expressed lncRNAs were identified. LINC01614 was the most aberrantly expressed lncRNA in NSCLC compared with NTL. We confirmed the significantly upregulated LINC01614 in NSCLC patients from TCGA database. Furthermore, in TCGA database, LINC01614 was significantly upregulated in both adenocarcinoma and squamous cell carcinoma. And high expression of LINC01614 indicated poor overall survival of NSCLC patients. A sensitivity of 93% was calculated conditional on a high specificity of 95% for the discrimination of NSCLC tissues from normal tissues. Furthermore, the expression levels of LINC01614 were associated with the stage of tumor, but had no relationship with age and sex. Additionally, GSEA found that LINC01614 might be involved in TGF-β-, P53-, IGF-IR-mediated, Wnt and RTK/Ras/MAPK signaling pathways.lncRNAs may play key roles in the development of NSCLC. LINC01614 is the most aberrantly expressed lncRNA in NSCLC tissues in our experiment and is also significantly differentially expressed in NSCLC patients from TCGA database. LINC01614 could be a prognostic indicator and has the potential to be a diagnostic biomarker of NSCLC.
Collapse
|
48
|
Abstract
BACKGROUND Set enrichment methods are commonly used to analyze high-dimensional molecular data and gain biological insight into molecular or clinical phenotypes. One important category of analysis methods employs an enrichment score, which is created from ranked univariate correlations between phenotype and each molecular attribute. Estimates of the significance of the associations are determined via a null distribution generated from phenotype permutation. We investigate some statistical properties of this method and demonstrate how alternative assessments of enrichment can be used to increase the statistical power of such analyses to detect associations between phenotype and biological processes and pathways. RESULTS For this category of set enrichment analysis, the null distribution is largely independent of the number of samples with available molecular data. Hence, providing the sample cohort is not too small, we show that increased statistical power to identify associations between biological processes and phenotype can be achieved by splitting the cohort into two halves and using the average of the enrichment scores evaluated for each half as an alternative test statistic. Further, we demonstrate that this principle can be extended by averaging over multiple random splits of the cohort into halves. This enables the calculation of an enrichment statistic and associated p value of arbitrary precision, independent of the exact random splits used. CONCLUSIONS It is possible to increase the statistical power of gene set enrichment analyses that employ enrichment scores created from running sums of univariate phenotype-attribute correlations and phenotype-permutation generated null distributions. This increase can be achieved by using alternative test statistics that average enrichment scores calculated for splits of the dataset. Apart from the special case of a close balance between up- and down-regulated genes within a gene set, statistical power can be improved, or at least maintained, by this method down to small sample sizes, where accurate assessment of univariate phenotype-gene correlations becomes unfeasible.
Collapse
|
49
|
Chen L, Toke NH, Luo S, Vasoya RP, Fullem RL, Parthasarathy A, Perekatt AO, Verzi MP. A reinforcing HNF4-SMAD4 feed-forward module stabilizes enterocyte identity. Nat Genet 2019; 51:777-785. [PMID: 30988513 PMCID: PMC6650150 DOI: 10.1038/s41588-019-0384-0] [Citation(s) in RCA: 92] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Accepted: 02/28/2019] [Indexed: 12/30/2022]
Abstract
BMP/SMAD signaling is a crucial regulator of intestinal differentiation1–4. However, the molecular underpinnings of the BMP pathway in this context are unknown. Here, we characterize the mechanism by which BMP/SMAD signaling drives enterocyte differentiation. We establish that the transcription factor HNF4A acts redundantly with an intestine-restricted HNF4 paralog, HNF4G, to activate enhancer chromatin and upregulate the majority of transcripts enriched in the differentiated epithelium; cells fail to differentiate upon double knockout of both HNF4 paralogs. Furthermore, we show that SMAD4 and HNF4 function via a reinforcing feed-forward loop, activating each other’s expression and co-binding to regulatory elements of differentiation genes. This feed-forward regulatory module promotes and stabilizes enterocyte cell identity; disruption of the HNF4-SMAD4 module results in loss of enterocyte fate in favor of progenitor and secretory cell lineages. This intersection of signaling and transcriptional control provides a framework to understand regenerative tissue homeostasis, particularly in tissues with inherent cellular plasticity5.
Collapse
Affiliation(s)
- Lei Chen
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, USA.,Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA
| | - Natalie H Toke
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, USA
| | - Shirley Luo
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, USA
| | - Roshan P Vasoya
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, USA
| | - Robert L Fullem
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, USA
| | - Aditya Parthasarathy
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, USA
| | - Ansu O Perekatt
- Department of Chemistry and Chemical Biology, Stevens Institute of Technology, Hoboken, NJ, USA
| | - Michael P Verzi
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, USA. .,Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA.
| |
Collapse
|
50
|
Qin W, Wang X, Zhao H, Lu H. A Novel Joint Gene Set Analysis Framework Improves Identification of Enriched Pathways in Cross Disease Transcriptomic Analysis. Front Genet 2019; 10:293. [PMID: 31031796 PMCID: PMC6473067 DOI: 10.3389/fgene.2019.00293] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 03/19/2019] [Indexed: 12/25/2022] Open
Abstract
Motivation: Gene set enrichment analysis is a widely accepted expression analysis tool which aims at detecting coordinated expression change within a pre-defined gene sets rather than individual genes. The benefit of gene set analysis over individual differentially expressed (DE) gene analysis includes more reproducible and interpretable results and detecting small but consistent change among gene set which could not be detected by DE gene analysis. There have been many successful gene set analysis applications in human diseases. However, when the sample size of a disease study is small and no other public data sets of the same disease are available, it will lead to lack of power to detect pathways of importance to the disease. Results: We have developed a novel joint gene set analysis statistical framework which aims at improving the power of identifying enriched gene sets through integrating multiple similar disease data sets. Through comprehensive simulation studies, we demonstrated that our proposed frameworks obtained much better AUC scores than single data set analysis and another meta-analysis method in identification of enriched pathways. When applied to two real data sets, the proposed framework could retain the enriched gene sets identified by single data set analysis and exclusively obtained up to 200% more disease-related gene sets demonstrating the improved identification power through information shared between similar diseases. We expect that the proposed framework would enable researchers to better explore public data sets when the sample size of their study is limited.
Collapse
Affiliation(s)
- Wenyi Qin
- Center for Biomedical Informatics, Shanghai Children's Hospital, Shanghai Jiaotong University, Shanghai, China
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, United States
- Department of Genetics, School of Medicine, Yale University, New Haven, CT, United States
| | - Xujun Wang
- Department of Bioinformatics and Biostatistics, SJTU-Yale Joint Center for Biostatistics, Shanghai Jiaotong University, Shanghai, China
| | - Hongyu Zhao
- Department of Bioinformatics and Biostatistics, SJTU-Yale Joint Center for Biostatistics, Shanghai Jiaotong University, Shanghai, China
- Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, United States
| | - Hui Lu
- Center for Biomedical Informatics, Shanghai Children's Hospital, Shanghai Jiaotong University, Shanghai, China
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, United States
- Department of Bioinformatics and Biostatistics, SJTU-Yale Joint Center for Biostatistics, Shanghai Jiaotong University, Shanghai, China
- Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, United States
| |
Collapse
|