1
|
Tan J, Huyck M, Hu D, Zelaya RA, Hogan DA, Greene CS. ADAGE signature analysis: differential expression analysis with data-defined gene sets. BMC Bioinformatics 2017; 18:512. [PMID: 29166858 PMCID: PMC5700673 DOI: 10.1186/s12859-017-1905-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Accepted: 11/01/2017] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Gene set enrichment analysis and overrepresentation analyses are commonly used methods to determine the biological processes affected by a differential expression experiment. This approach requires biologically relevant gene sets, which are currently curated manually, limiting their availability and accuracy in many organisms without extensively curated resources. New feature learning approaches can now be paired with existing data collections to directly extract functional gene sets from big data. RESULTS Here we introduce a method to identify perturbed processes. In contrast with methods that use curated gene sets, this approach uses signatures extracted from public expression data. We first extract expression signatures from public data using ADAGE, a neural network-based feature extraction approach. We next identify signatures that are differentially active under a given treatment. Our results demonstrate that these signatures represent biological processes that are perturbed by the experiment. Because these signatures are directly learned from data without supervision, they can identify uncurated or novel biological processes. We implemented ADAGE signature analysis for the bacterial pathogen Pseudomonas aeruginosa. For the convenience of different user groups, we implemented both an R package (ADAGEpath) and a web server ( http://adage.greenelab.com ) to run these analyses. Both are open-source to allow easy expansion to other organisms or signature generation methods. We applied ADAGE signature analysis to an example dataset in which wild-type and ∆anr mutant cells were grown as biofilms on the Cystic Fibrosis genotype bronchial epithelial cells. We mapped active signatures in the dataset to KEGG pathways and compared with pathways identified using GSEA. The two approaches generally return consistent results; however, ADAGE signature analysis also identified a signature that revealed the molecularly supported link between the MexT regulon and Anr. CONCLUSIONS We designed ADAGE signature analysis to perform gene set analysis using data-defined functional gene signatures. This approach addresses an important gap for biologists studying non-traditional model organisms and those without extensive curated resources available. We built both an R package and web server to provide ADAGE signature analysis to the community.
Collapse
Affiliation(s)
- Jie Tan
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, 03755, USA
| | - Matthew Huyck
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, 19104, USA.,Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH, 03755, USA
| | - Dongbo Hu
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - René A Zelaya
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Deborah A Hogan
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH, 03755, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
2
|
Tan J, Doing G, Lewis KA, Price CE, Chen KM, Cady KC, Perchuk B, Laub MT, Hogan DA, Greene CS. Unsupervised Extraction of Stable Expression Signatures from Public Compendia with an Ensemble of Neural Networks. Cell Syst 2017; 5:63-71.e6. [PMID: 28711280 PMCID: PMC5532071 DOI: 10.1016/j.cels.2017.06.003] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2016] [Revised: 04/11/2017] [Accepted: 06/08/2017] [Indexed: 01/18/2023]
Abstract
Cross-experiment comparisons in public data compendia are challenged by unmatched conditions and technical noise. The ADAGE method, which performs unsupervised integration with denoising autoencoder neural networks, can identify biological patterns, but because ADAGE models, like many neural networks, are over-parameterized, different ADAGE models perform equally well. To enhance model robustness and better build signatures consistent with biological pathways, we developed an ensemble ADAGE (eADAGE) that integrated stable signatures across models. We applied eADAGE to a compendium of Pseudomonas aeruginosa gene expression profiling experiments performed in 78 media. eADAGE revealed a phosphate starvation response controlled by PhoB in media with moderate phosphate and predicted that a second stimulus provided by the sensor kinase, KinB, is required for this PhoB activation. We validated this relationship using both targeted and unbiased genetic approaches. eADAGE, which captures stable biological patterns, enables cross-experiment comparisons that can highlight measured but undiscovered relationships.
Collapse
Affiliation(s)
- Jie Tan
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Georgia Doing
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Kimberley A Lewis
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Courtney E Price
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Kathleen M Chen
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Kyle C Cady
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA; Howard Hughes Medical Institute, Cambridge, MA, USA
| | - Barret Perchuk
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA; Howard Hughes Medical Institute, Cambridge, MA, USA
| | - Michael T Laub
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA; Howard Hughes Medical Institute, Cambridge, MA, USA
| | - Deborah A Hogan
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
3
|
K1 and K15 of Kaposi's Sarcoma-Associated Herpesvirus Are Partial Functional Homologues of Latent Membrane Protein 2A of Epstein-Barr Virus. J Virol 2015; 89:7248-61. [PMID: 25948739 DOI: 10.1128/jvi.00839-15] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 04/27/2015] [Indexed: 12/22/2022] Open
Abstract
UNLABELLED The human herpesviruses Epstein-Barr virus (EBV) and Kaposi's sarcoma-associated herpesvirus (KSHV) are associated with Hodgkin's lymphoma (HL) and Primary effusion lymphomas (PEL), respectively, which are B cell malignancies that originate from germinal center B cells. PEL cells but also a quarter of EBV-positive HL tumor cells do not express the genuine B cell receptor (BCR), a situation incompatible with survival of normal B cells. EBV encodes LMP2A, one of EBV's viral latent membrane proteins, which likely replaces the BCR's survival signaling in HL. Whether KSHV encodes a viral BCR mimic that contributes to oncogenesis is not known because an experimental model of KSHV-mediated B cell transformation is lacking. We addressed this uncertainty with mutant EBVs encoding the KSHV genes K1 or K15 in lieu of LMP2A and infected primary BCR-negative (BCR(-)) human B cells with them. We confirmed that the survival of BCR(-) B cells and their proliferation depended on an active LMP2A signal. Like LMP2A, the expression of K1 and K15 led to the survival of BCR(-) B cells prone to apoptosis, supported their proliferation, and regulated a similar set of cellular target genes. K1 and K15 encoded proteins appear to have noncomplementing, redundant functions in this model, but our findings suggest that both KSHV proteins can replace LMP2A's key activities contributing to the survival, activation and proliferation of BCR(-) PEL cells in vivo. IMPORTANCE Several herpesviruses encode oncogenes that are receptor-like proteins. Often, they are constitutively active providing important functions to the latently infected cells. LMP2A of Epstein-Barr virus (EBV) is such a receptor that mimics an activated B cell receptor, BCR. K1 and K15, related receptors of Kaposi's sarcoma-associated herpesvirus (KSHV) expressed in virus-associated tumors, have less obvious functions. We found in infection experiments that both viral receptors of KSHV can replace LMP2A and deliver functions similar to the endogenous BCR. K1, K15, and LMP2A also control the expression of a related set of cellular genes in primary human B cells, the target cells of EBV and KSHV. The observed phenotypes, as well as the known characteristics of these genes, argue for their contributions to cellular survival, B cell activation, and proliferation. Our findings provide one possible explanation for the tumorigenicity of KSHV, which poses a severe problem in immunocompromised patients.
Collapse
|
4
|
Wexler EM, Rosen E, Lu D, Osborn GE, Martin E, Raybould H, Geschwind DH. Genome-wide analysis of a Wnt1-regulated transcriptional network implicates neurodegenerative pathways. Sci Signal 2012; 4:ra65. [PMID: 21971039 DOI: 10.1126/scisignal.2002282] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Wnt proteins are critical to mammalian brain development and function. The canonical Wnt signaling pathway involves the stabilization and nuclear translocation of β-catenin; however, Wnt also signals through alternative, noncanonical pathways. To gain a systems-level, genome-wide view of Wnt signaling, we analyzed Wnt1-stimulated changes in gene expression by transcriptional microarray analysis in cultured human neural progenitor (hNP) cells at multiple time points over a 72-hour time course. We observed a widespread oscillatory-like pattern of changes in gene expression, involving components of both the canonical and the noncanonical Wnt signaling pathways. A higher-order, systems-level analysis that combined independent component analysis, waveform analysis, and mutual information-based network construction revealed effects on pathways related to cell death and neurodegenerative disease. Wnt effectors were tightly clustered with presenilin1 (PSEN1) and granulin (GRN), which cause dominantly inherited forms of Alzheimer's disease and frontotemporal dementia (FTD), respectively. We further explored a potential link between Wnt1 and GRN and found that Wnt1 decreased GRN expression by hNPs. Conversely, GRN knockdown increased WNT1 expression, demonstrating that Wnt and GRN reciprocally regulate each other. Finally, we provided in vivo validation of the in vitro findings by analyzing gene expression data from individuals with FTD. These unbiased and genome-wide analyses provide evidence for a connection between Wnt signaling and the transcriptional regulation of neurodegenerative disease genes.
Collapse
Affiliation(s)
- Eric M Wexler
- Department of Psychiatry, David Geffen School of Medicine at University of California, Los Angeles, Los Angeles, CA 90024, USA.
| | | | | | | | | | | | | |
Collapse
|
5
|
An ensemble approach for inferring semi-quantitative regulatory dynamics for the differentiation of mouse embryonic stem cells using prior knowledge. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2012; 736:247-60. [PMID: 22161333 DOI: 10.1007/978-1-4419-7210-1_14] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The process of differentiation of embryonic stem cells (ESCs) is currently becoming the focus of many systems biologists not only due to mechanistic interest but also since it is expected to play an increasingly important role in regenerative medicine, in particular with the advert to induced pluripotent stem cells. These ESCs give rise to the formation of the three germ layers and therefore to the formation of all tissues and organs. Here, we present a computational method for inferring regulatory interactions between the genes involved in ESC differentiation based on time resolved microarray profiles. Fully quantitative methods are commonly unavailable on such large-scale data; on the other hand, purely qualitative methods may fail to capture some of the more detailed regulations. Our method combines the beneficial aspects of qualitative and quantitative (ODE-based) modeling approaches searching for quantitative interaction coefficients in a discrete and qualitative state space. We further optimize on an ensemble of networks to detect essential properties and compare networks with respect to robustness. Applied to a toy model our method is able to reconstruct the original network and outperforms an entire discrete boolean approach. In particular, we show that including prior knowledge leads to more accurate results. Applied to data from differentiating mouse ESCs reveals new regulatory interactions, in particular we confirm the activation of Foxh1 through Oct4, mediating Nodal signaling.
Collapse
|
6
|
Skrzypczak M, Goryca K, Rubel T, Paziewska A, Mikula M, Jarosz D, Pachlewski J, Oledzki J, Ostrowsk J. Modeling oncogenic signaling in colon tumors by multidirectional analyses of microarray data directed for maximization of analytical reliability. PLoS One 2010; 5. [PMID: 20957034 PMCID: PMC2948500 DOI: 10.1371/journal.pone.0013091] [Citation(s) in RCA: 270] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2010] [Accepted: 09/08/2010] [Indexed: 12/16/2022] Open
Abstract
Background Clinical progression of colorectal cancers (CRC) may occur in parallel with distinctive signaling alterations. We designed multidirectional analyses integrating microarray-based data with biostatistics and bioinformatics to elucidate the signaling and metabolic alterations underlying CRC development in the adenoma-carcinoma sequence. Methodology/Principal Findings Studies were performed on normal mucosa, adenoma, and carcinoma samples obtained during surgery or colonoscopy. Collections of cryostat sections prepared from the tissue samples were evaluated by a pathologist to control the relative cell type content. The measurements were done using Affymetrix GeneChip HG-U133plus2, and probe set data was generated using two normalization algorithms: MAS5.0 and GCRMA with least-variant set (LVS). The data was evaluated using pair-wise comparisons and data decomposition into singular value decomposition (SVD) modes. The method selected for the functional analysis used the Kolmogorov-Smirnov test. Expressional profiles obtained in 105 samples of whole tissue sections were used to establish oncogenic signaling alterations in progression of CRC, while those representing 40 microdissected specimens were used to select differences in KEGG pathways between epithelium and mucosa. Based on a consensus of the results obtained by two normalization algorithms, and two probe set sorting criteria, we identified 14 and 17 KEGG signaling and metabolic pathways that are significantly altered between normal and tumor samples and between benign and malignant tumors, respectively. Several of them were also selected from the raw microarray data of 2 recently published studies (GSE4183 and GSE8671). Conclusion/Significance Although the proposed strategy is computationally complex and labor–intensive, it may reduce the number of false results.
Collapse
Affiliation(s)
- Magdalena Skrzypczak
- Department of Gastroenterology and Hepatology, Medical Center for Postgraduate Education, Warsaw, Poland
| | - Krzysztof Goryca
- Department of Gastroenterology and Hepatology, Medical Center for Postgraduate Education, Warsaw, Poland
| | - Tymon Rubel
- Laboratory of Bioinformatics and Systems Biology, Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Warsaw, Poland
| | - Agnieszka Paziewska
- Department of Gastroenterology and Hepatology, Medical Center for Postgraduate Education, Warsaw, Poland
| | - Michal Mikula
- Department of Oncological Genetics, Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Warsaw, Poland
| | - Dorota Jarosz
- Department of Gastroenterology and Hepatology, Medical Center for Postgraduate Education, Warsaw, Poland
| | - Jacek Pachlewski
- Department of Gastroenterology and Hepatology, Medical Center for Postgraduate Education, Warsaw, Poland
| | - Janusz Oledzki
- Department of Colorectal Cancer, Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Warsaw, Poland
| | - Jerzy Ostrowsk
- Department of Gastroenterology and Hepatology, Medical Center for Postgraduate Education, Warsaw, Poland
- Department of Oncological Genetics, Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Warsaw, Poland
- * E-mail:
| |
Collapse
|
7
|
Engreitz JM, Daigle BJ, Marshall JJ, Altman RB. Independent component analysis: mining microarray data for fundamental human gene expression modules. J Biomed Inform 2010; 43:932-44. [PMID: 20619355 DOI: 10.1016/j.jbi.2010.07.001] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2009] [Revised: 06/28/2010] [Accepted: 07/02/2010] [Indexed: 11/28/2022]
Abstract
As public microarray repositories rapidly accumulate gene expression data, these resources contain increasingly valuable information about cellular processes in human biology. This presents a unique opportunity for intelligent data mining methods to extract information about the transcriptional modules underlying these biological processes. Modeling cellular gene expression as a combination of functional modules, we use independent component analysis (ICA) to derive 423 fundamental components of human biology from a 9395-array compendium of heterogeneous expression data. Annotation using the Gene Ontology (GO) suggests that while some of these components represent known biological modules, others may describe biology not well characterized by existing manually-curated ontologies. In order to understand the biological functions represented by these modules, we investigate the mechanism of the preclinical anti-cancer drug parthenolide (PTL) by analyzing the differential expression of our fundamental components. Our method correctly identifies known pathways and predicts that N-glycan biosynthesis and T-cell receptor signaling may contribute to PTL response. The fundamental gene modules we describe have the potential to provide pathway-level insight into new gene expression datasets.
Collapse
Affiliation(s)
- Jesse M Engreitz
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | | | | | | |
Collapse
|
8
|
Splettstoesser WD, Seibold E, Zeman E, Trebesius K, Podbielski A. Rapid differentiation of Francisella species and subspecies by fluorescent in situ hybridization targeting the 23S rRNA. BMC Microbiol 2010; 10:72. [PMID: 20205957 PMCID: PMC2844405 DOI: 10.1186/1471-2180-10-72] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2009] [Accepted: 03/08/2010] [Indexed: 11/18/2022] Open
Abstract
Background Francisella (F.) tularensis is the causative agent of tularemia. Due to its low infectious dose, ease of dissemination and high case fatality rate, F. tularensis was the subject in diverse biological weapons programs and is among the top six agents with high potential if misused in bioterrorism. Microbiological diagnosis is cumbersome and time-consuming. Methods for the direct detection of the pathogen (immunofluorescence, PCR) have been developed but are restricted to reference laboratories. Results The complete 23S rRNA genes of representative strains of F. philomiragia and all subspecies of F. tularensis were sequenced. Single nucleotide polymorphisms on species and subspecies level were confirmed by partial amplification and sequencing of 24 additional strains. Fluorescent In Situ Hybridization (FISH) assays were established using species- and subspecies-specific probes. Different FISH protocols allowed the positive identification of all 4 F. philomiragia strains, and more than 40 F. tularensis strains tested. By combination of different probes, it was possible to differentiate the F. tularensis subspecies holarctica, tularensis, mediasiatica and novicida. No cross reactivity with strains of 71 clinically relevant bacterial species was observed. FISH was also successfully applied to detect different F. tularensis strains in infected cells or tissue samples. In blood culture systems spiked with F. tularensis, bacterial cells of different subspecies could be separated within single samples. Conclusion We could show that FISH targeting the 23S rRNA gene is a rapid and versatile method for the identification and differentiation of F. tularensis isolates from both laboratory cultures and clinical samples.
Collapse
Affiliation(s)
- Wolf D Splettstoesser
- Bundeswehr Institute of Microbiology, German Reference Laboratory for Tularemia, Neuherbergstr 11, 80937 Munich, Germany.
| | | | | | | | | |
Collapse
|