1
|
Jiang L, Qu S, Yu Z, Wang J, Liu X. MOASL: Predicting drug mechanism of actions through similarity learning with transcriptomic signature. Comput Biol Med 2024; 169:107853. [PMID: 38104518 DOI: 10.1016/j.compbiomed.2023.107853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 11/02/2023] [Accepted: 12/11/2023] [Indexed: 12/19/2023]
Abstract
Understanding the mechanisms of actions (MOAs) of compounds is crucial in drug discovery. A common step in drug MOAs annotation is to query the dysregulated gene signatures induced by drugs in a reference library of pre-defined signatures. However, traditional similarity-based computational strategies face challenges when dealing with high-dimensional and noisy transcriptional signature data. To address this issue, we introduce MOASL (MOAs prediction via Similarity Learning), a novel approach that contrastive to learn similarity embeddings among signatures with shared MOAs automatically. We evaluated the accuracy of signature matching on various transcriptional activity score (TAS) datasets and individual cell lines by using MOASL. The results show MOASL achieved higher performance over several statistical and machine learning methods. Furthermore, we provided the rationale of our model by visualizing the signature annotation procedure. Using MOASL, the MOAs label of query signature could be conveniently defined by calculating the similarity between the query embedding and the reference embeddings. Finally, we applied MOASL to repurpose thousands of compounds as glucocorticoid receptor (GR) agonists, accurately identifying 8 out of the top 10 compounds. MOASL is conveniently accessible on GitHub at https://github.com/jianglikun/MOASL, empowering researchers and practitioners in the field of drug discovery to predict the MOAs of drug.
Collapse
Affiliation(s)
- Likun Jiang
- Department of Computer Science, Xiamen University, Xiamen 361005, PR China; National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, PR China
| | - Susu Qu
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, PR China; Chinese Institute for Brain Research, Beijing 102206, PR China
| | - Zhengqiu Yu
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, PR China; School of Medicine, Xiamen University, Xiamen 361005, PR China
| | - Jianmin Wang
- The Interdisciplinary Graduate Program in Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon 21983, South Korea
| | - Xiangrong Liu
- Department of Computer Science, Xiamen University, Xiamen 361005, PR China; National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, PR China.
| |
Collapse
|
2
|
Shah I, Bundy J, Chambers B, Everett LJ, Haggard D, Harrill J, Judson RS, Nyffeler J, Patlewicz G. Navigating Transcriptomic Connectivity Mapping Workflows to Link Chemicals with Bioactivities. Chem Res Toxicol 2022; 35:1929-1949. [PMID: 36301716 PMCID: PMC10483698 DOI: 10.1021/acs.chemrestox.2c00245] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Screening new compounds for potential bioactivities against cellular targets is vital for drug discovery and chemical safety. Transcriptomics offers an efficient approach for assessing global gene expression changes, but interpreting chemical mechanisms from these data is often challenging. Connectivity mapping is a potential data-driven avenue for linking chemicals to mechanisms based on the observation that many biological processes are associated with unique gene expression signatures (gene signatures). However, mining the effects of a chemical on gene signatures for biological mechanisms is challenging because transcriptomic data contain thousands of noisy genes. New connectivity mapping approaches seeking to distinguish signal from noise continue to be developed, spurred by the promise of discovering chemical mechanisms, new drugs, and disease targets from burgeoning transcriptomic data. Here, we analyze these approaches in terms of diverse transcriptomic technologies, public databases, gene signatures, pattern-matching algorithms, and statistical evaluation criteria. To navigate the complexity of connectivity mapping, we propose a harmonized scheme to coherently organize and compare published workflows. We first standardize concepts underlying transcriptomic profiles and gene signatures based on various transcriptomic technologies such as microarrays, RNA-Seq, and L1000 and discuss the widely used data sources such as Gene Expression Omnibus, ArrayExpress, and MSigDB. Next, we generalize connectivity mapping as a pattern-matching task for finding similarity between a query (e.g., transcriptomic profile for new chemical) and a reference (e.g., gene signature of known target). Published pattern-matching approaches fall into two main categories: vector-based use metrics like correlation, Jaccard index, etc., and aggregation-based use parametric and nonparametric statistics (e.g., gene set enrichment analysis). The statistical methods for evaluating the performance of different approaches are described, along with comparisons reported in the literature on benchmark transcriptomic data sets. Lastly, we review connectivity mapping applications in toxicology and offer guidance on evaluating chemical-induced toxicity with concentration-response transcriptomic data. In addition to serving as a high-level guide and tutorial for understanding and implementing connectivity mapping workflows, we hope this review will stimulate new algorithms for evaluating chemical safety and drug discovery using transcriptomic data.
Collapse
Affiliation(s)
- Imran Shah
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| | - Joseph Bundy
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| | - Bryant Chambers
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| | - Logan J. Everett
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| | - Derik Haggard
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| | - Joshua Harrill
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| | - Richard S. Judson
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| | - Johanna Nyffeler
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
- Oak Ridge Institute for Science and Education (ORISE) Postdoctoral Fellow, Oak Ridge, Tennessee, 37831, US
| | - Grace Patlewicz
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| |
Collapse
|
3
|
Lin K, Li L, Dai Y, Wang H, Teng S, Bao X, Lu ZJ, Wang D. A comprehensive evaluation of connectivity methods for L1000 data. Brief Bioinform 2019; 21:2194-2205. [DOI: 10.1093/bib/bbz129] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Revised: 08/26/2019] [Accepted: 09/14/2019] [Indexed: 01/08/2023] Open
Abstract
Abstract
The methodologies for evaluating similarities between gene expression profiles of different perturbagens are the key to understanding mechanisms of actions (MoAs) of unknown compounds and finding new indications for existing drugs. L1000-based next-generation Connectivity Map (CMap) data is more than a thousand-fold scale-up of the CMap pilot dataset. Although several systematic evaluations have been performed individually to assess the accuracy of the methodologies for the CMap pilot study, the performance of these methodologies needs to be re-evaluated for the L1000 data. Here, using the drug–drug similarities from the Drug Repurposing Hub database as a benchmark standard, we evaluated six popular published methods for the prediction performance of drug–drug relationships based on the partial area under the receiver operating characteristic (ROC) curve at false positive rates of 0.001, 0.005 and 0.01 (AUC0.001, AUC0.005 and AUC0.01). The similarity evaluating algorithm called ZhangScore was generally superior to other methods and exhibited the highest accuracy at the gene signature sizes ranging from 10 to 200. Further, we tested these methods with an experimentally derived gene signature related to estrogen in breast cancer cells, and the results confirmed that ZhangScore was more accurate than other methods. Moreover, based on scoring results of ZhangScore for the gene signature of TOP2A knockdown, in addition to well-known TOP2A inhibitors, we identified a number of potential inhibitors and at least two of them were the subject of previous investigation. Our studies provide potential guidelines for researchers to choose the suitable connectivity method. The six connectivity methods used in this report have been implemented in R package (https://github.com/Jasonlinchina/RCSM).
Collapse
Affiliation(s)
- Kequan Lin
- School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Lu Li
- School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Yifei Dai
- School of Medicine, Tsinghua University, Beijing 100084, China
| | - Huili Wang
- School of Medicine, Tsinghua University, Beijing 100084, China
| | - Shuaishuai Teng
- School of Medicine, Tsinghua University, Beijing 100084, China
| | - Xilinqiqige Bao
- International Mongolian Hospital of Inner Mongolia, Hohhot 010065, China
| | - Zhi John Lu
- School of Life Sciences, Tsinghua University, Beijing 100084, China
- Center of Synthetic & Systems Biology, Tsinghua University, Beijing 100084, China
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China
| | - Dong Wang
- School of Medicine, Tsinghua University, Beijing 100084, China
- Center of Synthetic & Systems Biology, Tsinghua University, Beijing 100084, China
- National Collaborative Innovation Center for Biotherapy, Tsinghua University, Beijing 100084, China
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| |
Collapse
|
4
|
Keenan AB, Wojciechowicz ML, Wang Z, Jagodnik KM, Jenkins SL, Lachmann A, Ma'ayan A. Connectivity Mapping: Methods and Applications. Annu Rev Biomed Data Sci 2019. [DOI: 10.1146/annurev-biodatasci-072018-021211] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Connectivity mapping resources consist of signatures representing changes in cellular state following systematic small-molecule, disease, gene, or other form of perturbations. Such resources enable the characterization of signatures from novel perturbations based on similarity; provide a global view of the space of many themed perturbations; and allow the ability to predict cellular, tissue, and organismal phenotypes for perturbagens. A signature search engine enables hypothesis generation by finding connections between query signatures and the database of signatures. This framework has been used to identify connections between small molecules and their targets, to discover cell-specific responses to perturbations and ways to reverse disease expression states with small molecules, and to predict small-molecule mimickers for existing drugs. This review provides a historical perspective and the current state of connectivity mapping resources with a focus on both methodology and community implementations.
Collapse
Affiliation(s)
- Alexandra B. Keenan
- Department of Pharmacological Sciences and Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Megan L. Wojciechowicz
- Department of Pharmacological Sciences and Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Zichen Wang
- Department of Pharmacological Sciences and Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Kathleen M. Jagodnik
- Department of Pharmacological Sciences and Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sherry L. Jenkins
- Department of Pharmacological Sciences and Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Alexander Lachmann
- Department of Pharmacological Sciences and Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Avi Ma'ayan
- Department of Pharmacological Sciences and Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
5
|
Ding Y, Li H, He X, Liao W, Yi Z, Yi J, Chen Z, Moore DJ, Yi Y, Xiang W. Identification of a gene-expression predictor for diagnosis and personalized stratification of lupus patients. PLoS One 2018; 13:e0198325. [PMID: 29975701 PMCID: PMC6033382 DOI: 10.1371/journal.pone.0198325] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2017] [Accepted: 05/17/2018] [Indexed: 11/29/2022] Open
Abstract
Systemic lupus erythematosus (SLE) is an autoimmune disease characterized by a wide spectrum of clinical manifestations and degrees of severity. Few genomic biomarkers for SLE have been validated and employed to inform clinical classifications and decisions. To discover and assess the gene-expression based SLE predictors in published studies, we performed a meta-analysis using our established signature database and a data similarity-driven strategy. From 13 training data sets on SLE gene-expression studies, we identified a SLE meta-signature (SLEmetaSig100) containing 100 concordant genes that are involved in DNA sensors and the IFN signaling pathway. We rigorously examined SLEmetaSig100 with both retrospective and prospective validation in two independent data sets. Using unsupervised clustering, we retrospectively elucidated that SLEmetaSig100 could classify clinical samples into two groups that correlated with SLE disease status and disease activities. More importantly, SLEmetaSig100 enabled personalized stratification demonstrating its ability to prospectively predict SLE disease at the individual patient level. To evaluate the performance of SLEmetaSig100 in predicting SLE, we predicted 1,171 testing samples to be either non-SLE or SLE with positive predictive value (97–99%), specificity (85%-84%), and sensitivity (60–84%). Our study suggests that SLEmetaSig100 has enhanced predictive value to facilitate current SLE clinical classification and provides personalized disease activity monitoring.
Collapse
Affiliation(s)
- Yan Ding
- Department of Dermatology, Hainan Provincial Dermatology Disease Hospital, Haikou, China
| | - Hongai Li
- Pediatrics, The Hainan Affiliated Hospital of University of South China, Haikou, China
| | - Xiaojie He
- Department of Nephropathy, Children’s Medical Center, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Wang Liao
- Department of Cardiology, Hainan General Hospital, Haikou, China
| | - Zhuwen Yi
- Department of Nephropathy, Children’s Medical Center, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Jia Yi
- Department of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, NC, United States of America
| | - Zhibin Chen
- Department of Microbiology and Immunology, University of Miami Miller School of Medicine, Miami, FL, United States of America
| | - Daniel J. Moore
- Departments of Pediatrics and Pathology, Microbiology, and Immunology, Vanderbilt University, Nashville, TN, United States of America
| | - Yajun Yi
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States of America
- * E-mail: (WX); (YY)
| | - Wei Xiang
- Department of Pediatrics, Maternal and Child Health Care Hospital of Hainan Province, Haikou, China
- * E-mail: (WX); (YY)
| |
Collapse
|
6
|
Abstract
Motivation Several large-scale efforts have been made to collect gene expression signatures from a variety of biological conditions, such as response of cell lines to treatment with drugs, or tumor samples with different characteristics. These gene signature collections are utilized through bioinformatics tools for 'signature matching', whereby a researcher studying an expression profile can identify previously cataloged biological conditions most related to their profile. Signature matching tools typically retrieve from the collection the signature that has highest similarity to the user-provided profile. Alternatively, classification models may be applied where each biological condition in the signature collection is a class label; however, such models are trained on the collection of available signatures and may not generalize to the novel cellular context or cell line of the researcher's expression profile. Results We present an advanced multi-way classification algorithm for signature matching, called SigMat, that is trained on a large signature collection from a well-studied cellular context, but can also classify signatures from other cell types by relying on an additional, small collection of signatures representing the target cell type. It uses these 'tuning data' to learn two additional parameters that help adapt its predictions for other cellular contexts. SigMat outperforms other similarity scores and classification methods in identifying the correct label of a query expression profile from as many as 244 or 500 candidate classes (drug treatments) cataloged by the LINCS L1000 project. SigMat retains its high accuracy in cross-cell line applications even when the amount of tuning data is severely limited. Availability and implementation SigMat is available on GitHub at https://github.com/JinfengXiao/SigMat. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jinfeng Xiao
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | - Charles Blatti
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL, USA
- To whom correspondence should be addressed.
| |
Collapse
|
7
|
Laine JE, Bailey KA, Olshan AF, Smeester L, Drobná Z, Stýblo M, Douillet C, García-Vargas G, Rubio-Andrade M, Pathmasiri W, McRitchie S, Sumner SJ, Fry RC. Neonatal Metabolomic Profiles Related to Prenatal Arsenic Exposure. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2017; 51:625-633. [PMID: 27997141 PMCID: PMC5460981 DOI: 10.1021/acs.est.6b04374] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
Prenatal inorganic arsenic (iAs) exposure is associated with health effects evident at birth and later in life. An understanding of the relationship between prenatal iAs exposure and alterations in the neonatal metabolome could reveal critical molecular modifications, potentially underpinning disease etiologies. In this study, nuclear magnetic resonance (NMR) spectroscopy-based metabolomic analysis was used to identify metabolites in neonate cord serum associated with prenatal iAs exposure in participants from the Biomarkers of Exposure to ARsenic (BEAR) pregnancy cohort, in Gómez Palacio, Mexico. Through multivariable linear regression, ten cord serum metabolites were identified as significantly associated with total urinary iAs and/or iAs metabolites, measured as %iAs, %monomethylated arsenicals (MMAs), and %dimethylated arsenicals (DMAs). A total of 17 metabolites were identified as significantly associated with total iAs and/or iAs metabolites in cord serum. These metabolites are indicative of changes in important biochemical pathways such as vitamin metabolism, the citric acid (TCA) cycle, and amino acid metabolism. These data highlight that maternal biotransformation of iAs and neonatal levels of iAs and its metabolites are associated with differences in neonate cord metabolomic profiles. The results demonstrate the potential utility of metabolites as biomarkers/indicators of in utero environmental exposure.
Collapse
Affiliation(s)
- Jessica E. Laine
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Kathryn A. Bailey
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Andrew F. Olshan
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Lisa Smeester
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Zuzana Drobná
- Department of Biological Sciences, College of Sciences, North Carolina State University, Raleigh, North Carolina 27695, United States
| | - Miroslav Stýblo
- Department of Nutrition, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Christelle Douillet
- Department of Nutrition, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Gonzalo García-Vargas
- Facultad de Medicina, Universidad Juarez del Estado de Durango, Gómez Palacio, Durango 35050, Mexico
| | - Marisela Rubio-Andrade
- Facultad de Medicina, Universidad Juarez del Estado de Durango, Gómez Palacio, Durango 35050, Mexico
| | - Wimal Pathmasiri
- RTI International, Research Triangle Park, North Carolina 27709, United States
| | - Susan McRitchie
- RTI International, Research Triangle Park, North Carolina 27709, United States
| | - Susan J. Sumner
- RTI International, Research Triangle Park, North Carolina 27709, United States
| | - Rebecca C. Fry
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
8
|
Vora NL, Smeester L, Boggess K, Fry RC. Investigating the Role of Fetal Gene Expression in Preterm Birth. Reprod Sci 2016; 24:824-828. [PMID: 27678095 DOI: 10.1177/1933719116670038] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Second-trimester amniotic fluid supernatant (AFS) contains cell-free fetal RNA (cffRNA) transcripts that can provide information about fetal gene expression. In a retrospective case-control study, we measured second-trimester fetal gene expression using cffRNA extracted from AFS in women who had spontaneous preterm birth (sPTB) <34 weeks and in women who delivered >37 weeks. We extracted cffRNA from AFS of women with singletons who had second-trimester genetic amniocenteses. Twenty-one gravidas who had sPTB and 21 term controls were matched 1:1 for maternal age, fetal sex, race, gestational age (GA) at the time of amniocentesis, and medication exposure. Cell-free fetal RNA was extracted and hybridized to a customized 65-gene NanoString panel containing genes related to oxidative stress, inflammation, and hypothalamic-pituitary-adrenal (HPA) axis and included 15 housekeeping genes. Two models were run, 1 examining sPTB in relation to case/control status and 1 examining sPTB in relation to GA as a continuous variable. Among cases, the gene expression of nitric oxide synthase 1 ( NOS1), d-aspartate oxidase ( DDO), and Beta-2-microglobulin ( B2M) was higher than controls ( P value < .05; false discovery rate-corrected Q value of ≤0.10). Nitric oxide synthase 1 and DDO are genes associated with oxidative stress; B2M is a marker of the fetal inflammatory response. Fetal HPA gene expression is not associated with GA at delivery or sPTB in second-trimester AFS. Alterations of fetal gene expression related to inflammation and oxidative stress antedate clinical symptoms and may be useful for early identification of patients at risk of having an sPTB.
Collapse
Affiliation(s)
- Neeta L Vora
- 1 Division of Maternal-Fetal Medicine, Department of Obstetrics & Gynecology, University of North Carolina School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Lisa Smeester
- 2 Department of Environmental Sciences and Engineering, UNC Gillings School of Global Public Health, Chapel Hill, NC, USA
| | - Kim Boggess
- 1 Division of Maternal-Fetal Medicine, Department of Obstetrics & Gynecology, University of North Carolina School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Rebecca C Fry
- 2 Department of Environmental Sciences and Engineering, UNC Gillings School of Global Public Health, Chapel Hill, NC, USA
| |
Collapse
|
9
|
Jung S, Bi Y, Davuluri RV. Evaluation of data discretization methods to derive platform independent isoform expression signatures for multi-class tumor subtyping. BMC Genomics 2015; 16 Suppl 11:S3. [PMID: 26576613 PMCID: PMC4652565 DOI: 10.1186/1471-2164-16-s11-s3] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Many supervised learning algorithms have been applied in deriving gene signatures for patient stratification from gene expression data. However, transferring the multi-gene signatures from one analytical platform to another without loss of classification accuracy is a major challenge. Here, we compared three unsupervised data discretization methods--Equal-width binning, Equal-frequency binning, and k-means clustering--in accurately classifying the four known subtypes of glioblastoma multiforme (GBM) when the classification algorithms were trained on the isoform-level gene expression profiles from exon-array platform and tested on the corresponding profiles from RNA-seq data. RESULTS We applied an integrated machine learning framework that involves three sequential steps; feature selection, data discretization, and classification. For models trained and tested on exon-array data, the addition of data discretization step led to robust and accurate predictive models with fewer number of variables in the final models. For models trained on exon-array data and tested on RNA-seq data, the addition of data discretization step dramatically improved the classification accuracies with Equal-frequency binning showing the highest improvement with more than 90% accuracies for all the models with features chosen by Random Forest based feature selection. Overall, SVM classifier coupled with Equal-frequency binning achieved the best accuracy (> 95%). Without data discretization, however, only 73.6% accuracy was achieved at most. CONCLUSIONS The classification algorithms, trained and tested on data from the same platform, yielded similar accuracies in predicting the four GBM subgroups. However, when dealing with cross-platform data, from exon-array to RNA-seq, the classifiers yielded stable models with highest classification accuracies on data transformed by Equal frequency binning. The approach presented here is generally applicable to other cancer types for classification and identification of molecular subgroups by integrating data across different gene expression platforms.
Collapse
Affiliation(s)
- Segun Jung
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Yingtao Bi
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Ramana V Davuluri
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| |
Collapse
|
10
|
Yi Y, Polosukhina D, Love HD, Hembd A, Pickup M, Moses HL, Lovvorn HN, Zent R, Clark PE. A Murine Model of K-RAS and β-Catenin Induced Renal Tumors Expresses High Levels of E2F1 and Resembles Human Wilms Tumor. J Urol 2015; 194:1762-70. [PMID: 25934441 DOI: 10.1016/j.juro.2015.04.090] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/13/2015] [Indexed: 01/05/2023]
Abstract
PURPOSE Wilms tumor is the most common renal neoplasm of childhood. We previously found that restricted activation of the WNT/β-catenin pathway in renal epithelium late in kidney development is sufficient to induce small primitive neoplasms with features of epithelial Wilms tumor. Metastatic disease progression required simultaneous addition of an activating mutation of the oncogene K-RAS. We sought to define the molecular pathways activated in this process and their relationship to human renal malignancies. MATERIALS AND METHODS Affymetrix® expression microarray data from murine kidneys with activation of K-ras and/or Ctnnb1 (β-catenin) restricted to renal epithelium were analyzed and compared to publicly available expression data on normal and neoplastic human renal tissue. Target genes were verified by immunoblot and immunohistochemistry. RESULTS Mouse kidney tumors with activation of K-ras and Ctnnb1, and human renal malignancies had similar mRNA expression signatures and were associated with activation of networks centered on β-catenin and TP53. Up-regulation of WNT/β-catenin targets (MYC, Survivin, FOXA2, Axin2 and Cyclin D1) was confirmed by immunoblot. K-RAS/β-catenin murine kidney tumors were more similar to human Wilms tumor than to other renal malignancies and demonstrated activation of a TP53 dependent network of genes, including the transcription factor E2F1. Up-regulation of E2F1 was confirmed in murine and human Wilms tumor samples. CONCLUSIONS Simultaneous activation of K-RAS and β-catenin in embryonic renal epithelium leads to neoplasms similar to human Wilms tumor and associated with activation of TP53 and up-regulation of E2F1. Further studies are warranted to evaluate the role of TP53 and E2F1 in human Wilms tumor.
Collapse
Affiliation(s)
- Yajun Yi
- Department of Urologic Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Dina Polosukhina
- Department of Urologic Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Harold D Love
- Department of Urologic Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Austin Hembd
- Department of Urologic Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Michael Pickup
- Department of Urologic Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Harold L Moses
- Department of Urologic Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Harold N Lovvorn
- Department of Urologic Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Roy Zent
- Department of Urologic Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Peter E Clark
- Department of Urologic Surgery, Vanderbilt University Medical Center, Nashville, Tennessee.
| |
Collapse
|
11
|
Ni M, Ye F, Zhu J, Li Z, Yang S, Yang B, Han L, Wu Y, Chen Y, Li F, Wang S, Bo X. ExpTreeDB: web-based query and visualization of manually annotated gene expression profiling experiments of human and mouse from GEO. ACTA ACUST UNITED AC 2014; 30:3379-86. [PMID: 25152233 DOI: 10.1093/bioinformatics/btu560] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
MOTIVATION Numerous public microarray datasets are valuable resources for the scientific communities. Several online tools have made great steps to use these data by querying related datasets with users' own gene signatures or expression profiles. However, dataset annotation and result exhibition still need to be improved. RESULTS ExpTreeDB is a database that allows for queries on human and mouse microarray experiments from Gene Expression Omnibus with gene signatures or profiles. Compared with similar applications, ExpTreeDB pays more attention to dataset annotations and result visualization. We introduced a multiple-level annotation system to depict and organize original experiments. For example, a tamoxifen-treated cell line experiment is hierarchically annotated as 'agent→drug→estrogen receptor antagonist→tamoxifen'. Consequently, retrieved results are exhibited by an interactive tree-structured graphics, which provide an overview for related experiments and might enlighten users on key items of interest. AVAILABILITY AND IMPLEMENTATION The database is freely available at http://biotech.bmi.ac.cn/ExpTreeDB. Web site is implemented in Perl, PHP, R, MySQL and Apache.
Collapse
Affiliation(s)
- Ming Ni
- Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
| | - Fuqiang Ye
- Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
| | - Juanjuan Zhu
- Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
| | - Zongwei Li
- Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
| | - Shuai Yang
- Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
| | - Bite Yang
- Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
| | - Lu Han
- Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
| | - Yongge Wu
- Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
| | - Ying Chen
- Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
| | - Fei Li
- Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
| | - Shengqi Wang
- Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
| | - Xiaochen Bo
- Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
| |
Collapse
|
12
|
Xiang Y, Qiu Q, Jiang M, Jin R, Lehmann BD, Strand DW, Jovanovic B, DeGraff DJ, Zheng Y, Yousif DA, Simmons CQ, Case TC, Yi J, Cates JM, Virostko J, He X, Jin X, Hayward SW, Matusik RJ, George AL, Yi Y. SPARCL1 suppresses metastasis in prostate cancer. Mol Oncol 2013; 7:1019-30. [PMID: 23916135 DOI: 10.1016/j.molonc.2013.07.008] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2013] [Accepted: 07/09/2013] [Indexed: 01/08/2023] Open
Abstract
PURPOSE Metastasis, the main cause of death from cancer, remains poorly understood at the molecular level. EXPERIMENTAL DESIGN Based on a pattern of reduced expression in human prostate cancer tissues and tumor cell lines, a candidate suppressor gene (SPARCL1) was identified. We used in vitro approaches to determine whether overexpression of SPARCL1 affects cell growth, migration, and invasiveness. We then employed xenograft mouse models to analyze the impact of SPARCL1 on prostate cancer cell growth and metastasis in vivo. RESULTS SPARCL1 expression did not inhibit tumor cell proliferation in vitro. By contrast, SPARCL1 did suppress tumor cell migration and invasiveness in vitro and tumor metastatic growth in vivo, conferring improved survival in xenograft mouse models. CONCLUSIONS We present the first in vivo data suggesting that SPARCL1 suppresses metastasis of prostate cancer.
Collapse
Affiliation(s)
- Yuzhu Xiang
- Department of Medicine, Vanderbilt University, Nashville, TN 37232-0275, USA; Minimally Invasive Urology Center, Provincial Hospital Affiliated to Shandong University, Jinan 250021, China.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
A data similarity-based strategy for meta-analysis of transcriptional profiles in cancer. PLoS One 2013; 8:e54979. [PMID: 23383020 PMCID: PMC3558433 DOI: 10.1371/journal.pone.0054979] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2012] [Accepted: 12/22/2012] [Indexed: 11/22/2022] Open
Abstract
Background Robust transcriptional signatures in cancer can be identified by data similarity-driven meta-analysis of gene expression profiles. An unbiased data integration and interrogation strategy has not previously been available. Methods and Findings We implemented and performed a large meta-analysis of breast cancer gene expression profiles from 223 datasets containing 10,581 human breast cancer samples using a novel data similarity-based approach (iterative EXALT). Cancer gene expression signatures extracted from individual datasets were clustered by data similarity and consolidated into a meta-signature with a recurrent and concordant gene expression pattern. A retrospective survival analysis was performed to evaluate the predictive power of a novel meta-signature deduced from transcriptional profiling studies of human breast cancer. Validation cohorts consisting of 6,011 breast cancer patients from 21 different breast cancer datasets and 1,110 patients with other malignancies (lung and prostate cancer) were used to test the robustness of our findings. During the iterative EXALT analysis, 633 signatures were grouped by their data similarity and formed 121 signature clusters. From the 121 signature clusters, we identified a unique meta-signature (BRmet50) based on a cluster of 11 signatures sharing a phenotype related to highly aggressive breast cancer. In patients with breast cancer, there was a significant association between BRmet50 and disease outcome, and the prognostic power of BRmet50 was independent of common clinical and pathologic covariates. Furthermore, the prognostic value of BRmet50 was not specific to breast cancer, as it also predicted survival in prostate and lung cancers. Conclusions We have established and implemented a novel data similarity-driven meta-analysis strategy. Using this approach, we identified a transcriptional meta-signature (BRmet50) in breast cancer, and the prognostic performance of BRmet50 was robust and applicable across a wide range of cancer-patient populations.
Collapse
|
14
|
Kim J, Patel K, Jung H, Kuo WP, Ohno-Machado L. AnyExpress: integrated toolkit for analysis of cross-platform gene expression data using a fast interval matching algorithm. BMC Bioinformatics 2011; 12:75. [PMID: 21410990 PMCID: PMC3076267 DOI: 10.1186/1471-2105-12-75] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2010] [Accepted: 03/17/2011] [Indexed: 12/04/2022] Open
Abstract
Background Cross-platform analysis of gene express data requires multiple, intricate processes at different layers with various platforms. However, existing tools handle only a single platform and are not flexible enough to support custom changes, which arise from the new statistical methods, updated versions of reference data, and better platforms released every month or year. Current tools are so tightly coupled with reference information, such as reference genome, transcriptome database, and SNP, which are often erroneous or outdated, that the output results are incorrect and misleading. Results We developed AnyExpress, a software package that combines cross-platform gene expression data using a fast interval-matching algorithm. Supported platforms include next-generation-sequencing technology, microarray, SAGE, MPSS, and more. Users can define custom target transcriptome database references for probe/read mapping in any species, as well as criteria to remove undesirable probes/reads. AnyExpress offers scalable processing features such as binding, normalization, and summarization that are not present in existing software tools. As a case study, we applied AnyExpress to published Affymetrix microarray and Illumina NGS RNA-Seq data from human kidney and liver. The mean of within-platform correlation coefficient was 0.98 for within-platform samples in kidney and liver, respectively. The mean of cross-platform correlation coefficients was 0.73. These results confirmed those of the original and secondary studies. Applying filtering produced higher agreement between microarray and NGS, according to an agreement index calculated from differentially expressed genes. Conclusion AnyExpress can combine cross-platform gene expression data, process data from both open- and closed-platforms, select a custom target reference, filter out undesirable probes or reads based on custom-defined biological features, and perform quantile-normalization with a large number of microarray samples. AnyExpress is fast, comprehensive, flexible, and freely available at http://anyexpress.sourceforge.net.
Collapse
Affiliation(s)
- Jihoon Kim
- Division of Biomedical Informatics, University of California, San Diego, CA, USA
| | | | | | | | | |
Collapse
|
15
|
Baron D, Dubois E, Bihouée A, Teusan R, Steenman M, Jourdon P, Magot A, Péréon Y, Veitia R, Savagner F, Ramstein G, Houlgatte R. Meta-analysis of muscle transcriptome data using the MADMuscle database reveals biologically relevant gene patterns. BMC Genomics 2011; 12:113. [PMID: 21324190 PMCID: PMC3049149 DOI: 10.1186/1471-2164-12-113] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2010] [Accepted: 02/16/2011] [Indexed: 12/12/2022] Open
Abstract
Background DNA microarray technology has had a great impact on muscle research and microarray gene expression data has been widely used to identify gene signatures characteristic of the studied conditions. With the rapid accumulation of muscle microarray data, it is of great interest to understand how to compare and combine data across multiple studies. Meta-analysis of transcriptome data is a valuable method to achieve it. It enables to highlight conserved gene signatures between multiple independent studies. However, using it is made difficult by the diversity of the available data: different microarray platforms, different gene nomenclature, different species studied, etc. Description We have developed a system tool dedicated to muscle transcriptome data. This system comprises a collection of microarray data as well as a query tool. This latter allows the user to extract similar clusters of co-expressed genes from the database, using an input gene list. Common and relevant gene signatures can thus be searched more easily. The dedicated database consists in a large compendium of public data (more than 500 data sets) related to muscle (skeletal and heart). These studies included seven different animal species from invertebrates (Drosophila melanogaster, Caenorhabditis elegans) and vertebrates (Homo sapiens, Mus musculus, Rattus norvegicus, Canis familiaris, Gallus gallus). After a renormalization step, clusters of co-expressed genes were identified in each dataset. The lists of co-expressed genes were annotated using a unified re-annotation procedure. These gene lists were compared to find significant overlaps between studies. Conclusions Applied to this large compendium of data sets, meta-analyses demonstrated that conserved patterns between species could be identified. Focusing on a specific pathology (Duchenne Muscular Dystrophy) we validated results across independent studies and revealed robust biomarkers and new pathways of interest. The meta-analyses performed with MADMuscle show the usefulness of this approach. Our method can be applied to all public transcriptome data.
Collapse
|
16
|
Freudenberg JM, Sivaganesan S, Phatak M, Shinde K, Medvedovic M. Generalized random set framework for functional enrichment analysis using primary genomics datasets. ACTA ACUST UNITED AC 2010; 27:70-7. [PMID: 20971985 DOI: 10.1093/bioinformatics/btq593] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
MOTIVATION Functional enrichment analysis using primary genomics datasets is an emerging approach to complement established methods for functional enrichment based on predefined lists of functionally related genes. Currently used methods depend on creating lists of 'significant' and 'non-significant' genes based on ad hoc significance cutoffs. This can lead to loss of statistical power and can introduce biases affecting the interpretation of experimental results. RESULTS We developed and validated a new statistical framework, generalized random set (GRS) analysis, for comparing the genomic signatures in two datasets without the need for gene categorization. In our tests, GRS produced correct measures of statistical significance, and it showed dramatic improvement in the statistical power over other methods currently used in this setting. We also developed a procedure for identifying genes driving the concordance of the genomics profiles and demonstrated a dramatic improvement in functional coherence of genes identified in such analysis. AVAILABILITY GRS can be downloaded as part of the R package CLEAN from http://ClusterAnalysis.org/. An online implementation is available at http://GenomicsPortals.org/.
Collapse
Affiliation(s)
- Johannes M Freudenberg
- Department of Environmental Health, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA
| | | | | | | | | |
Collapse
|
17
|
Vazquez M, Nogales-Cadenas R, Arroyo J, Botías P, García R, Carazo JM, Tirado F, Pascual-Montano A, Carmona-Saez P. MARQ: an online tool to mine GEO for experiments with similar or opposite gene expression signatures. Nucleic Acids Res 2010; 38:W228-32. [PMID: 20513648 PMCID: PMC2896165 DOI: 10.1093/nar/gkq476] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The enormous amount of data available in public gene expression repositories such as Gene Expression Omnibus (GEO) offers an inestimable resource to explore gene expression programs across several organisms and conditions. This information can be used to discover experiments that induce similar or opposite gene expression patterns to a given query, which in turn may lead to the discovery of new relationships among diseases, drugs or pathways, as well as the generation of new hypotheses. In this work, we present MARQ, a web-based application that allows researchers to compare a query set of genes, e.g. a set of over- and under-expressed genes, against a signature database built from GEO datasets for different organisms and platforms. MARQ offers an easy-to-use and integrated environment to mine GEO, in order to identify conditions that induce similar or opposite gene expression patterns to a given experimental condition. MARQ also includes additional functionalities for the exploration of the results, including a meta-analysis pipeline to find genes that are differentially expressed across different experiments. The application is freely available at http://marq.dacya.ucm.es.
Collapse
Affiliation(s)
- Miguel Vazquez
- Software Engineering Department, Facultad de Informatica, Universidad Complutense de Madrid, Madrid, Spain
| | | | | | | | | | | | | | | | | |
Collapse
|
18
|
|
19
|
Reina-Pinto JJ, Voisin D, Teodor R, Yephremov A. Probing differentially expressed genes against a microarray database for in silico suppressor/enhancer and inhibitor/activator screens. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2010; 61:166-75. [PMID: 19811619 DOI: 10.1111/j.1365-313x.2009.04043.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
High-density oligonucleotide arrays are widely used for analysis of gene expression on a genomic scale, but the generated data remain largely inaccessible for comparative analysis purposes. Similarity searches in databases with differentially expressed gene (DEG) lists may be used to assign potential functions to new genes and to identify potential chemical inhibitors/activators and genetic suppressors/enhancers. Although this is a very promising concept, it requires the compatibility and validity of the DEG lists to be significantly improved. Using Arabidopsis and human datasets, we have developed guidelines for the performance of similarity searches against databases that collect microarray data. We found that, in comparison with many other methods, a rank-product analysis achieves a higher degree of inter- and intra-laboratory consistency of DEG lists, and is advantageous for assessing similarities and differences between them. To support this concept, we developed a tool called MASTA (microarray overlap search tool and analysis), and re-analyzed over 600 Arabidopsis microarray expression datasets. This revealed that large-scale searches produce reliable intersections between DEG lists that prove to be useful for genetic analysis, thus aiding in the characterization of cellular and molecular mechanisms. We show that this approach can be used to discover unexpected connections and to illuminate unanticipated interactions between individual genes.
Collapse
Affiliation(s)
- José J Reina-Pinto
- Max-Planck-Institut für Züchtungsforschung, Carl-von-Linné-Weg 10, 50829 Köln, Germany
| | | | | | | |
Collapse
|
20
|
Wu J, Qiu Q, Xie L, Fullerton J, Yu J, Shyr Y, George AL, Yi Y. Web-based interrogation of gene expression signatures using EXALT. BMC Bioinformatics 2009; 10:420. [PMID: 20003458 PMCID: PMC2799423 DOI: 10.1186/1471-2105-10-420] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2009] [Accepted: 12/14/2009] [Indexed: 12/31/2022] Open
Abstract
Background Widespread use of high-throughput techniques such as microarrays to monitor gene expression levels has resulted in an explosive growth of data sets in public domains. Integration and exploration of these complex and heterogeneous data have become a major challenge. Results The EXALT (EXpression signature AnaLysis Tool) online program enables meta-analysis of gene expression profiles derived from publically accessible sources. Searches can be executed online against two large databases currently containing more than 28,000 gene expression signatures derived from GEO (Gene Expression Omnibus) and published expression profiles of human cancer. Comparisons among gene expression signatures can be performed with homology analysis and co-expression analysis. Results can be visualized instantly in a plot or a heat map. Three typical use cases are illustrated. Conclusions The EXALT online program is uniquely suited for discovering relationships among transcriptional profiles and searching gene expression patterns derived from diverse physiological and pathological settings. The EXALT online program is freely available for non-commercial users from http://seq.mc.vanderbilt.edu/exalt/.
Collapse
Affiliation(s)
- Jun Wu
- Department of Medicine, Vanderbilt University, Nashville, TN 37232-0275, USA
| | | | | | | | | | | | | | | |
Collapse
|
21
|
Yi Y, Nandana S, Case T, Nelson C, Radmilovic T, Matusik RJ, Tsuchiya KD. Candidate metastasis suppressor genes uncovered by array comparative genomic hybridization in a mouse allograft model of prostate cancer. Mol Cytogenet 2009; 2:18. [PMID: 19781100 PMCID: PMC2761934 DOI: 10.1186/1755-8166-2-18] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2009] [Accepted: 09/26/2009] [Indexed: 12/02/2022] Open
Abstract
Background The purpose of this study was to identify candidate metastasis suppressor genes from a mouse allograft model of prostate cancer (NE-10). This allograft model originally developed metastases by twelve weeks after implantation in male athymic nude mice, but lost the ability to metastasize after a number of in vivo passages. We performed high resolution array comparative genomic hybridization on the metastasizing and non-metastasizing allografts to identify chromosome imbalances that differed between the two groups of tumors. Results This analysis uncovered a deletion on chromosome 2 that differed between the metastasizing and non-metastasizing tumors. Bioinformatics filters were employed to mine this region of the genome for candidate metastasis suppressor genes. Of the 146 known genes that reside within the region of interest on mouse chromosome 2, four candidate metastasis suppressor genes (Slc27a2, Mall, Snrpb, and Rassf2) were identified. Quantitative expression analysis confirmed decreased expression of these genes in the metastasizing compared to non-metastasizing tumors. Conclusion This study presents combined genomics and bioinformatics approaches for identifying potential metastasis suppressor genes. The genes identified here are candidates for further studies to determine their functional role in inhibiting metastases in the NE-10 allograft model and human prostate cancer.
Collapse
Affiliation(s)
- Yajun Yi
- Clinical Research Division, Fred Hutchinson Cancer Research Center and Department of Laboratories, Seattle Children's Hospital, WA, USA.
| | | | | | | | | | | | | |
Collapse
|
22
|
Yu Y, Tu K, Zheng S, Li Y, Ding G, Ping J, Hao P, Li Y. GEOGLE: context mining tool for the correlation between gene expression and the phenotypic distinction. BMC Bioinformatics 2009; 10:264. [PMID: 19703314 PMCID: PMC2745391 DOI: 10.1186/1471-2105-10-264] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2009] [Accepted: 08/25/2009] [Indexed: 12/05/2022] Open
Abstract
Background In the post-genomic era, the development of high-throughput gene expression detection technology provides huge amounts of experimental data, which challenges the traditional pipelines for data processing and analyzing in scientific researches. Results In our work, we integrated gene expression information from Gene Expression Omnibus (GEO), biomedical ontology from Medical Subject Headings (MeSH) and signaling pathway knowledge from sigPathway entries to develop a context mining tool for gene expression analysis – GEOGLE. GEOGLE offers a rapid and convenient way for searching relevant experimental datasets, pathways and biological terms according to multiple types of queries: including biomedical vocabularies, GDS IDs, gene IDs, pathway names and signature list. Moreover, GEOGLE summarizes the signature genes from a subset of GDSes and estimates the correlation between gene expression and the phenotypic distinction with an integrated p value. Conclusion This approach performing global searching of expression data may expand the traditional way of collecting heterogeneous gene expression experiment data. GEOGLE is a novel tool that provides researchers a quantitative way to understand the correlation between gene expression and phenotypic distinction through meta-analysis of gene expression datasets from different experiments, as well as the biological meaning behind. The web site and user guide of GEOGLE are available at:
Collapse
Affiliation(s)
- Yao Yu
- Key Lab of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, PR China.
| | | | | | | | | | | | | | | |
Collapse
|