Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Yi Y, Li C, Miller C, George AL. Strategy for encoding and comparison of gene expression signatures. Genome Biol 2008;8:R133. [PMID: 17612401 PMCID: PMC2323223 DOI: 10.1186/gb-2007-8-7-r133] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2007] [Revised: 06/13/2007] [Accepted: 07/05/2007] [Indexed: 11/10/2022] Open

For:	Yi Y, Li C, Miller C, George AL. Strategy for encoding and comparison of gene expression signatures. Genome Biol 2008;8:R133. [PMID: 17612401 PMCID: PMC2323223 DOI: 10.1186/gb-2007-8-7-r133] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2007] [Revised: 06/13/2007] [Accepted: 07/05/2007] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Jiang L, Qu S, Yu Z, Wang J, Liu X. MOASL: Predicting drug mechanism of actions through similarity learning with transcriptomic signature. Comput Biol Med 2024;169:107853. [PMID: 38104518 DOI: 10.1016/j.compbiomed.2023.107853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 11/02/2023] [Accepted: 12/11/2023] [Indexed: 12/19/2023]

Shah I, Bundy J, Chambers B, Everett LJ, Haggard D, Harrill J, Judson RS, Nyffeler J, Patlewicz G. Navigating Transcriptomic Connectivity Mapping Workflows to Link Chemicals with Bioactivities. Chem Res Toxicol 2022;35:1929-1949. [PMID: 36301716 PMCID: PMC10483698 DOI: 10.1021/acs.chemrestox.2c00245] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

Abstract

Screening new compounds for potential bioactivities against cellular targets is vital for drug discovery and chemical safety. Transcriptomics offers an efficient approach for assessing global gene expression changes, but interpreting chemical mechanisms from these data is often challenging. Connectivity mapping is a potential data-driven avenue for linking chemicals to mechanisms based on the observation that many biological processes are associated with unique gene expression signatures (gene signatures). However, mining the effects of a chemical on gene signatures for biological mechanisms is challenging because transcriptomic data contain thousands of noisy genes. New connectivity mapping approaches seeking to distinguish signal from noise continue to be developed, spurred by the promise of discovering chemical mechanisms, new drugs, and disease targets from burgeoning transcriptomic data. Here, we analyze these approaches in terms of diverse transcriptomic technologies, public databases, gene signatures, pattern-matching algorithms, and statistical evaluation criteria. To navigate the complexity of connectivity mapping, we propose a harmonized scheme to coherently organize and compare published workflows. We first standardize concepts underlying transcriptomic profiles and gene signatures based on various transcriptomic technologies such as microarrays, RNA-Seq, and L1000 and discuss the widely used data sources such as Gene Expression Omnibus, ArrayExpress, and MSigDB. Next, we generalize connectivity mapping as a pattern-matching task for finding similarity between a query (e.g., transcriptomic profile for new chemical) and a reference (e.g., gene signature of known target). Published pattern-matching approaches fall into two main categories: vector-based use metrics like correlation, Jaccard index, etc., and aggregation-based use parametric and nonparametric statistics (e.g., gene set enrichment analysis). The statistical methods for evaluating the performance of different approaches are described, along with comparisons reported in the literature on benchmark transcriptomic data sets. Lastly, we review connectivity mapping applications in toxicology and offer guidance on evaluating chemical-induced toxicity with concentration-response transcriptomic data. In addition to serving as a high-level guide and tutorial for understanding and implementing connectivity mapping workflows, we hope this review will stimulate new algorithms for evaluating chemical safety and drug discovery using transcriptomic data.

Collapse

Lin K, Li L, Dai Y, Wang H, Teng S, Bao X, Lu ZJ, Wang D. A comprehensive evaluation of connectivity methods for L1000 data. Brief Bioinform 2019;21:2194-2205. [DOI: 10.1093/bib/bbz129] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Revised: 08/26/2019] [Accepted: 09/14/2019] [Indexed: 01/08/2023] Open

Abstract Abstract The methodologies for evaluating similarities between gene expression profiles of different perturbagens are the key to understanding mechanisms of actions (MoAs) of unknown compounds and finding new indications for existing drugs. L1000-based next-generation Connectivity Map (CMap) data is more than a thousand-fold scale-up of the CMap pilot dataset. Although several systematic evaluations have been performed individually to assess the accuracy of the methodologies for the CMap pilot study, the performance of these methodologies needs to be re-evaluated for the L1000 data. Here, using the drug–drug similarities from the Drug Repurposing Hub database as a benchmark standard, we evaluated six popular published methods for the prediction performance of drug–drug relationships based on the partial area under the receiver operating characteristic (ROC) curve at false positive rates of 0.001, 0.005 and 0.01 (AUC0.001, AUC0.005 and AUC0.01). The similarity evaluating algorithm called ZhangScore was generally superior to other methods and exhibited the highest accuracy at the gene signature sizes ranging from 10 to 200. Further, we tested these methods with an experimentally derived gene signature related to estrogen in breast cancer cells, and the results confirmed that ZhangScore was more accurate than other methods. Moreover, based on scoring results of ZhangScore for the gene signature of TOP2A knockdown, in addition to well-known TOP2A inhibitors, we identified a number of potential inhibitors and at least two of them were the subject of previous investigation. Our studies provide potential guidelines for researchers to choose the suitable connectivity method. The six connectivity methods used in this report have been implemented in R package (https://github.com/Jasonlinchina/RCSM). Collapse

Keenan AB, Wojciechowicz ML, Wang Z, Jagodnik KM, Jenkins SL, Lachmann A, Ma'ayan A. Connectivity Mapping: Methods and Applications. Annu Rev Biomed Data Sci 2019. [DOI: 10.1146/annurev-biodatasci-072018-021211] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Ding Y, Li H, He X, Liao W, Yi Z, Yi J, Chen Z, Moore DJ, Yi Y, Xiang W. Identification of a gene-expression predictor for diagnosis and personalized stratification of lupus patients. PLoS One 2018;13:e0198325. [PMID: 29975701 PMCID: PMC6033382 DOI: 10.1371/journal.pone.0198325] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2017] [Accepted: 05/17/2018] [Indexed: 11/29/2022] Open

Xiao J, Blatti C, Sinha S. SigMat: a classification scheme for gene signature matching. Bioinformatics 2018;34:i547-i554. [PMID: 29950002 PMCID: PMC6022536 DOI: 10.1093/bioinformatics/bty251] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Abstract

Motivation

Several large-scale efforts have been made to collect gene expression signatures from a variety of biological conditions, such as response of cell lines to treatment with drugs, or tumor samples with different characteristics. These gene signature collections are utilized through bioinformatics tools for 'signature matching', whereby a researcher studying an expression profile can identify previously cataloged biological conditions most related to their profile. Signature matching tools typically retrieve from the collection the signature that has highest similarity to the user-provided profile. Alternatively, classification models may be applied where each biological condition in the signature collection is a class label; however, such models are trained on the collection of available signatures and may not generalize to the novel cellular context or cell line of the researcher's expression profile.

Results

We present an advanced multi-way classification algorithm for signature matching, called SigMat, that is trained on a large signature collection from a well-studied cellular context, but can also classify signatures from other cell types by relying on an additional, small collection of signatures representing the target cell type. It uses these 'tuning data' to learn two additional parameters that help adapt its predictions for other cellular contexts. SigMat outperforms other similarity scores and classification methods in identifying the correct label of a query expression profile from as many as 244 or 500 candidate classes (drug treatments) cataloged by the LINCS L1000 project. SigMat retains its high accuracy in cross-cell line applications even when the amount of tuning data is severely limited.

Availability and implementation

SigMat is available on GitHub at https://github.com/JinfengXiao/SigMat.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Laine JE, Bailey KA, Olshan AF, Smeester L, Drobná Z, Stýblo M, Douillet C, García-Vargas G, Rubio-Andrade M, Pathmasiri W, McRitchie S, Sumner SJ, Fry RC. Neonatal Metabolomic Profiles Related to Prenatal Arsenic Exposure. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2017;51:625-633. [PMID: 27997141 PMCID: PMC5460981 DOI: 10.1021/acs.est.6b04374] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]

Affiliation(s)

Jessica E. Laine Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina 27599, United States
Kathryn A. Bailey Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina 27599, United States
Andrew F. Olshan Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina 27599, United States
Lisa Smeester Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina 27599, United States
Zuzana Drobná Department of Biological Sciences, College of Sciences, North Carolina State University, Raleigh, North Carolina 27695, United States
Miroslav Stýblo Department of Nutrition, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina 27599, United States
Christelle Douillet Department of Nutrition, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina 27599, United States
Gonzalo García-Vargas Facultad de Medicina, Universidad Juarez del Estado de Durango, Gómez Palacio, Durango 35050, Mexico
Marisela Rubio-Andrade Facultad de Medicina, Universidad Juarez del Estado de Durango, Gómez Palacio, Durango 35050, Mexico
Wimal Pathmasiri RTI International, Research Triangle Park, North Carolina 27709, United States
Susan McRitchie RTI International, Research Triangle Park, North Carolina 27709, United States
Susan J. Sumner RTI International, Research Triangle Park, North Carolina 27709, United States
Rebecca C. Fry Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina 27599, United States

Collapse

Vora NL, Smeester L, Boggess K, Fry RC. Investigating the Role of Fetal Gene Expression in Preterm Birth. Reprod Sci 2016;24:824-828. [PMID: 27678095 DOI: 10.1177/1933719116670038] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Jung S, Bi Y, Davuluri RV. Evaluation of data discretization methods to derive platform independent isoform expression signatures for multi-class tumor subtyping. BMC Genomics 2015;16 Suppl 11:S3. [PMID: 26576613 PMCID: PMC4652565 DOI: 10.1186/1471-2164-16-s11-s3] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Abstract

BACKGROUND

Many supervised learning algorithms have been applied in deriving gene signatures for patient stratification from gene expression data. However, transferring the multi-gene signatures from one analytical platform to another without loss of classification accuracy is a major challenge. Here, we compared three unsupervised data discretization methods--Equal-width binning, Equal-frequency binning, and k-means clustering--in accurately classifying the four known subtypes of glioblastoma multiforme (GBM) when the classification algorithms were trained on the isoform-level gene expression profiles from exon-array platform and tested on the corresponding profiles from RNA-seq data.

RESULTS

We applied an integrated machine learning framework that involves three sequential steps; feature selection, data discretization, and classification. For models trained and tested on exon-array data, the addition of data discretization step led to robust and accurate predictive models with fewer number of variables in the final models. For models trained on exon-array data and tested on RNA-seq data, the addition of data discretization step dramatically improved the classification accuracies with Equal-frequency binning showing the highest improvement with more than 90% accuracies for all the models with features chosen by Random Forest based feature selection. Overall, SVM classifier coupled with Equal-frequency binning achieved the best accuracy (> 95%). Without data discretization, however, only 73.6% accuracy was achieved at most.

CONCLUSIONS

The classification algorithms, trained and tested on data from the same platform, yielded similar accuracies in predicting the four GBM subgroups. However, when dealing with cross-platform data, from exon-array to RNA-seq, the classifiers yielded stable models with highest classification accuracies on data transformed by Equal frequency binning. The approach presented here is generally applicable to other cancer types for classification and identification of molecular subgroups by integrating data across different gene expression platforms.

Collapse

Yi Y, Polosukhina D, Love HD, Hembd A, Pickup M, Moses HL, Lovvorn HN, Zent R, Clark PE. A Murine Model of K-RAS and β-Catenin Induced Renal Tumors Expresses High Levels of E2F1 and Resembles Human Wilms Tumor. J Urol 2015;194:1762-70. [PMID: 25934441 DOI: 10.1016/j.juro.2015.04.090] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/13/2015] [Indexed: 01/05/2023]

Ni M, Ye F, Zhu J, Li Z, Yang S, Yang B, Han L, Wu Y, Chen Y, Li F, Wang S, Bo X. ExpTreeDB: web-based query and visualization of manually annotated gene expression profiling experiments of human and mouse from GEO. ACTA ACUST UNITED AC 2014;30:3379-86. [PMID: 25152233 DOI: 10.1093/bioinformatics/btu560] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Affiliation(s)

Ming Ni Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
Fuqiang Ye Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
Juanjuan Zhu Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
Zongwei Li Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
Shuai Yang Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
Bite Yang Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
Lu Han Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
Yongge Wu Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
Ying Chen Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
Fei Li Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
Shengqi Wang Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China
Xiaochen Bo Beijing Institute of Radiation Medicine, Beijing 100850, College of Life Sciences, Jilin University, Changchun 130012 and Henan University of Traditional Chinese Medicine, Zhengzhou 450008, China

Collapse

Xiang Y, Qiu Q, Jiang M, Jin R, Lehmann BD, Strand DW, Jovanovic B, DeGraff DJ, Zheng Y, Yousif DA, Simmons CQ, Case TC, Yi J, Cates JM, Virostko J, He X, Jin X, Hayward SW, Matusik RJ, George AL, Yi Y. SPARCL1 suppresses metastasis in prostate cancer. Mol Oncol 2013;7:1019-30. [PMID: 23916135 DOI: 10.1016/j.molonc.2013.07.008] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2013] [Accepted: 07/09/2013] [Indexed: 01/08/2023] Open

A data similarity-based strategy for meta-analysis of transcriptional profiles in cancer. PLoS One 2013;8:e54979. [PMID: 23383020 PMCID: PMC3558433 DOI: 10.1371/journal.pone.0054979] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2012] [Accepted: 12/22/2012] [Indexed: 11/22/2022] Open

Abstract

Background

Robust transcriptional signatures in cancer can be identified by data similarity-driven meta-analysis of gene expression profiles. An unbiased data integration and interrogation strategy has not previously been available.

Methods and Findings

We implemented and performed a large meta-analysis of breast cancer gene expression profiles from 223 datasets containing 10,581 human breast cancer samples using a novel data similarity-based approach (iterative EXALT). Cancer gene expression signatures extracted from individual datasets were clustered by data similarity and consolidated into a meta-signature with a recurrent and concordant gene expression pattern. A retrospective survival analysis was performed to evaluate the predictive power of a novel meta-signature deduced from transcriptional profiling studies of human breast cancer. Validation cohorts consisting of 6,011 breast cancer patients from 21 different breast cancer datasets and 1,110 patients with other malignancies (lung and prostate cancer) were used to test the robustness of our findings. During the iterative EXALT analysis, 633 signatures were grouped by their data similarity and formed 121 signature clusters. From the 121 signature clusters, we identified a unique meta-signature (BRmet50) based on a cluster of 11 signatures sharing a phenotype related to highly aggressive breast cancer. In patients with breast cancer, there was a significant association between BRmet50 and disease outcome, and the prognostic power of BRmet50 was independent of common clinical and pathologic covariates. Furthermore, the prognostic value of BRmet50 was not specific to breast cancer, as it also predicted survival in prostate and lung cancers.

Conclusions

We have established and implemented a novel data similarity-driven meta-analysis strategy. Using this approach, we identified a transcriptional meta-signature (BRmet50) in breast cancer, and the prognostic performance of BRmet50 was robust and applicable across a wide range of cancer-patient populations.

Collapse

Kim J, Patel K, Jung H, Kuo WP, Ohno-Machado L. AnyExpress: integrated toolkit for analysis of cross-platform gene expression data using a fast interval matching algorithm. BMC Bioinformatics 2011;12:75. [PMID: 21410990 PMCID: PMC3076267 DOI: 10.1186/1471-2105-12-75] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2010] [Accepted: 03/17/2011] [Indexed: 12/04/2022] Open

Abstract

Background

Cross-platform analysis of gene express data requires multiple, intricate processes at different layers with various platforms. However, existing tools handle only a single platform and are not flexible enough to support custom changes, which arise from the new statistical methods, updated versions of reference data, and better platforms released every month or year. Current tools are so tightly coupled with reference information, such as reference genome, transcriptome database, and SNP, which are often erroneous or outdated, that the output results are incorrect and misleading.

Results

We developed AnyExpress, a software package that combines cross-platform gene expression data using a fast interval-matching algorithm. Supported platforms include next-generation-sequencing technology, microarray, SAGE, MPSS, and more. Users can define custom target transcriptome database references for probe/read mapping in any species, as well as criteria to remove undesirable probes/reads.

AnyExpress offers scalable processing features such as binding, normalization, and summarization that are not present in existing software tools.

As a case study, we applied AnyExpress to published Affymetrix microarray and Illumina NGS RNA-Seq data from human kidney and liver. The mean of within-platform correlation coefficient was 0.98 for within-platform samples in kidney and liver, respectively. The mean of cross-platform correlation coefficients was 0.73. These results confirmed those of the original and secondary studies. Applying filtering produced higher agreement between microarray and NGS, according to an agreement index calculated from differentially expressed genes.

Conclusion

AnyExpress can combine cross-platform gene expression data, process data from both open- and closed-platforms, select a custom target reference, filter out undesirable probes or reads based on custom-defined biological features, and perform quantile-normalization with a large number of microarray samples. AnyExpress is fast, comprehensive, flexible, and freely available at http://anyexpress.sourceforge.net.

Collapse

Baron D, Dubois E, Bihouée A, Teusan R, Steenman M, Jourdon P, Magot A, Péréon Y, Veitia R, Savagner F, Ramstein G, Houlgatte R. Meta-analysis of muscle transcriptome data using the MADMuscle database reveals biologically relevant gene patterns. BMC Genomics 2011;12:113. [PMID: 21324190 PMCID: PMC3049149 DOI: 10.1186/1471-2164-12-113] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2010] [Accepted: 02/16/2011] [Indexed: 12/12/2022] Open

Abstract

Background

DNA microarray technology has had a great impact on muscle research and microarray gene expression data has been widely used to identify gene signatures characteristic of the studied conditions. With the rapid accumulation of muscle microarray data, it is of great interest to understand how to compare and combine data across multiple studies. Meta-analysis of transcriptome data is a valuable method to achieve it. It enables to highlight conserved gene signatures between multiple independent studies. However, using it is made difficult by the diversity of the available data: different microarray platforms, different gene nomenclature, different species studied, etc.

Description

We have developed a system tool dedicated to muscle transcriptome data. This system comprises a collection of microarray data as well as a query tool. This latter allows the user to extract similar clusters of co-expressed genes from the database, using an input gene list. Common and relevant gene signatures can thus be searched more easily. The dedicated database consists in a large compendium of public data (more than 500 data sets) related to muscle (skeletal and heart). These studies included seven different animal species from invertebrates (Drosophila melanogaster, Caenorhabditis elegans) and vertebrates (Homo sapiens, Mus musculus, Rattus norvegicus, Canis familiaris, Gallus gallus). After a renormalization step, clusters of co-expressed genes were identified in each dataset. The lists of co-expressed genes were annotated using a unified re-annotation procedure. These gene lists were compared to find significant overlaps between studies.

Conclusions

Applied to this large compendium of data sets, meta-analyses demonstrated that conserved patterns between species could be identified. Focusing on a specific pathology (Duchenne Muscular Dystrophy) we validated results across independent studies and revealed robust biomarkers and new pathways of interest. The meta-analyses performed with MADMuscle show the usefulness of this approach. Our method can be applied to all public transcriptome data.

Collapse

Freudenberg JM, Sivaganesan S, Phatak M, Shinde K, Medvedovic M. Generalized random set framework for functional enrichment analysis using primary genomics datasets. ACTA ACUST UNITED AC 2010;27:70-7. [PMID: 20971985 DOI: 10.1093/bioinformatics/btq593] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]

Vazquez M, Nogales-Cadenas R, Arroyo J, Botías P, García R, Carazo JM, Tirado F, Pascual-Montano A, Carmona-Saez P. MARQ: an online tool to mine GEO for experiments with similar or opposite gene expression signatures. Nucleic Acids Res 2010;38:W228-32. [PMID: 20513648 PMCID: PMC2896165 DOI: 10.1093/nar/gkq476] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Ontology engineering. Nat Biotechnol 2010;28:128-30. [PMID: 20139945 DOI: 10.1038/nbt0210-128] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Reina-Pinto JJ, Voisin D, Teodor R, Yephremov A. Probing differentially expressed genes against a microarray database for in silico suppressor/enhancer and inhibitor/activator screens. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2010;61:166-75. [PMID: 19811619 DOI: 10.1111/j.1365-313x.2009.04043.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]

Wu J, Qiu Q, Xie L, Fullerton J, Yu J, Shyr Y, George AL, Yi Y. Web-based interrogation of gene expression signatures using EXALT. BMC Bioinformatics 2009;10:420. [PMID: 20003458 PMCID: PMC2799423 DOI: 10.1186/1471-2105-10-420] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2009] [Accepted: 12/14/2009] [Indexed: 12/31/2022] Open

Yi Y, Nandana S, Case T, Nelson C, Radmilovic T, Matusik RJ, Tsuchiya KD. Candidate metastasis suppressor genes uncovered by array comparative genomic hybridization in a mouse allograft model of prostate cancer. Mol Cytogenet 2009;2:18. [PMID: 19781100 PMCID: PMC2761934 DOI: 10.1186/1755-8166-2-18] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2009] [Accepted: 09/26/2009] [Indexed: 12/02/2022] Open

Yu Y, Tu K, Zheng S, Li Y, Ding G, Ping J, Hao P, Li Y. GEOGLE: context mining tool for the correlation between gene expression and the phenotypic distinction. BMC Bioinformatics 2009;10:264. [PMID: 19703314 PMCID: PMC2745391 DOI: 10.1186/1471-2105-10-264] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2009] [Accepted: 08/25/2009] [Indexed: 12/05/2022] Open