1
|
Zheng T, Ni Y, Li J, Chow BKC, Panagiotou G. Designing Dietary Recommendations Using System Level Interactomics Analysis and Network-Based Inference. Front Physiol 2017; 8:753. [PMID: 29033850 PMCID: PMC5625024 DOI: 10.3389/fphys.2017.00753] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Accepted: 09/19/2017] [Indexed: 12/14/2022] Open
Abstract
Background: A range of computational methods that rely on the analysis of genome-wide expression datasets have been developed and successfully used for drug repositioning. The success of these methods is based on the hypothesis that introducing a factor (in this case, a drug molecule) that could reverse the disease gene expression signature will lead to a therapeutic effect. However, it has also been shown that globally reversing the disease expression signature is not a prerequisite for drug activity. On the other hand, the basic idea of significant anti-correlation in expression profiles could have great value for establishing diet-disease associations and could provide new insights into the role of dietary interventions in disease. Methods: We performed an integrated analysis of publicly available gene expression profiles for foods, diseases and drugs, by calculating pairwise similarity scores for diet and disease gene expression signatures and characterizing their topological features in protein-protein interaction networks. Results: We identified 485 diet-disease pairs where diet could positively influence disease development and 472 pairs where specific diets should be avoided in a disease state. Multiple evidence suggests that orange, whey and coconut fat could be beneficial for psoriasis, lung adenocarcinoma and macular degeneration, respectively. On the other hand, fructose-rich diet should be restricted in patients with chronic intermittent hypoxia and ovarian cancer. Since humans normally do not consume foods in isolation, we also applied different algorithms to predict synergism; as a result, 58 food pairs were predicted. Interestingly, the diets identified as anti-correlated with diseases showed a topological proximity to the disease proteins similar to that of the corresponding drugs. Conclusions: In conclusion, we provide a computational framework for establishing diet-disease associations and additional information on the role of diet in disease development. Due to the complexity of analyzing the food composition and eating patterns of individuals our in silico analysis, using large-scale gene expression datasets and network-based topological features, may serve as a proof-of-concept in nutritional systems biology for identifying diet-disease relationships and subsequently designing dietary recommendations.
Collapse
Affiliation(s)
- Tingting Zheng
- Systems Biology and Bioinformatics Group, Faculty of Sciences, School of Biological Sciences, The University of HongKong, Hong Kong, Hong Kong
| | - Yueqiong Ni
- Systems Biology and Bioinformatics Group, Faculty of Sciences, School of Biological Sciences, The University of HongKong, Hong Kong, Hong Kong
| | - Jun Li
- Systems Biology and Bioinformatics Group, Faculty of Sciences, School of Biological Sciences, The University of HongKong, Hong Kong, Hong Kong
| | - Billy K C Chow
- Faculty of Science, School of Biological Sciences, The University of Hong Kong, Hong Kong, Hong Kong
| | - Gianni Panagiotou
- Systems Biology and Bioinformatics Group, Faculty of Sciences, School of Biological Sciences, The University of HongKong, Hong Kong, Hong Kong.,Department of Systems Biology and Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology, Hans Knöll Institute, Jena, Germany
| |
Collapse
|
2
|
Yang XH, Wang B, Cunningham JM. Identification of epigenetic modifications that contribute to pathogenesis in therapy-related AML: Effective integration of genome-wide histone modification with transcriptional profiles. BMC Med Genomics 2015; 8 Suppl 2:S6. [PMID: 26043758 PMCID: PMC4460748 DOI: 10.1186/1755-8794-8-s2-s6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Background Therapy-related, secondary acute myeloid leukemia (t-AML) is an increasingly frequent complication of intensive chemotherapy. This malignancy is often characterized by abnormalities of chromosome 7, including large deletions or chromosomal loss. A variety of studies suggest that decreased expression of the EZH2 gene located at 7q36.1 is critical in disease pathogenesis. This histone methyltransferase has been implicated in transcriptional repression through modifying histone H3 on lysine 27 (H3k27). However, the critical target genes of EZH2 and their regulatory roles remain unclear. Method To characterize the subset of EZH2 target genes that might contribute to t-AML pathogenesis, we developed a novel computational analysis to integrate tissue-specific histone modifications and genome-wide transcriptional regulation. Initial integrative analysis utilized a novel "seq2gene" strategy to explore largely the target genes of chromatin immuneprecipitation sequencing (ChIP-seq) enriched regions. By combining seq2gene with our Phenotype-Genotype-Network (PGNet) algorithm, we enriched genes with similar expression profiles and genomic or functional characteristics into "biomodules". Results Initial studies identified SEMA3A (semaphoring 3A) as a novel oncogenic candidate that is regulated by EZH2-silencing, using data derived from both normal and leukemic cell lines as well as murine cells deficient in EZH2. A microsatellite marker at the SEMA3A promoter has been associated with chemosensitivity and radiosensitivity. Notably, our subsequent studies in primary t-AML demonstrate an expected up-regulation of SEMA3A that is EZH2-modulated. Furthermore, we have identified three biomodules that are co-expressed with SEMA3A and up-regulated in t-AML, one of which consists of previously characterized EZH2-repressed gene targets. The other two biomodules include MAPK8 and TATA box targets. Together, our studies suggest an important role for EZH2 targets in t-AML pathogenesis that warrants further study. Conclusion These developed computational algorithms and systems biology strategies will enhance the knowledge discovery and hypothesis-driven analysis of multiple next generation sequencing data, for t-AML and other complex diseases.
Collapse
|
3
|
Vander Jagt CJ, Whitley JC, Cocks BG, Goddard ME. Gene expression in the mammary gland of the tammar wallaby during the lactation cycle reveals conserved mechanisms regulating mammalian lactation. Reprod Fertil Dev 2015; 28:RD14210. [PMID: 25701950 DOI: 10.1071/rd14210] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2014] [Accepted: 12/21/2014] [Indexed: 12/16/2022] Open
Abstract
The tammar wallaby (Macropus eugenii), an Australian marsupial, has evolved a different lactation strategy compared with eutherian mammals, making it a valuable comparative model for lactation studies. The tammar mammary gland was investigated for changes in gene expression during key stages of the lactation cycle using microarrays. Differentially regulated genes were identified, annotated and subsequent gene ontologies, pathways and molecular networks analysed. Major milk-protein gene expression changes during lactation were in accord with changes in milk-protein secretion. However, other gene expression changes included changes in genes affecting mRNA stability, hormone and cytokine signalling and genes for transport and metabolism of amino acids and lipids. Some genes with large changes in expression have poorly known roles in lactation. For instance, SIM2 was upregulated at lactation initiation and may inhibit proliferation and involution of mammary epithelial cells, while FUT8 was upregulated in Phase 3 of lactation and may support the large increase in milk volume that occurs at this point in the lactation cycle. This pattern of regulation has not previously been reported and suggests that these genes may play a crucial regulatory role in marsupial milk production and are likely to play a related role in other mammals.
Collapse
|
4
|
Lai L, Ge SX. Meta-analysis of gene expression signatures reveals hidden links among diverse biological processes in Arabidopsis. PLoS One 2014; 9:e108567. [PMID: 25398003 PMCID: PMC4232243 DOI: 10.1371/journal.pone.0108567] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2013] [Accepted: 09/01/2014] [Indexed: 11/29/2022] Open
Abstract
The model plant Arabidopsis has been well-studied using high-throughput genomics technologies, which usually generate lists of differentially expressed genes under various conditions. Our group recently collected 1065 gene lists from 397 gene expression studies as a knowledgebase for pathway analysis. Here we systematically analyzed these gene lists by computing overlaps in all-vs.-all comparisons. We identified 16,261 statistically significant overlaps, represented by an undirected network in which nodes correspond to gene lists and edges indicate significant overlaps. The network highlights the correlation across the gene expression signatures of the diverse biological processes. We also partitioned the main network into 20 sub-networks, representing groups of highly similar expression signatures. These are common sets of genes that were co-regulated under different treatments or conditions and are often related to specific biological themes. Overall, our result suggests that diverse gene expression signatures are highly interconnected in a modular fashion.
Collapse
Affiliation(s)
- Liming Lai
- Department of Mathematics and Statistics, South Dakota State University, Brookings, South Dakota, United States of America
| | - Steven X. Ge
- Department of Mathematics and Statistics, South Dakota State University, Brookings, South Dakota, United States of America
- * E-mail:
| |
Collapse
|
5
|
Abstract
Joint analyses of high-throughput datasets generate the need to assess the association between two long lists of p-values. In such p-value lists, the vast majority of the features are insignificant. Ideally contributions of features that are null in both tests should be minimized. However, by random chance their p-values are uniformly distributed between zero and one, and weak correlations of the p-values may exist due to inherent biases in the high-throughput technology used to generate the multiple datasets. Rank-based agreement test may capture such unwanted effects. Testing contingency tables generated using hard cutoffs may be sensitive to arbitrary threshold choice. We develop a novel method based on feature-level concordance using local false discovery rate. The association score enjoys straight-forward interpretation. The method shows higher statistical power to detect association between p-value lists in simulation. We demonstrate its utility using real data analysis. The R implementation of the method is available at http://userwww.service.emory.edu/~tyu8/AAPL/.
Collapse
Affiliation(s)
- Tianwei Yu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | | | | |
Collapse
|
6
|
Jurman G, Riccadonna S, Visintainer R, Furlanello C. Algebraic comparison of partial lists in bioinformatics. PLoS One 2012; 7:e36540. [PMID: 22615778 PMCID: PMC3355159 DOI: 10.1371/journal.pone.0036540] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2011] [Accepted: 04/06/2012] [Indexed: 12/20/2022] Open
Abstract
The outcome of a functional genomics pipeline is usually a partial list of genomic features, ranked by their relevance in modelling biological phenotype in terms of a classification or regression model. Due to resampling protocols or to a meta-analysis comparison, it is often the case that sets of alternative feature lists (possibly of different lengths) are obtained, instead of just one list. Here we introduce a method, based on permutations, for studying the variability between lists ("list stability") in the case of lists of unequal length. We provide algorithms evaluating stability for lists embedded in the full feature set or just limited to the features occurring in the partial lists. The method is demonstrated by finding and comparing gene profiles on a large prostate cancer dataset, consisting of two cohorts of patients from different countries, for a total of 455 samples.
Collapse
|
7
|
Jean D, Daubriac J, Le Pimpec-Barthes F, Galateau-Salle F, Jaurand MC. Molecular changes in mesothelioma with an impact on prognosis and treatment. Arch Pathol Lab Med 2012; 136:277-93. [PMID: 22372904 DOI: 10.5858/arpa.2011-0215-ra] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
CONTEXT In recent decades, research on malignant pleural mesothelioma (MPM) has been developed to improve patients' outcomes by increasing the level of confidence in MPM diagnosis and prognosis. OBJECTIVE To summarize data on genetic and epigenetic abnormalities in MPM that may be of interest for a better management of patients with MPM. DATA SOURCES Data were obtained from scientific publications on genetic and epigenetic abnormalities in MPM by studying gene mutations, DNA methylation, and gene and microRNA expression profiling. CONCLUSIONS Molecular changes in MPM consist in altered expression and in activation or inactivation of critical genes in oncogenesis, especially tumor suppressor genes at the INK4 and NF2 loci. Activation of membrane receptor tyrosine kinases and deregulation of signaling pathways related to differentiation, survival, proliferation, apoptosis, cell cycle control, metabolism, migration, and invasion have been demonstrated. Alterations that could be targeted at a global level (methylation) have been recently reported. Experimental research has succeeded especially in abolishing proliferation and triggering apoptosis in MPM cells. So far, targeted clinical approaches focusing on receptor tyrosine kinases have had limited success. Molecular analyses of series of MPM cases have shown that defined alterations are present in MPM subsets, consistent with interindividual variations of molecular alterations, and suggesting that identification of patient subgroups will be essential to develop more specific therapies.
Collapse
Affiliation(s)
- Didier Jean
- INSERM, U, Université Paris Descartes, UMR-S, Paris, France
| | | | | | | | | |
Collapse
|
8
|
Tseng GC, Ghosh D, Feingold E. Comprehensive literature review and statistical considerations for microarray meta-analysis. Nucleic Acids Res 2012; 40:3785-99. [PMID: 22262733 PMCID: PMC3351145 DOI: 10.1093/nar/gkr1265] [Citation(s) in RCA: 266] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
With the rapid advances of various high-throughput technologies, generation of ‘-omics’ data is commonplace in almost every biomedical field. Effective data management and analytical approaches are essential to fully decipher the biological knowledge contained in the tremendous amount of experimental data. Meta-analysis, a set of statistical tools for combining multiple studies of a related hypothesis, has become popular in genomic research. Here, we perform a systematic search from PubMed and manual collection to obtain 620 genomic meta-analysis papers, of which 333 microarray meta-analysis papers are summarized as the basis of this paper and the other 249 GWAS meta-analysis papers are discussed in the next companion paper. The review in the present paper focuses on various biological purposes of microarray meta-analysis, databases and software and related statistical procedures. Statistical considerations of such an analysis are further scrutinized and illustrated by a case study. Finally, several open questions are listed and discussed.
Collapse
Affiliation(s)
- George C Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA.
| | | | | |
Collapse
|
9
|
Dozmorov MG, Wren JD. High-throughput processing and normalization of one-color microarrays for transcriptional meta-analyses. BMC Bioinformatics 2011; 12 Suppl 10:S2. [PMID: 22166002 PMCID: PMC3236842 DOI: 10.1186/1471-2105-12-s10-s2] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Background Microarray experiments are becoming increasingly common in biomedical research, as is their deposition in publicly accessible repositories, such as Gene Expression Omnibus (GEO). As such, there has been a surge in interest to use this microarray data for meta-analytic approaches, whether to increase sample size for a more powerful analysis of a specific disease (e.g. lung cancer) or to re-examine experiments for reasons different than those examined in the initial, publishing study that generated them. For the average biomedical researcher, there are a number of practical barriers to conducting such meta-analyses such as manually aggregating, filtering and formatting the data. Methods to automatically process large repositories of microarray data into a standardized, directly comparable format will enable easier and more reliable access to microarray data to conduct meta-analyses. Methods We present a straightforward, simple but robust against potential outliers method for automatic quality control and pre-processing of tens of thousands of single-channel microarray data files. GEO GDS files are quality checked by comparing parametric distributions and quantile normalized to enable direct comparison of expression level for subsequent meta-analyses. Results 13,000 human 1-color experiments were processed to create a single gene expression matrix that subsets can be extracted from to conduct meta-analyses. Interestingly, we found that when conducting a global meta-analysis of gene-gene co-expression patterns across all 13,000 experiments to predict gene function, normalization had minimal improvement over using the raw data. Conclusions Normalization of microarray data appears to be of minimal importance on analyses based on co-expression patterns when the sample size is on the order of thousands microarray datasets. Smaller subsets, however, are more prone to aberrations and artefacts, and effective means of automating normalization procedures not only empowers meta-analytic approaches, but aids in reproducibility by providing a standard way of approaching the problem. Data availability: matrix containing normalized expression of 20,813 genes across 13,000 experiments is available for download at . Source code for GDS files pre-processing is available from the authors upon request.
Collapse
Affiliation(s)
- Mikhail G Dozmorov
- Arthritis and Clinical Immunology Research Program, Oklahoma Medical Research Foundation 825 NE 13th Street, Oklahoma City, Oklahoma 73104-5005, USA.
| | | |
Collapse
|
10
|
Gholami AM, Fellenberg K. Cross-species common regulatory network inference without requirement for prior gene affiliation. ACTA ACUST UNITED AC 2010; 26:1082-90. [PMID: 20200011 DOI: 10.1093/bioinformatics/btq096] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Cross-species meta-analyses of microarray data usually require prior affiliation of genes based on orthology information that often relies on sequence similarity. RESULTS We present an algorithm merging microarray datasets on the basis of co-expression alone, without any requirement for orthology information to affiliate genes. Combining existing methods such as co-inertia analysis, back-transformation, Hungarian matching and majority voting in an iterative non-greedy hill-climbing approach, it affiliates arrays and genes at the same time, maximizing the co-structure between the datasets. To introduce the method, we demonstrate its performance on two closely and two distantly related datasets of different experimental context and produced on different platforms. Each pair stems from two different species. The resulting cross-species dynamic Bayesian gene networks improve on the networks inferred from each dataset alone by yielding more significant network motifs, as well as more of the interactions already recorded in KEGG and other databases. Also, it is shown that our algorithm converges on the optimal number of nodes for network inference. Being readily extendable to more than two datasets, it provides the opportunity to infer extensive gene regulatory networks. AVAILABILITY AND IMPLEMENTATION Source code (MATLAB and R) freely available for download at http://www.mchips.org/supplements/moghaddasi_source.tgz.
Collapse
Affiliation(s)
- Amin Moghaddas Gholami
- Chair of Proteomics and Bioanalytics, Center for Integrated Protein Sciences Munich (CIPSM), Technische Universität München, Emil Erlenmeyer Forum 5, 85354 Freising, Germany
| | | |
Collapse
|
11
|
Liu L, Li Y, Liu B, Li J. A simple yet effective data integration approach to tree-based microarray data classification. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2010; 2010:1503-1506. [PMID: 21096367 DOI: 10.1109/iembs.2010.5626842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Different biological labs conduct similar experiments on same diseases. It is highly desirable to have a better model based on more experimental results than that on a single result. In this paper, we propose a method for integrating microarray data from multiple sources for building classification models. To test the method, we use three real world microarray data sets from different labs with different experimental devices and environments. Although microarray data is well known for its inconsistencies across labs, we demonstrate that it is possible to build consistent models using data sets from multiple labs. We report our method, experimental results and observations in the paper.
Collapse
Affiliation(s)
- Lin Liu
- School of Computer and Information Science, University of South Australia, Mawson Lakes, 5095, Australia.
| | | | | | | |
Collapse
|
12
|
Rogers JV, Price JA, McDougal JN. A review of transcriptomics in cutaneous chemical exposure. Cutan Ocul Toxicol 2009; 28:157-70. [DOI: 10.3109/15569520903157145] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
13
|
Hu P, Greenwood CMT, Beyene J. Using the ratio of means as the effect size measure in combining results of microarray experiments. BMC SYSTEMS BIOLOGY 2009; 3:106. [PMID: 19891778 PMCID: PMC2784452 DOI: 10.1186/1752-0509-3-106] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2009] [Accepted: 11/05/2009] [Indexed: 12/19/2022]
Abstract
Background Development of efficient analytic methodologies for combining microarray results is a major challenge in gene expression analysis. The widely used effect size models are thought to provide an efficient modeling framework for this purpose, where the measures of association for each study and each gene are combined, weighted by the standard errors. A significant disadvantage of this strategy is that the quality of different data sets may be highly variable, but this information is usually neglected during the integration. Moreover, it is widely known that the estimated standard deviations are probably unstable in the commonly used effect size measures (such as standardized mean difference) when sample sizes in each group are small. Results We propose a re-parameterization of the traditional mean difference based effect measure by using the log ratio of means as an effect size measure for each gene in each study. The estimated effect sizes for all studies were then combined under two modeling frameworks: the quality-unweighted random effects models and the quality-weighted random effects models. We defined the quality measure as a function of the detection p-value, which indicates whether a transcript is reliably detected or not on the Affymetrix gene chip. The new effect size measure is evaluated and compared under the quality-weighted and quality-unweighted data integration frameworks using simulated data sets, and also in several data sets of prostate cancer patients and controls. We focus on identifying differentially expressed biomarkers for prediction of cancer outcomes. Conclusion Our results show that the proposed effect size measure (log ratio of means) has better power to identify differentially expressed genes, and that the detected genes have better performance in predicting cancer outcomes than the commonly used effect size measure, the standardized mean difference (SMD), under both quality-weighted and quality-unweighted data integration frameworks. The new effect size measure and the quality-weighted microarray data integration framework provide efficient ways to combine microarray results.
Collapse
Affiliation(s)
- Pingzhao Hu
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, Canada.
| | | | | |
Collapse
|
14
|
Mechanism-anchored profiling derived from epigenetic networks predicts outcome in acute lymphoblastic leukemia. BMC Bioinformatics 2009; 10 Suppl 9:S6. [PMID: 19761576 PMCID: PMC2745693 DOI: 10.1186/1471-2105-10-s9-s6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background Current outcome predictors based on "molecular profiling" rely on gene lists selected without consideration for their molecular mechanisms. This study was designed to demonstrate that we could learn about genes related to a specific mechanism and further use this knowledge to predict outcome in patients – a paradigm shift towards accurate "mechanism-anchored profiling". We propose a novel algorithm, PGnet, which predicts a tripartite mechanism-anchored network associated to epigenetic regulation consisting of phenotypes, genes and mechanisms. Genes termed as GEMs in this network meet all of the following criteria: (i) they are co-expressed with genes known to be involved in the biological mechanism of interest, (ii) they are also differentially expressed between distinct phenotypes relevant to the study, and (iii) as a biomodule, genes correlate with both the mechanism and the phenotype. Results This proof-of-concept study, which focuses on epigenetic mechanisms, was conducted in a well-studied set of 132 acute lymphoblastic leukemia (ALL) microarrays annotated with nine distinct phenotypes and three measures of response to therapy. We used established parametric and non parametric statistics to derive the PGnet tripartite network that consisted of 10 phenotypes and 33 significant clusters of GEMs comprising 535 distinct genes. The significance of PGnet was estimated from empirical p-values, and a robust subnetwork derived from ALL outcome data was produced by repeated random sampling. The evaluation of derived robust network to predict outcome (relapse of ALL) was significant (p = 3%), using one hundred three-fold cross-validations and the shrunken centroids classifier. Conclusion To our knowledge, this is the first method predicting co-expression networks of genes associated with epigenetic mechanisms and to demonstrate its inherent capability to predict therapeutic outcome. This PGnet approach can be applied to any regulatory mechanisms including transcriptional or microRNA regulation in order to derive predictive molecular profiles that are mechanistically anchored. The implementation of PGnet in R is freely available at .
Collapse
|
15
|
Wren JD. A global meta-analysis of microarray expression data to predict unknown gene functions and estimate the literature-data divide. ACTA ACUST UNITED AC 2009; 25:1694-701. [PMID: 19447786 DOI: 10.1093/bioinformatics/btp290] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Approximately 9334 (37%) of human genes have no publications documenting their function and, for those that are published, the number of publications per gene is highly skewed. Furthermore, for reasons not clear, the entry of new gene names into the literature has slowed in recent years. If we are to better understand human/mammalian biology and complete the catalog of human gene function, it is important to finish predicting putative functions for these genes based upon existing experimental evidence. RESULTS A global meta-analysis (GMA) of all publicly available GEO two-channel human microarray datasets (3551 experiments total) was conducted to identify genes with recurrent, reproducible patterns of co-regulation across different conditions. Patterns of co-expression were divided into parallel (i.e. genes are up and down-regulated together) and anti-parallel. Several ranking methods to predict a gene's function based on its top 20 co-expressed gene pairs were compared. In the best method, 34% of predicted Gene Ontology (GO) categories matched exactly with the known GO categories for approximately 5000 genes analyzed versus only 3% for random gene sets. Only 2.4% of co-expressed gene pairs were found as co-occurring gene pairs in MEDLINE. CONCLUSIONS Via a GO enrichment analysis, genes co-expressed in parallel with the query gene were frequently associated with the same GO categories, whereas anti-parallel genes were not. Combining parallel and anti-parallel genes for analysis resulted in fewer significant GO categories, suggesting they are best analyzed separately. Expression databases contain much unexpected genetic knowledge that has not yet been reported in the literature. A total of 1642 Human genes with unknown function were differentially expressed in at least 30 experiments. AVAILABILITY Data matrix available upon request.
Collapse
Affiliation(s)
- Jonathan D Wren
- Arthritis and Immunology Research Program, Oklahoma Medical Research Foundation;, 825 N.E. 13th Street, Oklahoma City, OK 73104-5005, USA.
| |
Collapse
|
16
|
Greillier L, Baas P, Welch JJ, Hasan B, Passioukov A. Biomarkers for malignant pleural mesothelioma: current status. Mol Diagn Ther 2009; 12:375-90. [PMID: 19035624 DOI: 10.1007/bf03256303] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Malignant pleural mesothelioma (MPM) is an aggressive tumor with poor prognosis, whose main etiology is exposure to asbestos fibers. The incidence of MPM is anticipated to increase worldwide during the first half of this century. For various reasons, MPM is difficult to diagnose and is notoriously refractory to most treatments. However, recently two active chemotherapy regimens have been demonstrated to significantly increase survival in patients with MPM, and several therapeutic agents and strategies are currently under evaluation.Researchers have actively sought MPM biomarkers for more than 20 years. Biomarkers would be helpful in managing three clinical aspects of MPM: early diagnosis, prognosis, and treatment outcome prediction. The aims of the present review are to summarize the published and recently presented data on MPM biomarkers and to identify the prospects for future translational research projects.Among the 'classical' diagnostic biomarkers measured in biological fluids, such as cytokeratins and cell surface antigens, none discriminate patients with MPM from those with other malignancies and nonmalignant diseases. Osteopontin, soluble mesothelin, and megakaryocyte potentiating factor (MPF) appear to be the most promising of the recent biomarkers, but are still subject to some limitations. Osteopontin lacks specificity for mesothelioma, while both soluble mesothelin and MPF lack sensitivity for detecting non-epithelial subtypes. Panels consisting of a small set of biomarkers do not improve the diagnostic yield, and results from molecular profiling are too preliminary to be brought into daily clinical practice. While a large number of biomarkers have been assessed in biological fluids and tumor tissue for their prognostic value, none have had a widespread impact on clinical practice. In contrast, data concerning predictive biomarkers are very limited, even though they are most interesting from the perspective of clinicians.Additional prospective studies, in large and independent samples of patients, with rigorous statistical methodology and standardized laboratory techniques are now warranted to validate and define the precise value of diagnostic and prognostic MPM biomarkers. Future research efforts should focus on biomarkers predictive of the efficacy and toxicity of standard chemotherapy. Translational research should be systematically incorporated into the design of clinical trials assessing new targeted agents in MPM.
Collapse
Affiliation(s)
- Laurent Greillier
- European Organisation for Research and Treatment of Cancer (EORTC), Headquarters, Brussels, Belgium.
| | | | | | | | | |
Collapse
|
17
|
Hamid JS, Hu P, Roslin NM, Ling V, Greenwood CMT, Beyene J. Data integration in genetics and genomics: methods and challenges. HUMAN GENOMICS AND PROTEOMICS : HGP 2009; 2009. [PMID: 20948564 PMCID: PMC2950414 DOI: 10.4061/2009/869093] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/25/2008] [Accepted: 12/01/2008] [Indexed: 01/18/2023]
Abstract
Due to rapid technological advances, various types of genomic and proteomic data with different sizes, formats, and structures have become available. Among them are gene expression, single nucleotide polymorphism, copy number variation, and protein-protein/gene-gene interactions. Each of these distinct data types provides a different, partly independent and complementary, view of the whole genome. However, understanding functions of genes, proteins, and other aspects of the genome requires more information than provided by each of the datasets. Integrating data from different sources is, therefore, an important part of current research in genomics and proteomics. Data integration also plays important roles in combining clinical, environmental, and demographic data with high-throughput genomic data. Nevertheless, the concept of data integration is not well defined in the literature and it may mean different things to different researchers. In this paper, we first propose a conceptual framework for integrating genetic, genomic, and proteomic data. The framework captures fundamental aspects of data integration and is developed taking the key steps in genetic, genomic, and proteomic data fusion. Secondly, we provide a review of some of the most commonly used current methods and approaches for combining genomic data with focus on the statistical aspects.
Collapse
Affiliation(s)
- Jemila S Hamid
- Biostatistics Methodology Unit, The Hospital for Sick Children Research Institute, 555 University Avenue, Toronto, ON, Canada M5G 1X8
| | | | | | | | | | | |
Collapse
|
18
|
Meta-analysis of genome-wide expression patterns associated with behavioral maturation in honey bees. BMC Genomics 2008; 9:503. [PMID: 18950506 PMCID: PMC2582039 DOI: 10.1186/1471-2164-9-503] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2008] [Accepted: 10/24/2008] [Indexed: 11/22/2022] Open
Abstract
Background The information from multiple microarray experiments can be integrated in an objective manner via meta-analysis. However, multiple meta-analysis approaches are available and their relative strengths have not been directly compared using experimental data in the context of different gene expression scenarios and studies with different degrees of relationship. This study investigates the complementary advantages of meta-analysis approaches to integrate information across studies, and further mine the transcriptome for genes that are associated with complex processes such as behavioral maturation in honey bees. Behavioral maturation and division of labor in honey bees are related to changes in the expression of hundreds of genes in the brain. The information from various microarray studies comparing the expression of genes at different maturation stages in honey bee brains was integrated using complementary meta-analysis approaches. Results Comparison of lists of genes with significant differential expression across studies failed to identify genes with consistent patterns of expression that were below the selected significance threshold, or identified genes with significant yet inconsistent patterns. The meta-analytical framework supported the identification of genes with consistent overall expression patterns and eliminated genes that exhibited contradictory expression patterns across studies. Sample-level meta-analysis of normalized gene-expression can detect more differentially expressed genes than the study-level meta-analysis of estimates for genes that were well described by similar model parameter estimates across studies and had small variation across studies. Furthermore, study-level meta-analysis was well suited for genes that exhibit consistent patterns across studies, genes that had substantial variation across studies, and genes that did not conform to the assumptions of the sample-level meta-analysis. Meta-analyses confirmed previously reported genes and helped identify genes (e.g. Tomosyn, Chitinase 5, Adar, Innexin 2, Transferrin 1, Sick, Oatp26F) and Gene Ontology categories (e.g. purine nucleotide binding) not previously associated with maturation in honey bees. Conclusion This study demonstrated that a combination of meta-analytical approaches best addresses the highly dimensional nature of genome-wide microarray studies. As expected, the integration of gene expression information from microarray studies using meta-analysis enhanced the characterization of the transcriptome of complex biological processes.
Collapse
|
19
|
Abstract
INTRODUCTION An expanding understanding of the importance of angiogenesis in oncology and the development of numerous angiogenesis inhibitors are driving the search for biomarkers of angiogenesis. We review currently available candidate biomarkers and surrogate markers of anti-angiogenic agent effect. DISCUSSION A number of invasive, minimally invasive, and non-invasive tools are described with their potential benefits and limitations. Diverse markers can evaluate tumor tissue or biological fluids, or specialized imaging modalities. CONCLUSIONS The inclusion of these markers into clinical trials may provide insight into appropriate dosing for desired biological effects, appropriate timing of additional therapy, prediction of individual response to an agent, insight into the interaction of chemotherapy and radiation following exposure to these agents, and perhaps most importantly, a better understanding of the complex nature of angiogenesis in human tumors. While many markers have potential for clinical use, it is not yet clear which marker or combination of markers will prove most useful.
Collapse
Affiliation(s)
- Aaron P Brown
- National Institutes of Health, Building 10/3B42, Bethesda, MD 20892, USA
| | | | | |
Collapse
|
20
|
Weston DJ, Gunter LE, Rogers A, Wullschleger SD. Connecting genes, coexpression modules, and molecular signatures to environmental stress phenotypes in plants. BMC SYSTEMS BIOLOGY 2008; 2:16. [PMID: 18248680 PMCID: PMC2277374 DOI: 10.1186/1752-0509-2-16] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2007] [Accepted: 02/04/2008] [Indexed: 11/21/2022]
Abstract
Background One of the eminent opportunities afforded by modern genomic technologies is the potential to provide a mechanistic understanding of the processes by which genetic change translates to phenotypic variation and the resultant appearance of distinct physiological traits. Indeed much progress has been made in this area, particularly in biomedicine where functional genomic information can be used to determine the physiological state (e.g., diagnosis) and predict phenotypic outcome (e.g., patient survival). Ecology currently lacks an analogous approach where genomic information can be used to diagnose the presence of a given physiological state (e.g., stress response) and then predict likely phenotypic outcomes (e.g., stress duration and tolerance, fitness). Results Here, we demonstrate that a compendium of genomic signatures can be used to classify the plant abiotic stress phenotype in Arabidopsis according to the architecture of the transcriptome, and then be linked with gene coexpression network analysis to determine the underlying genes governing the phenotypic response. Using this approach, we confirm the existence of known stress responsive pathways and marker genes, report a common abiotic stress responsive transcriptome and relate phenotypic classification to stress duration. Conclusion Linking genomic signatures to gene coexpression analysis provides a unique method of relating an observed plant phenotype to changes in gene expression that underlie that phenotype. Such information is critical to current and future investigations in plant biology and, in particular, to evolutionary ecology, where a mechanistic understanding of adaptive physiological responses to abiotic stress can provide researchers with a tool of great predictive value in understanding species and population level adaptation to climate change.
Collapse
Affiliation(s)
- David J Weston
- Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6422, USA.
| | | | | | | |
Collapse
|
21
|
Burgun A, Bodenreider O. Accessing and integrating data and knowledge for biomedical research. Yearb Med Inform 2008:91-101. [PMID: 18660883 PMCID: PMC2553094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023] Open
Abstract
OBJECTIVES To review the issues that have arisen with the advent of translational research in terms of integration of data and knowledge, and survey current efforts to address these issues. METHODS Using examples form the biomedical literature, we identified new trends in biomedical research and their impact on bioinformatics. We analyzed the requirements for effective knowledge repositories and studied issues in the integration of biomedical knowledge. RESULTS New diagnostic and therapeutic approaches based on gene expression patterns have brought about new issues in the statistical analysis of data, and new workflows are needed are needed to support translational research. Interoperable data repositories based on standard annotations, infrastructures and services are needed to support the pooling and meta-analysis of data, as well as their comparison to earlier experiments. High-quality, integrated ontologies and knowledge bases serve as a source of prior knowledge used in combination with traditional data mining techniques and contribute to the development of more effective data analysis strategies. CONCLUSION As biomedical research evolves from traditional clinical and biological investigations towards omics sciences and translational research, specific needs have emerged, including integrating data collected in research studies with patient clinical data, linking omics knowledge with medical knowledge, modeling the molecular basis of diseases, and developing tools that support in-depth analysis of research data. As such, translational research illustrates the need to bridge the gap between bioinformatics and medical informatics, and opens new avenues for biomedical informatics research.
Collapse
Affiliation(s)
- A Burgun
- Département d'Information Médicale, CHU Pontchaillou, rue Henri Le Guilloux, F-35033 Rennes Cedex, France.
| | | |
Collapse
|