1
|
Wang KYX, Menzies AM, Silva IP, Wilmott JS, Yan Y, Wongchenko M, Kefford RF, Scolyer RA, Long GV, Tarr G, Mueller S, Yang JYH. bcGST-an interactive bias-correction method to identify over-represented gene-sets in boutique arrays. Bioinformatics 2020; 35:1350-1357. [PMID: 30215668 DOI: 10.1093/bioinformatics/bty783] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2017] [Revised: 07/31/2018] [Accepted: 09/11/2018] [Indexed: 02/01/2023] Open
Abstract
MOTIVATION Gene annotation and pathway databases such as Gene Ontology and Kyoto Encyclopaedia of Genes and Genomes are important tools in Gene-Set Test (GST) that describe gene biological functions and associated pathways. GST aims to establish an association relationship between a gene-set of interest and an annotation. Importantly, GST tests for over-representation of genes in an annotation term. One implicit assumption of GST is that the gene expression platform captures the complete or a very large proportion of the genome. However, this assumption is neither satisfied for the increasingly popular boutique array nor the custom designed gene expression profiling platform. Specifically, conventional GST is no longer appropriate due to the gene-set selection bias induced during the construction of these platforms. RESULTS We propose bcGST, a bias-corrected GST by introducing bias-correction terms in the contingency table needed for calculating the Fisher's Exact Test. The adjustment method works by estimating the proportion of genes captured on the array with respect to the genome in order to assist filtration of annotation terms that would otherwise be falsely included or excluded. We illustrate the practicality of bcGST and its stability through multiple differential gene expression analyses in melanoma and the Cancer Genome Atlas cancer studies. AVAILABILITY AND IMPLEMENTATION The bcGST method is made available as a Shiny web application at http://shiny.maths.usyd.edu.au/bcGST/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kevin Y X Wang
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
| | - Alexander M Menzies
- Melanoma Institute of Australia, Wollstonecraft, NSW, Australia.,Sydney Medical School, The University of Sydney, Sydney, NSW, Australia.,Royal North Shore Hospital, Sydney, NSW, Australia
| | - Ines P Silva
- Melanoma Institute of Australia, Wollstonecraft, NSW, Australia
| | - James S Wilmott
- Melanoma Institute of Australia, Wollstonecraft, NSW, Australia
| | - Yibing Yan
- Genentech Inc, South San Francisco, CA, USA
| | | | - Richard F Kefford
- Melanoma Institute of Australia, Wollstonecraft, NSW, Australia.,Department of Clinical Medicine, Macquarie University, Sydney, NSW, Australia
| | - Richard A Scolyer
- Melanoma Institute of Australia, Wollstonecraft, NSW, Australia.,Sydney Medical School, The University of Sydney, Sydney, NSW, Australia.,Royal Prince Alfred Hospital, Sydney, NSW, Australia
| | - Georgina V Long
- Melanoma Institute of Australia, Wollstonecraft, NSW, Australia.,Sydney Medical School, The University of Sydney, Sydney, NSW, Australia.,Royal North Shore Hospital, Sydney, NSW, Australia
| | - Garth Tarr
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
| | - Samuel Mueller
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
| | - Jean Y H Yang
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia.,The Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia
| |
Collapse
|
2
|
Ghazanfar S, Strbenac D, Ormerod JT, Yang JYH, Patrick E. DCARS: differential correlation across ranked samples. Bioinformatics 2019; 35:823-829. [PMID: 30102408 DOI: 10.1093/bioinformatics/bty698] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Revised: 07/19/2018] [Accepted: 08/07/2018] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Genes act as a system and not in isolation. Thus, it is important to consider coordinated changes of gene expression rather than single genes when investigating biological phenomena such as the aetiology of cancer. We have developed an approach for quantifying how changes in the association between pairs of genes may inform the outcome of interest called Differential Correlation across Ranked Samples (DCARS). Modelling gene correlation across a continuous sample ranking does not require the dichotomisation of samples into two distinct classes and can identify differences in gene correlation across early, mid or late stages of the outcome of interest. RESULTS When we evaluated DCARS against the typical Fisher Z-transformation test for differential correlation, as well as a typical approach testing for interaction within a linear model, on real TCGA data, DCARS significantly ranked gene pairs containing known cancer genes more highly across several cancers. Similar results are found with our simulation study. DCARS was applied to 13 cancers datasets in TCGA, revealing several distinct relationships for which survival ranking was found to be associated with a change in correlation between genes. Furthermore, we demonstrated that DCARS can be used in conjunction with network analysis techniques to extract biological meaning from multi-layered and complex data. AVAILABILITY AND IMPLEMENTATION DCARS R package and sample data are available at https://github.com/shazanfar/DCARS. Publicly available data from The Cancer Genome Atlas (TCGA) was used using the TCGABiolinks R package. Supplementary Files and DCARS R package is available at https://github.com/shazanfar/DCARS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shila Ghazanfar
- The Judith and David Coffey Life Lab, Charles Perkins Centre, University of Sydney, Sydney, NSW, Australia.,School of Mathematics and Statistics, University of Sydney, Sydney, NSW, Australia
| | - Dario Strbenac
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW, Australia
| | - John T Ormerod
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW, Australia.,ARC Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS), Richard Berry Building, The University of Melbourne, Melbourne, Parkville, VIC, Australia
| | - Jean Y H Yang
- The Judith and David Coffey Life Lab, Charles Perkins Centre, University of Sydney, Sydney, NSW, Australia.,School of Mathematics and Statistics, University of Sydney, Sydney, NSW, Australia
| | - Ellis Patrick
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW, Australia.,Westmead Institute for Medical Research, University of Sydney, Westmead, NSW, Australia
| |
Collapse
|
3
|
Zhang Z, Zhao L, Wei X, Guo Q, Zhu X, Wei R, Yin X, Zhang Y, Wang B, Li X. Integrated bioinformatic analysis of microarray data reveals shared gene signature between MDS and AML. Oncol Lett 2018; 16:5147-5159. [PMID: 30214614 PMCID: PMC6126153 DOI: 10.3892/ol.2018.9237] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Accepted: 06/20/2018] [Indexed: 12/19/2022] Open
Abstract
Myeloid disorders, especially myelodysplastic syndrome (MDS) and acute myeloid leukemia (AML), cause significant mobility and high mortality worldwide. Despite numerous attempts, the common molecular events underlying the development of MDS and AML remain to be established. In the present study, 18 microarray datasets were selected, and a meta-analysis was conducted to identify shared gene signatures and biological processes between MDS and AML. Using NetworkAnalyst, 191 upregulated and 139 downregulated genes were identified in MDS and AML, among which, PTH2R, TEC, and GPX1 were the most upregulated genes, while MME, RAG1, and CD79B were mostly downregulated. Comprehensive functional enrichment analyses revealed oncogenic signaling related pathway, fibroblast growth factor receptor (FGFR) and immune response related events, 'interleukine-6/interferon signaling pathway, and B cell receptor signaling pathway', were the most upregulated and downregulated biological processes, respectively. Network based meta-analysis ascertained that HSP90AA1 and CUL1 were the most important hub genes. Interestingly, our study has largely clarified the link between MDS and AML in terms of potential pathways, and genetic markers, which shed light on the molecular mechanisms underlying the development and transition of MDS and AML, and facilitate the understanding of novel diagnostic, therapeutic and prognostic biomarkers.
Collapse
Affiliation(s)
- Zhen Zhang
- Laboratory for Molecular Immunology, Institute of Basic Medicine, Shandong Academy of Medical Sciences, Jinan, Shandong 250062, P.R. China
| | - Lin Zhao
- Laboratory for Molecular Immunology, Institute of Basic Medicine, Shandong Academy of Medical Sciences, Jinan, Shandong 250062, P.R. China
| | - Xijin Wei
- Department of Peripheral Vascular Surgery, Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, Shandong 250011, P.R. China
| | - Qiang Guo
- Laboratory for Molecular Immunology, Institute of Basic Medicine, Shandong Academy of Medical Sciences, Jinan, Shandong 250062, P.R. China
| | - Xiaoxiao Zhu
- Laboratory for Molecular Immunology, Institute of Basic Medicine, Shandong Academy of Medical Sciences, Jinan, Shandong 250062, P.R. China
| | - Ran Wei
- Laboratory for Molecular Immunology, Institute of Basic Medicine, Shandong Academy of Medical Sciences, Jinan, Shandong 250062, P.R. China
| | - Xunqiang Yin
- Laboratory for Molecular Immunology, Institute of Basic Medicine, Shandong Academy of Medical Sciences, Jinan, Shandong 250062, P.R. China
- School of Medicine and Life Sciences, University of Jinan-Shandong Academy of Medical Sciences, Jinan, Shandong 250062, P.R. China
| | - Yunhong Zhang
- Laboratory for Molecular Immunology, Institute of Basic Medicine, Shandong Academy of Medical Sciences, Jinan, Shandong 250062, P.R. China
- School of Medicine and Life Sciences, University of Jinan-Shandong Academy of Medical Sciences, Jinan, Shandong 250062, P.R. China
| | - Bin Wang
- Department of Peripheral Vascular Surgery, Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, Shandong 250011, P.R. China
| | - Xia Li
- Laboratory for Molecular Immunology, Institute of Basic Medicine, Shandong Academy of Medical Sciences, Jinan, Shandong 250062, P.R. China
| |
Collapse
|
4
|
Vafaee F, Diakos C, Kirschner MB, Reid G, Michael MZ, Horvath LG, Alinejad-Rokny H, Cheng ZJ, Kuncic Z, Clarke S. A data-driven, knowledge-based approach to biomarker discovery: application to circulating microRNA markers of colorectal cancer prognosis. NPJ Syst Biol Appl 2018; 4:20. [PMID: 29872543 PMCID: PMC5981448 DOI: 10.1038/s41540-018-0056-1] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Revised: 04/11/2018] [Accepted: 05/04/2018] [Indexed: 02/08/2023] Open
Abstract
Recent advances in high-throughput technologies have provided an unprecedented opportunity to identify molecular markers of disease processes. This plethora of complex-omics data has simultaneously complicated the problem of extracting meaningful molecular signatures and opened up new opportunities for more sophisticated integrative and holistic approaches. In this era, effective integration of data-driven and knowledge-based approaches for biomarker identification has been recognised as key to improving the identification of high-performance biomarkers, and necessary for translational applications. Here, we have evaluated the role of circulating microRNA as a means of predicting the prognosis of patients with colorectal cancer, which is the second leading cause of cancer-related death worldwide. We have developed a multi-objective optimisation method that effectively integrates a data-driven approach with the knowledge obtained from the microRNA-mediated regulatory network to identify robust plasma microRNA signatures which are reliable in terms of predictive power as well as functional relevance. The proposed multi-objective framework has the capacity to adjust for conflicting biomarker objectives and to incorporate heterogeneous information facilitating systems approaches to biomarker discovery. We have found a prognostic signature of colorectal cancer comprising 11 circulating microRNAs. The identified signature predicts the patients' survival outcome and targets pathways underlying colorectal cancer progression. The altered expression of the identified microRNAs was confirmed in an independent public data set of plasma samples of patients in early stage vs advanced colorectal cancer. Furthermore, the generality of the proposed method was demonstrated across three publicly available miRNA data sets associated with biomarker studies in other diseases.
Collapse
Affiliation(s)
- Fatemeh Vafaee
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2033 Australia
| | - Connie Diakos
- Kolling Institute of Medical Research, University of Sydney, Royal North Shore Hospital, Reserve Road, St Leonards, NSW 2065 Australia
| | | | - Glen Reid
- Asbestos Diseases Research Institute, Hospital Road, Concord, NSW 2139 Australia
- Sydney Medical School, University of Sydney, Sydney, NSW 2050 Australia
| | - Michael Z. Michael
- Flinders Centre for Innovation in Cancer, Flinders Medical Centre, Flinders University, Adelaide, SA 5042 Australia
| | - Lisa G. Horvath
- Sydney Medical School, University of Sydney, Sydney, NSW 2050 Australia
- Chris O’Brien Lifehouse, Missenden Road, Camperdown, NSW 2050 Australia
- Royal Prince Alfred Hospital, Camperdown, NSW 2050 Australia
| | | | - Zhangkai Jason Cheng
- Charles Perkins Centre, University of Sydney, Sydney, NSW 2006 Australia
- School of Physics, University of Sydney, Sydney, NSW 2006 Australia
| | - Zdenka Kuncic
- Charles Perkins Centre, University of Sydney, Sydney, NSW 2006 Australia
- School of Physics, University of Sydney, Sydney, NSW 2006 Australia
| | - Stephen Clarke
- Kolling Institute of Medical Research, University of Sydney, Royal North Shore Hospital, Reserve Road, St Leonards, NSW 2065 Australia
| |
Collapse
|
5
|
Yan W, Xue W, Chen J, Hu G. Biological Networks for Cancer Candidate Biomarkers Discovery. Cancer Inform 2016; 15:1-7. [PMID: 27625573 PMCID: PMC5012434 DOI: 10.4137/cin.s39458] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Revised: 06/06/2016] [Accepted: 06/16/2016] [Indexed: 12/16/2022] Open
Abstract
Due to its extraordinary heterogeneity and complexity, cancer is often proposed as a model case of a systems biology disease or network disease. There is a critical need of effective biomarkers for cancer diagnosis and/or outcome prediction from system level analyses. Methods based on integrating omics data into networks have the potential to revolutionize the identification of cancer biomarkers. Deciphering the biological networks underlying cancer is undoubtedly important for understanding the molecular mechanisms of the disease and identifying effective biomarkers. In this review, the networks constructed for cancer biomarker discovery based on different omics level data are described and illustrated from recent advances in the field.
Collapse
Affiliation(s)
- Wenying Yan
- Center for Systems Biology, Soochow University, Suzhou, Jiangsu, China
| | - Wenjin Xue
- Department of Electrical Engineering, Technician College of Taizhou, Taizhou, Jiangsu, China
| | - Jiajia Chen
- School of Chemistry, Biology and Material Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Guang Hu
- Center for Systems Biology, Soochow University, Suzhou, Jiangsu, China
| |
Collapse
|
6
|
Jeanquartier F, Jean-Quartier C, Kotlyar M, Tokar T, Hauschild AC, Jurisica I, Holzinger A. Machine Learning for In Silico Modeling of Tumor Growth. LECTURE NOTES IN COMPUTER SCIENCE 2016. [DOI: 10.1007/978-3-319-50478-0_21] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
7
|
Xia J, Gill EE, Hancock REW. NetworkAnalyst for statistical, visual and network-based meta-analysis of gene expression data. Nat Protoc 2015; 10:823-44. [PMID: 25950236 DOI: 10.1038/nprot.2015.052] [Citation(s) in RCA: 598] [Impact Index Per Article: 66.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Meta-analysis of gene expression data sets is increasingly performed to help identify robust molecular signatures and to gain insights into underlying biological processes. The complicated nature of such analyses requires both advanced statistics and innovative visualization strategies to support efficient data comparison, interpretation and hypothesis generation. NetworkAnalyst (http://www.networkanalyst.ca) is a comprehensive web-based tool designed to allow bench researchers to perform various common and complex meta-analyses of gene expression data via an intuitive web interface. By coupling well-established statistical procedures with state-of-the-art data visualization techniques, NetworkAnalyst allows researchers to easily navigate large complex gene expression data sets to determine important features, patterns, functions and connections, thus leading to the generation of new biological hypotheses. This protocol provides a step-wise description of how to effectively use NetworkAnalyst to perform network analysis and visualization from gene lists; to perform meta-analysis on gene expression data while taking into account multiple metadata parameters; and, finally, to perform a meta-analysis of multiple gene expression data sets. NetworkAnalyst is designed to be accessible to biologists rather than to specialist bioinformaticians. The complete protocol can be executed in ∼1.5 h. Compared with other similar web-based tools, NetworkAnalyst offers a unique visual analytics experience that enables data analysis within the context of protein-protein interaction networks, heatmaps or chord diagrams. All of these analysis methods provide the user with supporting statistical and functional evidence.
Collapse
Affiliation(s)
- Jianguo Xia
- 1] Department of Microbiology and Immunology, University of British Columbia, Vancouver, British Columbia, Canada. [2] Institute of Parasitology, and Department of Animal Science, McGill University, Ste. Ann de Bellevue, Québec, Canada. [3] Department of Microbiology and Immunology, McGill University, Montreal, Québec, Canada
| | - Erin E Gill
- Department of Microbiology and Immunology, University of British Columbia, Vancouver, British Columbia, Canada
| | - Robert E W Hancock
- 1] Department of Microbiology and Immunology, University of British Columbia, Vancouver, British Columbia, Canada. [2] Wellcome Trust Sanger Institute, Hinxton, United Kingdom
| |
Collapse
|
8
|
Barter RL, Schramm SJ, Mann GJ, Yang YH. Network-based biomarkers enhance classical approaches to prognostic gene expression signatures. BMC SYSTEMS BIOLOGY 2014; 8 Suppl 4:S5. [PMID: 25521200 PMCID: PMC4290694 DOI: 10.1186/1752-0509-8-s4-s5] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
BACKGROUND Classical approaches to predicting patient clinical outcome via gene expression information are primarily based on differential expression of unrelated genes (single-gene approaches) or genes related by, for example, biologic pathway or function (gene-sets). Recently, network-based approaches utilising interaction information between genes have emerged. An open problem is whether such approaches add value to the more traditional methods of signature modelling. We explored this question via comparison of the most widely employed single-gene, gene-set, and network-based methods, using gene expression microarray data from two different cancers: melanoma and ovarian. We considered two kinds of network approaches. The first of these identifies informative genes using gene expression and network connectivity information combined, the latter drawn from prior knowledge of protein-protein interactions. The second approach focuses on identification of informative sub-networks (small networks of interacting proteins, again from prior knowledge networks). For all methods we performed 100 rounds of 5-fold cross-validation under 3 different classifiers. For network-based approaches, we considered two different protein-protein interaction networks. We quantified resulting patterns of misclassification and discussed the relative value of each relative to ongoing development of prognostic biomarkers. RESULTS We found that single-gene, gene-set and network methods yielded similar error rates in melanoma and ovarian cancer data. Crucially, however, our novel and detailed patient-level analyses revealed that the different methods were correctly classifying alternate subsets of patients in each cohort. We also found that the network-based NetRank feature selection method was the most stable. CONCLUSIONS Next-generation methods of gene expression signature modelling harness data from external networks and are foreshadowed as a standard mode of analysis. But what do they add to traditional approaches? Our findings indicate there is value in the way in which different subspaces of the patient sample are captured differently among the various methods, highlighting the possibility of 'combination' classifiers capable of identifying which patients will be more accurately classified by one particular method over another. We have seen this clearly for the first time because of our in-depth analysis at the level of individual patients.
Collapse
Affiliation(s)
- Rebecca L Barter
- School of Mathematics and Statistics at The University of Sydney, F07, The University of Sydney, NSW, 2006, Australia
| | - Sarah-Jane Schramm
- Westmead Millennium Institute at The University of Sydney, 176 Hawkesbury Road, Westmead, NSW, 2145, Australia
- Melanoma Institute Australia, 40 Rocklands Rd, North Sydney, NSW, 2060, Australia
| | - Graham J Mann
- Westmead Millennium Institute at The University of Sydney, 176 Hawkesbury Road, Westmead, NSW, 2145, Australia
- Melanoma Institute Australia, 40 Rocklands Rd, North Sydney, NSW, 2060, Australia
| | - Yee Hwa Yang
- School of Mathematics and Statistics at The University of Sydney, F07, The University of Sydney, NSW, 2006, Australia
- Melanoma Institute Australia, 40 Rocklands Rd, North Sydney, NSW, 2060, Australia
| |
Collapse
|
9
|
Schramm SJ, Jayaswal V, Goel A, Li SS, Yang YH, Mann GJ, Wilkins MR. Molecular interaction networks for the analysis of human disease: utility, limitations, and considerations. Proteomics 2014; 13:3393-405. [PMID: 24166987 DOI: 10.1002/pmic.201200570] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Revised: 09/11/2013] [Accepted: 10/07/2013] [Indexed: 01/01/2023]
Abstract
High-throughput '-omics' data can be combined with large-scale molecular interaction networks, for example, protein-protein interaction networks, to provide a unique framework for the investigation of human molecular biology. Interest in these integrative '-omics' methods is growing rapidly because of their potential to understand complexity and association with disease; such approaches have a focus on associations between phenotype and "network-type." The potential of this research is enticing, yet there remain a series of important considerations. Here, we discuss interaction data selection, data quality, the relative merits of using data from large high-throughput studies versus a meta-database of smaller literature-curated studies, and possible issues of sociological or inspection bias in interaction data. Other work underway, especially international consortia to establish data formats, quality standards and address data redundancy, and the improvements these efforts are making to the field, is also evaluated. We present options for researchers intending to use large-scale molecular interaction networks as a functional context for protein or gene expression data, including microRNAs, especially in the context of human disease.
Collapse
Affiliation(s)
- Sarah-Jane Schramm
- Sydney Medical School, Westmead Millennium Institute for Medical Research, The University of Sydney, Sydney, NSW, Australia; Melanoma Institute Australia, Sydney, NSW, Australia
| | | | | | | | | | | | | |
Collapse
|