1
|
Subnetwork representation learning for discovering network biomarkers in predicting lymph node metastasis in early oral cancer. Sci Rep 2021; 11:23992. [PMID: 34907266 PMCID: PMC8671417 DOI: 10.1038/s41598-021-03333-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 11/18/2021] [Indexed: 12/02/2022] Open
Abstract
Cervical lymph node metastasis is the leading cause of poor prognosis in oral tongue squamous cell carcinoma and also occurs in the early stages. The current clinical diagnosis depends on a physical examination that is not enough to determine whether micrometastasis remains. The transcriptome profiling technique has shown great potential for predicting micrometastasis by capturing the dynamic activation state of genes. However, there are several technical challenges in using transcriptome data to model patient conditions: (1) An Insufficient number of samples compared to the number of genes, (2) Complex dependence between genes that govern the cancer phenotype, and (3) Heterogeneity between patients between cohorts that differ geographically and racially. We developed a computational framework to learn the subnetwork representation of the transcriptome to discover network biomarkers and determine the potential of metastasis in early oral tongue squamous cell carcinoma. Our method achieved high accuracy in predicting the potential of metastasis in two geographically and racially different groups of patients. The robustness of the model and the reproducibility of the discovered network biomarkers show great potential as a tool to diagnose lymph node metastasis in early oral cancer.
Collapse
|
2
|
Liu X, Li D, Liu J, Su Z, Li G. RecBic: a fast and accurate algorithm recognizing trend-preserving biclusters. Bioinformatics 2021; 36:5054-5060. [PMID: 32653907 DOI: 10.1093/bioinformatics/btaa630] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 06/24/2020] [Accepted: 07/06/2020] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION Biclustering has emerged as a powerful approach to identifying functional patterns in complex biological data. However, existing tools are limited by their accuracy and efficiency to recognize various kinds of complex biclusters submerged in ever large datasets. We introduce a novel fast and highly accurate algorithm RecBic to identify various forms of complex biclusters in gene expression datasets. RESULTS We designed RecBic to identify various trend-preserving biclusters, particularly, those with narrow shapes, i.e. clusters where the number of genes is larger than the number of conditions/samples. Given a gene expression matrix, RecBic starts with a column seed, and grows it into a full-sized bicluster by simply repetitively comparing real numbers. When tested on simulated datasets in which the elements of implanted trend-preserving biclusters and those of the background matrix have the same distribution, RecBic was able to identify the implanted biclusters in a nearly perfect manner, outperforming all the compared salient tools in terms of accuracy and robustness to noise and overlaps between the clusters. Moreover, RecBic also showed superiority in identifying functionally related genes in real gene expression datasets. AVAILABILITY AND IMPLEMENTATION Code, sample input data and usage instructions are available at the following websites. Code: https://github.com/holyzews/RecBic/tree/master/RecBic/. Data: http://doi.org/10.5281/zenodo.3842717. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiangyu Liu
- Research Center for Mathematics and Interdisciplinary Sciences.,School of Mathematics, Shandong University, Jinan 250100, China
| | - Di Li
- Research Center for Mathematics and Interdisciplinary Sciences.,School of Mathematics, Shandong University, Jinan 250100, China
| | - Juntao Liu
- School of Mathematics, Shandong University, Jinan 250100, China
| | - Zhengchang Su
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Guojun Li
- Research Center for Mathematics and Interdisciplinary Sciences.,School of Mathematics, Shandong University, Jinan 250100, China
| |
Collapse
|
3
|
Yang L, Chen R, Goodison S, Sun Y. An efficient and effective method to identify significantly perturbed subnetworks in cancer. NATURE COMPUTATIONAL SCIENCE 2021; 1:79-88. [PMID: 37346964 PMCID: PMC10284573 DOI: 10.1038/s43588-020-00009-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 12/02/2020] [Indexed: 06/23/2023]
Abstract
The identification of key functional biological networks from high-dimensional genomics data is pivotal for cancer research. Here, we introduce FDRnet, a method for the detection of molecular subnetworks in cancer, which addresses several challenges in pathway analysis. FDRnet detects key subnetworks by solving a mixed-integer linear programming problem, using a given upper bound of false discovery rate (FDR) as a budget constraint, and minimizing a conductance score to find dense subgraphs around seed genes. A large-scale benchmark study was performed on both simulation and cancer genomics data. FDRnet outperformed other methods in the ability to detect functionally homogeneous subnetworks in a scale-free biological network, to control FDRs of the genes in detected subnetworks, to improve computational efficiency and to integrate multi-omics data. By overcoming the limitations of existing approaches, FDRnet can facilitate the detection of key functional pathways in cancer and other genetic diseases.
Collapse
Affiliation(s)
- Le Yang
- Department of Computer Science and Engineering, The State University of New York at Buffalo, Buffalo, NY, USA
| | - Runpu Chen
- Department of Computer Science and Engineering, The State University of New York at Buffalo, Buffalo, NY, USA
| | - Steve Goodison
- Department of Health Sciences Research, Mayo Clinic, Jacksonville, FL, USA
| | - Yijun Sun
- Department of Computer Science and Engineering, The State University of New York at Buffalo, Buffalo, NY, USA
- Department of Microbiology and Immunology, The State University of New York at Buffalo, Buffalo, NY, USA
- Department of Biostatistics, The State University of New York at Buffalo, Buffalo, NY, USA
| |
Collapse
|
4
|
Adnan N, Lei C, Ruan J. Robust edge-based biomarker discovery improves prediction of breast cancer metastasis. BMC Bioinformatics 2020; 21:359. [PMID: 32998692 PMCID: PMC7526355 DOI: 10.1186/s12859-020-03692-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Background The abundance of molecular profiling of breast cancer tissues entailed active research on molecular marker-based early diagnosis of metastasis. Recently there is a surging interest in combining gene expression with gene networks such as protein-protein interaction (PPI) network, gene co-expression (CE) network and pathway information to identify robust and accurate biomarkers for metastasis prediction, reflecting the common belief that cancer is a systems biology disease. However, controversy exists in the literature regarding whether network markers are indeed better features than genes alone for predicting as well as understanding metastasis. We believe much of the existing results may have been biased by the overly complicated prediction algorithms, unfair evaluation, and lack of rigorous statistics. In this study, we propose a simple approach to use network edges as features, based on two types of networks respectively, and compared their prediction power using three classification algorithms and rigorous statistical procedure on one of the largest datasets available. To detect biomarkers that are significant for the prediction and to compare the robustness of different feature types, we propose an unbiased and novel procedure to measure feature importance that eliminates the potential bias from factors such as different sample size, number of features, as well as class distribution. Results Experimental results reveal that edge-based feature types consistently outperformed gene-based feature type in random forest and logistic regression models under all performance evaluation metrics, while the prediction accuracy of edge-based support vector machine (SVM) model was poorer, due to the larger number of edge features compared to gene features and the lack of feature selection in SVM model. Experimental results also show that edge features are much more robust than gene features and the top biomarkers from edge feature types are statistically more significantly enriched in the biological processes that are well known to be related to breast cancer metastasis. Conclusions Overall, this study validates the utility of edge features as biomarkers but also highlights the importance of carefully designed experimental procedures in order to achieve statistically reliable comparison results.
Collapse
Affiliation(s)
- Nahim Adnan
- Department of Computer Science, The University of Texas at San Antonio, One UTSA Circle, San Antonio, 78249, TX, USA
| | - Chengwei Lei
- Department of Computer & Electrical Engineering/Computer Science, California State University, Bakersfield, 9001 Stockdale Highway, Bakersfield, 93311, CA, USA
| | - Jianhua Ruan
- Department of Computer Science, The University of Texas at San Antonio, One UTSA Circle, San Antonio, 78249, TX, USA.
| |
Collapse
|
5
|
Adnan N, Liu Z, Huang THM, Ruan J. Comparative evaluation of network features for the prediction of breast cancer metastasis. BMC Med Genomics 2020; 13:40. [PMID: 32241278 PMCID: PMC7119280 DOI: 10.1186/s12920-020-0676-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Background Discovering a highly accurate and robust gene signature for the prediction of breast cancer metastasis from gene expression profiling of primary tumors is one of the most challenging tasks to reduce the number of deaths in women. Due to the limited success of gene-based features in achieving satisfactory prediction accuracy, many methodologies have been proposed in recent years to develop network-based features by integrating network information with gene expression. However, evaluation results are inconsistent to confirm the effectiveness of network-based features, because of many confounding factors involved in classification model learning process, such as data normalization, dimension reduction, and feature selection. An unbiased comparative evaluation is essential for uncovering the strength of network-based features. Methods In this study, we compared several types of network-based features obtained using different mathematical operators (Mean, Maximum, Minimum, Median, Variance) on geneset (i.e., a gene and its’ neighbors in the network) in protein-protein interaction network and gene co-expression network for their ability in predicting breast cancer metastasis using gene expression data from more than 10 patient cohorts. Results While network-based features are usually statistically more significant than gene-based feature, a consistent improvement of prediction performance using network-based features requires a substantial number of patients in the dataset. In contrary to many previous reports, no evidence was found to support the robustness of network-based features and we argue some of the robustness may be due to the inherent bias associated with node degree in the network. In addition, different types of network features seem to cover different pathways and are complementary to each other. Consequently, an ensemble classifier combining different network features was proposed and was found to significantly outperform classifiers based on gene-based feature or any single type of network-based features. Conclusions Network-based features and their combination show promise for improving the prediction of breast cancer metastasis but may require a large amount of training data. Robustness claim of network-based features needs to be re-examined with network node degree and other confounding factors in consideration.
Collapse
Affiliation(s)
- Nahim Adnan
- Department of Computer Science, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX 78249, USA
| | - Zhijie Liu
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Drive, San Antonio, TX 78230, USA
| | - Tim H M Huang
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Drive, San Antonio, TX 78230, USA
| | - Jianhua Ruan
- Department of Computer Science, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX 78249, USA. .,Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Drive, San Antonio, TX 78230, USA.
| |
Collapse
|
6
|
Shao B, Bjaanæs MM, Helland Å, Schütte C, Conrad T. EMT network-based feature selection improves prognosis prediction in lung adenocarcinoma. PLoS One 2019; 14:e0204186. [PMID: 30703089 PMCID: PMC6354965 DOI: 10.1371/journal.pone.0204186] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Accepted: 12/25/2018] [Indexed: 12/16/2022] Open
Abstract
Various feature selection algorithms have been proposed to identify cancer prognostic biomarkers. In recent years, however, their reproducibility is criticized. The performance of feature selection algorithms is shown to be affected by the datasets, underlying networks and evaluation metrics. One of the causes is the curse of dimensionality, which makes it hard to select the features that generalize well on independent data. Even the integration of biological networks does not mitigate this issue because the networks are large and many of their components are not relevant for the phenotype of interest. With the availability of multi-omics data, integrative approaches are being developed to build more robust predictive models. In this scenario, the higher data dimensions create greater challenges. We proposed a phenotype relevant network-based feature selection (PRNFS) framework and demonstrated its advantages in lung cancer prognosis prediction. We constructed cancer prognosis relevant networks based on epithelial mesenchymal transition (EMT) and integrated them with different types of omics data for feature selection. With less than 2.5% of the total dimensionality, we obtained EMT prognostic signatures that achieved remarkable prediction performance (average AUC values >0.8), very significant sample stratifications, and meaningful biological interpretations. In addition to finding EMT signatures from different omics data levels, we combined these single-omics signatures into multi-omics signatures, which improved sample stratifications significantly. Both single- and multi-omics EMT signatures were tested on independent multi-omics lung cancer datasets and significant sample stratifications were obtained.
Collapse
Affiliation(s)
- Borong Shao
- Zuse Institute Berlin, Berlin, Germany
- Dept of mathematics and computer science, Freie Universität Berlin, Berlin, Germany
- * E-mail:
| | - Maria Moksnes Bjaanæs
- Dept of Oncology, Oslo University Hospital, Oslo, Norway
- Dept of Cancer Genetics, Oslo University Hospital, Oslo, Norway
- Dept of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Åslaug Helland
- Dept of Oncology, Oslo University Hospital, Oslo, Norway
- Dept of Cancer Genetics, Oslo University Hospital, Oslo, Norway
- Dept of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Christof Schütte
- Zuse Institute Berlin, Berlin, Germany
- Dept of mathematics and computer science, Freie Universität Berlin, Berlin, Germany
| | - Tim Conrad
- Zuse Institute Berlin, Berlin, Germany
- Dept of mathematics and computer science, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
7
|
Farahmand S, Goliaei S, Kashani ZRM, Farahmand S. Identifying Cancer Subnetwork Markers Using Game Theory Method. INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS 2019. [DOI: 10.1007/978-981-10-4505-9_17] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
8
|
Farahmand S, Foroughmand-Araabi MH, Goliaei S, Razaghi-Moghadam Z. CytoGTA: A cytoscape plugin for identifying discriminative subnetwork markers using a game theoretic approach. PLoS One 2017; 12:e0185016. [PMID: 28968407 PMCID: PMC5624584 DOI: 10.1371/journal.pone.0185016] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2017] [Accepted: 09/04/2017] [Indexed: 01/07/2023] Open
Abstract
In recent years, analyzing genome-wide expression profiles to find genetic markers has received much attention as a challenging field of research aiming at unveiling biological mechanisms behind complex disorders. The identification of reliable and reproducible markers has lately been achieved by integrating genome-scale functional relationships and transcriptome datasets, and a number of algorithms have been developed to support this strategy. In this paper, we present a promising and easily applicable tool to accomplish this goal, namely CytoGTA, which is a Cytoscape plug-in that relies on an optimistic game theoretic approach (GTA) for identifying subnetwork markers. Given transcriptomic data of two phenotype classes and interactome data, this plug-in offers discriminative markers for the two classes. The high performance of CytoGTA would not have been achieved if the strategy of GTA was not implemented in Cytoscape. This plug-in provides a simple-to-use platform, convenient for biological researchers to interactively work with and visualize the structure of subnetwork markers. CytoGTA is one of the few available Cytoscape plug-ins for marker identification, which shows superior performance to existing methods.
Collapse
Affiliation(s)
- S. Farahmand
- Faculty of New Sciences and Technologies, University of Tehran, Tehran, Iran
- College of Science and Mathematics, University of Massachusetts Boston, Boston, Massachusetts, United States of America
| | | | | | - Z. Razaghi-Moghadam
- Faculty of New Sciences and Technologies, University of Tehran, Tehran, Iran
- * E-mail:
| |
Collapse
|
9
|
Auslander N, Wagner A, Oberhardt M, Ruppin E. Data-Driven Metabolic Pathway Compositions Enhance Cancer Survival Prediction. PLoS Comput Biol 2016; 12:e1005125. [PMID: 27673682 PMCID: PMC5038951 DOI: 10.1371/journal.pcbi.1005125] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 08/30/2016] [Indexed: 12/31/2022] Open
Abstract
Altered cellular metabolism is an important characteristic and driver of cancer. Surprisingly, however, we find here that aggregating individual gene expression using canonical metabolic pathways fails to enhance the classification of noncancerous vs. cancerous tissues and the prediction of cancer patient survival. This supports the notion that metabolic alterations in cancer rewire cellular metabolism through unconventional pathways. Here we present MCF (Metabolic classifier and feature generator), which incorporates gene expression measurements into a human metabolic network to infer new cancer-mediated pathway compositions that enhance cancer vs. adjacent noncancerous tissue classification across five different cancer types. MCF outperforms standard classifiers based on individual gene expression and on canonical human curated metabolic pathways. It successfully builds robust classifiers integrating different datasets of the same cancer type. Reassuringly, the MCF pathways identified lead to metabolites known to be associated with the pertaining specific cancer types. Aggregating gene expression through MCF pathways leads to markedly better predictions of breast cancer patients’ survival in an independent cohort than using the canonical human metabolic pathways (C-index = 0.69 vs. 0.52, respectively). Notably, the survival predictive power of individual MCF pathways strongly correlates with their power in predicting cancer vs. noncancerous samples. The more predictive composite pathways identified via MCF are hence more likely to capture key metabolic alterations occurring in cancer than the canonical pathways characterizing healthy human metabolism. Cancer proliferating cells adapt their metabolism to support the conversion of available nutrients into biomass, which often involves an increased rate of specific metabolic pathways, such as glycolysis. Surprisingly, however, we observe that aggregating individual gene expression using canonical human metabolic pathways frequently fails to enhance the classification of noncancerous vs. cancerous tissues and in the task of predicting cancer patient survival. This supports the notion that metabolic alterations in cancer rewire cellular metabolism through unconventional pathways. Here we introduce a novel algorithm (MCF) that aims to identify these cancer-mediated ‘composite’ metabolic pathways by identifying those that best differentiate between cancerous vs. non-cancerous tissues gene expression. Remarkably, MCF successfully builds robust classifiers integrating different datasets of the same cancer type. We further show that the data-driven pathways identified by MCF, in contrast to the canonical literature-based pathways, successfully generate clinically relevant features that are predictive of breast cancer patients’ survival in an independent dataset. Our findings thus suggest that cancer metabolism may be rewired via non-standard composite pathways.
Collapse
Affiliation(s)
- Noam Auslander
- Center for Bioinformatics and Computational Biology and the Department of Computer Science, University of Maryland, College Park, Maryland, United States of America
- * E-mail: (NA); (ER)
| | - Allon Wagner
- Department of Electrical Engineering and Computer Science and the Center for Computational Biology, University of California, Berkeley, Berkeley, California, United States of America
| | - Matthew Oberhardt
- Center for Bioinformatics and Computational Biology and the Department of Computer Science, University of Maryland, College Park, Maryland, United States of America
| | - Eytan Ruppin
- Center for Bioinformatics and Computational Biology and the Department of Computer Science, University of Maryland, College Park, Maryland, United States of America
- The Blavatnik School of Computer Science and the Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
- * E-mail: (NA); (ER)
| |
Collapse
|
10
|
Henriques R, Madeira SC. BicNET: Flexible module discovery in large-scale biological networks using biclustering. Algorithms Mol Biol 2016; 11:14. [PMID: 27213009 PMCID: PMC4875761 DOI: 10.1186/s13015-016-0074-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Accepted: 04/22/2016] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Despite the recognized importance of module discovery in biological networks to enhance our understanding of complex biological systems, existing methods generally suffer from two major drawbacks. First, there is a focus on modules where biological entities are strongly connected, leading to the discovery of trivial/well-known modules and to the inaccurate exclusion of biological entities with subtler yet relevant roles. Second, there is a generalized intolerance towards different forms of noise, including uncertainty associated with less-studied biological entities (in the context of literature-driven networks) and experimental noise (in the context of data-driven networks). Although state-of-the-art biclustering algorithms are able to discover modules with varying coherency and robustness to noise, their application for the discovery of non-dense modules in biological networks has been poorly explored and it is further challenged by efficiency bottlenecks. METHODS This work proposes Biclustering NETworks (BicNET), a biclustering algorithm to discover non-trivial yet coherent modules in weighted biological networks with heightened efficiency. Three major contributions are provided. First, we motivate the relevance of discovering network modules given by constant, symmetric, plaid and order-preserving biclustering models. Second, we propose an algorithm to discover these modules and to robustly handle noisy and missing interactions. Finally, we provide new searches to tackle time and memory bottlenecks by effectively exploring the inherent structural sparsity of network data. RESULTS Results in synthetic network data confirm the soundness, efficiency and superiority of BicNET. The application of BicNET on protein interaction and gene interaction networks from yeast, E. coli and Human reveals new modules with heightened biological significance. CONCLUSIONS BicNET is, to our knowledge, the first method enabling the efficient unsupervised analysis of large-scale network data for the discovery of coherent modules with parameterizable homogeneity.
Collapse
Affiliation(s)
- Rui Henriques
- INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
| | - Sara C. Madeira
- INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
| |
Collapse
|
11
|
Alevyzaki A, Sfakianakis S, Bei ES, Obermayr E, Zeillinger R, Fotiadis D, Zervakis M. Biclustering strategies for genetic marker selection in gynecologic tumor cell lines. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016; 2016:1430-1433. [PMID: 28324944 DOI: 10.1109/embc.2016.7590977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Over the past few decades great interest has been focused on cell lines derived from tumors, because of their usability as models to understand the biology of cancer. At the same time, advanced technologies such as DNA-microarrays have been broadly used to study the expression level of thousands of genes in primary tumors or cancer cell lines in a single experiment. Results from microarray analysis approaches have provided valuable insights into the underlying biology and proven useful for tumor classification, prognostication and prediction. Our approach utilizes biclustering methods for the discovery of genes with coherent expression across a subset of conditions (cell lines of a tumor type). More specifically, we present a novel modification on Cheng & Church's algorithm that searches for differences across the studied conditions, but also enforces consistent intensity characteristics of each cluster within each condition. The application of this approach on a gynecologic panel of cell lines succeeds to derive discriminant groups of compact bi-clusters across four types of tumor cell lines. In this form, the proposed approach is proven efficient for the derivation of tumor-specific markers.
Collapse
|
12
|
Allahyar A, de Ridder J. FERAL: network-based classifier with application to breast cancer outcome prediction. Bioinformatics 2015; 31:i311-9. [PMID: 26072498 PMCID: PMC4765883 DOI: 10.1093/bioinformatics/btv255] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
MOTIVATION Breast cancer outcome prediction based on gene expression profiles is an important strategy for personalize patient care. To improve performance and consistency of discovered markers of the initial molecular classifiers, network-based outcome prediction methods (NOPs) have been proposed. In spite of the initial claims, recent studies revealed that neither performance nor consistency can be improved using these methods. NOPs typically rely on the construction of meta-genes by averaging the expression of several genes connected in a network that encodes protein interactions or pathway information. In this article, we expose several fundamental issues in NOPs that impede on the prediction power, consistency of discovered markers and obscures biological interpretation. RESULTS To overcome these issues, we propose FERAL, a network-based classifier that hinges upon the Sparse Group Lasso which performs simultaneous selection of marker genes and training of the prediction model. An important feature of FERAL, and a significant departure from existing NOPs, is that it uses multiple operators to summarize genes into meta-genes. This gives the classifier the opportunity to select the most relevant meta-gene for each gene set. Extensive evaluation revealed that the discovered markers are markedly more stable across independent datasets. Moreover, interpretation of the marker genes detected by FERAL reveals valuable mechanistic insight into the etiology of breast cancer. AVAILABILITY AND IMPLEMENTATION All code is available for download at: http://homepage.tudelft.nl/53a60/resources/FERAL/FERAL.zip.
Collapse
Affiliation(s)
- Amin Allahyar
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
| | - Jeroen de Ridder
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
| |
Collapse
|
13
|
Cha K, Hwang T, Oh K, Yi GS. Discovering transnosological molecular basis of human brain diseases using biclustering analysis of integrated gene expression data. BMC Med Inform Decis Mak 2015; 15 Suppl 1:S7. [PMID: 26043779 PMCID: PMC4460778 DOI: 10.1186/1472-6947-15-s1-s7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND It has been reported that several brain diseases can be treated as transnosological manner implicating possible common molecular basis under those diseases. However, molecular level commonality among those brain diseases has been largely unexplored. Gene expression analyses of human brain have been used to find genes associated with brain diseases but most of those studies were restricted either to an individual disease or to a couple of diseases. In addition, identifying significant genes in such brain diseases mostly failed when it used typical methods depending on differentially expressed genes. RESULTS In this study, we used a correlation-based biclustering approach to find coexpressed gene sets in five neurodegenerative diseases and three psychiatric disorders. By using biclustering analysis, we could efficiently and fairly identified various gene sets expressed specifically in both single and multiple brain diseases. We could find 4,307 gene sets correlatively expressed in multiple brain diseases and 3,409 gene sets exclusively specified in individual brain diseases. The function enrichment analysis of those gene sets showed many new possible functional bases as well as neurological processes that are common or specific for those eight diseases. CONCLUSIONS This study introduces possible common molecular bases for several brain diseases, which open the opportunity to clarify the transnosological perspective assumed in brain diseases. It also showed the advantages of correlation-based biclustering analysis and accompanying function enrichment analysis for gene expression data in this type of investigation.
Collapse
|
14
|
Nepomuceno JA, Troncoso A, Nepomuceno-Chamorro IA, Aguilar-Ruiz JS. Integrating biological knowledge based on functional annotations for biclustering of gene expression data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2015; 119:163-80. [PMID: 25843807 DOI: 10.1016/j.cmpb.2015.02.010] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2014] [Revised: 02/17/2015] [Accepted: 02/27/2015] [Indexed: 05/06/2023]
Abstract
Gene expression data analysis is based on the assumption that co-expressed genes imply co-regulated genes. This assumption is being reformulated because the co-expression of a group of genes may be the result of an independent activation with respect to the same experimental condition and not due to the same regulatory regime. For this reason, traditional techniques are recently being improved with the use of prior biological knowledge from open-access repositories together with gene expression data. Biclustering is an unsupervised machine learning technique that searches patterns in gene expression data matrices. A scatter search-based biclustering algorithm that integrates biological information is proposed in this paper. In addition to the gene expression data matrix, the input of the algorithm is only a direct annotation file that relates each gene to a set of terms from a biological repository where genes are annotated. Two different biological measures, FracGO and SimNTO, are proposed to integrate this information by means of its addition to-be-optimized fitness function in the scatter search scheme. The measure FracGO is based on the biological enrichment and SimNTO is based on the overlapping among GO annotations of pairs of genes. Experimental results evaluate the proposed algorithm for two datasets and show the algorithm performs better when biological knowledge is integrated. Moreover, the analysis and comparison between the two different biological measures is presented and it is concluded that the differences depend on both the data source and how the annotation file has been built in the case GO is used. It is also shown that the proposed algorithm obtains a greater number of enriched biclusters than other classical biclustering algorithms typically used as benchmark and an analysis of the overlapping among biclusters reveals that the biclusters obtained present a low overlapping. The proposed methodology is a general-purpose algorithm which allows the integration of biological information from several sources and can be extended to other biclustering algorithms based on the optimization of a merit function.
Collapse
Affiliation(s)
- Juan A Nepomuceno
- Departamento de Lenguajes y Sistemas Informáticos, Universidad de Sevilla, Avd. Reina Mercedes s/n, 41012 Seville, Spain.
| | - Alicia Troncoso
- Department of Computer Engineering, Pablo de Olavide University, Ctra. Utrera km. 1, 41013 Seville, Spain
| | - Isabel A Nepomuceno-Chamorro
- Departamento de Lenguajes y Sistemas Informáticos, Universidad de Sevilla, Avd. Reina Mercedes s/n, 41012 Seville, Spain
| | - Jesús S Aguilar-Ruiz
- Department of Computer Engineering, Pablo de Olavide University, Ctra. Utrera km. 1, 41013 Seville, Spain
| |
Collapse
|
15
|
Bhat A, Dakna M, Mischak H. Integrating proteomics profiling data sets: a network perspective. Methods Mol Biol 2015; 1243:237-53. [PMID: 25384750 DOI: 10.1007/978-1-4939-1872-0_14] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Understanding disease mechanisms often requires complex and accurate integration of cellular pathways and molecular networks. Systems biology offers the possibility to provide a comprehensive map of the cell's intricate wiring network, which can ultimately lead to decipher the disease phenotype. Here, we describe what biological pathways are, how they function in normal and abnormal cellular systems, limitations faced by databases for integrating data, and highlight how network models are emerging as a powerful integrative framework to understand and interpret the roles of proteins and peptides in diseases.
Collapse
Affiliation(s)
- Akshay Bhat
- Mosaiques-Diagnostics GmbH, Mellendorfer Straße 7-9, D-30625, Hannover, Germany,
| | | | | |
Collapse
|
16
|
Wang X, Qian H, Zhang S. Discovery of significant pathways in breast cancer metastasis via module extraction and comparison. IET Syst Biol 2014; 8:47-55. [PMID: 25014225 PMCID: PMC8687293 DOI: 10.1049/iet-syb.2013.0041] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2013] [Revised: 12/03/2013] [Accepted: 12/30/2013] [Indexed: 09/29/2023] Open
Abstract
Discovering significant pathways rather than single genes or small gene sets involved in metastasis is becoming more and more important in the study of breast cancer. Many researches have shed light on this problem. However, most of the existing works are relying on some priori biological information, which may bring bias to the models. The authors propose a new method that detects metastasis-related pathways by identifying and comparing modules in metastasis and non-metastasis gene co-expression networks. The gene co-expression networks are built by Pearson correlation coefficients, and then the modules inferred in these two networks are compared. In metastasis and non-metastasis networks, 36 and 41 significant modules are identified. Also, 27.8% (metastasis) and 29.3% (non-metastasis) of the modules are enriched significantly for one or several pathways with p-value <0.05. Many breast cancer genes including RB1, CCND1 and TP53 are included in these identified pathways. Five significant pathways are discovered only in metastasis network: glycolysis pathway, cell adhesion molecules, focal adhesion, stathmin and breast cancer resistance to antimicrotubule agents, and cytosolic DNA-sensing pathway. The first three pathways have been proved to be closely associated with metastasis. The rest two can be taken as a guide for future research in breast cancer metastasis.
Collapse
Affiliation(s)
- Xiaochen Wang
- School of Mathematical Sciences, Fudan University, Shanghai 200433, People's Republic of China
| | - Huajie Qian
- School of Mathematical Sciences, Fudan University, Shanghai 200433, People's Republic of China
| | - Shuqin Zhang
- Center for Computational Systems Biology, School of Mathematical Sciences, Fudan University Shanghai, Shanghai 200433, People's Republic of China.
| |
Collapse
|
17
|
Frantzi M, Bhat A, Latosinska A. Clinical proteomic biomarkers: relevant issues on study design & technical considerations in biomarker development. Clin Transl Med 2014; 3:7. [PMID: 24679154 PMCID: PMC3994249 DOI: 10.1186/2001-1326-3-7] [Citation(s) in RCA: 97] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2013] [Accepted: 03/06/2014] [Indexed: 12/11/2022] Open
Abstract
Biomarker research is continuously expanding in the field of clinical proteomics. A combination of different proteomic-based methodologies can be applied depending on the specific clinical context of use. Moreover, current advancements in proteomic analytical platforms are leading to an expansion of biomarker candidates that can be identified. Specifically, mass spectrometric techniques could provide highly valuable tools for biomarker research. Ideally, these advances could provide with biomarkers that are clinically applicable for disease diagnosis and/ or prognosis. Unfortunately, in general the biomarker candidates fail to be implemented in clinical decision making. To improve on this current situation, a well-defined study design has to be established driven by a clear clinical need, while several checkpoints between the different phases of discovery, verification and validation have to be passed in order to increase the probability of establishing valid biomarkers. In this review, we summarize the technical proteomic platforms that are available along the different stages in the biomarker discovery pipeline, exemplified by clinical applications in the field of bladder cancer biomarker research.
Collapse
Affiliation(s)
- Maria Frantzi
- Mosaiques Diagnostics GmbH, Mellendorfer Strasse 7-9, D-30625 Hannover, Germany
- Biotechnology Division, Biomedical Research Foundation Academy of Athens, Soranou Ephessiou 4, 115 27 Athens, Greece
| | - Akshay Bhat
- Mosaiques Diagnostics GmbH, Mellendorfer Strasse 7-9, D-30625 Hannover, Germany
- Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Agnieszka Latosinska
- Biotechnology Division, Biomedical Research Foundation Academy of Athens, Soranou Ephessiou 4, 115 27 Athens, Greece
- Charité-Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
18
|
Laubenbacher R, Hinkelmann F, Murrugarra D, Veliz-Cuba A. Algebraic Models and Their Use in Systems Biology. DISCRETE AND TOPOLOGICAL MODELS IN MOLECULAR BIOLOGY 2014. [DOI: 10.1007/978-3-642-40193-0_21] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
19
|
Saha A, Tan AC, Kang J. Automatic context-specific subnetwork discovery from large interaction networks. PLoS One 2014; 9:e84227. [PMID: 24392115 PMCID: PMC3877685 DOI: 10.1371/journal.pone.0084227] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2013] [Accepted: 11/21/2013] [Indexed: 01/18/2023] Open
Abstract
Genes act in concert via specific networks to drive various biological processes, including progression of diseases such as cancer. Under different phenotypes, different subsets of the gene members of a network participate in a biological process. Single gene analyses are less effective in identifying such core gene members (subnetworks) within a gene set/network, as compared to gene set/network-based analyses. Hence, it is useful to identify a discriminative classifier by focusing on the subnetworks that correspond to different phenotypes. Here we present a novel algorithm to automatically discover the important subnetworks of closely interacting molecules to differentiate between two phenotypes (context) using gene expression profiles. We name it COSSY (COntext-Specific Subnetwork discoverY). It is a non-greedy algorithm and thus unlikely to have local optima problems. COSSY works for any interaction network regardless of the network topology. One added benefit of COSSY is that it can also be used as a highly accurate classification platform which can produce a set of interpretable features.
Collapse
Affiliation(s)
- Ashis Saha
- Department of Computer Science and Engineering, Korea University, Seoul, Korea
| | - Aik Choon Tan
- Department of Medicine/Medical Oncology, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| | - Jaewoo Kang
- Department of Computer Science and Engineering, Korea University, Seoul, Korea
- Interdisciplinary Graduate Program in Bioinformatics, Korea University, Seoul, Korea
| |
Collapse
|
20
|
Staiger C, Cadot S, Györffy B, Wessels LFA, Klau GW. Current composite-feature classification methods do not outperform simple single-genes classifiers in breast cancer prognosis. Front Genet 2013; 4:289. [PMID: 24391662 PMCID: PMC3870302 DOI: 10.3389/fgene.2013.00289] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2013] [Accepted: 11/28/2013] [Indexed: 01/21/2023] Open
Abstract
Integrating gene expression data with secondary data such as pathway or protein-protein interaction data has been proposed as a promising approach for improved outcome prediction of cancer patients. Methods employing this approach usually aggregate the expression of genes into new composite features, while the secondary data guide this aggregation. Previous studies were limited to few data sets with a small number of patients. Moreover, each study used different data and evaluation procedures. This makes it difficult to objectively assess the gain in classification performance. Here we introduce the Amsterdam Classification Evaluation Suite (ACES). ACES is a Python package to objectively evaluate classification and feature-selection methods and contains methods for pooling and normalizing Affymetrix microarrays from different studies. It is simple to use and therefore facilitates the comparison of new approaches to best-in-class approaches. In addition to the methods described in our earlier study (Staiger et al., 2012), we have included two prominent prognostic gene signatures specific for breast cancer outcome, one more composite feature selection method and two network-based gene ranking methods. Employing the evaluation pipeline we show that current composite-feature classification methods do not outperform simple single-genes classifiers in predicting outcome in breast cancer. Furthermore, we find that also the stability of features across different data sets is not higher for composite features. Most stunningly, we observe that prediction performances are not affected when extracting features from randomized PPI networks.
Collapse
Affiliation(s)
- Christine Staiger
- Life Sciences, Centrum Wiskunde & Informatica Amsterdam, Netherlands ; Computational Cancer Biology, Division of Molecular Carcinogenesis, Netherlands Cancer Institute Amsterdam, Netherlands
| | - Sidney Cadot
- Computational Cancer Biology, Division of Molecular Carcinogenesis, Netherlands Cancer Institute Amsterdam, Netherlands
| | - Balázs Györffy
- Research Laboratory of Pediatrics and Nephrology, Hungarian Academy of Sciences Budapest, Hungary
| | - Lodewyk F A Wessels
- Computational Cancer Biology, Division of Molecular Carcinogenesis, Netherlands Cancer Institute Amsterdam, Netherlands ; Cancer Systems Biology Center, Netherlands Cancer Institute Amsterdam, Netherlands ; Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, TU Delft Delft, Netherlands
| | - Gunnar W Klau
- Life Sciences, Centrum Wiskunde & Informatica Amsterdam, Netherlands ; Operations Research and Bioinformatics, Faculty of Sciences, VU University Amsterdam Amsterdam, Netherlands
| |
Collapse
|
21
|
Alroobi R, Ahmed S, Salem S. Mining maximal cohesive induced subnetworks and patterns by integrating biological networks with gene profile data. Interdiscip Sci 2013; 5:211-24. [DOI: 10.1007/s12539-013-0168-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Revised: 03/30/2013] [Accepted: 06/12/2013] [Indexed: 01/28/2023]
|
22
|
Cho DY, Przytycka TM. Dissecting cancer heterogeneity with a probabilistic genotype-phenotype model. Nucleic Acids Res 2013; 41:8011-20. [PMID: 23821670 PMCID: PMC3783162 DOI: 10.1093/nar/gkt577] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2013] [Revised: 06/05/2013] [Accepted: 06/07/2013] [Indexed: 12/13/2022] Open
Abstract
One of the obstacles hindering a better understanding of cancer is its heterogeneity. However, computational approaches to model cancer heterogeneity have lagged behind. To bridge this gap, we have developed a new probabilistic approach that models individual cancer cases as mixtures of subtypes. Our approach can be seen as a meta-model that summarizes the results of a large number of alternative models. It does not assume predefined subtypes nor does it assume that such subtypes have to be sharply defined. Instead given a measure of phenotypic similarity between patients and a list of potential explanatory features, such as mutations, copy number variation, microRNA levels, etc., it explains phenotypic similarities with the help of these features. We applied our approach to Glioblastoma Multiforme (GBM). The resulting model Prob_GBM, not only correctly inferred known relationships but also identified new properties underlining phenotypic similarities. The proposed probabilistic framework can be applied to model relations between similarity of gene expression and a broad spectrum of potential genetic causes.
Collapse
Affiliation(s)
| | - Teresa M. Przytycka
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
23
|
Kessler T, Hache H, Wierling C. Integrative analysis of cancer-related signaling pathways. Front Physiol 2013; 4:124. [PMID: 23760067 PMCID: PMC3671203 DOI: 10.3389/fphys.2013.00124] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2012] [Accepted: 05/12/2013] [Indexed: 12/11/2022] Open
Abstract
Identification and classification of cancer types and subtypes is a major issue in current cancer research. Whole genome expression profiling of cancer tissues is often the basis for such subtype classifications of tumors and different signatures for individual cancer types have been described. However, the search for best performing discriminatory gene-expression signatures covering more than one cancer type remains a relevant topic in cancer research as such a signature would help understanding the common changes in signaling networks in these disease types. In this work, we explore the idea of a top down approach for sample stratification based on a module-based network of cancer relevant signaling pathways. For assembly of this network, we consider several of the most established cancer pathways. We evaluate our sample stratification approach using expression data of human breast and ovarian cancer signatures. We show that our approach performs equally well to previously reported methods besides providing the advantage to classify different cancer types. Furthermore, it allows to identify common changes in network module activity of those cancer samples.
Collapse
Affiliation(s)
- Thomas Kessler
- Systems Biology Group, Department Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Hendrik Hache
- Systems Biology Group, Department Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Christoph Wierling
- Systems Biology Group, Department Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Berlin, Germany
| |
Collapse
|
24
|
Kim YA, Przytycka TM. Bridging the Gap between Genotype and Phenotype via Network Approaches. Front Genet 2013; 3:227. [PMID: 23755063 PMCID: PMC3668153 DOI: 10.3389/fgene.2012.00227] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2012] [Accepted: 10/10/2012] [Indexed: 11/15/2022] Open
Abstract
In the last few years we have witnessed tremendous progress in detecting associations between genetic variations and complex traits. While genome-wide association studies have been able to discover genomic regions that may influence many common human diseases, these discoveries created an urgent need for methods that extend the knowledge of genotype-phenotype relationships to the level of the molecular mechanisms behind them. To address this emerging need, computational approaches increasingly utilize a pathway-centric perspective. These new methods often utilize known or predicted interactions between genes and/or gene products. In this review, we survey recently developed network based methods that attempt to bridge the genotype-phenotype gap. We note that although these methods help narrow the gap between genotype and phenotype relationships, these approaches alone cannot provide the precise details of underlying mechanisms and current research is still far from closing the gap.
Collapse
Affiliation(s)
- Yoo-Ah Kim
- National Center for Biotechnology Information, National Institutes of Health, National Library of Medicine Bethesda, MD, USA
| | | |
Collapse
|
25
|
Dand N, Sprengel F, Ahlers V, Schlitt T. BioGranat-IG: a network analysis tool to suggest mechanisms of genetic heterogeneity from exome-sequencing data. ACTA ACUST UNITED AC 2013; 29:733-41. [PMID: 23361329 DOI: 10.1093/bioinformatics/btt045] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
MOTIVATION Recent exome-sequencing studies have successfully identified disease-causing sequence variants for several rare monogenic diseases by examining variants common to a group of patients. However, the current data analysis strategies are only insufficiently able to deal with confounding factors such as genetic heterogeneity, incomplete penetrance, individuals lacking data and involvement of several genes. RESULTS We introduce BioGranat-IG, an analysis strategy that incorporates the information contained in biological networks to the analysis of exome-sequencing data. To identify genes that may have a disease-causing role, we label all nodes of the network according to the individuals that are carrying a sequence variant and subsequently identify small subnetworks linked to all or most individuals. Using simulated exome-sequencing data, we demonstrate that BioGranat-IG is able to recover the genes responsible for two diseases known to be caused by variants in an underlying complex. We also examine the performance of BioGranat-IG under various conditions likely to be faced by the user, and show that its network-based approach is more powerful than a set-cover-based approach.
Collapse
Affiliation(s)
- Nick Dand
- Department of Medical and Molecular Genetics, King's College London, London SE1 9RT, UK
| | | | | | | |
Collapse
|
26
|
Abstract
Complex diseases are caused by a combination of genetic and environmental factors. Uncovering the molecular pathways through which genetic factors affect a phenotype is always difficult, but in the case of complex diseases this is further complicated since genetic factors in affected individuals might be different. In recent years, systems biology approaches and, more specifically, network based approaches emerged as powerful tools for studying complex diseases. These approaches are often built on the knowledge of physical or functional interactions between molecules which are usually represented as an interaction network. An interaction network not only reports the binary relationships between individual nodes but also encodes hidden higher level organization of cellular communication. Computational biologists were challenged with the task of uncovering this organization and utilizing it for the understanding of disease complexity, which prompted rich and diverse algorithmic approaches to be proposed. We start this chapter with a description of the general characteristics of complex diseases followed by a brief introduction to physical and functional networks. Next we will show how these networks are used to leverage genotype, gene expression, and other types of data to identify dysregulated pathways, infer the relationships between genotype and phenotype, and explain disease heterogeneity. We group the methods by common underlying principles and first provide a high level description of the principles followed by more specific examples. We hope that this chapter will give readers an appreciation for the wealth of algorithmic techniques that have been developed for the purpose of studying complex diseases as well as insight into their strengths and limitations.
Collapse
Affiliation(s)
- Dong-Yeon Cho
- National Center for Biotechnology Information, NLM, NIH, Bethesda, Maryland, United States of America
| | - Yoo-Ah Kim
- National Center for Biotechnology Information, NLM, NIH, Bethesda, Maryland, United States of America
| | - Teresa M. Przytycka
- National Center for Biotechnology Information, NLM, NIH, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
27
|
Roy J, Winter C, Isik Z, Schroeder M. Network information improves cancer outcome prediction. Brief Bioinform 2012; 15:612-25. [PMID: 23255167 DOI: 10.1093/bib/bbs083] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Disease progression in cancer can vary substantially between patients. Yet, patients often receive the same treatment. Recently, there has been much work on predicting disease progression and patient outcome variables from gene expression in order to personalize treatment options. Despite first diagnostic kits in the market, there are open problems such as the choice of random gene signatures or noisy expression data. One approach to deal with these two problems employs protein-protein interaction networks and ranks genes using the random surfer model of Google's PageRank algorithm. In this work, we created a benchmark dataset collection comprising 25 cancer outcome prediction datasets from literature and systematically evaluated the use of networks and a PageRank derivative, NetRank, for signature identification. We show that the NetRank performs significantly better than classical methods such as fold change or t-test. Despite an order of magnitude difference in network size, a regulatory and protein-protein interaction network perform equally well. Experimental evaluation on cancer outcome prediction in all of the 25 underlying datasets suggests that the network-based methodology identifies highly overlapping signatures over all cancer types, in contrast to classical methods that fail to identify highly common gene sets across the same cancer types. Integration of network information into gene expression analysis allows the identification of more reliable and accurate biomarkers and provides a deeper understanding of processes occurring in cancer development and progression.
Collapse
|
28
|
Erten S, Chowdhury SA, Guan X, Nibbe RK, Barnholtz-Sloan JS, Chance MR, Koyutürk M. Identifying stage-specific protein subnetworks for colorectal cancer. BMC Proc 2012; 6 Suppl 7:S1. [PMID: 23173715 PMCID: PMC3504924 DOI: 10.1186/1753-6561-6-s7-s1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background In recent years, many algorithms have been developed for network-based analysis of differential gene expression in complex diseases. These algorithms use protein-protein interaction (PPI) networks as an integrative framework and identify subnetworks that are coordinately dysregulated in the phenotype of interest. Motivation While such dysregulated subnetworks have demonstrated significant improvement over individual gene markers for classifying phenotype, the current state-of-the-art in dysregulated subnetwork discovery is almost exclusively limited to binary phenotype classes. However, many clinical applications require identification of molecular markers for multiple classes. Approach We consider the problem of discovering groups of genes whose expression signatures can discriminate multiple phenotype classes. We consider two alternate formulations of this problem (i) an all-vs-all approach that aims to discover subnetworks distinguishing all classes, (ii) a one-vs-all approach that aims to discover subnetworks distinguishing each class from the rest of the classes. For the one-vs-all formulation, we develop a set-cover based algorithm, which aims to identify groups of genes such that at least one gene in the group exhibits differential expression in the target class. Results We test the proposed algorithms in the context of predicting stages of colorectal cancer. Our results show that the set-cover based algorithm identifying "stage-specific" subnetworks outperforms the all-vs-all approaches in classification. We also investigate the merits of utilizing PPI networks in the search for multiple markers, and show that, with correct parameter settings, network-guided search improves performance. Furthermore, we show that assessing statistical significance when selecting features greatly improves classification performance.
Collapse
Affiliation(s)
- Sinan Erten
- Department of Electrical Engineering & Computer Science, Case Western Reserve University, Cleveland, OH, USA.
| | | | | | | | | | | | | |
Collapse
|
29
|
Gao S, Jia S, Hessner MJ, Wang X. Predicting disease-related subnetworks for type 1 diabetes using a new network activity score. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2012; 16:566-78. [PMID: 22917479 DOI: 10.1089/omi.2012.0029] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
In this study we investigated the advantage of including network information in prioritizing disease genes of type 1 diabetes (T1D). First, a naïve Bayesian network (NBN) model was developed to integrate information from multiple data sources and to define a T1D-involvement probability score (PS) for each individual gene. The algorithm was validated using known functional candidate genes as a benchmark. Genes with higher PS were found to be more likely to appear in T1D-related publications. Next a new network activity metric was proposed to evaluate the T1D relevance of protein-protein interaction (PPI) subnetworks. The metric considered the contribution both from individual genes and from network topological characteristics. The predictions were confirmed by several independent datasets, including a genome wide association study (GWAS), and two large-scale human gene expression studies. We found that novel candidate genes in the T1D subnetworks showed more significant associations with T1D than genes predicted using PS alone. Interestingly, most novel candidates were not encoded within the human leukocyte antigen (HLA) region, and their expression levels showed correlation with disease only in cohorts with low-risk HLA genotypes. The results suggested the importance of mapping disease gene networks in dissecting the genetics of complex diseases, and offered a general approach to network-based disease gene prioritization from multiple data sources.
Collapse
Affiliation(s)
- Shouguo Gao
- Department of Physics, the University of Alabama at Birmingham, Birmingham, Alabama 35294, USA
| | | | | | | |
Collapse
|
30
|
Bebek G, Koyutürk M, Price ND, Chance MR. Network biology methods integrating biological data for translational science. Brief Bioinform 2012; 13:446-59. [PMID: 22390873 PMCID: PMC3404396 DOI: 10.1093/bib/bbr075] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2011] [Revised: 11/29/2011] [Indexed: 12/29/2022] Open
Abstract
The explosion of biomedical data, both on the genomic and proteomic side as well as clinical data, will require complex integration and analysis to provide new molecular variables to better understand the molecular basis of phenotype. Currently, much data exist in silos and is not analyzed in frameworks where all data are brought to bear in the development of biomarkers and novel functional targets. This is beginning to change. Network biology approaches, which emphasize the interactions between genes, proteins and metabolites provide a framework for data integration such that genome, proteome, metabolome and other -omics data can be jointly analyzed to understand and predict disease phenotypes. In this review, recent advances in network biology approaches and results are identified. A common theme is the potential for network analysis to provide multiplexed and functionally connected biomarkers for analyzing the molecular basis of disease, thus changing our approaches to analyzing and modeling genome- and proteome-wide data.
Collapse
|
31
|
Staiger C, Cadot S, Kooter R, Dittrich M, Müller T, Klau GW, Wessels LFA. A critical evaluation of network and pathway-based classifiers for outcome prediction in breast cancer. PLoS One 2012; 7:e34796. [PMID: 22558100 PMCID: PMC3338754 DOI: 10.1371/journal.pone.0034796] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2011] [Accepted: 03/09/2012] [Indexed: 12/19/2022] Open
Abstract
Recently, several classifiers that combine primary tumor data, like gene expression data, and secondary data sources, such as protein-protein interaction networks, have been proposed for predicting outcome in breast cancer. In these approaches, new composite features are typically constructed by aggregating the expression levels of several genes. The secondary data sources are employed to guide this aggregation. Although many studies claim that these approaches improve classification performance over single genes classifiers, the gain in performance is difficult to assess. This stems mainly from the fact that different breast cancer data sets and validation procedures are employed to assess the performance. Here we address these issues by employing a large cohort of six breast cancer data sets as benchmark set and by performing an unbiased evaluation of the classification accuracies of the different approaches. Contrary to previous claims, we find that composite feature classifiers do not outperform simple single genes classifiers. We investigate the effect of (1) the number of selected features; (2) the specific gene set from which features are selected; (3) the size of the training set and (4) the heterogeneity of the data set on the performance of composite feature and single genes classifiers. Strikingly, we find that randomization of secondary data sources, which destroys all biological information in these sources, does not result in a deterioration in performance of composite feature classifiers. Finally, we show that when a proper correction for gene set size is performed, the stability of single genes sets is similar to the stability of composite feature sets. Based on these results there is currently no reason to prefer prognostic classifiers based on composite features over single genes classifiers for predicting outcome in breast cancer.
Collapse
Affiliation(s)
- Christine Staiger
- Centrum Wiskunde & Informatica, Life Sciences Group, The Netherlands
- Bioinformatics and Statistics, The Netherlands Cancer Institute, Amsterdam, The Netherlands
- * E-mail: (CS); (GWK); (LFAW)
| | - Sidney Cadot
- Bioinformatics and Statistics, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Raul Kooter
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft, The Netherlands
| | - Marcus Dittrich
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| | - Tobias Müller
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| | - Gunnar W. Klau
- Centrum Wiskunde & Informatica, Life Sciences Group, The Netherlands
- Netherlands Institute for Systems Biology, Amsterdam, The Netherlands
- * E-mail: (CS); (GWK); (LFAW)
| | - Lodewyk F. A. Wessels
- Bioinformatics and Statistics, The Netherlands Cancer Institute, Amsterdam, The Netherlands
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft, The Netherlands
- Cancer Systems Biology Center, The Netherlands Cancer Institute, Amsterdam, The Netherlands
- * E-mail: (CS); (GWK); (LFAW)
| |
Collapse
|
32
|
Rivera CG, Tyler BM, Murali TM. Sensitive detection of pathway perturbations in cancers. BMC Bioinformatics 2012; 13 Suppl 3:S9. [PMID: 22536907 PMCID: PMC3471354 DOI: 10.1186/1471-2105-13-s3-s9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background The normal functioning of a living cell is characterized by complex interaction networks involving many different types of molecules. Associations detected between diseases and perturbations in well-defined pathways within such interaction networks have the potential to illuminate the molecular mechanisms underlying disease progression and response to treatment. Results In this paper, we present a computational method that compares expression profiles of genes in cancer samples to samples from normal tissues in order to detect perturbations of pre-defined pathways in the cancer. In contrast to many previous methods, our scoring function approach explicitly takes into account the interactions between the gene products in a pathway. Moreover, we compute the sub-pathway that has the highest score, as opposed to merely computing the score for the entire pathway. We use a permutation test to assess the statistical significance of the most perturbed sub-pathway. We apply our method to 20 pathways in the Netpath database and to the Global Cancer Map of gene expression in 18 cancers. We demonstrate that our method yields more sensitive results than alternatives that do not consider interactions or measure the perturbation of a pathway as a whole. We perform a sensitivity analysis to show that our approach is robust to modest changes in the input data. Our method confirms numerous well-known connections between pathways and cancers. Conclusions Our results indicate that integrating differential gene expression with the interaction structure in a pathway is a powerful approach for detecting links between a cancer and the pathways perturbed in it. Our results also suggest that even well-studied pathways may be perturbed only partially in any given cancer. Further analysis of cancer-specific sub-pathways may shed new light on the similarities and differences between cancers.
Collapse
Affiliation(s)
- Corban G Rivera
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | | | | |
Collapse
|
33
|
Cun Y, Fröhlich H. Biomarker gene signature discovery integrating network knowledge. BIOLOGY 2012; 1:5-17. [PMID: 24832044 PMCID: PMC4011032 DOI: 10.3390/biology1010005] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2012] [Revised: 02/18/2012] [Accepted: 02/21/2012] [Indexed: 12/17/2022]
Abstract
Discovery of prognostic and diagnostic biomarker gene signatures for diseases, such as cancer, is seen as a major step towards a better personalized medicine. During the last decade various methods, mainly coming from the machine learning or statistical domain, have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinical diagnosis is the typical low reproducibility of these signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. Here we review the current state of research in this field by giving an overview about so-far proposed approaches.
Collapse
Affiliation(s)
- Yupeng Cun
- Bonn-Aachen International Center for IT (B-IT), Dahlmannstr. 2, 53113 Bonn, Germany.
| | - Holger Fröhlich
- Bonn-Aachen International Center for IT (B-IT), Dahlmannstr. 2, 53113 Bonn, Germany.
| |
Collapse
|
34
|
Dao P, Wang K, Collins C, Ester M, Lapuk A, Sahinalp SC. Optimally discriminative subnetwork markers predict response to chemotherapy. Bioinformatics 2011; 27:i205-13. [PMID: 21685072 PMCID: PMC3117373 DOI: 10.1093/bioinformatics/btr245] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Motivation: Molecular profiles of tumour samples have been widely and successfully used for classification problems. A number of algorithms have been proposed to predict classes of tumor samples based on expression profiles with relatively high performance. However, prediction of response to cancer treatment has proved to be more challenging and novel approaches with improved generalizability are still highly needed. Recent studies have clearly demonstrated the advantages of integrating protein–protein interaction (PPI) data with gene expression profiles for the development of subnetwork markers in classification problems. Results: We describe a novel network-based classification algorithm (OptDis) using color coding technique to identify optimally discriminative subnetwork markers. Focusing on PPI networks, we apply our algorithm to drug response studies: we evaluate our algorithm using published cohorts of breast cancer patients treated with combination chemotherapy. We show that our OptDis method improves over previously published subnetwork methods and provides better and more stable performance compared with other subnetwork and single gene methods. We also show that our subnetwork method produces predictive markers that are more reproducible across independent cohorts and offer valuable insight into biological processes underlying response to therapy. Availability: The implementation is available at: http://www.cs.sfu.ca/~pdao/personal/OptDis.html Contact:cenk@cs.sfu.ca; alapuk@prostatecentre.com; ccollins@prostatecentre.com
Collapse
Affiliation(s)
- Phuong Dao
- School of Computing Science, Simon Fraser University
| | | | | | | | | | | |
Collapse
|