1
|
Bhuva DD, Cursons J, Smyth GK, Davis MJ. Differential co-expression-based detection of conditional relationships in transcriptional data: comparative analysis and application to breast cancer. Genome Biol 2019; 20:236. [PMID: 31727119 PMCID: PMC6857226 DOI: 10.1186/s13059-019-1851-8] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 10/02/2019] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Elucidation of regulatory networks, including identification of regulatory mechanisms specific to a given biological context, is a key aim in systems biology. This has motivated the move from co-expression to differential co-expression analysis and numerous methods have been developed subsequently to address this task; however, evaluation of methods and interpretation of the resulting networks has been hindered by the lack of known context-specific regulatory interactions. RESULTS In this study, we develop a simulator based on dynamical systems modelling capable of simulating differential co-expression patterns. With the simulator and an evaluation framework, we benchmark and characterise the performance of inference methods. Defining three different levels of "true" networks for each simulation, we show that accurate inference of causation is difficult for all methods, compared to inference of associations. We show that a z-score-based method has the best general performance. Further, analysis of simulation parameters reveals five network and simulation properties that explained the performance of methods. The evaluation framework and inference methods used in this study are available in the dcanr R/Bioconductor package. CONCLUSIONS Our analysis of networks inferred from simulated data show that hub nodes are more likely to be differentially regulated targets than transcription factors. Based on this observation, we propose an interpretation of the inferred differential network that can reconstruct a putative causal network.
Collapse
Affiliation(s)
- Dharmesh D Bhuva
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia.,School of Mathematics and Statistics, Faculty of Science, University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Joseph Cursons
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia.,Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Gordon K Smyth
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia.,School of Mathematics and Statistics, Faculty of Science, University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Melissa J Davis
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia. .,Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Melbourne, VIC, 3010, Australia. .,Department of Clinical Pathology, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Melbourne, VIC, 3010, Australia.
| |
Collapse
|
2
|
Yan J, Risacher SL, Shen L, Saykin AJ. Network approaches to systems biology analysis of complex disease: integrative methods for multi-omics data. Brief Bioinform 2019; 19:1370-1381. [PMID: 28679163 DOI: 10.1093/bib/bbx066] [Citation(s) in RCA: 107] [Impact Index Per Article: 21.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2017] [Indexed: 11/14/2022] Open
Abstract
In the past decade, significant progress has been made in complex disease research across multiple omics layers from genome, transcriptome and proteome to metabolome. There is an increasing awareness of the importance of biological interconnections, and much success has been achieved using systems biology approaches. However, because of the typical focus on one single omics layer at a time, existing systems biology findings explain only a modest portion of complex disease. Recent advances in multi-omics data collection and sharing present us new opportunities for studying complex diseases in a more comprehensive fashion, and yet simultaneously create new challenges considering the unprecedented data dimensionality and diversity. Here, our goal is to review extant and emerging network approaches that can be applied across multiple biological layers to facilitate a more comprehensive and integrative multilayered omics analysis of complex diseases.
Collapse
Affiliation(s)
- Jingwen Yan
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis, USA
| | - Shannon L Risacher
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, USA
| | - Li Shen
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, USA
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, USA
| |
Collapse
|
3
|
Oulas A, Minadakis G, Zachariou M, Sokratous K, Bourdakou MM, Spyrou GM. Systems Bioinformatics: increasing precision of computational diagnostics and therapeutics through network-based approaches. Brief Bioinform 2019; 20:806-824. [PMID: 29186305 PMCID: PMC6585387 DOI: 10.1093/bib/bbx151] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Revised: 10/17/2017] [Indexed: 02/01/2023] Open
Abstract
Systems Bioinformatics is a relatively new approach, which lies in the intersection of systems biology and classical bioinformatics. It focuses on integrating information across different levels using a bottom-up approach as in systems biology with a data-driven top-down approach as in bioinformatics. The advent of omics technologies has provided the stepping-stone for the emergence of Systems Bioinformatics. These technologies provide a spectrum of information ranging from genomics, transcriptomics and proteomics to epigenomics, pharmacogenomics, metagenomics and metabolomics. Systems Bioinformatics is the framework in which systems approaches are applied to such data, setting the level of resolution as well as the boundary of the system of interest and studying the emerging properties of the system as a whole rather than the sum of the properties derived from the system's individual components. A key approach in Systems Bioinformatics is the construction of multiple networks representing each level of the omics spectrum and their integration in a layered network that exchanges information within and between layers. Here, we provide evidence on how Systems Bioinformatics enhances computational therapeutics and diagnostics, hence paving the way to precision medicine. The aim of this review is to familiarize the reader with the emerging field of Systems Bioinformatics and to provide a comprehensive overview of its current state-of-the-art methods and technologies. Moreover, we provide examples of success stories and case studies that utilize such methods and tools to significantly advance research in the fields of systems biology and systems medicine.
Collapse
Affiliation(s)
- Anastasis Oulas
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - George Minadakis
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Margarita Zachariou
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Kleitos Sokratous
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Marilena M Bourdakou
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - George M Spyrou
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| |
Collapse
|
4
|
Henriques D, Villaverde AF, Rocha M, Saez-Rodriguez J, Banga JR. Data-driven reverse engineering of signaling pathways using ensembles of dynamic models. PLoS Comput Biol 2017; 13:e1005379. [PMID: 28166222 PMCID: PMC5319798 DOI: 10.1371/journal.pcbi.1005379] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Revised: 02/21/2017] [Accepted: 01/24/2017] [Indexed: 11/19/2022] Open
Abstract
Despite significant efforts and remarkable progress, the inference of signaling networks from experimental data remains very challenging. The problem is particularly difficult when the objective is to obtain a dynamic model capable of predicting the effect of novel perturbations not considered during model training. The problem is ill-posed due to the nonlinear nature of these systems, the fact that only a fraction of the involved proteins and their post-translational modifications can be measured, and limitations on the technologies used for growing cells in vitro, perturbing them, and measuring their variations. As a consequence, there is a pervasive lack of identifiability. To overcome these issues, we present a methodology called SELDOM (enSEmbLe of Dynamic lOgic-based Models), which builds an ensemble of logic-based dynamic models, trains them to experimental data, and combines their individual simulations into an ensemble prediction. It also includes a model reduction step to prune spurious interactions and mitigate overfitting. SELDOM is a data-driven method, in the sense that it does not require any prior knowledge of the system: the interaction networks that act as scaffolds for the dynamic models are inferred from data using mutual information. We have tested SELDOM on a number of experimental and in silico signal transduction case-studies, including the recent HPN-DREAM breast cancer challenge. We found that its performance is highly competitive compared to state-of-the-art methods for the purpose of recovering network topology. More importantly, the utility of SELDOM goes beyond basic network inference (i.e. uncovering static interaction networks): it builds dynamic (based on ordinary differential equation) models, which can be used for mechanistic interpretations and reliable dynamic predictions in new experimental conditions (i.e. not used in the training). For this task, SELDOM's ensemble prediction is not only consistently better than predictions from individual models, but also often outperforms the state of the art represented by the methods used in the HPN-DREAM challenge.
Collapse
Affiliation(s)
- David Henriques
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, Vigo, Spain
| | - Alejandro F. Villaverde
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, Vigo, Spain
- Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - Miguel Rocha
- Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - Julio Saez-Rodriguez
- Joint Research Center for Computational Biomedicine, RWTH-Aachen University, Aachen, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Julio R. Banga
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, Vigo, Spain
| |
Collapse
|
5
|
Andrews MC, Cursons J, Hurley DG, Anaka M, Cebon JS, Behren A, Crampin EJ. Systems analysis identifies miR-29b regulation of invasiveness in melanoma. Mol Cancer 2016; 15:72. [PMID: 27852308 PMCID: PMC5112703 DOI: 10.1186/s12943-016-0554-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Accepted: 10/31/2016] [Indexed: 02/08/2023] Open
Abstract
Background In many cancers, microRNAs (miRs) contribute to metastatic progression by modulating phenotypic reprogramming processes such as epithelial-mesenchymal plasticity. This can be driven by miRs targeting multiple mRNA transcripts, inducing regulated changes across large sets of genes. The miR-target databases TargetScan and DIANA-microT predict putative relationships by examining sequence complementarity between miRs and mRNAs. However, it remains a challenge to identify which miR-mRNA interactions are active at endogenous expression levels, and of biological consequence. Methods We developed a workflow to integrate TargetScan and DIANA-microT predictions into the analysis of data-driven associations calculated from transcript abundance (RNASeq) data, specifically the mutual information and Pearson’s correlation metrics. We use this workflow to identify putative relationships of miR-mediated mRNA repression with strong support from both lines of evidence. Applying this approach systematically to a large, published collection of unique melanoma cell lines – the Ludwig Melbourne melanoma (LM-MEL) cell line panel – we identified putative miR-mRNA interactions that may contribute to invasiveness. This guided the selection of interactions of interest for further in vitro validation studies. Results Several miR-mRNA regulatory relationships supported by TargetScan and DIANA-microT demonstrated differential activity across cell lines of varying matrigel invasiveness. Strong negative statistical associations for these putative regulatory relationships were consistent with target mRNA inhibition by the miR, and suggest that differential activity of such miR-mRNA relationships contribute to differences in melanoma invasiveness. Many of these relationships were reflected across the skin cutaneous melanoma TCGA dataset, indicating that these observations also show graded activity across clinical samples. Several of these miRs are implicated in cancer progression (miR-211, -340, -125b, −221, and -29b). The specific role for miR-29b-3p in melanoma has not been well studied. We experimentally validated the predicted miR-29b-3p regulation of LAMC1 and PPIC and LASP1, and show that dysregulation of miR-29b-3p or these mRNA targets can influence cellular invasiveness in vitro. Conclusions This analytic strategy provides a comprehensive, systems-level approach to identify miR-mRNA regulation in high-throughput cancer data, identifies novel putative interactions with functional phenotypic relevance, and can be used to direct experimental resources for subsequent experimental validation. Computational scripts are available: http://github.com/uomsystemsbiology/LMMEL-miR-miner Electronic supplementary material The online version of this article (doi:10.1186/s12943-016-0554-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Miles C Andrews
- Olivia Newton-John Cancer Research Institute, Heidelberg, VIC, 3084, Australia.,Ludwig Institute for Cancer Research, Melbourne-Austin Branch, Cancer Immunobiology Laboratory, Heidelberg, VIC, 3084, Australia.,School of Cancer Medicine, La Trobe University, Heidelberg, VIC, 3084, Australia.,Department of Medicine, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Joseph Cursons
- Systems Biology Laboratory, University of Melbourne, Parkville, VIC, 3010, Australia.,ARC Centre of Excellence in Convergent Bio-Nano Science, University of Melbourne, Parkville, VIC, 3010, Australia.,School of Mathematics and Statistics, University of Melbourne, Parkville, VIC, 3010, Australia.,Centre for Systems Genomics, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Daniel G Hurley
- Systems Biology Laboratory, University of Melbourne, Parkville, VIC, 3010, Australia.,School of Mathematics and Statistics, University of Melbourne, Parkville, VIC, 3010, Australia.,Centre for Systems Genomics, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Matthew Anaka
- Ludwig Institute for Cancer Research, Melbourne-Austin Branch, Cancer Immunobiology Laboratory, Heidelberg, VIC, 3084, Australia.,Department of Medicine, University of Toronto, Toronto, ON, Canada
| | - Jonathan S Cebon
- Olivia Newton-John Cancer Research Institute, Heidelberg, VIC, 3084, Australia. .,Ludwig Institute for Cancer Research, Melbourne-Austin Branch, Cancer Immunobiology Laboratory, Heidelberg, VIC, 3084, Australia. .,School of Cancer Medicine, La Trobe University, Heidelberg, VIC, 3084, Australia. .,Department of Medicine, University of Melbourne, Parkville, VIC, 3010, Australia.
| | - Andreas Behren
- Olivia Newton-John Cancer Research Institute, Heidelberg, VIC, 3084, Australia. .,Ludwig Institute for Cancer Research, Melbourne-Austin Branch, Cancer Immunobiology Laboratory, Heidelberg, VIC, 3084, Australia. .,School of Cancer Medicine, La Trobe University, Heidelberg, VIC, 3084, Australia.
| | - Edmund J Crampin
- Department of Medicine, University of Melbourne, Parkville, VIC, 3010, Australia. .,Systems Biology Laboratory, University of Melbourne, Parkville, VIC, 3010, Australia. .,ARC Centre of Excellence in Convergent Bio-Nano Science, University of Melbourne, Parkville, VIC, 3010, Australia. .,School of Mathematics and Statistics, University of Melbourne, Parkville, VIC, 3010, Australia. .,Centre for Systems Genomics, University of Melbourne, Parkville, VIC, 3010, Australia.
| |
Collapse
|
6
|
Budden DM, Crampin EJ. Distributed gene expression modelling for exploring variability in epigenetic function. BMC Bioinformatics 2016; 17:446. [PMID: 27816056 PMCID: PMC5097851 DOI: 10.1186/s12859-016-1313-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Accepted: 10/25/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Predictive gene expression modelling is an important tool in computational biology due to the volume of high-throughput sequencing data generated by recent consortia. However, the scope of previous studies has been restricted to a small set of cell-lines or experimental conditions due an inability to leverage distributed processing architectures for large, sharded data-sets. RESULTS We present a distributed implementation of gene expression modelling using the MapReduce paradigm and prove that performance improves as a linear function of available processor cores. We then leverage the computational efficiency of this framework to explore the variability of epigenetic function across fifty histone modification data-sets from variety of cancerous and non-cancerous cell-lines. CONCLUSIONS We demonstrate that the genome-wide relationships between histone modifications and mRNA transcription are lineage, tissue and karyotype-invariant, and that models trained on matched -omics data from non-cancerous cell-lines are able to predict cancerous expression with equivalent genome-wide fidelity.
Collapse
Affiliation(s)
- David M Budden
- Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory, Cambridge, 02139, USA. .,Systems Biology Laboratory, Melbourne School of Engineering, the University of Melbourne, Parkville, 3010, Australia.
| | - Edmund J Crampin
- Systems Biology Laboratory, Melbourne School of Engineering, the University of Melbourne, Parkville, 3010, Australia.,ARC Centre of Excellence in Convergent Bio-Nano Science and Technology, Parkville, 3010, Australia.,Department of Mathematics and Statistics, the University of Melbourne, Parkville, 3010, Australia.,School of Medicine, the University of Melbourne, Parkville, 3010, Australia
| |
Collapse
|
7
|
Abstract
Modeling biology as classical problems in computer science allows researchers to leverage the wealth of theoretical advancements in this field. Despite countless studies presenting heuristics that report improvement on specific benchmarking data, there has been comparatively little focus on exploring the theoretical bounds on the performance of practical (polynomial-time) algorithms. Conversely, theoretical studies tend to overstate the generalizability of their conclusions to physical biological processes. In this article we provide a fresh perspective on the concepts of NP-hardness and inapproximability in the computational biology domain, using popular sequence assembly and alignment (mapping) algorithms as illustrative examples. These algorithms exemplify how computer science theory can both (a) lead to substantial improvement in practical performance and (b) highlight areas ripe for future innovation. Importantly, we discuss caveats that seemingly allow the performance of heuristics to exceed their provable bounds.
Collapse
Affiliation(s)
- David Budden
- 1 Google, Inc. , Pyrmont, Australia .,2 Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology , Cambridge, Massachusetts
| | - Mitchell Jones
- 1 Google, Inc. , Pyrmont, Australia .,3 Department of Computer Science, University of Illinois at Urbana-Champaign
| |
Collapse
|
8
|
Information theoretic approaches for inference of biological networks from continuous-valued data. BMC SYSTEMS BIOLOGY 2016; 10:89. [PMID: 27599566 PMCID: PMC5013667 DOI: 10.1186/s12918-016-0331-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2016] [Accepted: 08/23/2016] [Indexed: 01/30/2023]
Abstract
Background Characterising programs of gene regulation by studying individual protein-DNA and protein-protein interactions would require a large volume of high-resolution proteomics data, and such data are not yet available. Instead, many gene regulatory network (GRN) techniques have been developed, which leverage the wealth of transcriptomic data generated by recent consortia to study indirect, gene-level relationships between transcriptional regulators. Despite the popularity of such methods, previous methods of GRN inference exhibit limitations that we highlight and address through the lens of information theory. Results We introduce new model-free and non-linear information theoretic measures for the inference of GRNs and other biological networks from continuous-valued data. Although previous tools have implemented mutual information as a means of inferring pairwise associations, they either introduce statistical bias through discretisation or are limited to modelling undirected relationships. Our approach overcomes both of these limitations, as demonstrated by a substantial improvement in empirical performance for a set of 160 GRNs of varying size and topology. Conclusions The information theoretic measures described in this study yield substantial improvements over previous approaches (e.g. ARACNE) and have been implemented in the latest release of NAIL (Network Analysis and Inference Library). However, despite the theoretical and empirical advantages of these new measures, they do not circumvent the fundamental limitation of indeterminacy exhibited across this class of biological networks. These methods have presently found value in computational neurobiology, and will likely gain traction for GRN analysis as the volume and quality of temporal transcriptomics data continues to improve.
Collapse
|
9
|
Abstract
Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely, support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabelled data for training. We investigated inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabelled data. We then applied our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluated the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms.
Collapse
|
10
|
Abstract
Behaviours of complex biomolecular systems are often irreducible to the elementary properties of their individual components. Explanatory and predictive mathematical models are therefore useful for fully understanding and precisely engineering cellular functions. The development and analyses of these models require their adaptation to the problems that need to be solved and the type and amount of available genetic or molecular data. Quantitative and logic modelling are among the main methods currently used to model molecular and gene networks. Each approach comes with inherent advantages and weaknesses. Recent developments show that hybrid approaches will become essential for further progress in synthetic biology and in the development of virtual organisms.
Collapse
Affiliation(s)
- Nicolas Le Novère
- Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK
| |
Collapse
|