1
|
Roth C, Venu V, Job V, Lubbers N, Sanbonmatsu KY, Steadman CR, Starkenburg SR. Improved quality metrics for association and reproducibility in chromatin accessibility data using mutual information. BMC Bioinformatics 2023; 24:441. [PMID: 37990143 PMCID: PMC10664258 DOI: 10.1186/s12859-023-05553-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 10/30/2023] [Indexed: 11/23/2023] Open
Abstract
BACKGROUND Correlation metrics are widely utilized in genomics analysis and often implemented with little regard to assumptions of normality, homoscedasticity, and independence of values. This is especially true when comparing values between replicated sequencing experiments that probe chromatin accessibility, such as assays for transposase-accessible chromatin via sequencing (ATAC-seq). Such data can possess several regions across the human genome with little to no sequencing depth and are thus non-normal with a large portion of zero values. Despite distributed use in the epigenomics field, few studies have evaluated and benchmarked how correlation and association statistics behave across ATAC-seq experiments with known differences or the effects of removing specific outliers from the data. Here, we developed a computational simulation of ATAC-seq data to elucidate the behavior of correlation statistics and to compare their accuracy under set conditions of reproducibility. RESULTS Using these simulations, we monitored the behavior of several correlation statistics, including the Pearson's R and Spearman's [Formula: see text] coefficients as well as Kendall's [Formula: see text] and Top-Down correlation. We also test the behavior of association measures, including the coefficient of determination R[Formula: see text], Kendall's W, and normalized mutual information. Our experiments reveal an insensitivity of most statistics, including Spearman's [Formula: see text], Kendall's [Formula: see text], and Kendall's W, to increasing differences between simulated ATAC-seq replicates. The removal of co-zeros (regions lacking mapped sequenced reads) between simulated experiments greatly improves the estimates of correlation and association. After removing co-zeros, the R[Formula: see text] coefficient and normalized mutual information display the best performance, having a closer one-to-one relationship with the known portion of shared, enhanced loci between simulated replicates. When comparing values between experimental ATAC-seq data using a random forest model, mutual information best predicts ATAC-seq replicate relationships. CONCLUSIONS Collectively, this study demonstrates how measures of correlation and association can behave in epigenomics experiments. We provide improved strategies for quantifying relationships in these increasingly prevalent and important chromatin accessibility assays.
Collapse
Affiliation(s)
- Cullen Roth
- Los Alamos National Laboratory, Genomics and Bioanalytics, Los Alamos, NM, USA.
| | - Vrinda Venu
- Los Alamos National Laboratory, Climate, Ecosystems, and Environmental Science, Los Alamos, NM, USA
| | - Vanessa Job
- Los Alamos National Laboratory, High Performance Computing and Design, Los Alamos, NM, USA
| | - Nicholas Lubbers
- Los Alamos National Laboratory, Information Sciences, Los Alamos, NM, USA
| | - Karissa Y Sanbonmatsu
- Los Alamos National Laboratory, Theoretical Biology and Biophysics, Los Alamos, NM, USA
| | - Christina R Steadman
- Los Alamos National Laboratory, Climate, Ecosystems, and Environmental Science, Los Alamos, NM, USA
| | - Shawn R Starkenburg
- Los Alamos National Laboratory, Genomics and Bioanalytics, Los Alamos, NM, USA
| |
Collapse
|
2
|
Mao G, Pang Z, Zuo K, Wang Q, Pei X, Chen X, Liu J. Predicting gene regulatory links from single-cell RNA-seq data using graph neural networks. Brief Bioinform 2023; 24:bbad414. [PMID: 37985457 PMCID: PMC10661972 DOI: 10.1093/bib/bbad414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 10/25/2023] [Accepted: 10/26/2023] [Indexed: 11/22/2023] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) has emerged as a powerful technique for studying gene expression patterns at the single-cell level. Inferring gene regulatory networks (GRNs) from scRNA-seq data provides insight into cellular phenotypes from the genomic level. However, the high sparsity, noise and dropout events inherent in scRNA-seq data present challenges for GRN inference. In recent years, the dramatic increase in data on experimentally validated transcription factors binding to DNA has made it possible to infer GRNs by supervised methods. In this study, we address the problem of GRN inference by framing it as a graph link prediction task. In this paper, we propose a novel framework called GNNLink, which leverages known GRNs to deduce the potential regulatory interdependencies between genes. First, we preprocess the raw scRNA-seq data. Then, we introduce a graph convolutional network-based interaction graph encoder to effectively refine gene features by capturing interdependencies between nodes in the network. Finally, the inference of GRN is obtained by performing matrix completion operation on node features. The features obtained from model training can be applied to downstream tasks such as measuring similarity and inferring causality between gene pairs. To evaluate the performance of GNNLink, we compare it with six existing GRN reconstruction methods using seven scRNA-seq datasets. These datasets encompass diverse ground truth networks, including functional interaction networks, Loss of Function/Gain of Function data, non-specific ChIP-seq data and cell-type-specific ChIP-seq data. Our experimental results demonstrate that GNNLink achieves comparable or superior performance across these datasets, showcasing its robustness and accuracy. Furthermore, we observe consistent performance across datasets of varying scales. For reproducibility, we provide the data and source code of GNNLink on our GitHub repository: https://github.com/sdesignates/GNNLink.
Collapse
Affiliation(s)
- Guo Mao
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Zhengbin Pang
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Ke Zuo
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Qinglin Wang
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Xiangdong Pei
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Xinhai Chen
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Jie Liu
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
- Laboratory of Software Engineering for Complex System, National University of Defense Technology, deya, 410073 Changsha, China
| |
Collapse
|
3
|
Kendall transformation brings a robust categorical representation of ordinal data. Sci Rep 2022; 12:8341. [PMID: 35585217 PMCID: PMC9117319 DOI: 10.1038/s41598-022-12224-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Accepted: 04/29/2022] [Indexed: 11/17/2022] Open
Abstract
Kendall transformation is a conversion of an ordered feature into a vector of pairwise order relations between individual values. This way, it preserves ranking of observations and represents it in a categorical form. Such transformation allows for generalisation of methods requiring strictly categorical input, especially in the limit of small number of observations, when quantisation becomes problematic. In particular, many approaches of information theory can be directly applied to Kendall-transformed continuous data without relying on differential entropy or any additional parameters. Moreover, by filtering information to this contained in ranking, Kendall transformation leads to a better robustness at a reasonable cost of dropping sophisticated interactions which are anyhow unlikely to be correctly estimated. In bivariate analysis, Kendall transformation can be related to popular non-parametric methods, showing the soundness of the approach. The paper also demonstrates its efficiency in multivariate problems, as well as provides an example analysis of a real-world data.
Collapse
|
4
|
Novelli L, Lizier JT. Inferring network properties from time series using transfer entropy and mutual information: Validation of multivariate versus bivariate approaches. Netw Neurosci 2021; 5:373-404. [PMID: 34189370 PMCID: PMC8233116 DOI: 10.1162/netn_a_00178] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 12/03/2020] [Indexed: 02/02/2023] Open
Abstract
Functional and effective networks inferred from time series are at the core of network neuroscience. Interpreting properties of these networks requires inferred network models to reflect key underlying structural features. However, even a few spurious links can severely distort network measures, posing a challenge for functional connectomes. We study the extent to which micro- and macroscopic properties of underlying networks can be inferred by algorithms based on mutual information and bivariate/multivariate transfer entropy. The validation is performed on two macaque connectomes and on synthetic networks with various topologies (regular lattice, small-world, random, scale-free, modular). Simulations are based on a neural mass model and on autoregressive dynamics (employing Gaussian estimators for direct comparison to functional connectivity and Granger causality). We find that multivariate transfer entropy captures key properties of all network structures for longer time series. Bivariate methods can achieve higher recall (sensitivity) for shorter time series but are unable to control false positives (lower specificity) as available data increases. This leads to overestimated clustering, small-world, and rich-club coefficients, underestimated shortest path lengths and hub centrality, and fattened degree distribution tails. Caution should therefore be used when interpreting network properties of functional connectomes obtained via correlation or pairwise statistical dependence measures, rather than more holistic (yet data-hungry) multivariate models. We compare bivariate and multivariate methods for inferring networks from time series, which are generated using a neural mass model and autoregressive dynamics. We assess their ability to reproduce key properties of the underlying structural network. Validation is performed on two macaque connectomes and on synthetic networks with various topologies (regular lattice, small-world, random, scale-free, modular). Even a few spurious links can severely bias key network properties. Multivariate transfer entropy performs best on all topologies for longer time series.
Collapse
Affiliation(s)
- Leonardo Novelli
- Centre for Complex Systems, Faculty of Engineering, University of Sydney, Sydney, Australia
| | - Joseph T Lizier
- Centre for Complex Systems, Faculty of Engineering, University of Sydney, Sydney, Australia
| |
Collapse
|
5
|
Seweryn MT, Pietrzak M, Ma Q. Application of information theoretical approaches to assess diversity and similarity in single-cell transcriptomics. Comput Struct Biotechnol J 2020; 18:1830-1837. [PMID: 32728406 PMCID: PMC7371753 DOI: 10.1016/j.csbj.2020.05.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 04/24/2020] [Accepted: 05/06/2020] [Indexed: 02/09/2023] Open
Abstract
Single-cell transcriptomics offers a powerful way to reveal the heterogeneity of individual cells. To date, many information theoretical approaches have been proposed to assess diversity and similarity, and characterize the latent heterogeneity in transcriptome data. Diversity implies gene expression variations and can facilitate the identification of signature genes; while, similarity unravels co-expression patterns for cell type clustering. In this review, we summarized 16 measures of information theory used for evaluating diversity and similarity in single-cell transcriptomic data, provide references and shed light on selected theoretical properties when there is a need to select proper measurements in general cases. We further provide an R package assembling discussed approaches to improve the researchers own single-cell transcriptome study. At last, we prospected further applications of diversity and similarity measures in support of depicting heterogeneity in single-cell multi-omics data.
Collapse
Affiliation(s)
- Michal T. Seweryn
- Center for Medical Genomics, Jagiellonian University, Cracow, Poland
| | - Maciej Pietrzak
- Department of Biomedical Informatics, The Ohio State University, Columbus OH, United States
| | - Qin Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus OH, United States
| |
Collapse
|
6
|
Rosenfeld CS, Hekman JP, Johnson JL, Lyu Z, Ortega MT, Joshi T, Mao J, Vladimirova AV, Gulevich RG, Kharlamova AV, Acland GM, Hecht EE, Wang X, Clark AG, Trut LN, Behura SK, Kukekova AV. Hypothalamic transcriptome of tame and aggressive silver foxes (Vulpes vulpes) identifies gene expression differences shared across brain regions. GENES BRAIN AND BEHAVIOR 2019; 19:e12614. [PMID: 31605445 DOI: 10.1111/gbb.12614] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 10/02/2019] [Accepted: 10/03/2019] [Indexed: 12/15/2022]
Abstract
The underlying neurological events accompanying dog domestication remain elusive. To reconstruct the domestication process in an experimental setting, silver foxes (Vulpes vulpes) have been deliberately bred for tame vs aggressive behaviors for more than 50 generations at the Institute for Cytology and Genetics in Novosibirsk, Russia. The hypothalamus is an essential part of the hypothalamic-pituitary-adrenal axis and regulates the fight-or-flight response, and thus, we hypothesized that selective breeding for tameness/aggressiveness has shaped the hypothalamic transcriptomic profile. RNA-seq analysis identified 70 differentially expressed genes (DEGs). Seven of these genes, DKKL1, FBLN7, NPL, PRIMPOL, PTGRN, SHCBP1L and SKIV2L, showed the same direction expression differences in the hypothalamus, basal forebrain and prefrontal cortex. The genes differentially expressed across the three tissues are involved in cell division, differentiation, adhesion and carbohydrate processing, suggesting an association of these processes with selective breeding. Additionally, 159 transcripts from the hypothalamus demonstrated differences in the abundance of alternative spliced forms between the tame and aggressive foxes. Weighted gene coexpression network analyses also suggested that gene modules in hypothalamus were significantly associated with tame vs aggressive behavior. Pathways associated with these modules include signal transduction, interleukin signaling, cytokine-cytokine receptor interaction and peptide ligand-binding receptors (eg, G-protein coupled receptor [GPCR] ligand binding). Current studies show the selection for tameness vs aggressiveness in foxes is associated with unique hypothalamic gene profiles partly shared with other brain regions and highlight DEGs involved in biological processes such as development, differentiation and immunological responses. The role of these processes in fox and dog domestication remains to be determined.
Collapse
Affiliation(s)
- Cheryl S Rosenfeld
- Bond Life Sciences Center, University of Missouri, Columbia, Missouri.,Biomedical Sciences, University of Missouri, Columbia, Missouri.,Thompson Center for Autism and Neurobehavioral Disorders, University of Missouri, Columbia, Missouri.,MU Informatics Institute, University of Missouri, Columbia, Missouri
| | - Jessica P Hekman
- Department of Animal Sciences, College of Agricultural, Consumer, and Environmental Sciences, University of Illinois, Urbana, Illinois.,The Broad Institute of MIT and Harvard, Cambridge, Massachusetts
| | - Jennifer L Johnson
- Department of Animal Sciences, College of Agricultural, Consumer, and Environmental Sciences, University of Illinois, Urbana, Illinois
| | - Zhen Lyu
- Department of Computer Science, University of Missouri, Columbia, Missouri
| | - Madison T Ortega
- Bond Life Sciences Center, University of Missouri, Columbia, Missouri.,Biomedical Sciences, University of Missouri, Columbia, Missouri
| | - Trupti Joshi
- Bond Life Sciences Center, University of Missouri, Columbia, Missouri.,MU Informatics Institute, University of Missouri, Columbia, Missouri.,Department of Computer Science, University of Missouri, Columbia, Missouri.,Department of Health Management and Informatics, University of Missouri, Columbia, Missouri
| | - Jiude Mao
- Bond Life Sciences Center, University of Missouri, Columbia, Missouri.,Biomedical Sciences, University of Missouri, Columbia, Missouri
| | - Anastasiya V Vladimirova
- The Laboratory of Evolutionary Genetics, Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Rimma G Gulevich
- The Laboratory of Evolutionary Genetics, Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Anastasiya V Kharlamova
- The Laboratory of Evolutionary Genetics, Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Gregory M Acland
- Baker Institute for Animal Health, Cornell University, College of Veterinary Medicine, Ithaca, New York
| | - Erin E Hecht
- Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts
| | - Xu Wang
- Department of Pathobiology, Auburn University, College of Veterinary Medicine, Auburn, Alabama
| | - Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York
| | - Lyudmila N Trut
- The Laboratory of Evolutionary Genetics, Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Susanta K Behura
- MU Informatics Institute, University of Missouri, Columbia, Missouri.,Division of Animal Sciences, University of Missouri, Columbia, Missouri
| | - Anna V Kukekova
- Department of Animal Sciences, College of Agricultural, Consumer, and Environmental Sciences, University of Illinois, Urbana, Illinois
| |
Collapse
|
7
|
Behura SK, Kelleher AM, Spencer TE. Evidence for functional interactions between the placenta and brain in pregnant mice. FASEB J 2019; 33:4261-4272. [PMID: 30521381 PMCID: PMC6404589 DOI: 10.1096/fj.201802037r] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Accepted: 11/12/2018] [Indexed: 12/19/2022]
Abstract
The placenta plays a pivotal role in the development of the fetal brain and also influences maternal brain function, but our understanding of communication between the placenta and brain remains limited. Using a gene expression and network analysis approach, we provide evidence that the placenta transcriptome is tightly interconnected with the maternal brain and fetal brain in d 15 pregnant C57BL/6J mice. Activation of serotonergic synapse signaling and inhibition of neurotrophin signaling were identified as potential mediators of crosstalk between the placenta and maternal brain and fetal brain, respectively. Genes encoding specific receptors and ligands were predicted to affect functional interactions between the placenta and brain. Paralogous genes, such as sex comb on midleg homolog 1/scm-like with 4 mbt domains 2 and polycomb group ring finger (Pcgf) 2/ Pcgf5, displayed antagonistic regulation between the placenta and brain. Additionally, conditional ablation of forkhead box a2 ( Foxa2) in the glands of the uterus altered the transcriptome of the d 15 placenta, which provides novel evidence of crosstalk between the uterine glands and placenta. Furthermore, expression of cathepsin 6 and monocyte to macrophage differentiation associated 2 was significantly different in the fetal brain of Foxa2 conditional knockout mice compared with control mice. These findings provide a better understanding of the intricacies of uterus-placenta-brain interactions during pregnancy and provide a foundation and model system for their exploration.-Behura, S. K., Kelleher, A. M., Spencer, T. E. Evidence for functional interactions between the placenta and brain in pregnant mice.
Collapse
Affiliation(s)
- Susanta K. Behura
- Division of Animal Sciences, University of Missouri, Columbia, Missouri, USA
- Informatics Institute, University of Missouri, Columbia, Missouri, USA; and
| | - Andrew M. Kelleher
- Division of Animal Sciences, University of Missouri, Columbia, Missouri, USA
| | - Thomas E. Spencer
- Division of Animal Sciences, University of Missouri, Columbia, Missouri, USA
- Department of Obstetrics, Gynecology, and Women’s Health, University of Missouri, Columbia, Missouri, USA
| |
Collapse
|
8
|
Villaverde AF, Becker K, Banga JR. PREMER: A Tool to Infer Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1193-1202. [PMID: 28981423 DOI: 10.1109/tcbb.2017.2758786] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Inferring the structure of unknown cellular networks is a main challenge in computational biology. Data-driven approaches based on information theory can determine the existence of interactions among network nodes automatically. However, the elucidation of certain features-such as distinguishing between direct and indirect interactions or determining the direction of a causal link-requires estimating information-theoretic quantities in a multidimensional space. This can be a computationally demanding task, which acts as a bottleneck for the application of elaborate algorithms to large-scale network inference problems. The computational cost of such calculations can be alleviated by the use of compiled programs and parallelization. To this end, we have developed PREMER (Parallel Reverse Engineering with Mutual information & Entropy Reduction), a software toolbox that can run in parallel and sequential environments. It uses information theoretic criteria to recover network topology and determine the strength and causality of interactions, and allows incorporating prior knowledge, imputing missing data, and correcting outliers. PREMER is a free, open source software tool that does not require any commercial software. Its core algorithms are programmed in FORTRAN 90 and implement OpenMP directives. It has user interfaces in Python and MATLAB/Octave, and runs on Windows, Linux, and OSX (https://sites.google.com/site/premertoolbox/).
Collapse
|
9
|
Darmon D, Rapp PE. Specific transfer entropy and other state-dependent transfer entropies for continuous-state input-output systems. Phys Rev E 2017; 96:022121. [PMID: 28950488 DOI: 10.1103/physreve.96.022121] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Indexed: 11/07/2022]
Abstract
Since its original formulation in 2000, transfer entropy has become an invaluable tool in the toolbox of nonlinear dynamicists working with empirical data. Transfer entropy and its generalizations provide a precise definition of uncertainty and information transfer that are central to the coupled systems studied in nonlinear science. However, a canonical definition of state-dependent transfer entropy has yet to be introduced. We introduce a candidate measure, the specific transfer entropy, and compare its properties to both total and local transfer entropy. Specific transfer entropy makes possible both state- and time-resolved analysis of the predictive impact of a candidate input system on a candidate output system. We also present principled methods for estimating total, local, and specific transfer entropies from empirical data. We demonstrate the utility of specific transfer entropy and our proposed estimation procedures with two model systems, and find that specific transfer entropy provides more, and more easily interpretable, information about an input-output system compared to currently existing methods.
Collapse
Affiliation(s)
- David Darmon
- Department of Military and Emergency Medicine, Uniformed Services University of the Health Sciences, Bethesda, Maryland 20814, USA and The Henry M. Jackson Foundation for the Advancement of Military Medicine, Bethesda, Maryland 20817, USA
| | - Paul E Rapp
- Department of Military and Emergency Medicine, Uniformed Services University of the Health Sciences, Bethesda, Maryland 20814, USA
| |
Collapse
|
10
|
Andrews MC, Cursons J, Hurley DG, Anaka M, Cebon JS, Behren A, Crampin EJ. Systems analysis identifies miR-29b regulation of invasiveness in melanoma. Mol Cancer 2016; 15:72. [PMID: 27852308 PMCID: PMC5112703 DOI: 10.1186/s12943-016-0554-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Accepted: 10/31/2016] [Indexed: 02/08/2023] Open
Abstract
Background In many cancers, microRNAs (miRs) contribute to metastatic progression by modulating phenotypic reprogramming processes such as epithelial-mesenchymal plasticity. This can be driven by miRs targeting multiple mRNA transcripts, inducing regulated changes across large sets of genes. The miR-target databases TargetScan and DIANA-microT predict putative relationships by examining sequence complementarity between miRs and mRNAs. However, it remains a challenge to identify which miR-mRNA interactions are active at endogenous expression levels, and of biological consequence. Methods We developed a workflow to integrate TargetScan and DIANA-microT predictions into the analysis of data-driven associations calculated from transcript abundance (RNASeq) data, specifically the mutual information and Pearson’s correlation metrics. We use this workflow to identify putative relationships of miR-mediated mRNA repression with strong support from both lines of evidence. Applying this approach systematically to a large, published collection of unique melanoma cell lines – the Ludwig Melbourne melanoma (LM-MEL) cell line panel – we identified putative miR-mRNA interactions that may contribute to invasiveness. This guided the selection of interactions of interest for further in vitro validation studies. Results Several miR-mRNA regulatory relationships supported by TargetScan and DIANA-microT demonstrated differential activity across cell lines of varying matrigel invasiveness. Strong negative statistical associations for these putative regulatory relationships were consistent with target mRNA inhibition by the miR, and suggest that differential activity of such miR-mRNA relationships contribute to differences in melanoma invasiveness. Many of these relationships were reflected across the skin cutaneous melanoma TCGA dataset, indicating that these observations also show graded activity across clinical samples. Several of these miRs are implicated in cancer progression (miR-211, -340, -125b, −221, and -29b). The specific role for miR-29b-3p in melanoma has not been well studied. We experimentally validated the predicted miR-29b-3p regulation of LAMC1 and PPIC and LASP1, and show that dysregulation of miR-29b-3p or these mRNA targets can influence cellular invasiveness in vitro. Conclusions This analytic strategy provides a comprehensive, systems-level approach to identify miR-mRNA regulation in high-throughput cancer data, identifies novel putative interactions with functional phenotypic relevance, and can be used to direct experimental resources for subsequent experimental validation. Computational scripts are available: http://github.com/uomsystemsbiology/LMMEL-miR-miner Electronic supplementary material The online version of this article (doi:10.1186/s12943-016-0554-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Miles C Andrews
- Olivia Newton-John Cancer Research Institute, Heidelberg, VIC, 3084, Australia.,Ludwig Institute for Cancer Research, Melbourne-Austin Branch, Cancer Immunobiology Laboratory, Heidelberg, VIC, 3084, Australia.,School of Cancer Medicine, La Trobe University, Heidelberg, VIC, 3084, Australia.,Department of Medicine, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Joseph Cursons
- Systems Biology Laboratory, University of Melbourne, Parkville, VIC, 3010, Australia.,ARC Centre of Excellence in Convergent Bio-Nano Science, University of Melbourne, Parkville, VIC, 3010, Australia.,School of Mathematics and Statistics, University of Melbourne, Parkville, VIC, 3010, Australia.,Centre for Systems Genomics, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Daniel G Hurley
- Systems Biology Laboratory, University of Melbourne, Parkville, VIC, 3010, Australia.,School of Mathematics and Statistics, University of Melbourne, Parkville, VIC, 3010, Australia.,Centre for Systems Genomics, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Matthew Anaka
- Ludwig Institute for Cancer Research, Melbourne-Austin Branch, Cancer Immunobiology Laboratory, Heidelberg, VIC, 3084, Australia.,Department of Medicine, University of Toronto, Toronto, ON, Canada
| | - Jonathan S Cebon
- Olivia Newton-John Cancer Research Institute, Heidelberg, VIC, 3084, Australia. .,Ludwig Institute for Cancer Research, Melbourne-Austin Branch, Cancer Immunobiology Laboratory, Heidelberg, VIC, 3084, Australia. .,School of Cancer Medicine, La Trobe University, Heidelberg, VIC, 3084, Australia. .,Department of Medicine, University of Melbourne, Parkville, VIC, 3010, Australia.
| | - Andreas Behren
- Olivia Newton-John Cancer Research Institute, Heidelberg, VIC, 3084, Australia. .,Ludwig Institute for Cancer Research, Melbourne-Austin Branch, Cancer Immunobiology Laboratory, Heidelberg, VIC, 3084, Australia. .,School of Cancer Medicine, La Trobe University, Heidelberg, VIC, 3084, Australia.
| | - Edmund J Crampin
- Department of Medicine, University of Melbourne, Parkville, VIC, 3010, Australia. .,Systems Biology Laboratory, University of Melbourne, Parkville, VIC, 3010, Australia. .,ARC Centre of Excellence in Convergent Bio-Nano Science, University of Melbourne, Parkville, VIC, 3010, Australia. .,School of Mathematics and Statistics, University of Melbourne, Parkville, VIC, 3010, Australia. .,Centre for Systems Genomics, University of Melbourne, Parkville, VIC, 3010, Australia.
| |
Collapse
|
11
|
Budden DM, Crampin EJ. Distributed gene expression modelling for exploring variability in epigenetic function. BMC Bioinformatics 2016; 17:446. [PMID: 27816056 PMCID: PMC5097851 DOI: 10.1186/s12859-016-1313-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Accepted: 10/25/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Predictive gene expression modelling is an important tool in computational biology due to the volume of high-throughput sequencing data generated by recent consortia. However, the scope of previous studies has been restricted to a small set of cell-lines or experimental conditions due an inability to leverage distributed processing architectures for large, sharded data-sets. RESULTS We present a distributed implementation of gene expression modelling using the MapReduce paradigm and prove that performance improves as a linear function of available processor cores. We then leverage the computational efficiency of this framework to explore the variability of epigenetic function across fifty histone modification data-sets from variety of cancerous and non-cancerous cell-lines. CONCLUSIONS We demonstrate that the genome-wide relationships between histone modifications and mRNA transcription are lineage, tissue and karyotype-invariant, and that models trained on matched -omics data from non-cancerous cell-lines are able to predict cancerous expression with equivalent genome-wide fidelity.
Collapse
Affiliation(s)
- David M Budden
- Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory, Cambridge, 02139, USA. .,Systems Biology Laboratory, Melbourne School of Engineering, the University of Melbourne, Parkville, 3010, Australia.
| | - Edmund J Crampin
- Systems Biology Laboratory, Melbourne School of Engineering, the University of Melbourne, Parkville, 3010, Australia.,ARC Centre of Excellence in Convergent Bio-Nano Science and Technology, Parkville, 3010, Australia.,Department of Mathematics and Statistics, the University of Melbourne, Parkville, 3010, Australia.,School of Medicine, the University of Melbourne, Parkville, 3010, Australia
| |
Collapse
|