Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Hurley DG, Budden DM, Crampin EJ. Virtual Reference Environments: a simple way to make research reproducible. Brief Bioinform 2014;16:901-3. [PMID: 25433467 PMCID: PMC4570198 DOI: 10.1093/bib/bbu043] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2014] [Indexed: 11/18/2022] Open

For:	Hurley DG, Budden DM, Crampin EJ. Virtual Reference Environments: a simple way to make research reproducible. Brief Bioinform 2014;16:901-3. [PMID: 25433467 PMCID: PMC4570198 DOI: 10.1093/bib/bbu043] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2014] [Indexed: 11/18/2022] Open

Number

Cited by Other Article(s)

Ziemann M, Poulain P, Bora A. The five pillars of computational reproducibility: bioinformatics and beyond. Brief Bioinform 2023;24:bbad375. [PMID: 37870287 PMCID: PMC10591307 DOI: 10.1093/bib/bbad375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 09/26/2023] [Accepted: 09/30/2023] [Indexed: 10/24/2023] Open

Deng ZL, Zhou DZ, Cao SJ, Li Q, Zhang JF, Xie H. Development and Validation of an Inflammatory Response-Related Gene Signature for Predicting the Prognosis of Pancreatic Adenocarcinoma. Inflammation 2022;45:1732-1751. [PMID: 35322324 DOI: 10.1007/s10753-022-01657-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 02/27/2022] [Accepted: 03/01/2022] [Indexed: 11/05/2022]

Abstract

Pancreatic adenocarcinoma (PAAD) is a highly dangerous malignant tumor of the digestive tract, and difficult to diagnose, treat, and predict the prognosis. As we all know, tumor and inflammation can affect each other, and thus the inflammatory response in the microenvironment can be used to affect the prognosis. So far, the prognostic value of inflammatory response-related genes in PAAD is still unclear. Therefore, this study aimed to explore the inflammatory response-related genes for predicting the prognosis of PAAD. In this study, the mRNA expression profiles of PAAD patients and the corresponding clinical characteristics data of PAAD patients were downloaded from the public database. The least absolute shrinkage and selection operator (LASSO) Cox analysis model was used to identify and construct the prognostic gene signature in The Cancer Genome Atlas (TCGA) cohort. The PAAD patients used for verification are from the International Cancer Genome Consortium (ICGC) cohort. The Kaplan-Meier method was used to compare the overall survival (OS) between the high- and low-risk groups. Univariate and multivariate Cox analyses were performed to identify the independent predictors of OS. Gene set enrichment analysis (GSEA) was performed to obtain gene ontology (GO) terms and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and the correlation between gene expression and immune infiltrates was investigated via single sample gene set enrichment analysis (ssGSEA). The GEPIA database was performed to examine prognostic genes in PAAD. LASSO Cox regression analysis was used to construct a model of inflammatory response-related gene signature. Compared with the low-risk group, patients in the high-risk group had significantly lower OS. The receiver operating characteristic curve (ROC) analysis confirmed the signature's predictive capacity. Multivariate Cox analysis showed that risk score is an independent predictor of OS. Functional analysis shows that the immune status between the two risk groups is significantly different, and the cancer-related pathways were abundant in the high-risk group. Moreover, the risk score is significantly related to tumor grade, stage, and immune infiltration types. It was also obtained that the expression level of prognostic genes was significantly correlated with the sensitivity of cancer cells to anti-tumor drugs. In addition, there are significant differences in the expression of PAAD tissues and adjacent non-tumor tissues. The novel signature constructed from five inflammatory response-related genes can be used to predict prognosis and affect the immune status of PAAD. In addition, suppressing these genes may be a treatment option.

Collapse

Cudmore P, Pan M, Gawthrop PJ, Crampin EJ. Analysing and simulating energy-based models in biology using BondGraphTools. THE EUROPEAN PHYSICAL JOURNAL. E, SOFT MATTER 2021;44:148. [PMID: 34904197 DOI: 10.1140/epje/s10189-021-00152-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 11/18/2021] [Indexed: 06/14/2023]

Shahidi N, Pan M, Safaei S, Tran K, Crampin EJ, Nickerson DP. Hierarchical semantic composition of biosimulation models using bond graphs. PLoS Comput Biol 2021;17:e1008859. [PMID: 33983945 PMCID: PMC8148364 DOI: 10.1371/journal.pcbi.1008859] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 05/25/2021] [Accepted: 04/27/2021] [Indexed: 11/19/2022] Open

Schaduangrat N, Lampa S, Simeon S, Gleeson MP, Spjuth O, Nantasenamat C. Towards reproducible computational drug discovery. J Cheminform 2020;12:9. [PMID: 33430992 PMCID: PMC6988305 DOI: 10.1186/s13321-020-0408-x] [Citation(s) in RCA: 85] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 01/02/2020] [Indexed: 12/11/2022] Open

Pan M, Gawthrop PJ, Tran K, Cursons J, Crampin EJ. A thermodynamic framework for modelling membrane transporters. J Theor Biol 2018;481:10-23. [PMID: 30273576 DOI: 10.1016/j.jtbi.2018.09.034] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 09/24/2018] [Accepted: 09/27/2018] [Indexed: 12/18/2022]

Gawthrop PJ, Siekmann I, Kameneva T, Saha S, Ibbotson MR, Crampin EJ. Bond graph modelling of chemoelectrical energy transduction. IET Syst Biol 2017;11:127-138. [PMCID: PMC8687425 DOI: 10.1049/iet-syb.2017.0006] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2017] [Revised: 04/25/2017] [Accepted: 05/23/2017] [Indexed: 07/20/2023] Open

Gawthrop PJ, Crampin EJ. Energy-based analysis of biomolecular pathways. Proc Math Phys Eng Sci 2017;473:20160825. [PMID: 28690404 DOI: 10.1098/rspa.2016.0825] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Accepted: 05/26/2017] [Indexed: 01/03/2023] Open

Gawthrop PJ. Bond Graph Modeling of Chemiosmotic Biomolecular Energy Transduction. IEEE Trans Nanobioscience 2017;16:177-188. [PMID: 28252411 DOI: 10.1109/tnb.2017.2674683] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Denaxas S, Direk K, Gonzalez-Izquierdo A, Pikoula M, Cakiroglu A, Moore J, Hemingway H, Smeeth L. Methods for enhancing the reproducibility of biomedical research findings using electronic health records. BioData Min 2017;10:31. [PMID: 28912836 PMCID: PMC5594436 DOI: 10.1186/s13040-017-0151-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2017] [Accepted: 08/28/2017] [Indexed: 01/07/2023] Open

Abstract

BACKGROUND

The ability of external investigators to reproduce published scientific findings is critical for the evaluation and validation of biomedical research by the wider community. However, a substantial proportion of health research using electronic health records (EHR), data collected and generated during clinical care, is potentially not reproducible mainly due to the fact that the implementation details of most data preprocessing, cleaning, phenotyping and analysis approaches are not systematically made available or shared. With the complexity, volume and variety of electronic health record data sources made available for research steadily increasing, it is critical to ensure that scientific findings from EHR data are reproducible and replicable by researchers. Reporting guidelines, such as RECORD and STROBE, have set a solid foundation by recommending a series of items for researchers to include in their research outputs. Researchers however often lack the technical tools and methodological approaches to actuate such recommendations in an efficient and sustainable manner.

RESULTS

In this paper, we review and propose a series of methods and tools utilized in adjunct scientific disciplines that can be used to enhance the reproducibility of research using electronic health records and enable researchers to report analytical approaches in a transparent manner. Specifically, we discuss the adoption of scientific software engineering principles and best-practices such as test-driven development, source code revision control systems, literate programming and the standardization and re-use of common data management and analytical approaches.

CONCLUSION

The adoption of such approaches will enable scientists to systematically document and share EHR analytical workflows and increase the reproducibility of biomedical research using such complex data sources.

Collapse

Cui J, Faria M, Björnmalm M, Ju Y, Suma T, Gunawan ST, Richardson JJ, Heidari H, Bals S, Crampin EJ, Caruso F. A Framework to Account for Sedimentation and Diffusion in Particle-Cell Interactions. LANGMUIR : THE ACS JOURNAL OF SURFACES AND COLLOIDS 2016;32:12394-12402. [PMID: 27384770 DOI: 10.1021/acs.langmuir.6b01634] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]

Affiliation(s)

Jiwei Cui ARC Centre of Excellence in Convergent Bio-Nano Science and Technology, and the Department of Chemical and Biomolecular Engineering, The University of Melbourne , Parkville, Victoria 3010, Australia
Matthew Faria ARC Centre of Excellence in Convergent Bio-Nano Science and Technology, and the Department of Chemical and Biomolecular Engineering, The University of Melbourne , Parkville, Victoria 3010, Australia ARC Centre of Excellence in Convergent Bio-Nano Science and Technology, and the Systems Biology Laboratory, Melbourne School of Engineering, The University of Melbourne , Parkville, Victoria 3010, Australia
Mattias Björnmalm ARC Centre of Excellence in Convergent Bio-Nano Science and Technology, and the Department of Chemical and Biomolecular Engineering, The University of Melbourne , Parkville, Victoria 3010, Australia
Yi Ju ARC Centre of Excellence in Convergent Bio-Nano Science and Technology, and the Department of Chemical and Biomolecular Engineering, The University of Melbourne , Parkville, Victoria 3010, Australia
Tomoya Suma ARC Centre of Excellence in Convergent Bio-Nano Science and Technology, and the Department of Chemical and Biomolecular Engineering, The University of Melbourne , Parkville, Victoria 3010, Australia
Sylvia T Gunawan ARC Centre of Excellence in Convergent Bio-Nano Science and Technology, and the Department of Chemical and Biomolecular Engineering, The University of Melbourne , Parkville, Victoria 3010, Australia
Joseph J Richardson ARC Centre of Excellence in Convergent Bio-Nano Science and Technology, and the Department of Chemical and Biomolecular Engineering, The University of Melbourne , Parkville, Victoria 3010, Australia
Hamed Heidari Electron Microscopy for Materials Research (EMAT), University of Antwerp , Groenenborgerlaan 171, 2020 Antwerp, Belgium
Sara Bals Electron Microscopy for Materials Research (EMAT), University of Antwerp , Groenenborgerlaan 171, 2020 Antwerp, Belgium
Edmund J Crampin ARC Centre of Excellence in Convergent Bio-Nano Science and Technology, and the Systems Biology Laboratory, Melbourne School of Engineering, The University of Melbourne , Parkville, Victoria 3010, Australia
Frank Caruso ARC Centre of Excellence in Convergent Bio-Nano Science and Technology, and the Department of Chemical and Biomolecular Engineering, The University of Melbourne , Parkville, Victoria 3010, Australia

Collapse

Andrews MC, Cursons J, Hurley DG, Anaka M, Cebon JS, Behren A, Crampin EJ. Systems analysis identifies miR-29b regulation of invasiveness in melanoma. Mol Cancer 2016;15:72. [PMID: 27852308 PMCID: PMC5112703 DOI: 10.1186/s12943-016-0554-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Accepted: 10/31/2016] [Indexed: 02/08/2023] Open

Abstract

Background

In many cancers, microRNAs (miRs) contribute to metastatic progression by modulating phenotypic reprogramming processes such as epithelial-mesenchymal plasticity. This can be driven by miRs targeting multiple mRNA transcripts, inducing regulated changes across large sets of genes. The miR-target databases TargetScan and DIANA-microT predict putative relationships by examining sequence complementarity between miRs and mRNAs. However, it remains a challenge to identify which miR-mRNA interactions are active at endogenous expression levels, and of biological consequence.

Methods

We developed a workflow to integrate TargetScan and DIANA-microT predictions into the analysis of data-driven associations calculated from transcript abundance (RNASeq) data, specifically the mutual information and Pearson’s correlation metrics. We use this workflow to identify putative relationships of miR-mediated mRNA repression with strong support from both lines of evidence. Applying this approach systematically to a large, published collection of unique melanoma cell lines – the Ludwig Melbourne melanoma (LM-MEL) cell line panel – we identified putative miR-mRNA interactions that may contribute to invasiveness. This guided the selection of interactions of interest for further in vitro validation studies.

Results

Several miR-mRNA regulatory relationships supported by TargetScan and DIANA-microT demonstrated differential activity across cell lines of varying matrigel invasiveness. Strong negative statistical associations for these putative regulatory relationships were consistent with target mRNA inhibition by the miR, and suggest that differential activity of such miR-mRNA relationships contribute to differences in melanoma invasiveness. Many of these relationships were reflected across the skin cutaneous melanoma TCGA dataset, indicating that these observations also show graded activity across clinical samples. Several of these miRs are implicated in cancer progression (miR-211, -340, -125b, −221, and -29b). The specific role for miR-29b-3p in melanoma has not been well studied. We experimentally validated the predicted miR-29b-3p regulation of LAMC1 and PPIC and LASP1, and show that dysregulation of miR-29b-3p or these mRNA targets can influence cellular invasiveness in vitro.

Conclusions

This analytic strategy provides a comprehensive, systems-level approach to identify miR-mRNA regulation in high-throughput cancer data, identifies novel putative interactions with functional phenotypic relevance, and can be used to direct experimental resources for subsequent experimental validation.

Computational scripts are available: http://github.com/uomsystemsbiology/LMMEL-miR-miner

Electronic supplementary material

The online version of this article (doi:10.1186/s12943-016-0554-y) contains supplementary material, which is available to authorized users.

Collapse

Affiliation(s)

Miles C Andrews Olivia Newton-John Cancer Research Institute, Heidelberg, VIC, 3084, Australia.,Ludwig Institute for Cancer Research, Melbourne-Austin Branch, Cancer Immunobiology Laboratory, Heidelberg, VIC, 3084, Australia.,School of Cancer Medicine, La Trobe University, Heidelberg, VIC, 3084, Australia.,Department of Medicine, University of Melbourne, Parkville, VIC, 3010, Australia
Joseph Cursons Systems Biology Laboratory, University of Melbourne, Parkville, VIC, 3010, Australia.,ARC Centre of Excellence in Convergent Bio-Nano Science, University of Melbourne, Parkville, VIC, 3010, Australia.,School of Mathematics and Statistics, University of Melbourne, Parkville, VIC, 3010, Australia.,Centre for Systems Genomics, University of Melbourne, Parkville, VIC, 3010, Australia
Daniel G Hurley Systems Biology Laboratory, University of Melbourne, Parkville, VIC, 3010, Australia.,School of Mathematics and Statistics, University of Melbourne, Parkville, VIC, 3010, Australia.,Centre for Systems Genomics, University of Melbourne, Parkville, VIC, 3010, Australia
Matthew Anaka Ludwig Institute for Cancer Research, Melbourne-Austin Branch, Cancer Immunobiology Laboratory, Heidelberg, VIC, 3084, Australia.,Department of Medicine, University of Toronto, Toronto, ON, Canada
Jonathan S Cebon Olivia Newton-John Cancer Research Institute, Heidelberg, VIC, 3084, Australia. .,Ludwig Institute for Cancer Research, Melbourne-Austin Branch, Cancer Immunobiology Laboratory, Heidelberg, VIC, 3084, Australia. .,School of Cancer Medicine, La Trobe University, Heidelberg, VIC, 3084, Australia. .,Department of Medicine, University of Melbourne, Parkville, VIC, 3010, Australia.
Andreas Behren Olivia Newton-John Cancer Research Institute, Heidelberg, VIC, 3084, Australia. .,Ludwig Institute for Cancer Research, Melbourne-Austin Branch, Cancer Immunobiology Laboratory, Heidelberg, VIC, 3084, Australia. .,School of Cancer Medicine, La Trobe University, Heidelberg, VIC, 3084, Australia.
Edmund J Crampin Department of Medicine, University of Melbourne, Parkville, VIC, 3010, Australia. .,Systems Biology Laboratory, University of Melbourne, Parkville, VIC, 3010, Australia. .,ARC Centre of Excellence in Convergent Bio-Nano Science, University of Melbourne, Parkville, VIC, 3010, Australia. .,School of Mathematics and Statistics, University of Melbourne, Parkville, VIC, 3010, Australia. .,Centre for Systems Genomics, University of Melbourne, Parkville, VIC, 3010, Australia.

Collapse

Piccolo SR, Frampton MB. Tools and techniques for computational reproducibility. Gigascience 2016;5:30. [PMID: 27401684 PMCID: PMC4940747 DOI: 10.1186/s13742-016-0135-4] [Citation(s) in RCA: 74] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open

Leipzig J. A review of bioinformatic pipeline frameworks. Brief Bioinform 2016;18:530-536. [PMID: 27013646 PMCID: PMC5429012 DOI: 10.1093/bib/bbw020] [Citation(s) in RCA: 97] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Indexed: 11/24/2022] Open

Cursons J, Angel CE, Hurley DG, Print CG, Dunbar PR, Jacobs MD, Crampin EJ. Spatially transformed fluorescence image data for ERK-MAPK and selected proteins within human epidermis. Gigascience 2015;4:63. [PMID: 26675891 PMCID: PMC4678632 DOI: 10.1186/s13742-015-0102-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Accepted: 12/03/2015] [Indexed: 12/20/2022] Open

Abstract

Background

Phosphoprotein signalling pathways have been intensively studied in vitro, yet their role in regulating tissue homeostasis is not fully understood. In the skin, interfollicular keratinocytes differentiate over approximately 2 weeks as they traverse the epidermis. The extracellular signal-regulated kinase (ERK) branch of the mitogen-activated protein kinase (MAPK) pathway has been implicated in this process. Therefore, we examined ERK-MAPK activity within human epidermal keratinocytes in situ.

Findings

We used confocal microscopy and immunofluorescence labelling to measure the relative abundances of Raf-1, MEK1/2 and ERK1/2, and their phosphorylated (active) forms within three human skin samples. Additionally, we measured the abundance of selected proteins thought to modulate ERK-MAPK activity, including calmodulin, β1 integrin and stratifin (14-3-3σ); and of transcription factors known to act as effectors of ERK1/2, including the AP-1 components Jun-B, Fra2 and c-Fos. Imaging was performed with sufficient resolution to identify the plasma membrane, cytoplasm and nucleus as distinct domains within cells across the epidermis. The image field of view was also sufficiently large to capture the entire epidermis in cross-section, and thus the full range of keratinocyte differentiation in a single observation. Image processing methods were developed to quantify image data for mathematical and statistical analysis. Here, we provide raw image data and processed outputs.

Conclusions

These data indicate coordinated changes in ERK-MAPK signalling activity throughout the depth of the epidermis, with changes in relative phosphorylation-mediated signalling activity occurring along the gradient of cellular differentiation. We believe these data provide unique information about intracellular signalling as they are obtained from a homeostatic human tissue, and they might be useful for investigating intercellular heterogeneity.

Electronic supplementary material

The online version of this article (doi:10.1186/s13742-015-0102-5) contains supplementary material, which is available to authorized users.

Collapse

Gawthrop PJ, Cursons J, Crampin EJ. Hierarchical bond graph modelling of biochemical networks. Proc Math Phys Eng Sci 2015. [DOI: 10.1098/rspa.2015.0642] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open

FlexDM: Simple, parallel and fault-tolerant data mining using WEKA. SOURCE CODE FOR BIOLOGY AND MEDICINE 2015;10:13. [PMID: 26579209 PMCID: PMC4647584 DOI: 10.1186/s13029-015-0045-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2015] [Accepted: 11/09/2015] [Indexed: 12/03/2022]

Abstract

Background

With the continued exponential growth in data volume, large-scale data mining and machine learning experiments have become a necessity for many researchers without programming or statistics backgrounds. WEKA (Waikato Environment for Knowledge Analysis) is a gold standard framework that facilitates and simplifies this task by allowing specification of algorithms, hyper-parameters and test strategies from a streamlined Experimenter GUI. Despite its popularity, the WEKA Experimenter exhibits several limitations that we address in our new FlexDM software.

Results

FlexDM addresses four fundamental limitations with the WEKA Experimenter: reliance on a verbose and difficult-to-modify XML schema; inability to meta-optimise experiments over a large number of algorithm hyper-parameters; inability to recover from software or hardware failure during a large experiment; and failing to leverage modern multicore processor architectures. Direct comparisons between the FlexDM and default WEKA XML schemas demonstrate a 10-fold improvement in brevity for a specification that allows finer control of experimental procedures. The stability of FlexDM has been tested on a large biological dataset (approximately 450 k attributes by 150 samples), and automatic parallelisation of tasks yields a quasi-linear reduction in execution time when distributed across multiple processor cores.

Conclusion

FlexDM is a powerful and easy-to-use extension to the WEKA package, which better handles the increased volume and complexity of data that has emerged during the 20 years since WEKA’s original development. FlexDM has been tested on Windows, OSX and Linux operating systems and is provided as a pre-configured virtual reference environment for trivial usage and extensibility. This software can substantially improve the productivity of any research group conducting large-scale data mining or machine learning tasks, in addition to providing non-programmers with improved control over specific aspects of their data analysis pipeline via a succinct and simplified XML schema.

Electronic supplementary material

The online version of this article (doi:10.1186/s13029-015-0045-3) contains supplementary material, which is available to authorized users.

Collapse

Budden DM, Hurley DG, Crampin EJ. Modelling the conditional regulatory activity of methylated and bivalent promoters. Epigenetics Chromatin 2015;8:21. [PMID: 26097508 PMCID: PMC4474576 DOI: 10.1186/s13072-015-0013-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 06/10/2015] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Predictive modelling of gene expression is a powerful framework for the in silico exploration of transcriptional regulatory interactions through the integration of high-throughput -omics data. A major limitation of previous approaches is their inability to handle conditional interactions that emerge when genes are subject to different regulatory mechanisms. Although chromatin immunoprecipitation-based histone modification data are often used as proxies for chromatin accessibility, the association between these variables and expression often depends upon the presence of other epigenetic markers (e.g. DNA methylation or histone variants). These conditional interactions are poorly handled by previous predictive models and reduce the reliability of downstream biological inference.

RESULTS

We have previously demonstrated that integrating both transcription factor and histone modification data within a single predictive model is rendered ineffective by their statistical redundancy. In this study, we evaluate four proposed methods for quantifying gene-level DNA methylation levels and demonstrate that inclusion of these data in predictive modelling frameworks is also subject to this critical limitation in data integration. Based on the hypothesis that statistical redundancy in epigenetic data is caused by conditional regulatory interactions within a dynamic chromatin context, we construct a new gene expression model which is the first to improve prediction accuracy by unsupervised identification of latent regulatory classes. We show that DNA methylation and H2A.Z histone variant data can be interpreted in this way to identify and explore the signatures of silenced and bivalent promoters, substantially improving genome-wide predictions of mRNA transcript abundance and downstream biological inference across multiple cell lines.

CONCLUSIONS

Previous models of gene expression have been applied successfully to several important problems in molecular biology, including the discovery of transcription factor roles, identification of regulatory elements responsible for differential expression patterns and comparative analysis of the transcriptome across distant species. Our analysis supports our hypothesis that statistical redundancy in epigenetic data is partially due to conditional relationships between these regulators and gene expression levels. This analysis provides insight into the heterogeneous roles of H3K4me3 and H3K27me3 in the presence of the H2A.Z histone variant (implicated in cancer progression) and how these signatures change during lineage commitment and carcinogenesis.

Collapse