Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Budden DM, Hurley DG, Crampin EJ. Predictive modelling of gene expression from transcriptional regulatory elements. Brief Bioinform 2014;16:616-28. [PMID: 25231769 DOI: 10.1093/bib/bbu034] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2014] [Accepted: 08/20/2014] [Indexed: 12/15/2022] Open

For:	Budden DM, Hurley DG, Crampin EJ. Predictive modelling of gene expression from transcriptional regulatory elements. Brief Bioinform 2014;16:616-28. [PMID: 25231769 DOI: 10.1093/bib/bbu034] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2014] [Accepted: 08/20/2014] [Indexed: 12/15/2022] Open

Number

Cited by Other Article(s)

Patel N, Bush WS. Modeling transcriptional regulation using gene regulatory networks based on multi-omics data sources. BMC Bioinformatics 2021;22:200. [PMID: 33874910 PMCID: PMC8056605 DOI: 10.1186/s12859-021-04126-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 04/09/2021] [Indexed: 11/17/2022] Open

Abstract

Background

Transcriptional regulation is complex, requiring multiple cis (local) and trans acting mechanisms working in concert to drive gene expression, with disruption of these processes linked to multiple diseases. Previous computational attempts to understand the influence of regulatory mechanisms on gene expression have used prediction models containing input features derived from cis regulatory factors. However, local chromatin looping and trans-acting mechanisms are known to also influence transcriptional regulation, and their inclusion may improve model accuracy and interpretation. In this study, we create a general model of transcription factor influence on gene expression by incorporating both cis and trans gene regulatory features.

Results

We describe a computational framework to model gene expression for GM12878 and K562 cell lines. This framework weights the impact of transcription factor-based regulatory data using multi-omics gene regulatory networks to account for both cis and trans acting mechanisms, and measures of the local chromatin context. These prediction models perform significantly better compared to models containing cis-regulatory features alone. Models that additionally integrate long distance chromatin interactions (or chromatin looping) between distal transcription factor binding regions and gene promoters also show improved accuracy. As a demonstration of their utility, effect estimates from these models were used to weight cis-regulatory rare variants for sequence kernel association test analyses of gene expression.

Conclusions

Our models generate refined effect estimates for the influence of individual transcription factors on gene expression, allowing characterization of their roles across the genome. This work also provides a framework for integrating multiple data types into a single model of transcriptional regulation.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12859-021-04126-3.

Collapse

Zhang LQ, Fan GL, Liu JJ, Liu L, Li QZ, Lin H. Identification of Key Histone Modifications and Their Regulatory Regions on Gene Expression Level Changes in Chronic Myelogenous Leukemia. Front Cell Dev Biol 2021;8:621578. [PMID: 33511133 PMCID: PMC7835480 DOI: 10.3389/fcell.2020.621578] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Accepted: 12/09/2020] [Indexed: 12/12/2022] Open

Schmidt F, Kern F, Schulz MH. Integrative prediction of gene expression with chromatin accessibility and conformation data. Epigenetics Chromatin 2020;13:4. [PMID: 32029002 PMCID: PMC7003490 DOI: 10.1186/s13072-020-0327-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Accepted: 01/06/2020] [Indexed: 02/06/2023] Open

Ren J, Lee J, Na D. Recent advances in genetic engineering tools based on synthetic biology. J Microbiol 2020;58:1-10. [PMID: 31898252 DOI: 10.1007/s12275-020-9334-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Revised: 08/19/2019] [Accepted: 11/05/2019] [Indexed: 12/26/2022]

Schmidt F, Schulz MH. On the problem of confounders in modeling gene expression. Bioinformatics 2019;35:711-719. [PMID: 30084962 PMCID: PMC6530814 DOI: 10.1093/bioinformatics/bty674] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 06/21/2018] [Accepted: 08/02/2018] [Indexed: 01/01/2023] Open

The spatial binding model of the pioneer factor Oct4 with its target genes during cell reprogramming. Comput Struct Biotechnol J 2019;17:1226-1233. [PMID: 31921389 PMCID: PMC6944736 DOI: 10.1016/j.csbj.2019.09.002] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 09/05/2019] [Accepted: 09/07/2019] [Indexed: 12/18/2022] Open

Feng ZX, Li QZ, Meng JJ. Modeling the relationship of diverse genomic signatures to gene expression levels with the regulation of long-range enhancer-promoter interactions. BIOPHYSICS REPORTS 2019. [DOI: 10.1007/s41048-019-0089-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Pliner HA, Packer JS, McFaline-Figueroa JL, Cusanovich DA, Daza RM, Aghamirzaie D, Srivatsan S, Qiu X, Jackson D, Minkina A, Adey AC, Steemers FJ, Shendure J, Trapnell C. Cicero Predicts cis-Regulatory DNA Interactions from Single-Cell Chromatin Accessibility Data. Mol Cell 2018;71:858-871.e8. [PMID: 30078726 PMCID: PMC6582963 DOI: 10.1016/j.molcel.2018.06.044] [Citation(s) in RCA: 409] [Impact Index Per Article: 68.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Revised: 05/08/2018] [Accepted: 06/29/2018] [Indexed: 12/13/2022]

Genome-wide analysis of H3K36me3 and its regulations to cancer-related genes expression in human cell lines. Biosystems 2018;171:59-65. [DOI: 10.1016/j.biosystems.2018.07.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Revised: 07/01/2018] [Accepted: 07/09/2018] [Indexed: 01/11/2023]

Zhang LQ, Li QZ. Estimating the effects of transcription factors binding and histone modifications on gene expression levels in human cells. Oncotarget 2018;8:40090-40103. [PMID: 28454114 PMCID: PMC5522221 DOI: 10.18632/oncotarget.16988] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Accepted: 03/11/2017] [Indexed: 12/22/2022] Open

Dang LT, Tondl M, Chiu MHH, Revote J, Paten B, Tano V, Tokolyi A, Besse F, Quaife-Ryan G, Cumming H, Drvodelic MJ, Eichenlaub MP, Hallab JC, Stolper JS, Rossello FJ, Bogoyevitch MA, Jans DA, Nim HT, Porrello ER, Hudson JE, Ramialison M. TrawlerWeb: an online de novo motif discovery tool for next-generation sequencing datasets. BMC Genomics 2018;19:238. [PMID: 29621972 PMCID: PMC5887194 DOI: 10.1186/s12864-018-4630-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Accepted: 03/27/2018] [Indexed: 12/14/2022] Open

Abstract

Background

A strong focus of the post-genomic era is mining of the non-coding regulatory genome in order to unravel the function of regulatory elements that coordinate gene expression (Nat 489:57–74, 2012; Nat 507:462–70, 2014; Nat 507:455–61, 2014; Nat 518:317–30, 2015). Whole-genome approaches based on next-generation sequencing (NGS) have provided insight into the genomic location of regulatory elements throughout different cell types, organs and organisms. These technologies are now widespread and commonly used in laboratories from various fields of research. This highlights the need for fast and user-friendly software tools dedicated to extracting cis-regulatory information contained in these regulatory regions; for instance transcription factor binding site (TFBS) composition. Ideally, such tools should not require prior programming knowledge to ensure they are accessible for all users.

Results

We present TrawlerWeb, a web-based version of the Trawler_standalone tool (Nat Methods 4:563–5, 2007; Nat Protoc 5:323–34, 2010), to allow for the identification of enriched motifs in DNA sequences obtained from next-generation sequencing experiments in order to predict their TFBS composition. TrawlerWeb is designed for online queries with standard options common to web-based motif discovery tools. In addition, TrawlerWeb provides three unique new features: 1) TrawlerWeb allows the input of BED files directly generated from NGS experiments, 2) it automatically generates an input-matched biologically relevant background, and 3) it displays resulting conservation scores for each instance of the motif found in the input sequences, which assists the researcher in prioritising the motifs to validate experimentally. Finally, to date, this web-based version of Trawler_standalone remains the fastest online de novo motif discovery tool compared to other popular web-based software, while generating predictions with high accuracy.

Conclusions

TrawlerWeb provides users with a fast, simple and easy-to-use web interface for de novo motif discovery. This will assist in rapidly analysing NGS datasets that are now being routinely generated. TrawlerWeb is freely available and accessible at: http://trawler.erc.monash.edu.au.

Electronic supplementary material

The online version of this article (10.1186/s12864-018-4630-0) contains supplementary material, which is available to authorized users.

Collapse

Affiliation(s)

Louis T Dang Australian Regenerative Medicine Institute, Systems Biology Institute Australia, Monash University, Clayton, VIC, Australia
Markus Tondl Australian Regenerative Medicine Institute, Systems Biology Institute Australia, Monash University, Clayton, VIC, Australia
Man Ho H Chiu Australian Regenerative Medicine Institute, Systems Biology Institute Australia, Monash University, Clayton, VIC, Australia
Jerico Revote eResearch, Monash University, Clayton, VIC, Australia
Benedict Paten UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
Vincent Tano Department of Biochemistry and Molecular Biology, Bio21 Institute and Cell Signalling Research Laboratories, The University of Melbourne, Melbourne, VIC, Australia
Alex Tokolyi Australian Regenerative Medicine Institute, Systems Biology Institute Australia, Monash University, Clayton, VIC, Australia
Florence Besse CNRS, Inserm, Institute of Biology Valrose, Université Côte d'Azur, Parc Valrose, Nice, France
Greg Quaife-Ryan School of Biomedical Sciences, The University of Queensland, QLD, Brisbane, Australia
Helen Cumming Centre for Innate Immunity and Infectious Diseases, Hudson Institute of Medical Research, Monash University, Clayton, VIC, Australia
Mark J Drvodelic Australian Regenerative Medicine Institute, Systems Biology Institute Australia, Monash University, Clayton, VIC, Australia
Michael P Eichenlaub Australian Regenerative Medicine Institute, Systems Biology Institute Australia, Monash University, Clayton, VIC, Australia
Jeannette C Hallab Australian Regenerative Medicine Institute, Systems Biology Institute Australia, Monash University, Clayton, VIC, Australia
Julian S Stolper Australian Regenerative Medicine Institute, Systems Biology Institute Australia, Monash University, Clayton, VIC, Australia
Fernando J Rossello Australian Regenerative Medicine Institute, Systems Biology Institute Australia, Monash University, Clayton, VIC, Australia
Marie A Bogoyevitch Department of Biochemistry and Molecular Biology, Bio21 Institute and Cell Signalling Research Laboratories, The University of Melbourne, Melbourne, VIC, Australia
David A Jans Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC, Australia
Hieu T Nim Australian Regenerative Medicine Institute, Systems Biology Institute Australia, Monash University, Clayton, VIC, Australia.,Faculty of Information Technology, Monash University, Clayton, VIC, Australia
Enzo R Porrello Murdoch Children's Research Institute, The Royal Children's Hospital, Parkville, VIC, Australia.,Department of Physiology, School of Biomedical Sciences, The University of Melbourne, Parkville, VIC, Australia
James E Hudson School of Biomedical Sciences, The University of Queensland, QLD, Brisbane, Australia
Mirana Ramialison Australian Regenerative Medicine Institute, Systems Biology Institute Australia, Monash University, Clayton, VIC, Australia.

Collapse

Li Y, Zhang J, Huo C, Ding N, Li J, Xiao J, Lin X, Cai B, Zhang Y, Xu J. Dynamic Organization of lncRNA and Circular RNA Regulators Collectively Controlled Cardiac Differentiation in Humans. EBioMedicine 2017;24:137-146. [PMID: 29037607 PMCID: PMC5652025 DOI: 10.1016/j.ebiom.2017.09.015] [Citation(s) in RCA: 67] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Revised: 09/13/2017] [Accepted: 09/13/2017] [Indexed: 02/08/2023] Open

Abstract

Advances in developmental cardiology have increased our understanding of the early aspects of heart differentiation. However, understanding noncoding RNA (ncRNA) transcription and regulation during this process remains elusive. Here, we constructed transcriptomes for both long noncoding RNAs (lncRNAs) and circular RNAs (circRNAs) in four important developmental stages ranging from early embryonic to cardiomyocyte based on high-throughput sequencing datasets, which indicate the high stage-specific expression patterns of two ncRNA types. Additionally, higher similarities of samples within each stage were found, highlighting the divergence of samples collected from distinct cardiac developmental stages. Next, we developed a method to identify numerous lncRNA and circRNA regulators whose expression was significantly stage-specific and shifted gradually and continuously during heart differentiation. We inferred that these ncRNAs are important for the stages of cardiac differentiation. Moreover, transcriptional regulation analysis revealed that the expression of stage-specific lncRNAs is controlled by known key stage-specific transcription factors (TFs). In addition, circRNAs exhibited dynamic expression patterns independent from their host genes. Functional enrichment analysis revealed that lncRNAs and circRNAs play critical roles in pathways that are activated specifically during heart differentiation. We further identified candidate TF-ncRNA-gene network modules for each differentiation stage, suggesting the dynamic organization of lncRNAs and circRNAs collectively controlled cardiac differentiation, which may cause heart-related diseases when defective. Our study provides a foundation for understanding the dynamic regulation of ncRNA transcriptomes during heart differentiation and identifies the dynamic organization of novel key lncRNAs and circRNAs to collectively control cardiac differentiation.

Collapse

Integrated analysis and transcript abundance modelling of H3K4me3 and H3K27me3 in developing secondary xylem. Sci Rep 2017;7:3370. [PMID: 28611454 PMCID: PMC5469831 DOI: 10.1038/s41598-017-03665-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Accepted: 05/02/2017] [Indexed: 01/10/2023] Open

Schmidt F, Gasparoni N, Gasparoni G, Gianmoena K, Cadenas C, Polansky JK, Ebert P, Nordström K, Barann M, Sinha A, Fröhler S, Xiong J, Dehghani Amirabad A, Behjati Ardakani F, Hutter B, Zipprich G, Felder B, Eils J, Brors B, Chen W, Hengstler JG, Hamann A, Lengauer T, Rosenstiel P, Walter J, Schulz MH. Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction. Nucleic Acids Res 2017;45:54-66. [PMID: 27899623 PMCID: PMC5224477 DOI: 10.1093/nar/gkw1061] [Citation(s) in RCA: 73] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 10/18/2016] [Accepted: 10/24/2016] [Indexed: 12/21/2022] Open

Affiliation(s)

Florian Schmidt Cluster of Excellence for Multimodal Computing and Interaction, Saarland Informatics Campus, Saarland University, Saarbrücken, 66123, Germany Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany
Nina Gasparoni Department of Genetics, University of Saarland, Saarbrücken, 66123, Germany
Gilles Gasparoni Department of Genetics, University of Saarland, Saarbrücken, 66123, Germany
Kathrin Gianmoena Leibniz Research Centre for Working Environment and Human Factors IfADo, Dortmund, 44139, Germany
Cristina Cadenas Leibniz Research Centre for Working Environment and Human Factors IfADo, Dortmund, 44139, Germany
Julia K Polansky Experimental Rheumatology, German Rheumatism Research Centre, Berlin, 10117, Germany
Peter Ebert Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany International Max Planck Research School for Computer Science, Saarland Informatics Campus, Saarbrücken, 66123, Germany
Karl Nordström Department of Genetics, University of Saarland, Saarbrücken, 66123, Germany
Matthias Barann Institute of Clinical Molecular Biology, Christian-Albrechts-University, Kiel, 24105, Germany
Anupam Sinha Institute of Clinical Molecular Biology, Christian-Albrechts-University, Kiel, 24105, Germany
Sebastian Fröhler Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Berlin, 13092, Germany
Jieyi Xiong Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Berlin, 13092, Germany
Azim Dehghani Amirabad Cluster of Excellence for Multimodal Computing and Interaction, Saarland Informatics Campus, Saarland University, Saarbrücken, 66123, Germany Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany International Max Planck Research School for Computer Science, Saarland Informatics Campus, Saarbrücken, 66123, Germany
Fatemeh Behjati Ardakani Cluster of Excellence for Multimodal Computing and Interaction, Saarland Informatics Campus, Saarland University, Saarbrücken, 66123, Germany Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany
Barbara Hutter Applied Bioinformatics, Deutsches Krebsforschungszentrum, Heidelberg, 69120, Germany
Gideon Zipprich Data Management and Genomics IT, Deutsches Krebsforschungszentrum, Heidelberg, 69120, Germany
Bärbel Felder Data Management and Genomics IT, Deutsches Krebsforschungszentrum, Heidelberg, 69120, Germany
Jürgen Eils Data Management and Genomics IT, Deutsches Krebsforschungszentrum, Heidelberg, 69120, Germany
Benedikt Brors Applied Bioinformatics, Deutsches Krebsforschungszentrum, Heidelberg, 69120, Germany
Wei Chen Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Berlin, 13092, Germany
Jan G Hengstler Leibniz Research Centre for Working Environment and Human Factors IfADo, Dortmund, 44139, Germany
Alf Hamann International Max Planck Research School for Computer Science, Saarland Informatics Campus, Saarbrücken, 66123, Germany
Thomas Lengauer Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany
Philip Rosenstiel Institute of Clinical Molecular Biology, Christian-Albrechts-University, Kiel, 24105, Germany
Jörn Walter Department of Genetics, University of Saarland, Saarbrücken, 66123, Germany
Marcel H Schulz Cluster of Excellence for Multimodal Computing and Interaction, Saarland Informatics Campus, Saarland University, Saarbrücken, 66123, Germany Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany

Collapse

Budden DM, Crampin EJ. Distributed gene expression modelling for exploring variability in epigenetic function. BMC Bioinformatics 2016;17:446. [PMID: 27816056 PMCID: PMC5097851 DOI: 10.1186/s12859-016-1313-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Accepted: 10/25/2016] [Indexed: 11/10/2022] Open

Budden D, Jones M. Cautionary Tales of Inapproximability. J Comput Biol 2016;24:213-216. [PMID: 27608300 DOI: 10.1089/cmb.2016.0097] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Information theoretic approaches for inference of biological networks from continuous-valued data. BMC SYSTEMS BIOLOGY 2016;10:89. [PMID: 27599566 PMCID: PMC5013667 DOI: 10.1186/s12918-016-0331-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2016] [Accepted: 08/23/2016] [Indexed: 01/30/2023]

Chen X, Jung JG, Shajahan-Haq AN, Clarke R, Shih IM, Wang Y, Magnani L, Wang TL, Xuan J. ChIP-BIT: Bayesian inference of target genes using a novel joint probabilistic model of ChIP-seq profiles. Nucleic Acids Res 2016;44:e65. [PMID: 26704972 PMCID: PMC4838354 DOI: 10.1093/nar/gkv1491] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Revised: 11/16/2015] [Accepted: 12/09/2015] [Indexed: 11/16/2022] Open

Kleftogiannis D, Kalnis P, Bajic VB. Progress and challenges in bioinformatics approaches for enhancer identification. Brief Bioinform 2015;17:967-979. [PMID: 26634919 PMCID: PMC5142011 DOI: 10.1093/bib/bbv101] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 10/22/2015] [Indexed: 12/20/2022] Open

Grassi E, Zapparoli E, Molineris I, Provero P. Total Binding Affinity Profiles of Regulatory Regions Predict Transcription Factor Binding and Gene Expression in Human Cells. PLoS One 2015;10:e0143627. [PMID: 26599758 PMCID: PMC4658012 DOI: 10.1371/journal.pone.0143627] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Accepted: 11/07/2015] [Indexed: 11/29/2022] Open

FlexDM: Simple, parallel and fault-tolerant data mining using WEKA. SOURCE CODE FOR BIOLOGY AND MEDICINE 2015;10:13. [PMID: 26579209 PMCID: PMC4647584 DOI: 10.1186/s13029-015-0045-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2015] [Accepted: 11/09/2015] [Indexed: 12/03/2022]

Abstract

Background

With the continued exponential growth in data volume, large-scale data mining and machine learning experiments have become a necessity for many researchers without programming or statistics backgrounds. WEKA (Waikato Environment for Knowledge Analysis) is a gold standard framework that facilitates and simplifies this task by allowing specification of algorithms, hyper-parameters and test strategies from a streamlined Experimenter GUI. Despite its popularity, the WEKA Experimenter exhibits several limitations that we address in our new FlexDM software.

Results

FlexDM addresses four fundamental limitations with the WEKA Experimenter: reliance on a verbose and difficult-to-modify XML schema; inability to meta-optimise experiments over a large number of algorithm hyper-parameters; inability to recover from software or hardware failure during a large experiment; and failing to leverage modern multicore processor architectures. Direct comparisons between the FlexDM and default WEKA XML schemas demonstrate a 10-fold improvement in brevity for a specification that allows finer control of experimental procedures. The stability of FlexDM has been tested on a large biological dataset (approximately 450 k attributes by 150 samples), and automatic parallelisation of tasks yields a quasi-linear reduction in execution time when distributed across multiple processor cores.

Conclusion

FlexDM is a powerful and easy-to-use extension to the WEKA package, which better handles the increased volume and complexity of data that has emerged during the 20 years since WEKA’s original development. FlexDM has been tested on Windows, OSX and Linux operating systems and is provided as a pre-configured virtual reference environment for trivial usage and extensibility. This software can substantially improve the productivity of any research group conducting large-scale data mining or machine learning tasks, in addition to providing non-programmers with improved control over specific aspects of their data analysis pipeline via a succinct and simplified XML schema.

Electronic supplementary material

The online version of this article (doi:10.1186/s13029-015-0045-3) contains supplementary material, which is available to authorized users.

Collapse

Narang V, Ramli MA, Singhal A, Kumar P, de Libero G, Poidinger M, Monterola C. Automated Identification of Core Regulatory Genes in Human Gene Regulatory Networks. PLoS Comput Biol 2015;11:e1004504. [PMID: 26393364 PMCID: PMC4578944 DOI: 10.1371/journal.pcbi.1004504] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 08/11/2015] [Indexed: 12/20/2022] Open

Abstract

Human gene regulatory networks (GRN) can be difficult to interpret due to a tangle of edges interconnecting thousands of genes. We constructed a general human GRN from extensive transcription factor and microRNA target data obtained from public databases. In a subnetwork of this GRN that is active during estrogen stimulation of MCF-7 breast cancer cells, we benchmarked automated algorithms for identifying core regulatory genes (transcription factors and microRNAs). Among these algorithms, we identified K-core decomposition, pagerank and betweenness centrality algorithms as the most effective for discovering core regulatory genes in the network evaluated based on previously known roles of these genes in MCF-7 biology as well as in their ability to explain the up or down expression status of up to 70% of the remaining genes. Finally, we validated the use of K-core algorithm for organizing the GRN in an easier to interpret layered hierarchy where more influential regulatory genes percolate towards the inner layers. The integrated human gene and miRNA network and software used in this study are provided as supplementary materials (S1 Data) accompanying this manuscript.

A gene regulatory network (GRN) represents how some genes encoding regulatory molecules such as transcription factors or microRNAs regulate the expression of other genes. Researchers commonly study GRNs involved in a specific biological process with the aim of identifying a few important regulatory genes. In higher organisms such as humans, a regulatory gene regulates multiple target genes and correspondingly any gene is regulated by multiple regulatory genes. Due to such multiplicity of interactions, a GRN usually resembles a tangled hairball wherein it is difficult to identify few most influential regulatory genes. In this study, we show that network analysis algorithms such as K-core, pagerank and betweenness centrality are useful for identifying a few important or core regulatory genes in a GRN, and the K-core algorithm is also useful for organizing regulatory genes in a hierarchical layered structure where the most influential genes in a GRN are found within the innermost layer or core. These few core regulatory genes determine to a large extent the expression status of the remaining genes in the network. We illustrate a pragmatic application of this technique to GRNs reconstructed from genome-wide gene expression measurements in the MCF-7 human breast cancer cell line.

Collapse

Stegmayer G, Pividori M, Milone DH. A very simple and fast way to access and validate algorithms in reproducible research. Brief Bioinform 2015. [DOI: 10.1093/bib/bbv054] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open

Budden DM, Hurley DG, Crampin EJ. Modelling the conditional regulatory activity of methylated and bivalent promoters. Epigenetics Chromatin 2015;8:21. [PMID: 26097508 PMCID: PMC4474576 DOI: 10.1186/s13072-015-0013-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 06/10/2015] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Predictive modelling of gene expression is a powerful framework for the in silico exploration of transcriptional regulatory interactions through the integration of high-throughput -omics data. A major limitation of previous approaches is their inability to handle conditional interactions that emerge when genes are subject to different regulatory mechanisms. Although chromatin immunoprecipitation-based histone modification data are often used as proxies for chromatin accessibility, the association between these variables and expression often depends upon the presence of other epigenetic markers (e.g. DNA methylation or histone variants). These conditional interactions are poorly handled by previous predictive models and reduce the reliability of downstream biological inference.

RESULTS

We have previously demonstrated that integrating both transcription factor and histone modification data within a single predictive model is rendered ineffective by their statistical redundancy. In this study, we evaluate four proposed methods for quantifying gene-level DNA methylation levels and demonstrate that inclusion of these data in predictive modelling frameworks is also subject to this critical limitation in data integration. Based on the hypothesis that statistical redundancy in epigenetic data is caused by conditional regulatory interactions within a dynamic chromatin context, we construct a new gene expression model which is the first to improve prediction accuracy by unsupervised identification of latent regulatory classes. We show that DNA methylation and H2A.Z histone variant data can be interpreted in this way to identify and explore the signatures of silenced and bivalent promoters, substantially improving genome-wide predictions of mRNA transcript abundance and downstream biological inference across multiple cell lines.

CONCLUSIONS

Previous models of gene expression have been applied successfully to several important problems in molecular biology, including the discovery of transcription factor roles, identification of regulatory elements responsible for differential expression patterns and comparative analysis of the transcriptome across distant species. Our analysis supports our hypothesis that statistical redundancy in epigenetic data is partially due to conditional relationships between these regulators and gene expression levels. This analysis provides insight into the heterogeneous roles of H3K4me3 and H3K27me3 in the presence of the H2A.Z histone variant (implicated in cancer progression) and how these signatures change during lineage commitment and carcinogenesis.

Collapse

Hurley DG, Budden DM, Crampin EJ. Virtual Reference Environments: a simple way to make research reproducible. Brief Bioinform 2014;16:901-3. [PMID: 25433467 PMCID: PMC4570198 DOI: 10.1093/bib/bbu043] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2014] [Indexed: 11/18/2022] Open

Budden DM, Hurley DG, Cursons J, Markham JF, Davis MJ, Crampin EJ. Predicting expression: the complementary power of histone modification and transcription factor binding data. Epigenetics Chromatin 2014;7:36. [PMID: 25489339 PMCID: PMC4258808 DOI: 10.1186/1756-8935-7-36] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Accepted: 11/05/2014] [Indexed: 01/01/2023] Open

Abstract

Background

Transcription factors (TFs) and histone modifications (HMs) play critical roles in gene expression by regulating mRNA transcription. Modelling frameworks have been developed to integrate high-throughput omics data, with the aim of elucidating the regulatory logic that results from the interactions of DNA, TFs and HMs. These models have yielded an unexpected and poorly understood result: that TFs and HMs are statistically redundant in explaining mRNA transcript abundance at a genome-wide level.

Results

We constructed predictive models of gene expression by integrating RNA-sequencing, TF and HM chromatin immunoprecipitation sequencing and DNase I hypersensitivity data for two mammalian cell types. All models identified genome-wide statistical redundancy both within and between TFs and HMs, as previously reported. To investigate potential explanations, groups of genes were constructed for ontology-classified biological processes. Predictive models were constructed for each process to explore the distribution of statistical redundancy. We found significant variation in the predictive capacity of TFs and HMs across these processes and demonstrated the predictive power of HMs to be inversely proportional to process enrichment for housekeeping genes.

Conclusions

It is well established that the roles played by TFs and HMs are not functionally redundant. Instead, we attribute the statistical redundancy reported in this and previous genome-wide modelling studies to the heterogeneous distribution of HMs across chromatin domains. Furthermore, we conclude that statistical redundancy between individual TFs can be readily explained by nucleosome-mediated cooperative binding. This could possibly help the cell confer regulatory robustness by rejecting signalling noise and allowing control via multiple pathways.

Electronic supplementary material

The online version of this article (doi:10.1186/1756-8935-7-36) contains supplementary material, which is available to authorized users.

Collapse