Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

Yang T, Henao R. TAMC: A deep-learning approach to predict motif-centric transcriptional factor binding activity based on ATAC-seq profile. PLoS Comput Biol 2022;18:e1009921. [PMID: 36094959 PMCID: PMC9499209 DOI: 10.1371/journal.pcbi.1009921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 09/22/2022] [Accepted: 08/24/2022] [Indexed: 11/18/2022] Open

Luo K, Zhong J, Safi A, Hong LK, Tewari AK, Song L, Reddy TE, Ma L, Crawford GE, Hartemink AJ. Profiling the quantitative occupancy of myriad transcription factors across conditions by modeling chromatin accessibility data. Genome Res 2022;32:1183-1198. [PMID: 35609992 PMCID: PMC9248881 DOI: 10.1101/gr.272203.120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 05/06/2022] [Indexed: 11/24/2022]

Affiliation(s)

Kaixuan Luo Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA Department of Computer Science, Duke University, Durham, North Carolina 27708, USA Department of Human Genetics, The University of Chicago, Chicago, Illinois 60637, USA
Jianling Zhong Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA Department of Computer Science, Duke University, Durham, North Carolina 27708, USA
Alexias Safi Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA Department of Pediatrics, Duke University Medical Center, Durham, North Carolina 27710, USA
Linda K Hong Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA Department of Pediatrics, Duke University Medical Center, Durham, North Carolina 27710, USA
Alok K Tewari Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
Lingyun Song Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA Department of Pediatrics, Duke University Medical Center, Durham, North Carolina 27710, USA
Timothy E Reddy Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA Department of Biostatistics and Bioinformatics, Durham, North Carolina 27710, USA Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina 27710, USA Department of Biomedical Engineering, Duke University, Durham, North Carolina 27708, USA
Li Ma Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA Department of Statistical Science, Duke University, Durham, North Carolina 27708, USA
Gregory E Crawford Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA Department of Pediatrics, Duke University Medical Center, Durham, North Carolina 27710, USA
Alexander J Hartemink Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA Department of Computer Science, Duke University, Durham, North Carolina 27708, USA Department of Biology, Duke University, Durham, North Carolina 27708, USA

Collapse

Zhang Y, Wang Z, Zeng Y, Liu Y, Xiong S, Wang M, Zhou J, Zou Q. A novel convolution attention model for predicting transcription factor binding sites by combination of sequence and shape. Brief Bioinform 2021;23:6470969. [PMID: 34929739 DOI: 10.1093/bib/bbab525] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/28/2021] [Accepted: 11/13/2021] [Indexed: 12/17/2022] Open

Constructing gene regulatory networks using epigenetic data. NPJ Syst Biol Appl 2021;7:45. [PMID: 34887443 PMCID: PMC8660777 DOI: 10.1038/s41540-021-00208-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 11/01/2021] [Indexed: 12/24/2022] Open

Morrow A, Hughes J, Singh J, Joseph A, Yosef N. Epitome: predicting epigenetic events in novel cell types with multi-cell deep ensemble learning. Nucleic Acids Res 2021;49:e110. [PMID: 34379786 PMCID: PMC8565335 DOI: 10.1093/nar/gkab676] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 07/19/2021] [Accepted: 07/25/2021] [Indexed: 01/04/2023] Open

Zhang Y, Wang Z, Zeng Y, Zhou J, Zou Q. High-resolution transcription factor binding sites prediction improved performance and interpretability by deep learning method. Brief Bioinform 2021;22:6322761. [PMID: 34272562 DOI: 10.1093/bib/bbab273] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 06/19/2021] [Accepted: 06/25/2021] [Indexed: 11/14/2022] Open

Jing F, Zhang SW, Cao Z, Zhang S. An Integrative Framework for Combining Sequence and Epigenomic Data to Predict Transcription Factor Binding Sites Using Deep Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:355-364. [PMID: 30835229 DOI: 10.1109/tcbb.2019.2901789] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Funk CC, Casella AM, Jung S, Richards MA, Rodriguez A, Shannon P, Donovan-Maiye R, Heavner B, Chard K, Xiao Y, Glusman G, Ertekin-Taner N, Golde TE, Toga A, Hood L, Van Horn JD, Kesselman C, Foster I, Madduri R, Price ND, Ament SA. Atlas of Transcription Factor Binding Sites from ENCODE DNase Hypersensitivity Data across 27 Tissue Types. Cell Rep 2020;32:108029. [PMID: 32814038 PMCID: PMC7462736 DOI: 10.1016/j.celrep.2020.108029] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 05/07/2020] [Accepted: 07/22/2020] [Indexed: 12/27/2022] Open

Affiliation(s)

Cory C Funk Institute for Systems Biology, Seattle, WA 98109, USA
Alex M Casella Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA; Medical Scientist Training Program, University of Maryland School of Medicine, Baltimore, MD 21201, USA
Segun Jung Globus, University of Chicago, Chicago, IL 60637, USA
Matthew A Richards Institute for Systems Biology, Seattle, WA 98109, USA
Alex Rodriguez Globus, University of Chicago, Chicago, IL 60637, USA
Paul Shannon Institute for Systems Biology, Seattle, WA 98109, USA
Rory Donovan-Maiye Institute for Systems Biology, Seattle, WA 98109, USA
Ben Heavner Institute for Systems Biology, Seattle, WA 98109, USA
Kyle Chard Globus, University of Chicago, Chicago, IL 60637, USA
Yukai Xiao Globus, University of Chicago, Chicago, IL 60637, USA
Gustavo Glusman Institute for Systems Biology, Seattle, WA 98109, USA
Nilufer Ertekin-Taner Mayo Clinic, Department of Neuroscience, Jacksonville, FL 32224, USA
Todd E Golde Mayo Clinic, Department of Neuroscience, Jacksonville, FL 32224, USA
Arthur Toga Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA 90033, USA
Leroy Hood Institute for Systems Biology, Seattle, WA 98109, USA
John D Van Horn Department of Psychology, University of Southern California, Los Angeles, CA 90007, USA
Carl Kesselman Information Sciences Institute, University of Southern California, Los Angeles, CA 90292, USA
Ian Foster Globus, University of Chicago, Chicago, IL 60637, USA; Data Science and Learning Division, Argonne National Laboratory, Argonne, IL 60439, USA
Ravi Madduri Globus, University of Chicago, Chicago, IL 60637, USA; Data Science and Learning Division, Argonne National Laboratory, Argonne, IL 60439, USA.
Nathan D Price Institute for Systems Biology, Seattle, WA 98109, USA.
Seth A Ament Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA; Department of Psychiatry, University of Maryland School of Medicine, Baltimore, MD 21201, USA.

Collapse

Liu Y, Fu L, Kaufmann K, Chen D, Chen M. A practical guide for DNase-seq data analysis: from data management to common applications. Brief Bioinform 2020;20:1865-1877. [PMID: 30010713 DOI: 10.1093/bib/bby057] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 06/06/2018] [Accepted: 06/10/2018] [Indexed: 01/01/2023] Open

Ouyang N, Boyle AP. TRACE: transcription factor footprinting using chromatin accessibility data and DNA sequence. Genome Res 2020;30:1040-1046. [PMID: 32660981 PMCID: PMC7397869 DOI: 10.1101/gr.258228.119] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 06/26/2020] [Indexed: 02/06/2023]

Smith JP, Sheffield NC. Analytical Approaches for ATAC-seq Data Analysis. CURRENT PROTOCOLS IN HUMAN GENETICS 2020;106:e101. [PMID: 32543102 PMCID: PMC8191135 DOI: 10.1002/cphg.101] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Yevshin I, Sharipov R, Kolmykov S, Kondrakhin Y, Kolpakov F. GTRD: a database on gene transcription regulation-2019 update. Nucleic Acids Res 2020;47:D100-D105. [PMID: 30445619 PMCID: PMC6323985 DOI: 10.1093/nar/gky1128] [Citation(s) in RCA: 143] [Impact Index Per Article: 35.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 10/26/2018] [Indexed: 01/16/2023] Open

Yan F, Powell DR, Curtis DJ, Wong NC. From reads to insight: a hitchhiker's guide to ATAC-seq data analysis. Genome Biol 2020;21:22. [PMID: 32014034 PMCID: PMC6996192 DOI: 10.1186/s13059-020-1929-3] [Citation(s) in RCA: 204] [Impact Index Per Article: 51.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Accepted: 01/08/2020] [Indexed: 12/16/2022] Open

Behjati Ardakani F, Schmidt F, Schulz MH. Predicting transcription factor binding using ensemble random forest models. F1000Res 2019;7:1603. [PMID: 31723409 PMCID: PMC6823902 DOI: 10.12688/f1000research.16200.2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/15/2019] [Indexed: 12/03/2022] Open

Youn A, Marquez EJ, Lawlor N, Stitzel ML, Ucar D. BiFET: sequencing Bias-free transcription factor Footprint Enrichment Test. Nucleic Acids Res 2019;47:e11. [PMID: 30428075 PMCID: PMC6344870 DOI: 10.1093/nar/gky1117] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Accepted: 10/23/2018] [Indexed: 01/15/2023] Open

Kuang Z, Ji Z, Boeke JD, Ji H. Dynamic motif occupancy (DynaMO) analysis identifies transcription factors and their binding sites driving dynamic biological processes. Nucleic Acids Res 2019;46:e2. [PMID: 29325176 PMCID: PMC5758894 DOI: 10.1093/nar/gkx905] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2016] [Accepted: 09/26/2017] [Indexed: 01/02/2023] Open

Oh KS, Ha J, Baek S, Sung MH. XL-DNase-seq: improved footprinting of dynamic transcription factors. Epigenetics Chromatin 2019;12:30. [PMID: 31164146 PMCID: PMC6547507 DOI: 10.1186/s13072-019-0277-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Accepted: 05/17/2019] [Indexed: 02/08/2023] Open

Karabacak Calviello A, Hirsekorn A, Wurmus R, Yusuf D, Ohler U. Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling. Genome Biol 2019;20:42. [PMID: 30791920 PMCID: PMC6385462 DOI: 10.1186/s13059-019-1654-y] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 02/13/2019] [Indexed: 01/01/2023] Open

Li H, Quang D, Guan Y. Anchor: trans-cell type prediction of transcription factor binding sites. Genome Res 2019;29:281-292. [PMID: 30567711 PMCID: PMC6360811 DOI: 10.1101/gr.237156.118] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 12/13/2018] [Indexed: 12/16/2022]

Keilwagen J, Posch S, Grau J. Accurate prediction of cell type-specific transcription factor binding. Genome Biol 2019;20:9. [PMID: 30630522 PMCID: PMC6327544 DOI: 10.1186/s13059-018-1614-y] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Accepted: 12/18/2018] [Indexed: 01/11/2023] Open

Guo WL, Huang DS. An efficient method to transcription factor binding sites imputation via simultaneous completion of multiple matrices with positional consistency. MOLECULAR BIOSYSTEMS 2018;13:1827-1837. [PMID: 28718849 DOI: 10.1039/c7mb00155j] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Madsen JGS, Rauch A, Van Hauwaert EL, Schmidt SF, Winnefeld M, Mandrup S. Integrated analysis of motif activity and gene expression changes of transcription factors. Genome Res 2018;28:243-255. [PMID: 29233921 PMCID: PMC5793788 DOI: 10.1101/gr.227231.117] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Accepted: 12/01/2017] [Indexed: 01/01/2023]

Martins AL, Walavalkar NM, Anderson WD, Zang C, Guertin MJ. Universal correction of enzymatic sequence bias reveals molecular signatures of protein/DNA interactions. Nucleic Acids Res 2018;46:e9. [PMID: 29126307 PMCID: PMC5778497 DOI: 10.1093/nar/gkx1053] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Revised: 09/19/2017] [Accepted: 10/18/2017] [Indexed: 12/04/2022] Open

Kakumanu A, Velasco S, Mazzoni E, Mahony S. Deconvolving sequence features that discriminate between overlapping regulatory annotations. PLoS Comput Biol 2017;13:e1005795. [PMID: 29049320 PMCID: PMC5663517 DOI: 10.1371/journal.pcbi.1005795] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2017] [Revised: 10/31/2017] [Accepted: 09/26/2017] [Indexed: 11/19/2022] Open

Abstract

Genomic loci with regulatory potential can be annotated with various properties. For example, genomic sites bound by a given transcription factor (TF) can be divided according to whether they are proximal or distal to known promoters. Sites can be further labeled according to the cell types and conditions in which they are active. Given such a collection of labeled sites, it is natural to ask what sequence features are associated with each annotation label. However, discovering such label-specific sequence features is often confounded by overlaps between the labels; e.g. if regulatory sites specific to a given cell type are also more likely to be promoter-proximal, it is difficult to assess whether motifs identified in that set of sites are associated with the cell type or associated with promoters. In order to meet this challenge, we developed SeqUnwinder, a principled approach to deconvolving interpretable discriminative sequence features associated with overlapping annotation labels. We demonstrate the novel analysis abilities of SeqUnwinder using three examples. Firstly, SeqUnwinder is able to unravel sequence features associated with the dynamic binding behavior of TFs during motor neuron programming from features associated with chromatin state in the initial embryonic stem cells. Secondly, we characterize distinct sequence properties of multi-condition and cell-specific TF binding sites after controlling for uneven associations with promoter proximity. Finally, we demonstrate the scalability of SeqUnwinder to discover cell-specific sequence features from over one hundred thousand genomic loci that display DNase I hypersensitivity in one or more ENCODE cell lines.

Transcription factor proteins control gene expression by recognizing and interacting with short DNA sequence patterns in regulatory regions on the genome. Current genomics experiments allow us to find regulatory regions associated with a particular biochemical activity over the entire genome; for example, all regions where a particular transcription factor interacts with the genome in a given cell type. Given a collection of regulatory regions, we often aim to discover short DNA sequence patterns that are more common in the collection than in other regions. Performing such “DNA motif-finding” analysis can give us hints about the patterns that determine gene regulation in the analyzed cell type.

Here we describe a new method for DNA motif-finding called SeqUnwinder. Our approach analyzes collections of regulatory regions where each has been labeled according to various biological properties. For example, the labels could correspond to various cell types in which the regulatory region is active. SeqUnwinder then performs machine-learning analysis to unravel DNA sequence features that are characteristic of each label (e.g. features that distinguish regulatory regions in each cell type from other cell types). SeqUnwinder is the first method to enable analysis of regulatory region collections that contain several overlapping labels.

Collapse

Liu S, Zibetti C, Wan J, Wang G, Blackshaw S, Qian J. Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility. BMC Bioinformatics 2017;18:355. [PMID: 28750606 PMCID: PMC5530957 DOI: 10.1186/s12859-017-1769-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Accepted: 07/19/2017] [Indexed: 12/04/2022] Open

Abstract

Background

Computational prediction of transcription factor (TF) binding sites in different cell types is challenging. Recent technology development allows us to determine the genome-wide chromatin accessibility in various cellular and developmental contexts. The chromatin accessibility profiles provide useful information in prediction of TF binding events in various physiological conditions. Furthermore, ChIP-Seq analysis was used to determine genome-wide binding sites for a range of different TFs in multiple cell types. Integration of these two types of genomic information can improve the prediction of TF binding events.

Results

We assessed to what extent a model built upon on other TFs and/or other cell types could be used to predict the binding sites of TFs of interest. A random forest model was built using a set of cell type-independent features such as specific sequences recognized by the TFs and evolutionary conservation, as well as cell type-specific features derived from chromatin accessibility data. Our analysis suggested that the models learned from other TFs and/or cell lines performed almost as well as the model learned from the target TF in the cell type of interest. Interestingly, models based on multiple TFs performed better than single-TF models. Finally, we proposed a universal model, BPAC, which was generated using ChIP-Seq data from multiple TFs in various cell types.

Conclusion

Integrating chromatin accessibility information with sequence information improves prediction of TF binding.The prediction of TF binding is transferable across TFs and/or cell lines suggesting there are a set of universal “rules”. A computational tool was developed to predict TF binding sites based on the universal “rules”.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-017-1769-7) contains supplementary material, which is available to authorized users.

Collapse

Kehl T, Schneider L, Schmidt F, Stöckel D, Gerstner N, Backes C, Meese E, Keller A, Schulz MH, Lenhof HP. RegulatorTrail: a web service for the identification of key transcriptional regulators. Nucleic Acids Res 2017;45:W146-W153. [PMID: 28472408 PMCID: PMC5570139 DOI: 10.1093/nar/gkx350] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Revised: 04/07/2017] [Accepted: 04/20/2017] [Indexed: 12/14/2022] Open

Quach B, Furey TS. DeFCoM: analysis and modeling of transcription factor binding sites using a motif-centric genomic footprinter. Bioinformatics 2017;33:956-963. [PMID: 27993786 DOI: 10.1093/bioinformatics/btw740] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2016] [Accepted: 11/18/2016] [Indexed: 11/13/2022] Open

Abstract

Motivation

Identifying the locations of transcription factor binding sites is critical for understanding how gene transcription is regulated across different cell types and conditions. Chromatin accessibility experiments such as DNaseI sequencing (DNase-seq) and Assay for Transposase Accessible Chromatin sequencing (ATAC-seq) produce genome-wide data that include distinct 'footprint' patterns at binding sites. Nearly all existing computational methods to detect footprints from these data assume that footprint signals are highly homogeneous across footprint sites. Additionally, a comprehensive and systematic comparison of footprinting methods for specifically identifying which motif sites for a specific factor are bound has not been performed.

Results

Using DNase-seq data from the ENCODE project, we show that a large degree of previously uncharacterized site-to-site variability exists in footprint signal across motif sites for a transcription factor. To model this heterogeneity in the data, we introduce a novel, supervised learning footprinter called Detecting Footprints Containing Motifs (DeFCoM). We compare DeFCoM to nine existing methods using evaluation sets from four human cell-lines and eighteen transcription factors and show that DeFCoM outperforms current methods in determining bound and unbound motif sites. We also analyze the impact of several biological and technical factors on the quality of footprint predictions to highlight important considerations when conducting footprint analyses and assessing the performance of footprint prediction methods. Finally, we show that DeFCoM can detect footprints using ATAC-seq data with similar accuracy as when using DNase-seq data.

Availability and Implementation

Python code available at https://bitbucket.org/bryancquach/defcom.

Contact

bquach@email.unc.edu or tsfurey@email.unc.edu.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Chen X, Yu B, Carriero N, Silva C, Bonneau R. Mocap: large-scale inference of transcription factor binding sites from chromatin accessibility. Nucleic Acids Res 2017;45:4315-4329. [PMID: 28334916 PMCID: PMC5416775 DOI: 10.1093/nar/gkx174] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2016] [Revised: 02/28/2017] [Accepted: 03/06/2017] [Indexed: 12/21/2022] Open

Schmidt F, Gasparoni N, Gasparoni G, Gianmoena K, Cadenas C, Polansky JK, Ebert P, Nordström K, Barann M, Sinha A, Fröhler S, Xiong J, Dehghani Amirabad A, Behjati Ardakani F, Hutter B, Zipprich G, Felder B, Eils J, Brors B, Chen W, Hengstler JG, Hamann A, Lengauer T, Rosenstiel P, Walter J, Schulz MH. Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction. Nucleic Acids Res 2017;45:54-66. [PMID: 27899623 PMCID: PMC5224477 DOI: 10.1093/nar/gkw1061] [Citation(s) in RCA: 73] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 10/18/2016] [Accepted: 10/24/2016] [Indexed: 12/21/2022] Open

Affiliation(s)

Florian Schmidt Cluster of Excellence for Multimodal Computing and Interaction, Saarland Informatics Campus, Saarland University, Saarbrücken, 66123, Germany Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany
Nina Gasparoni Department of Genetics, University of Saarland, Saarbrücken, 66123, Germany
Gilles Gasparoni Department of Genetics, University of Saarland, Saarbrücken, 66123, Germany
Kathrin Gianmoena Leibniz Research Centre for Working Environment and Human Factors IfADo, Dortmund, 44139, Germany
Cristina Cadenas Leibniz Research Centre for Working Environment and Human Factors IfADo, Dortmund, 44139, Germany
Julia K Polansky Experimental Rheumatology, German Rheumatism Research Centre, Berlin, 10117, Germany
Peter Ebert Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany International Max Planck Research School for Computer Science, Saarland Informatics Campus, Saarbrücken, 66123, Germany
Karl Nordström Department of Genetics, University of Saarland, Saarbrücken, 66123, Germany
Matthias Barann Institute of Clinical Molecular Biology, Christian-Albrechts-University, Kiel, 24105, Germany
Anupam Sinha Institute of Clinical Molecular Biology, Christian-Albrechts-University, Kiel, 24105, Germany
Sebastian Fröhler Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Berlin, 13092, Germany
Jieyi Xiong Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Berlin, 13092, Germany
Azim Dehghani Amirabad Cluster of Excellence for Multimodal Computing and Interaction, Saarland Informatics Campus, Saarland University, Saarbrücken, 66123, Germany Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany International Max Planck Research School for Computer Science, Saarland Informatics Campus, Saarbrücken, 66123, Germany
Fatemeh Behjati Ardakani Cluster of Excellence for Multimodal Computing and Interaction, Saarland Informatics Campus, Saarland University, Saarbrücken, 66123, Germany Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany
Barbara Hutter Applied Bioinformatics, Deutsches Krebsforschungszentrum, Heidelberg, 69120, Germany
Gideon Zipprich Data Management and Genomics IT, Deutsches Krebsforschungszentrum, Heidelberg, 69120, Germany
Bärbel Felder Data Management and Genomics IT, Deutsches Krebsforschungszentrum, Heidelberg, 69120, Germany
Jürgen Eils Data Management and Genomics IT, Deutsches Krebsforschungszentrum, Heidelberg, 69120, Germany
Benedikt Brors Applied Bioinformatics, Deutsches Krebsforschungszentrum, Heidelberg, 69120, Germany
Wei Chen Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Berlin, 13092, Germany
Jan G Hengstler Leibniz Research Centre for Working Environment and Human Factors IfADo, Dortmund, 44139, Germany
Alf Hamann International Max Planck Research School for Computer Science, Saarland Informatics Campus, Saarbrücken, 66123, Germany
Thomas Lengauer Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany
Philip Rosenstiel Institute of Clinical Molecular Biology, Christian-Albrechts-University, Kiel, 24105, Germany
Jörn Walter Department of Genetics, University of Saarland, Saarbrücken, 66123, Germany
Marcel H Schulz Cluster of Excellence for Multimodal Computing and Interaction, Saarland Informatics Campus, Saarland University, Saarbrücken, 66123, Germany Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany

Collapse

Liu B, Wang S, Dong Q, Li S, Liu X. Identification of DNA-binding proteins by combining auto-cross covariance transformation and ensemble learning. IEEE Trans Nanobioscience 2016;15:328-334. [PMID: 28113908 DOI: 10.1109/tnb.2016.2555951] [Citation(s) in RCA: 65] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Jankowski A, Tiuryn J, Prabhakar S. Romulus: robust multi-state identification of transcription factor binding sites from DNase-seq data. Bioinformatics 2016;32:2419-26. [PMID: 27153645 PMCID: PMC4978937 DOI: 10.1093/bioinformatics/btw209] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Accepted: 04/12/2016] [Indexed: 12/24/2022] Open

Gusmao EG, Allhoff M, Zenke M, Costa IG. Analysis of computational footprinting methods for DNase sequencing experiments. Nat Methods 2016;13:303-9. [PMID: 26901649 DOI: 10.1038/nmeth.3772] [Citation(s) in RCA: 98] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Accepted: 01/27/2016] [Indexed: 12/26/2022]

Vierstra J, Stamatoyannopoulos JA. Genomic footprinting. Nat Methods 2016;13:213-21. [DOI: 10.1038/nmeth.3768] [Citation(s) in RCA: 76] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Accepted: 01/13/2016] [Indexed: 01/08/2023]

Kumar S, Bucher P. Predicting transcription factor site occupancy using DNA sequence intrinsic and cell-type specific chromatin features. BMC Bioinformatics 2016;17 Suppl 1:4. [PMID: 26818008 PMCID: PMC4895346 DOI: 10.1186/s12859-015-0846-z] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Abstract

Background

Understanding the mechanisms by which transcription factors (TF) are recruited to their physiological target sites is crucial for understanding gene regulation. DNA sequence intrinsic features such as predicted binding affinity are often not very effective in predicting in vivo site occupancy and in any case could not explain cell-type specific binding events. Recent reports show that chromatin accessibility, nucleosome occupancy and specific histone post-translational modifications greatly influence TF site occupancy in vivo. In this work, we use machine-learning methods to build predictive models and assess the relative importance of different sequence-intrinsic and chromatin features in the TF-to-target-site recruitment process.

Methods

Our study primarily relies on recent data published by the ENCODE consortium. Five dissimilar TFs assayed in multiple cell-types were selected as examples: CTCF, JunD, REST, GABP and USF2. We used two types of candidate target sites: (a) predicted sites obtained by scanning the whole genome with a position weight matrix, and (b) cell-type specific peak lists provided by ENCODE. Quantitative in vivo occupancy levels in different cell-types were based on ChIP-seq data for the corresponding TFs. In parallel, we computed a number of associated sequence-intrinsic and experimental features (histone modification, DNase I hypersensitivity, etc.) for each site. Machine learning algorithms were then used in a binary classification and regression framework to predict site occupancy and binding strength, for the purpose of assessing the relative importance of different contextual features.

Results

We observed striking differences in the feature importance rankings between the five factors tested. PWM-scores were amongst the most important features only for CTCF and REST but of little value for JunD and USF2. Chromatin accessibility and active histone marks are potent predictors for all factors except REST. Structural DNA parameters, repressive and gene body associated histone marks are generally of little or no predictive value.

Conclusions

We define a general and extensible computational framework for analyzing the importance of various DNA-intrinsic and chromatin-associated features in determining cell-type specific TF binding to target sites. The application of our methodology to ENCODE data has led to new insights on transcription regulatory processes and may serve as example for future studies encompassing even larger datasets.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0846-z) contains supplementary material, which is available to authorized users.

Collapse

Madrigal P. On Accounting for Sequence-Specific Bias in Genome-Wide Chromatin Accessibility Experiments: Recent Advances and Contradictions. Front Bioeng Biotechnol 2015;3:144. [PMID: 26442258 PMCID: PMC4585268 DOI: 10.3389/fbioe.2015.00144] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2015] [Accepted: 09/07/2015] [Indexed: 11/13/2022] Open