1
|
Yang J, Ma A, Hoppe AD, Wang C, Li Y, Zhang C, Wang Y, Liu B, Ma Q. Prediction of regulatory motifs from human Chip-sequencing data using a deep learning framework. Nucleic Acids Res 2019; 47:7809-7824. [PMID: 31372637 PMCID: PMC6735894 DOI: 10.1093/nar/gkz672] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Accepted: 07/23/2019] [Indexed: 11/24/2022] Open
Abstract
The identification of transcription factor binding sites and cis-regulatory motifs is a frontier whereupon the rules governing protein–DNA binding are being revealed. Here, we developed a new method (DEep Sequence and Shape mOtif or DESSO) for cis-regulatory motif prediction using deep neural networks and the binomial distribution model. DESSO outperformed existing tools, including DeepBind, in predicting motifs in 690 human ENCODE ChIP-sequencing datasets. Furthermore, the deep-learning framework of DESSO expanded motif discovery beyond the state-of-the-art by allowing the identification of known and new protein–protein–DNA tethering interactions in human transcription factors (TFs). Specifically, 61 putative tethering interactions were identified among the 100 TFs expressed in the K562 cell line. In this work, the power of DESSO was further expanded by integrating the detection of DNA shape features. We found that shape information has strong predictive power for TF–DNA binding and provides new putative shape motif information for human TFs. Thus, DESSO improves in the identification and structural analysis of TF binding sites, by integrating the complexities of DNA binding into a deep-learning framework.
Collapse
Affiliation(s)
- Jinyu Yang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA.,Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, TX 76010, USA
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Adam D Hoppe
- Department of Chemistry and Biochemistry, South Dakota State University, Brookings, SD 57007, USA.,BioSNTR, Brookings, SD 57007, USA
| | - Cankun Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Yang Li
- School of Mathematics, Shandong University, Jinan 250100, China
| | - Chi Zhang
- Department of Medical and Molecular Genetics, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
| | - Yan Wang
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan 250100, China
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
2
|
Azad RN, Zafiropoulos D, Ober D, Jiang Y, Chiu TP, Sagendorf JM, Rohs R, Tullius TD. Experimental maps of DNA structure at nucleotide resolution distinguish intrinsic from protein-induced DNA deformations. Nucleic Acids Res 2019; 46:2636-2647. [PMID: 29390080 PMCID: PMC5946862 DOI: 10.1093/nar/gky033] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2017] [Accepted: 01/15/2018] [Indexed: 12/22/2022] Open
Abstract
Recognition of DNA by proteins depends on DNA sequence and structure. Often unanswered is whether the structure of naked DNA persists in a protein–DNA complex, or whether protein binding changes DNA shape. While X-ray structures of protein–DNA complexes are numerous, the structure of naked cognate DNA is seldom available experimentally. We present here an experimental and computational analysis pipeline that uses hydroxyl radical cleavage to map, at single-nucleotide resolution, DNA minor groove width, a recognition feature widely exploited by proteins. For 11 protein–DNA complexes, we compared experimental maps of naked DNA minor groove width with minor groove width measured from X-ray co-crystal structures. Seven sites had similar minor groove widths as naked DNA and when bound to protein. For four sites, part of the DNA in the complex had the same structure as naked DNA, and part changed structure upon protein binding. We compared the experimental map with minor groove patterns of DNA predicted by two computational approaches, DNAshape and ORChID2, and found good but not perfect concordance with both. This experimental approach will be useful in mapping structures of DNA sequences for which high-resolution structural data are unavailable. This approach allows probing of protein family-dependent readout mechanisms.
Collapse
Affiliation(s)
- Robert N Azad
- Department of Chemistry, Boston University, Boston, MA 02215, USA
| | | | - Douglas Ober
- Department of Chemistry, Boston University, Boston, MA 02215, USA
| | - Yining Jiang
- Department of Chemistry, Boston University, Boston, MA 02215, USA
| | - Tsu-Pei Chiu
- Computational Biology and Bioinformatics Program, Departments of Biological Sciences, Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Jared M Sagendorf
- Computational Biology and Bioinformatics Program, Departments of Biological Sciences, Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Remo Rohs
- Computational Biology and Bioinformatics Program, Departments of Biological Sciences, Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Thomas D Tullius
- Department of Chemistry, Boston University, Boston, MA 02215, USA.,Program in Bioinformatics, Boston University, Boston, MA 02215, USA
| |
Collapse
|
3
|
Khabiri M, Freddolino PL. Deficiencies in Molecular Dynamics Simulation-Based Prediction of Protein-DNA Binding Free Energy Landscapes. J Phys Chem B 2017; 121:5151-5161. [PMID: 28471184 PMCID: PMC5817055 DOI: 10.1021/acs.jpcb.6b12450] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Transcriptional regulation allows cells to match their gene expression profiles to their current requirements based on environment, cellular physiological state, and extracellular signals. DNA binding transcription factors are major agents of transcriptional regulation, and bind to DNA with a factor-specific sequence preference to exert regulatory effects. A crucial step in unraveling the logic of a regulatory network is determining the sequence-specific binding affinity landscapes for the transcription factors in it. While such landscapes can be measured experimentally, the ability to predict them computationally would both reduce the effort required to obtain the needed data and provide additional insight into the key interactions shaping protein-DNA interactions. Here we apply free energy calculations based on all-atom molecular dynamics simulations to predict the changes in binding free energy for all single base pair perturbations of the binding sites for four eukaryotic transcription factors for which high-quality experimental data exist. We find that the simulated results both vastly overestimate the magnitude of changes in binding free energy, and frequently predict the incorrect signs. These simulations will nevertheless serve as a jumping-off point for refining our current representation of protein-DNA interactions to allow quantitative reproduction of experimental data on such systems in the future.
Collapse
Affiliation(s)
- Morteza Khabiri
- Department of Biological Chemistry, University of Michigan Medical School, Ann Arbor, MI USA
| | - Peter L. Freddolino
- Department of Biological Chemistry, University of Michigan Medical School, Ann Arbor, MI USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI USA
| |
Collapse
|