1
|
Oriol F, Alberto M, Joachim AP, Patrick G, M BP, Ruben MF, Jaume B, Altair CH, Ferran P, Oriol G, Narcis FF, Baldo O. Structure-based learning to predict and model protein-DNA interactions and transcription-factor co-operativity in cis-regulatory elements. NAR Genom Bioinform 2024; 6:lqae068. [PMID: 38867914 PMCID: PMC11167492 DOI: 10.1093/nargab/lqae068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 04/18/2024] [Accepted: 05/23/2024] [Indexed: 06/14/2024] Open
Abstract
Transcription factor (TF) binding is a key component of genomic regulation. There are numerous high-throughput experimental methods to characterize TF-DNA binding specificities. Their application, however, is both laborious and expensive, which makes profiling all TFs challenging. For instance, the binding preferences of ∼25% human TFs remain unknown; they neither have been determined experimentally nor inferred computationally. We introduce a structure-based learning approach to predict the binding preferences of TFs and the automated modelling of TF regulatory complexes. We show the advantage of using our approach over the classical nearest-neighbor prediction in the limits of remote homology. Starting from a TF sequence or structure, we predict binding preferences in the form of motifs that are then used to scan a DNA sequence for occurrences. The best matches are either profiled with a binding score or collected for their subsequent modeling into a higher-order regulatory complex with DNA. Co-operativity is modelled by: (i) the co-localization of TFs and (ii) the structural modeling of protein-protein interactions between TFs and with co-factors. We have applied our approach to automatically model the interferon-β enhanceosome and the pioneering complexes of OCT4, SOX2 (or SOX11) and KLF4 with a nucleosome, which are compared with the experimentally known structures.
Collapse
Affiliation(s)
- Fornes Oriol
- Centre for Molecular Medicine and Therapeutics. BC Children's Hospital Research Institute. Department of Medical Genetics. University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| | - Meseguer Alberto
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | | | - Gohl Patrick
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Bota Patricia M
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Molina-Fernández Ruben
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Bonet Jaume
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
- Laboratory of Protein Design & Immunoengineering. School of Engineering. Ecole Polytechnique Federale de Lausanne. Lausanne 1015, Vaud, Switzerland
| | - Chinchilla-Hernandez Altair
- Live-Cell Structural Biology. Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Pegenaute Ferran
- Live-Cell Structural Biology. Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Gallego Oriol
- Live-Cell Structural Biology. Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Fernandez-Fuentes Narcis
- Institute of Biological, Environmental and Rural Science. Aberystwyth University, SY23 3DA Aberystwyth, UK
| | - Oliva Baldo
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| |
Collapse
|
2
|
Pandey U, Behara SM, Sharma S, Patil RS, Nambiar S, Koner D, Bhukya H. DeePNAP: A Deep Learning Method to Predict Protein-Nucleic Acid Binding Affinity from Their Sequences. J Chem Inf Model 2024; 64:1806-1815. [PMID: 38458968 DOI: 10.1021/acs.jcim.3c01151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2024]
Abstract
Predicting the protein-nucleic acid (PNA) binding affinity solely from their sequences is of paramount importance for the experimental design and analysis of PNA interactions (PNAIs). A large number of currently developed models for binding affinity prediction are limited to specific PNAIs while also relying on the sequence and structural information of the PNA complexes for both training and testing, and also as inputs. As the PNA complex structures available are scarce, this significantly limits the diversity and generalizability due to the small training data set. Additionally, a majority of the tools predict a single parameter, such as binding affinity or free energy changes upon mutations, rendering a model less versatile for usage. Hence, we propose DeePNAP, a machine learning-based model built from a vast and heterogeneous data set with 14,401 entries (from both eukaryotes and prokaryotes) from the ProNAB database, consisting of wild-type and mutant PNA complex binding parameters. Our model precisely predicts the binding affinity and free energy changes due to the mutation(s) of PNAIs exclusively from their sequences. While other similar tools extract features from both sequence and structure information, DeePNAP employs sequence-based features to yield high correlation coefficients between the predicted and experimental values with low root mean squared errors for PNA complexes in predicting KD and ΔΔG, implying the generalizability of DeePNAP. Additionally, we have also developed a web interface hosting DeePNAP that can serve as a powerful tool to rapidly predict binding affinities for a myriad of PNAIs with high precision toward developing a deeper understanding of their implications in various biological systems. Web interface: http://14.139.174.41:8080/.
Collapse
Affiliation(s)
- Uddeshya Pandey
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Sasi M Behara
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Siddhant Sharma
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Rachit S Patil
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Souparnika Nambiar
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Debasish Koner
- Department of Chemistry, Indian Institute of Technology Hyderabad, Kandi 502284, India
| | - Hussain Bhukya
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| |
Collapse
|
3
|
Roldán-Piñero C, Luengo-Márquez J, Assenza S, Pérez R. Systematic Comparison of Atomistic Force Fields for the Mechanical Properties of Double-Stranded DNA. J Chem Theory Comput 2024; 20:2261-2272. [PMID: 38411091 PMCID: PMC10938644 DOI: 10.1021/acs.jctc.3c01089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 02/14/2024] [Accepted: 02/14/2024] [Indexed: 02/28/2024]
Abstract
The response of double-stranded DNA to external mechanical stress plays a central role in its interactions with the protein machinery in the cell. Modern atomistic force fields have been shown to provide highly accurate predictions for the fine structural features of the duplex. In contrast, and despite their pivotal function, less attention has been devoted to the accuracy of the prediction of the elastic parameters. Several reports have addressed the flexibility of double-stranded DNA via all-atom molecular dynamics, yet the collected information is insufficient to have a clear understanding of the relative performance of the various force fields. In this work, we fill this gap by performing a systematic study in which several systems, characterized by different sequence contexts, are simulated with the most popular force fields within the AMBER family, bcs1 and OL15, as well as with CHARMM36. Analysis of our results, together with their comparison with previous work focused on bsc0, allows us to unveil the differences in the predicted rigidity between the newest force fields and suggests a roadmap to test their performance against experiments. In the case of the stretch modulus, we reconcile these differences, showing that a single mapping between sequence-dependent conformation and elasticity via the crookedness parameter captures simultaneously the results of all force fields, supporting the key role of crookedness in the mechanical response of double-stranded DNA.
Collapse
Affiliation(s)
- Carlos Roldán-Piñero
- Departamento
de Física Teórica de la Materia Condensada, Universidad Autónoma de Madrid, E-28049 Madrid, Spain
| | - Juan Luengo-Márquez
- Departamento
de Física Teórica de la Materia Condensada, Universidad Autónoma de Madrid, E-28049 Madrid, Spain
- Instituto
Nicolás Cabrera, Universidad Autónoma
de Madrid, E-28049 Madrid, Spain
| | - Salvatore Assenza
- Departamento
de Física Teórica de la Materia Condensada, Universidad Autónoma de Madrid, E-28049 Madrid, Spain
- Instituto
Nicolás Cabrera, Universidad Autónoma
de Madrid, E-28049 Madrid, Spain
- Condensed
Matter Physics Center (IFIMAC), Universidad
Autónoma de Madrid, E-28049 Madrid, Spain
| | - Rubén Pérez
- Departamento
de Física Teórica de la Materia Condensada, Universidad Autónoma de Madrid, E-28049 Madrid, Spain
- Condensed
Matter Physics Center (IFIMAC), Universidad
Autónoma de Madrid, E-28049 Madrid, Spain
| |
Collapse
|
4
|
Kabir A, Bhattarai M, Rasmussen KØ, Shehu A, Bishop AR, Alexandrov B, Usheva A. Advancing Transcription Factor Binding Site Prediction Using DNA Breathing Dynamics and Sequence Transformers via Cross Attention. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.16.575935. [PMID: 38293094 PMCID: PMC10827174 DOI: 10.1101/2024.01.16.575935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Understanding the impact of genomic variants on transcription factor binding and gene regulation remains a key area of research, with implications for unraveling the complex mechanisms underlying various functional effects. Our study delves into the role of DNA's biophysical properties, including thermodynamic stability, shape, and flexibility in transcription factor (TF) binding. We developed a multi-modal deep learning model integrating these properties with DNA sequence data. Trained on ChIP-Seq (chromatin immunoprecipitation sequencing) data in vivo involving 690 TF-DNA binding events in human genome, our model significantly improves prediction performance in over 660 binding events, with up to 9.6% increase in AUROC metric compared to the baseline model when using no DNA biophysical properties explicitly. Further, we expanded our analysis to in vitro high-throughput Systematic Evolution of Ligands by Exponential enrichment (SELEX) and Protein Binding Microarray (PBM) datasets, comparing our model with established frameworks. The inclusion of DNA breathing features consistently improved TF binding predictions across different cell lines in these datasets. Notably, for complex ChIP-Seq datasets, integrating DNABERT2 with a cross-attention mechanism provided greater predictive capabilities and insights into the mechanisms of disease-related non-coding variants found in genome-wide association studies. This work highlights the importance of DNA biophysical characteristics in TF binding and the effectiveness of multi-modal deep learning models in gene regulation studies.
Collapse
Affiliation(s)
- Anowarul Kabir
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, 87544, NM, USA
- Department of Computer Science, George Mason University, 4400 University Dr, 22030, VA, USA
| | - Manish Bhattarai
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, 87544, NM, USA
| | - Kim Ø Rasmussen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, 87544, NM, USA
| | - Amarda Shehu
- Department of Computer Science, George Mason University, 4400 University Dr, 22030, VA, USA
| | - Alan R Bishop
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, 87544, NM, USA
| | - Boian Alexandrov
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, 87544, NM, USA
| | - Anny Usheva
- Department of Surgery, Brown University, 69 Brown St Box 1822, 02912, RI, USA
| |
Collapse
|
5
|
Li J, Chiu TP, Rohs R. Predicting DNA structure using a deep learning method. Nat Commun 2024; 15:1243. [PMID: 38336958 PMCID: PMC10858265 DOI: 10.1038/s41467-024-45191-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 01/17/2024] [Indexed: 02/12/2024] Open
Abstract
Understanding the mechanisms of protein-DNA binding is critical in comprehending gene regulation. Three-dimensional DNA structure, also described as DNA shape, plays a key role in these mechanisms. In this study, we present a deep learning-based method, Deep DNAshape, that fundamentally changes the current k-mer based high-throughput prediction of DNA shape features by accurately accounting for the influence of extended flanking regions, without the need for extensive molecular simulations or structural biology experiments. By using the Deep DNAshape method, DNA structural features can be predicted for any length and number of DNA sequences in a high-throughput manner, providing an understanding of the effects of flanking regions on DNA structure in a target region of a sequence. The Deep DNAshape method provides access to the influence of distant flanking regions on a region of interest. Our findings reveal that DNA shape readout mechanisms of a core target are quantitatively affected by flanking regions, including extended flanking regions, providing valuable insights into the detailed structural readout mechanisms of protein-DNA binding. Furthermore, when incorporated in machine learning models, the features generated by Deep DNAshape improve the model prediction accuracy. Collectively, Deep DNAshape can serve as versatile and powerful tool for diverse DNA structure-related studies.
Collapse
Affiliation(s)
- Jinsen Li
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA
| | - Tsu-Pei Chiu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA.
- Department of Chemistry, University of Southern California, Los Angeles, CA, 90089, USA.
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA, 90089, USA.
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, USA.
| |
Collapse
|
6
|
Harini K, Sekijima M, Gromiha MM. PRA-Pred: Structure-based prediction of protein-RNA binding affinity. Int J Biol Macromol 2024; 259:129490. [PMID: 38224813 DOI: 10.1016/j.ijbiomac.2024.129490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 01/10/2024] [Accepted: 01/12/2024] [Indexed: 01/17/2024]
Abstract
Understanding crucial factors that affect the binding affinity of protein-RNA complexes is vital for comprehending their recognition mechanisms. This study involved compiling experimentally measured binding affinity (ΔG) values of 217 protein-RNA complexes and extracting numerous structure-based features, considering RNA, protein, and interactions between protein and RNA. Our findings indicate the significance of RNA base-step parameters, interaction energies, number of atomic contacts in the complex, hydrogen bonds, and contact potentials in understanding the binding affinity. Further, we observed that these factors are influenced by the type of RNA strand and the function of the protein in a protein-RNA complex. Multiple regression equations were developed for different classes of complexes to perform the prediction of the binding affinity between the protein and RNA. We evaluated the models using the jack-knife test and achieved an overall correlation 0.77 between the experimental and predicted binding affinities with a mean absolute error of 1.02 kcal/mol. Furthermore, we introduced a web server, PRA-Pred, intended for the prediction of protein-RNA binding affinity, and it is freely accessible through https://web.iitm.ac.in/bioinfo2/prapred/. We propose that our approach could function as a potential resource for investigating protein-RNA recognitions and developing therapeutic strategies.
Collapse
Affiliation(s)
- K Harini
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - M Sekijima
- Department of Computer Science, Tokyo Institute of Technology, Yokohama, Japan
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India; International Research Frontiers Initiative, School of Computing, Tokyo Institute of Technology, Yokohama, 226-8501, Japan; Department of Computer Science, National University of Singapore, Singapore.
| |
Collapse
|
7
|
Li J, Chiu TP, Rohs R. Deep DNAshape: Predicting DNA shape considering extended flanking regions using a deep learning method. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.22.563383. [PMID: 37961633 PMCID: PMC10634709 DOI: 10.1101/2023.10.22.563383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Understanding the mechanisms of protein-DNA binding is critical in comprehending gene regulation. Three-dimensional DNA shape plays a key role in these mechanisms. In this study, we present a deep learning-based method, Deep DNAshape, that fundamentally changes the current k -mer based high-throughput prediction of DNA shape features by accurately accounting for the influence of extended flanking regions, without the need for extensive molecular simulations or structural biology experiments. By using the Deep DNAshape method, refined DNA shape features can be predicted for any length and number of DNA sequences in a high-throughput manner, providing a deeper understanding of the effects of flanking regions on DNA shape in a target region of a sequence. Deep DNAshape method provides access to the influence of distant flanking regions on a region of interest. Our findings reveal that DNA shape readout mechanisms of a core target are quantitatively affected by flanking regions, including extended flanking regions, providing valuable insights into the detailed structural readout mechanisms of protein-DNA binding. Furthermore, when incorporated in machine learning models, the features generated by Deep DNAshape improve the model prediction accuracy. Collectively, Deep DNAshape can serve as a versatile and powerful tool for diverse DNA structure-related studies.
Collapse
|
8
|
Battistini F, Sala A, Hospital A, Orozco M. Sequence-Dependent Properties of the RNA Duplex. J Chem Inf Model 2023; 63:5259-5271. [PMID: 37577978 DOI: 10.1021/acs.jcim.3c00741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Sequence-dependent properties of the DNA duplex have been accurately described using extensive molecular dynamics simulations. The RNA duplex meanwhile─which is typically represented as a sequence-averaged rigid rod─does not benefit from having equivalent molecular dynamics simulations. In this paper, we present a massive simulation effort using a set of ABC-optimized duplexes from which we derived tetramer-resolution properties of the RNA duplex and a simple mesoscopic model that can represent elastic properties of long RNA duplexes. Despite the extreme chemical similarity between DNA and RNA, the local and global elastic properties of the duplexes are very different. DNA duplexes show a complex and nonelastic pattern of flexibility, for instance, while RNA duplexes behave as an elastic system whose deformations can be represented by simple harmonic potentials. In RNA duplexes (RNA2), not only are intra- and interbase pair parameters (equilibrium and mechanical) different from those in the equivalent DNA duplex sequences (DNA2) but the correlations between movements also differ. Simple statements on the relative flexibility or stability of both polymers are meaningless and should be substituted by a more detailed description depending on the sequence and the type of deformation considered.
Collapse
Affiliation(s)
- Federica Battistini
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Baldiri Reixac 10, Barcelona 08028, Spain
- Departament de Bioquímica i Biomedicina. Facultat de Biologia, Universitat de Barcelona, Avgda Diagonal 647, Barcelona 08028, Spain
| | - Alba Sala
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Baldiri Reixac 10, Barcelona 08028, Spain
| | - Adam Hospital
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Baldiri Reixac 10, Barcelona 08028, Spain
| | - Modesto Orozco
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Baldiri Reixac 10, Barcelona 08028, Spain
- Departament de Bioquímica i Biomedicina. Facultat de Biologia, Universitat de Barcelona, Avgda Diagonal 647, Barcelona 08028, Spain
| |
Collapse
|
9
|
Harini K, Kihara D, Michael Gromiha M. PDA-Pred: Predicting the binding affinity of protein-DNA complexes using machine learning techniques and structural features. Methods 2023; 213:10-17. [PMID: 36924867 PMCID: PMC10563387 DOI: 10.1016/j.ymeth.2023.03.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 02/17/2023] [Accepted: 03/11/2023] [Indexed: 03/17/2023] Open
Abstract
Protein-DNA interactions play an important role in various biological processes such as gene expression, replication, and transcription. Understanding the important features that dictate the binding affinity of protein-DNA complexes and predicting their affinities is important for elucidating their recognition mechanisms. In this work, we have collected the experimental binding free energy (ΔG) for a set of 391 Protein-DNA complexes and derived several structure-based features such as interaction energy, contact potentials, volume and surface area of binding site residues, base step parameters of the DNA and contacts between different types of atoms. Our analysis on relationship between binding affinity and structural features revealed that the important factors mainly depend on the number of DNA strands as well as functional and structural classes of proteins. Specifically, binding site properties such as number of atom contacts between the DNA and protein, volume of protein binding sites and interaction-based features such as interaction energies and contact potentials are important to understand the binding affinity. Further, we developed multiple regression equations for predicting the binding affinity of protein-DNA complexes belonging to different structural and functional classes. Our method showed an average correlation and mean absolute error of 0.78 and 0.98 kcal/mol, respectively, between the experimental and predicted binding affinities on a jack-knife test. We have developed a webserver, PDA-PreD (Protein-DNA Binding affinity predictor), for predicting the affinity of protein-DNA complexes and it is freely available at https://web.iitm.ac.in/bioinfo2/pdapred/.
Collapse
Affiliation(s)
- K Harini
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States; Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India; International Research Frontiers Initiative, School of Computing, Tokyo Institute of Technology, Yokohama 226-8501, Japan.
| |
Collapse
|
10
|
Esmaeeli R, Bauzá A, Perez A. Structural predictions of protein-DNA binding: MELD-DNA. Nucleic Acids Res 2023; 51:1625-1636. [PMID: 36727436 PMCID: PMC9976882 DOI: 10.1093/nar/gkad013] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 12/27/2022] [Accepted: 01/30/2023] [Indexed: 02/03/2023] Open
Abstract
Structural, regulatory and enzymatic proteins interact with DNA to maintain a healthy and functional genome. Yet, our structural understanding of how proteins interact with DNA is limited. We present MELD-DNA, a novel computational approach to predict the structures of protein-DNA complexes. The method combines molecular dynamics simulations with general knowledge or experimental information through Bayesian inference. The physical model is sensitive to sequence-dependent properties and conformational changes required for binding, while information accelerates sampling of bound conformations. MELD-DNA can: (i) sample multiple binding modes; (ii) identify the preferred binding mode from the ensembles; and (iii) provide qualitative binding preferences between DNA sequences. We first assess performance on a dataset of 15 protein-DNA complexes and compare it with state-of-the-art methodologies. Furthermore, for three selected complexes, we show sequence dependence effects of binding in MELD predictions. We expect that the results presented herein, together with the freely available software, will impact structural biology (by complementing DNA structural databases) and molecular recognition (by bringing new insights into aspects governing protein-DNA interactions).
Collapse
Affiliation(s)
- Reza Esmaeeli
- Department of Chemistry, Quantum theory project, University of Florida, Gainesville, FL 32611, USA
| | - Antonio Bauzá
- Department of Chemistry, Universitat de les Illes Balears, Palma de Mallorca (Baleares), 07122, Spain
| | - Alberto Perez
- Department of Chemistry, Quantum theory project, University of Florida, Gainesville, FL 32611, USA
| |
Collapse
|