1
|
Butt W, Lai B, Chiu TP, Bhattarai M, Qian S, Bishop AR, Duan J, Alexandrov BS, Rohs R, He X. Contribution of DNA breathing to physical interactions with transcription factors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.20.633840. [PMID: 39896490 PMCID: PMC11785057 DOI: 10.1101/2025.01.20.633840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2025]
Abstract
Interaction between transcription factors (TFs) and DNA plays a key role in regulating gene expression. It is generally believed that these interactions are controlled through recognition of DNA core motifs by TFs. Nevertheless, several studies pointed out the limitation of this view, in particular, DNA sequence variants influencing TF binding are often located outside of core motifs. One possible explanation is that the physical properties of DNA may play a role in TF-DNA interactions. Recent studies have supported the importance of DNA shape features, especially in flanking regions of core motifs. Another important physical property of DNA is DNA breathing, the spontaneous opening of double-stranded DNA through thermal motions. But there have been few genomic studies of the role of DNA breathing in TF-DNA interactions. In this work, we analyzed in vitro TF-DNA binding data of three TFs and found that DNA breathing features inside or near core motifs are correlated with binding affinity. This suggests that these TFs may prefer locally and temporally melted DNA formed through breathing. We extended the analysis to 44 TFs with in vivo ChIP-seq binding data. We found that for a large proportion of TFs, their breathing features in or near core motifs are associated with binding, but the sign and magnitude of these associations vary substantially across TF families. Altogether, our study supports the hypothesis that DNA breathing features near binding motifs contribute to TF-DNA interactions.
Collapse
Affiliation(s)
- Waqaas Butt
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Ben Lai
- Toyota Technology Institute of Chicago, Chicago, Illinois, United States of America
| | - Tsu-Pei Chiu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - Manish Bhattarai
- Theoretical Division, Los Alamos National Lab, Los Alamos, New Mexico, United States of America
| | - Sheng Qian
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Alan R. Bishop
- Theoretical Division, Los Alamos National Lab, Los Alamos, New Mexico, United States of America
| | - Jubao Duan
- Center for Psychiatric Genetics, NorthShore University HealthSystem Research Institute, Chicago, Illinois, United States of America
| | - Boian S. Alexandrov
- Theoretical Division, Los Alamos National Lab, Los Alamos, New Mexico, United States of America
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, United States of America
- Departments of Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, California, United States of America
| | - Xin He
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
2
|
Wu S, Xu J, Guo JT. Accurate prediction of nucleic acid binding proteins using protein language model. BIOINFORMATICS ADVANCES 2025; 5:vbaf008. [PMID: 39990254 PMCID: PMC11845279 DOI: 10.1093/bioadv/vbaf008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2024] [Revised: 12/20/2024] [Accepted: 01/15/2025] [Indexed: 02/25/2025]
Abstract
Motivation Nucleic acid binding proteins (NABPs) play critical roles in various and essential biological processes. Many machine learning-based methods have been developed to predict different types of NABPs. However, most of these studies have limited applications in predicting the types of NABPs for any given protein with unknown functions, due to several factors such as dataset construction, prediction scope and features used for training and testing. In addition, single-stranded DNA binding proteins (DBP) (SSBs) have not been extensively investigated for identifying novel SSBs from proteins with unknown functions. Results To improve prediction accuracy of different types of NABPs for any given protein, we developed hierarchical and multi-class models with machine learning-based methods and a feature extracted from protein language model ESM2. Our results show that by combining the feature from ESM2 and machine learning methods, we can achieve high prediction accuracy up to 95% for each stage in the hierarchical approach, and 85% for overall prediction accuracy from the multi-class approach. More importantly, besides the much improved prediction of other types of NABPs, the models can be used to accurately predict single-stranded DBPs, which is underexplored. Availability and implementation The datasets and code can be found at https://figshare.com/projects/Prediction_of_nucleic_acid_binding_proteins_using_protein_language_model/211555.
Collapse
Affiliation(s)
- Siwen Wu
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, United States
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL 60637, United States
| | - Jun-tao Guo
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, United States
| |
Collapse
|
3
|
Mitra R, Cohen AS, Sagendorf JM, Berman HM, Rohs R. DNAproDB: an updated database for the automated and interactive analysis of protein-DNA complexes. Nucleic Acids Res 2025; 53:D396-D402. [PMID: 39494533 PMCID: PMC11701736 DOI: 10.1093/nar/gkae970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2024] [Revised: 10/07/2024] [Accepted: 10/11/2024] [Indexed: 11/05/2024] Open
Abstract
DNAproDB (https://dnaprodb.usc.edu/) is a database, visualization tool, and processing pipeline for analyzing structural features of protein-DNA interactions. Here, we present a substantially updated version of the database through additional structural annotations, search, and user interface functionalities. The update expands the number of pre-analyzed protein-DNA structures, which are automatically updated weekly. The analysis pipeline identifies water-mediated hydrogen bonds that are incorporated into the visualizations of protein-DNA complexes. Tertiary structure-aware nucleotide layouts are now available. New file formats and external database annotations are supported. The website has been redesigned, and interacting with graphs and data is more intuitive. We also present a statistical analysis on the updated collection of structures revealing salient patterns in protein-DNA interactions.
Collapse
Affiliation(s)
- Raktim Mitra
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Ari S Cohen
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Jared M Sagendorf
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Helen M Berman
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089, USA
- Department of Physics & Astronomy, University of Southern California, Los Angeles, CA 90089, USA
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
4
|
Basu S, Yu J, Kihara D, Kurgan L. Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences. Brief Bioinform 2024; 26:bbaf016. [PMID: 39833102 PMCID: PMC11745544 DOI: 10.1093/bib/bbaf016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Revised: 12/24/2024] [Accepted: 01/06/2025] [Indexed: 01/22/2025] Open
Abstract
Computational prediction of nucleic acid-binding residues in protein sequences is an active field of research, with over 80 methods that were released in the past 2 decades. We identify and discuss 87 sequence-based predictors that include dozens of recently published methods that are surveyed for the first time. We overview historical progress and examine multiple practical issues that include availability and impact of predictors, key features of their predictive models, and important aspects related to their training and assessment. We observe that the past decade has brought increased use of deep neural networks and protein language models, which contributed to substantial gains in the predictive performance. We also highlight advancements in vital and challenging issues that include cross-predictions between deoxyribonucleic acid (DNA)-binding and ribonucleic acid (RNA)-binding residues and targeting the two distinct sources of binding annotations, structure-based versus intrinsic disorder-based. The methods trained on the structure-annotated interactions tend to perform poorly on the disorder-annotated binding and vice versa, with only a few methods that target and perform well across both annotation types. The cross-predictions are a significant problem, with some predictors of DNA-binding or RNA-binding residues indiscriminately predicting interactions with both nucleic acid types. Moreover, we show that methods with web servers are cited substantially more than tools without implementation or with no longer working implementations, motivating the development and long-term maintenance of the web servers. We close by discussing future research directions that aim to drive further progress in this area.
Collapse
Affiliation(s)
- Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Richmond, VA 23284, United States
| | - Jing Yu
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Richmond, VA 23284, United States
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, 915 Mitch Daniels Boulevard, West Lafayette, IN 47907, United States
- Department of Computer Science, Purdue University, 305 N. University Street, West Lafayette, IN 47907, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Richmond, VA 23284, United States
| |
Collapse
|
5
|
Maddocks JH, Dans PD, Cheatham TH, Harris S, Laughton C, Orozco M, Pollack L, Olson WK. Special issue: Multiscale simulations of DNA from electrons to nucleosomes. Biophys Rev 2024; 16:259-262. [PMID: 39099838 PMCID: PMC11296990 DOI: 10.1007/s12551-024-01204-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/06/2024] Open
Abstract
This editorial for Volume 16, Issue 3 of Biophysical Reviews highlights the three-dimensional structural and dynamic information encoded in DNA sequences and introduces the topics covered in this special issue of the journal on Multiscale Simulations of DNA from Electrons to Nucleosomes. Biophysical Reviews is the official journal of the International Union for Pure and Applied Biophysics (IUPAB 2024). The international scope of the articles in the issue exemplifies the goals of IUPAB to organize worldwide advancements, co-operation, communication, and education in biophysics.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Wilma K. Olson
- Rutgers, the State University of New Jersey, Piscataway, NJ USA
| |
Collapse
|