1
|
Liu Z, Qiu WR, Liu Y, Yan H, Pei W, Zhu YH, Qiu J. A comprehensive review of computational methods for Protein-DNA binding site prediction. Anal Biochem 2025; 703:115862. [PMID: 40209920 DOI: 10.1016/j.ab.2025.115862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Revised: 03/20/2025] [Accepted: 04/06/2025] [Indexed: 04/12/2025]
Abstract
Accurately identifying protein-DNA binding sites is essential for understanding the molecular mechanisms underlying biological processes, which in turn facilitates advancements in drug discovery and design. While biochemical experiments provide the most accurate way to locate DNA-binding sites, they are generally time-consuming, resource-intensive, and expensive. There is a pressing need to develop computational methods that are both efficient and accurate for DNA-binding site prediction. This study thoroughly reviews and categorizes major computational approaches for predicting DNA-binding sites, including template detection, statistical machine learning, and deep learning-based methods. The 14 state-of-the-art DNA-binding site prediction models have been benchmarked on 136 non-redundant proteins, where the deep learning-based, especially pre-trained large language model-based, methods achieve superior performance over the other two categories. Applications of these DNA-binding site prediction methods are also involved.
Collapse
Affiliation(s)
- Zi Liu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Wang-Ren Qiu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Yan Liu
- Department of Computer Science, Yangzhou University, 196 Huayang West Road, Yangzhou, 225100, China
| | - He Yan
- College of Information Science and Technology & Artificial Intelligence, Nanjing Forestry University, 159 Longpanlu Road, Nanjing, 210037, China
| | - Wenyi Pei
- Geriatric Department, Shanghai Baoshan District Wusong Central Hospital, 101 Tongtai North Road, Shanghai, 200940, China.
| | - Yi-Heng Zhu
- College of Artificial Intelligence, Nanjing Agricultural University, 1 Weigang Road, Nanjing, 210095, China.
| | - Jing Qiu
- Information Department, The First Affiliated Hospital of Naval Medical University, 168 Changhai Road, Shanghai, 200433, China.
| |
Collapse
|
2
|
Yue Y, Li Q, Chen C, Yang J, Song W, Zhou C, Cui Y, Wei Z, He Q, Wang C, Lin H, Li J, Li J, Xi J, Song X, Yang W, Zhang Z, Shu W, Guo L, Wang S. Purine nucleoside phosphorylase dominates Influenza A virus replication and host hyperinflammation through purine salvage. Signal Transduct Target Ther 2025; 10:191. [PMID: 40517177 PMCID: PMC12167387 DOI: 10.1038/s41392-025-02272-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2024] [Revised: 04/21/2025] [Accepted: 05/18/2025] [Indexed: 06/16/2025] Open
Abstract
Influenza A virus (IAV) poses a significant threat to human health. The outcome of IAV results from the viral-host interaction, with the underlying molecular mechanisms largely unknown. By integrating the plasma proteomics data of the IAV-infected patients into the viral-inflammation protein-protein interaction (VI-PPI) network created in this study, purine nucleoside phosphorylase (PNP), the critical enzyme in purine salvage, was identified as a potential hub gene that connected the different stages of IAV infection. Extended survival rates and reduced pulmonary inflammatory lesions were observed in alveolar epithelial cell (AEC)-specific PNP conditional knockout mice upon H1N1 infection. Mechanistically, PB1-F2 of IAV was revealed as a novel viral transcriptional factor to bind to the TATA box of PNP promoter, leading to enhanced purine salvage in H1N1-challenged AECs. The activation of PNP-mediated purine salvage was verified in IAV-infected patients and A549 cells. PNP knockdown elicited a purine metabolic shift from augmented salvage pathway to de novo synthesis, constraining both viral infection and pro-inflammatory signaling through APRT-AICAR-AMPK activation. Moreover, durdihydroartemisinin (DHA), predicted by VI-PPI as a novel PNP inhibitor, exerted beneficial effects on the survival and weight gain of H1N1-challenged mice via its direct binding to PNP. To reveal for the first time, we found that PNP, activated by IAV, plays a hub role within H1N1-host interaction, simultaneously modulating viral replication and hyperinflammation through purine salvage. Our study sheds new light on a "two-for-one" strategy by targeting purine salvage in combating IAV-related pathology, suggesting PNP as a potential novel anti-influenza host target.
Collapse
Affiliation(s)
- Yang Yue
- Bioinformatics Center of AMMS, Beijing, China
| | - Qingyu Li
- Bioinformatics Center of AMMS, Beijing, China
| | - Changguo Chen
- The Sixth Medical Center of Chinese, PLA General Hospital, Beijing, China
| | - Juntao Yang
- State Key Laboratory of Common Mechanism Research for Major Diseases, Institute of Basic Medical Sciences, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, China
| | - Weian Song
- The Sixth Medical Center of Chinese, PLA General Hospital, Beijing, China
| | | | - Yuke Cui
- Bioinformatics Center of AMMS, Beijing, China
| | | | - Qi He
- Bioinformatics Center of AMMS, Beijing, China
| | | | - Hongjun Lin
- Bioinformatics Center of AMMS, Beijing, China
| | - Jiangbo Li
- Bioinformatics Center of AMMS, Beijing, China
| | - Jian Li
- Bioinformatics Center of AMMS, Beijing, China
| | - Ji Xi
- Bioinformatics Center of AMMS, Beijing, China
| | - Xiang Song
- Bioinformatics Center of AMMS, Beijing, China
| | - Wen Yang
- Bioinformatics Center of AMMS, Beijing, China
| | - Ze Zhang
- The Sixth Medical Center of Chinese, PLA General Hospital, Beijing, China
| | - Wenjie Shu
- Bioinformatics Center of AMMS, Beijing, China.
| | - Liang Guo
- Bioinformatics Center of AMMS, Beijing, China.
| | | |
Collapse
|
3
|
Horton JR, Yu M, Zhou J, Tran M, Anakal RR, Lu Y, Blumenthal RM, Zhang X, Huang Y, Zhang X, Cheng X. Multimeric transcription factor BCL11A utilizes two zinc-finger tandem arrays to bind clustered short sequence motifs. Nat Commun 2025; 16:3672. [PMID: 40246927 PMCID: PMC12006351 DOI: 10.1038/s41467-025-58998-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Accepted: 04/08/2025] [Indexed: 04/19/2025] Open
Abstract
BCL11A, a transcription factor, is vital for hematopoiesis, including B and T cell maturation and the fetal-to-adult hemoglobin switch. Mutations in BCL11A are linked to neurodevelopmental disorders. BCL11A contains two DNA-binding zinc-finger arrays, low-affinity ZF2-3 and high-affinity ZF4-6, separated by a 300-amino-acid linker. ZF2-3 and ZF4-5 share 73% identity, including five out of six DNA base-interacting residues. These arrays bind similar short sequence motifs in clusters, with the linker enabling a broader binding span. Crystallographic structures of ZF4-6, in complex with oligonucleotides from the β-globin locus region, reveal DNA sequence recognition by residues Asn756 (ZF4), Lys784 and Arg787 (ZF5). A Lys784-to-Thr mutation, linked to a neurodevelopmental disorder with persistent fetal globin expression, reduces DNA binding over 10-fold but gains interaction with a variable base pair. BCL11A isoforms may form oligomers, enhancing chromatin occupancy and repressor functions by allowing multiple copies of both low- and high-affinity ZF arrays to bind DNA. These distinctive properties, apparently conserved among vertebrates, provide essential functional flexibility to this crucial regulator.
Collapse
Affiliation(s)
- John R Horton
- Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Meigen Yu
- Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Jujun Zhou
- Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Melody Tran
- Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Rithvi R Anakal
- Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Yue Lu
- Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Robert M Blumenthal
- Department of Medical Microbiology and Immunology, and Program in Bioinformatics, The University of Toledo College of Medicine and Life Sciences, Toledo, OH, 43614, USA
| | - Xiaotian Zhang
- Department of Biochemistry and Molecular Biology, The University of Texas Health Science Center Houston, McGovern Medical School, Houston, TX, 77030, USA
| | - Yun Huang
- Center for Epigenetics and Disease Prevention, Institute of Biosciences and Technology, College of Medicine, Texas A&M University, Houston, TX, 77030, USA
| | - Xing Zhang
- Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Xiaodong Cheng
- Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.
| |
Collapse
|
4
|
Gerasimavicius L, Teichmann SA, Marsh JA. Leveraging protein structural information to improve variant effect prediction. Curr Opin Struct Biol 2025; 92:103023. [PMID: 39987793 DOI: 10.1016/j.sbi.2025.103023] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Revised: 12/17/2024] [Accepted: 02/05/2025] [Indexed: 02/25/2025]
Abstract
Despite massive sequencing efforts, understanding the difference between human pathogenic and benign variants remains a challenge. Computational variant effect predictors (VEPs) have emerged as essential tools for assessing the impact of genetic variants, although their performance varies. Initially, sequence-based methods dominated the field, but recent advances, particularly in protein structure prediction technologies like AlphaFold, have led to an increased utilization of structural information by VEPs aimed at scoring human missense variants. This review highlights the progress in integrating structural information into VEPs, showcasing novel models such as AlphaMissense, PrimateAI-3D, and CPT-1 that demonstrate improved variant evaluation. Structural data offers more interpretability, especially for non-loss-of-function variants, and provides insights into complex variant interactions in vivo. As the field advances, utilizing biomolecular complex structures will be pivotal for future VEP development, with recent breakthroughs in protein-ligand and protein-nucleic acid complex prediction offering new avenues.
Collapse
Affiliation(s)
- Lukas Gerasimavicius
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom
| | - Sarah A Teichmann
- Cambridge Stem Cell Institute & Dept Medicine, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, United Kingdom; Canadian Institute for Advanced Research, Toronto, Canada
| | - Joseph A Marsh
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom.
| |
Collapse
|
5
|
Rothwell S, Ng I, Shalchy-Tabrizi S, Kalinowski P, Taha OM, Paris I, Baniqued A, Lin L, Mezei MM, Lehman A, Julian LM, Poburko D. Loss-of-function mitochondrial DNA polymerase gamma variants cause vascular smooth muscle cells to secrete a diffusible mitogenic factor. Front Physiol 2025; 15:1488248. [PMID: 40034369 PMCID: PMC11873068 DOI: 10.3389/fphys.2024.1488248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Accepted: 11/26/2024] [Indexed: 03/05/2025] Open
Abstract
Introduction Mitochondrial dysfunction promotes vascular aging and disease through diverse mechanisms beyond metabolic supply, including calcium and radical signaling and inflammation. Mitochondrial DNA (mtDNA) replication by the POLG-encoded mitochondrial DNA polymerase (POLG) is critical for mitochondrial health. Loss-of-function POLG variants are associated with a predisposition to hypertension. We hypothesized that impaired POLG, through reduced mtDNA copy number or other mechanisms, would promote smooth muscle hypertrophy or hyperplasia that drives vascular remodeling associated with hypertension. Methods We characterized the effect of over-expressing POLG variants that were previously observed in a cohort of hypertensive patients (p.Tyr955Cys, p.Arg964Cys, p.Asn1098Ile, and p.Arg1138Cys) in A7r5 cells. Results AlphaFold modeling of the POLG holoenzyme complexed with DNA predicted changes in the catalytic site in the p.Tyr955Cys and p.Asn1098Ile variants, while p.Arg964Cys and p.Arg1138Cys showed minimal effects. The POLG variants reduced mtDNA copy number, assessed by immunofluorescence and droplet digital PCR, by up to 27% in the order p.Tyr955Cys > p.Arg964Cys > p.Asn1098Ile > p.Arg1138Cys relative to wild-type-transfected cultures. Loss of mtDNA was reduced in cultures grown in low serum and glucose media, but the cell density was increased in the same rank order in both 10% serum and 1% serum. POLG constructs contained a Myc epitope, the counterstaining for which showed that the mtDNA copy number was reduced in both transfected cells and untransfected neighbors. Live-cell imaging of mitochondrial membrane potential with TMRM and radical oxygen species production with MitoSOX showed little effect of the POLG variants. POLG variants had little effect on oxygen consumption, assessed by Seahorse assay. Live-cell imaging growth analyses again showed increased growth in A7r5 cells transfected with p.Tyr955Cys but a decreased growth with p.Arg1138Cys, while p.Tyr955Cys increased growth of HeLa cells. Conditioned media from HeLa cells transfected with POLG variants reduced doubling times in naïve cultures. Pharmacologically, wedelolactone and MitoTEMPOL, but not indomethacin or PD98059, suppressed the mitogenic effects of p.Tyr955Cys and p.Arg964Cys in A7r5 cells. Discussion We conclude that POLG dysfunction induces secretion of a mitogenic signal from A7r5 and HeLa cells even when changes in mtDNA copy number are below the limit of detection. Such mitogenic stimulation could stimulate hypertrophic remodeling that could contribute to drug-resistant hypertension in patient populations with loss-of-function POLG variants.
Collapse
Affiliation(s)
- Samantha Rothwell
- Biomedical Physiology and Kinesiology, Simon Fraser University, Burnaby, BC, Canada
| | - Irvin Ng
- Biomedical Physiology and Kinesiology, Simon Fraser University, Burnaby, BC, Canada
| | | | - Pola Kalinowski
- Biomedical Physiology and Kinesiology, Simon Fraser University, Burnaby, BC, Canada
| | - Omnia M. Taha
- Biomedical Physiology and Kinesiology, Simon Fraser University, Burnaby, BC, Canada
| | - Italia Paris
- Biomedical Physiology and Kinesiology, Simon Fraser University, Burnaby, BC, Canada
| | - Angelica Baniqued
- Biomedical Physiology and Kinesiology, Simon Fraser University, Burnaby, BC, Canada
| | - Lisa Lin
- Biological Sciences, Simon Fraser University, Burnaby, BC, Canada
- Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
- Centre for Cell Biology Development and Disease, Simon Fraser University, Burnaby, BC, Canada
- Institute for Neuroscience and Neurotechnology, Simon Fraser University, Burnaby, BC, Canada
| | - Michelle M. Mezei
- Adult Metabolic Diseases Unit, Vancouver General Hospital, Vancouver, BC, Canada
- Division of Neurology, University of British Columbia, Vancouver, BC, Canada
| | - Anna Lehman
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Lisa M. Julian
- Biological Sciences, Simon Fraser University, Burnaby, BC, Canada
- Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
- Centre for Cell Biology Development and Disease, Simon Fraser University, Burnaby, BC, Canada
- Institute for Neuroscience and Neurotechnology, Simon Fraser University, Burnaby, BC, Canada
| | - Damon Poburko
- Biomedical Physiology and Kinesiology, Simon Fraser University, Burnaby, BC, Canada
- Centre for Cell Biology Development and Disease, Simon Fraser University, Burnaby, BC, Canada
| |
Collapse
|
6
|
Mitra R, Cohen AS, Tang WY, Hosseini H, Hong Y, Berman HM, Rohs R. RNAproDB: A Webserver and Interactive Database for Analyzing Protein-RNA Interactions. J Mol Biol 2025:169012. [PMID: 40126909 DOI: 10.1016/j.jmb.2025.169012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2024] [Revised: 02/09/2025] [Accepted: 02/12/2025] [Indexed: 03/26/2025]
Abstract
We present RNAproDB (https://rnaprodb.usc.edu/), a new webserver, analysis pipeline, database, and highly interactive visualization tool, designed for protein-RNA complexes, and applicable to all forms of nucleic acid containing structures. RNAproDB computes several mapping schemes to place nucleic acid components and present protein-RNA interactions appropriately. Various structural annotations are computed including non-canonical base-pairing geometries, hydrogen bonds, and protein-RNA and RNA-RNA water-mediated interactions. This information is presented through integrated visualization and data tools. Subgraph selection facilitates studying smaller components of the interface. Molecular surface electrostatic potential can be visualized. RNAproDB enables analyzing and exploring experimentally determined, predicted, and designed protein-nucleic acid complexes. We present a quantitative analysis of pre-analyzed protein-RNA structures in RNAproDB revealing statistical patterns of molecular binding and recognition.
Collapse
Affiliation(s)
- Raktim Mitra
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Ari S Cohen
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Wei Yu Tang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Hirad Hosseini
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Yongchan Hong
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Helen M Berman
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA; Department of Chemistry, University of Southern California, Los Angeles, CA 90089, USA; Department of Physics & Astronomy, University of Southern California, Los Angeles, CA 90089, USA; Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA; Department of Medicine, Division of Medical Oncology, University of Southern California, Los Angeles, CA 90033, USA.
| |
Collapse
|
7
|
Wang Y, Li J, Chiu TP, Gompel N, Rohs R. DNAdesign: feature-aware in silico design of synthetic DNA through mutation. Bioinformatics 2025; 41:btaf052. [PMID: 39891349 PMCID: PMC11825384 DOI: 10.1093/bioinformatics/btaf052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Accepted: 01/29/2025] [Indexed: 02/03/2025] Open
Abstract
MOTIVATION DNA sequence and shape readout represent different modes of protein-DNA recognition. Current tools lack the functionality to simultaneously consider alterations in different readout modes caused by sequence mutations. DNAdesign is a web-based tool to compare and design mutations based on both DNA sequence and shape characteristics. Users input a wild-type sequence, select sites to introduce mutations and choose a set of DNA shape parameters for mutation design. RESULTS DNAdesign utilizes Deep DNAshape to provide ultra-fast predictions of DNA shape based on extended k-mers and offers multiple encoding methods for nucleotide sequences, including the physicochemical encoding of DNA through their functional groups in the major and minor groove. DNAdesign provides all mutation candidates along the sequence and shape dimensions, with interactive visualization comparing each candidate with the wild-type DNA molecule. DNAdesign provides an approach to studying gene regulation and applications in synthetic biology, such as the design of synthetic enhancers and transcription factor binding sites. AVAILABILITY AND IMPLEMENTATION The DNAdesign webserver and documentation are freely accessible at https://dnadesign.usc.edu.
Collapse
Affiliation(s)
- Yingfei Wang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, United States
| | - Jinsen Li
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, United States
| | - Tsu-Pei Chiu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, United States
| | - Nicolas Gompel
- Department of Evolutionary Biology and Ecology, Bonn Institute for Organismic Biology, University of Bonn, Bonn 53115, Germany
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, United States
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089, United States
- Department of Physics & Astronomy, University of Southern California, Los Angeles, CA 90089, United States
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA 90089, United States
- Division of Medical Oncology, Department of Medicine, University of Southern California, Los Angeles, CA 90033, United States
| |
Collapse
|
8
|
Schaepe JM, Fries T, Doughty BR, Crocker OJ, Hinks MM, Marklund E, Greenleaf WJ. Thermodynamic principles link in vitro transcription factor affinities to single-molecule chromatin states in cells. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.27.635162. [PMID: 39975040 PMCID: PMC11838358 DOI: 10.1101/2025.01.27.635162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
The molecular details governing transcription factor (TF) binding and the formation of accessible chromatin are not yet quantitatively understood - including how sequence context modulates affinity, how TFs search DNA, the kinetics of TF occupancy, and how motif grammars coordinate binding. To resolve these questions for a human TF, erythroid Krüppel-like factor (eKLF/KLF1), we quantitatively compare, in high throughput, in vitro TF binding rates and affinities with in vivo single molecule TF and nucleosome occupancies across engineered DNA sequences. We find that 40-fold flanking sequence effects on affinity are consistent with distal flanks tuning TF search parameters and captured by a linear energy model. Motif recognition probability, rather than time in the bound state, drives affinity changes, and in vitro and in nuclei measurements exhibit consistent, minutes-long TF residence times. Finally, pairing in vitro biophysical parameters with thermodynamic models accurately predicts in vivo single-molecule chromatin states for unseen motif grammars.
Collapse
Affiliation(s)
- Julia M Schaepe
- Bioengineering Department, Stanford University, Stanford, CA 94305, USA
| | - Torbjörn Fries
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | | | - Olivia J Crocker
- Genetics Department, Stanford University, Stanford, CA 94305, USA
| | - Michaela M Hinks
- Bioengineering Department, Stanford University, Stanford, CA 94305, USA
| | - Emil Marklund
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - William J Greenleaf
- Genetics Department, Stanford University, Stanford, CA 94305, USA
- Department of Applied Physics, Stanford University, Stanford, CA 94205, USA
| |
Collapse
|
9
|
Butt W, Lai B, Chiu TP, Bhattarai M, Qian S, Bishop AR, Duan J, Alexandrov BS, Rohs R, He X. Contribution of DNA breathing to physical interactions with transcription factors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.20.633840. [PMID: 39896490 PMCID: PMC11785057 DOI: 10.1101/2025.01.20.633840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2025]
Abstract
Interaction between transcription factors (TFs) and DNA plays a key role in regulating gene expression. It is generally believed that these interactions are controlled through recognition of DNA core motifs by TFs. Nevertheless, several studies pointed out the limitation of this view, in particular, DNA sequence variants influencing TF binding are often located outside of core motifs. One possible explanation is that the physical properties of DNA may play a role in TF-DNA interactions. Recent studies have supported the importance of DNA shape features, especially in flanking regions of core motifs. Another important physical property of DNA is DNA breathing, the spontaneous opening of double-stranded DNA through thermal motions. But there have been few genomic studies of the role of DNA breathing in TF-DNA interactions. In this work, we analyzed in vitro TF-DNA binding data of three TFs and found that DNA breathing features inside or near core motifs are correlated with binding affinity. This suggests that these TFs may prefer locally and temporally melted DNA formed through breathing. We extended the analysis to 44 TFs with in vivo ChIP-seq binding data. We found that for a large proportion of TFs, their breathing features in or near core motifs are associated with binding, but the sign and magnitude of these associations vary substantially across TF families. Altogether, our study supports the hypothesis that DNA breathing features near binding motifs contribute to TF-DNA interactions.
Collapse
Affiliation(s)
- Waqaas Butt
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Ben Lai
- Toyota Technology Institute of Chicago, Chicago, Illinois, United States of America
| | - Tsu-Pei Chiu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - Manish Bhattarai
- Theoretical Division, Los Alamos National Lab, Los Alamos, New Mexico, United States of America
| | - Sheng Qian
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Alan R. Bishop
- Theoretical Division, Los Alamos National Lab, Los Alamos, New Mexico, United States of America
| | - Jubao Duan
- Center for Psychiatric Genetics, NorthShore University HealthSystem Research Institute, Chicago, Illinois, United States of America
| | - Boian S. Alexandrov
- Theoretical Division, Los Alamos National Lab, Los Alamos, New Mexico, United States of America
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, United States of America
- Departments of Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, California, United States of America
| | - Xin He
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
10
|
Mitra R, Cohen AS, Sagendorf JM, Berman HM, Rohs R. DNAproDB: an updated database for the automated and interactive analysis of protein-DNA complexes. Nucleic Acids Res 2025; 53:D396-D402. [PMID: 39494533 PMCID: PMC11701736 DOI: 10.1093/nar/gkae970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2024] [Revised: 10/07/2024] [Accepted: 10/11/2024] [Indexed: 11/05/2024] Open
Abstract
DNAproDB (https://dnaprodb.usc.edu/) is a database, visualization tool, and processing pipeline for analyzing structural features of protein-DNA interactions. Here, we present a substantially updated version of the database through additional structural annotations, search, and user interface functionalities. The update expands the number of pre-analyzed protein-DNA structures, which are automatically updated weekly. The analysis pipeline identifies water-mediated hydrogen bonds that are incorporated into the visualizations of protein-DNA complexes. Tertiary structure-aware nucleotide layouts are now available. New file formats and external database annotations are supported. The website has been redesigned, and interacting with graphs and data is more intuitive. We also present a statistical analysis on the updated collection of structures revealing salient patterns in protein-DNA interactions.
Collapse
Affiliation(s)
- Raktim Mitra
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Ari S Cohen
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Jared M Sagendorf
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Helen M Berman
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089, USA
- Department of Physics & Astronomy, University of Southern California, Los Angeles, CA 90089, USA
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
11
|
Basu S, Yu J, Kihara D, Kurgan L. Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences. Brief Bioinform 2024; 26:bbaf016. [PMID: 39833102 PMCID: PMC11745544 DOI: 10.1093/bib/bbaf016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Revised: 12/24/2024] [Accepted: 01/06/2025] [Indexed: 01/22/2025] Open
Abstract
Computational prediction of nucleic acid-binding residues in protein sequences is an active field of research, with over 80 methods that were released in the past 2 decades. We identify and discuss 87 sequence-based predictors that include dozens of recently published methods that are surveyed for the first time. We overview historical progress and examine multiple practical issues that include availability and impact of predictors, key features of their predictive models, and important aspects related to their training and assessment. We observe that the past decade has brought increased use of deep neural networks and protein language models, which contributed to substantial gains in the predictive performance. We also highlight advancements in vital and challenging issues that include cross-predictions between deoxyribonucleic acid (DNA)-binding and ribonucleic acid (RNA)-binding residues and targeting the two distinct sources of binding annotations, structure-based versus intrinsic disorder-based. The methods trained on the structure-annotated interactions tend to perform poorly on the disorder-annotated binding and vice versa, with only a few methods that target and perform well across both annotation types. The cross-predictions are a significant problem, with some predictors of DNA-binding or RNA-binding residues indiscriminately predicting interactions with both nucleic acid types. Moreover, we show that methods with web servers are cited substantially more than tools without implementation or with no longer working implementations, motivating the development and long-term maintenance of the web servers. We close by discussing future research directions that aim to drive further progress in this area.
Collapse
Affiliation(s)
- Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Richmond, VA 23284, United States
| | - Jing Yu
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Richmond, VA 23284, United States
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, 915 Mitch Daniels Boulevard, West Lafayette, IN 47907, United States
- Department of Computer Science, Purdue University, 305 N. University Street, West Lafayette, IN 47907, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Richmond, VA 23284, United States
| |
Collapse
|
12
|
Li M, Chen Z, Huo YX. Application Evaluation and Performance-Directed Improvement of the Native and Engineered Biosensors. ACS Sens 2024; 9:5002-5024. [PMID: 39392681 DOI: 10.1021/acssensors.4c01072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Transcription factor (TF)-based biosensors (TFBs) have received considerable attention in various fields due to their capability of converting biosignals, such as molecule concentrations, into analyzable signals, thereby bypassing the dependence on time-consuming and laborious detection techniques. Natural TFs are evolutionarily optimized to maintain microbial survival and metabolic balance rather than for laboratory scenarios. As a result, native TFBs often exhibit poor performance, such as low specificity, narrow dynamic range, and limited sensitivity, hindering their application in laboratory and industrial settings. This work analyzes four types of regulatory mechanisms underlying TFBs and outlines strategies for constructing efficient sensing systems. Recent advances in TFBs across various usage scenarios are reviewed with a particular focus on the challenges of commercialization. The systematic improvement of TFB performance by modifying the constituent elements is thoroughly discussed. Additionally, we propose future directions of TFBs for developing rapid-responsive biosensors and addressing the challenge of application isolation. Furthermore, we look to the potential of artificial intelligence (AI) technologies and various models for programming TFB genetic circuits. This review sheds light on technical suggestions and fundamental instructions for constructing and engineering TFBs to promote their broader applications in Industry 4.0, including smart biomanufacturing, environmental and food contaminants detection, and medical science.
Collapse
Affiliation(s)
- Min Li
- Department of Gastroenterology, Aerospace Center Hospital, College of Life Science, Beijing Institute of Technology, Haidian District, No. 5 South Zhongguancun Street, Beijing 100081, China
| | - Zhenya Chen
- Department of Gastroenterology, Aerospace Center Hospital, College of Life Science, Beijing Institute of Technology, Haidian District, No. 5 South Zhongguancun Street, Beijing 100081, China
- Center for Future Foods, Muyuan Laboratory, 110 Shangding Road, Zhengzhou, Henan 450016, China
| | - Yi-Xin Huo
- Department of Gastroenterology, Aerospace Center Hospital, College of Life Science, Beijing Institute of Technology, Haidian District, No. 5 South Zhongguancun Street, Beijing 100081, China
- Center for Future Foods, Muyuan Laboratory, 110 Shangding Road, Zhengzhou, Henan 450016, China
| |
Collapse
|