1
|
Rosignoli S, Pacelli M, Manganiello F, Paiardini A. An outlook on structural biology after AlphaFold: tools, limits and perspectives. FEBS Open Bio 2024. [PMID: 39313455 DOI: 10.1002/2211-5463.13902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 08/19/2024] [Accepted: 09/13/2024] [Indexed: 09/25/2024] Open
Abstract
AlphaFold and similar groundbreaking, AI-based tools, have revolutionized the field of structural bioinformatics, with their remarkable accuracy in ab-initio protein structure prediction. This success has catalyzed the development of new software and pipelines aimed at incorporating AlphaFold's predictions, often focusing on addressing the algorithm's remaining challenges. Here, we present the current landscape of structural bioinformatics shaped by AlphaFold, and discuss how the field is dynamically responding to this revolution, with new software, methods, and pipelines. While the excitement around AI-based tools led to their widespread application, it is essential to acknowledge that their practical success hinges on their integration into established protocols within structural bioinformatics, often neglected in the context of AI-driven advancements. Indeed, user-driven intervention is still as pivotal in the structure prediction process as in complementing state-of-the-art algorithms with functional and biological knowledge.
Collapse
Affiliation(s)
- Serena Rosignoli
- Department of Biochemical sciences "A. Rossi Fanelli", Sapienza Università di Roma, Italy
| | - Maddalena Pacelli
- Department of Biochemical sciences "A. Rossi Fanelli", Sapienza Università di Roma, Italy
| | - Francesca Manganiello
- Department of Biochemical sciences "A. Rossi Fanelli", Sapienza Università di Roma, Italy
| | - Alessandro Paiardini
- Department of Biochemical sciences "A. Rossi Fanelli", Sapienza Università di Roma, Italy
| |
Collapse
|
2
|
Mitra R, Li J, Sagendorf JM, Jiang Y, Cohen AS, Chiu TP, Glasscock CJ, Rohs R. Geometric deep learning of protein-DNA binding specificity. Nat Methods 2024; 21:1674-1683. [PMID: 39103447 PMCID: PMC11399107 DOI: 10.1038/s41592-024-02372-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Accepted: 06/14/2024] [Indexed: 08/07/2024]
Abstract
Predicting protein-DNA binding specificity is a challenging yet essential task for understanding gene regulation. Protein-DNA complexes usually exhibit binding to a selected DNA target site, whereas a protein binds, with varying degrees of binding specificity, to a wide range of DNA sequences. This information is not directly accessible in a single structure. Here, to access this information, we present Deep Predictor of Binding Specificity (DeepPBS), a geometric deep-learning model designed to predict binding specificity from protein-DNA structure. DeepPBS can be applied to experimental or predicted structures. Interpretable protein heavy atom importance scores for interface residues can be extracted. When aggregated at the protein residue level, these scores are validated through mutagenesis experiments. Applied to designed proteins targeting specific DNA sequences, DeepPBS was demonstrated to predict experimentally measured binding specificity. DeepPBS offers a foundation for machine-aided studies that advance our understanding of molecular interactions and guide experimental designs and synthetic biology.
Collapse
Affiliation(s)
- Raktim Mitra
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Jinsen Li
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Jared M Sagendorf
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Yibei Jiang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Ari S Cohen
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Tsu-Pei Chiu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Cameron J Glasscock
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
- Department of Chemistry, University of Southern California, Los Angeles, CA, USA.
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA, USA.
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
3
|
Zhang X, Blumenthal RM, Cheng X. Updated understanding of the protein-DNA recognition code used by C2H2 zinc finger proteins. Curr Opin Struct Biol 2024; 87:102836. [PMID: 38754172 DOI: 10.1016/j.sbi.2024.102836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/21/2024] [Accepted: 04/23/2024] [Indexed: 05/18/2024]
Abstract
C2H2 zinc-finger (ZF) proteins form the largest family of DNA-binding transcription factors coded by mammalian genomes. In a typical DNA-binding ZF module, there are twelve residues (numbered from -1 to -12) between the last zinc-coordinating cysteine and the first zinc-coordinating histidine. The established C2H2-ZF "recognition code" suggests that residues at positions -1, -4, and -7 recognize the 5', central, and 3' bases of a DNA base-pair triplet, respectively. Structural studies have highlighted that additional residues at positions -5 and -8 also play roles in specific DNA recognition. The presence of bulky and either charged or polar residues at these five positions determines specificity for given DNA bases: guanine is recognized by arginine, lysine, or histidine; adenine by asparagine or glutamine; thymine or 5-methylcytosine by glutamate; and unmodified cytosine by aspartate. This review discusses recent structural characterizations of C2H2-ZFs that add to our understanding of the principles underlying the C2H2-ZF recognition code.
Collapse
Affiliation(s)
- Xing Zhang
- Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.
| | - Robert M Blumenthal
- Department of Medical Microbiology and Immunology, and Program in Bioinformatics, The University of Toledo College of Medicine and Life Sciences, Toledo, OH 43614, USA.
| | - Xiaodong Cheng
- Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.
| |
Collapse
|
4
|
Yao YM, Miodownik I, O'Hagan MP, Jbara M, Afek A. Deciphering the dynamic code: DNA recognition by transcription factors in the ever-changing genome. Transcription 2024:1-25. [PMID: 39033307 DOI: 10.1080/21541264.2024.2379161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 07/03/2024] [Indexed: 07/23/2024] Open
Abstract
Transcription factors (TFs) intricately navigate the vast genomic landscape to locate and bind specific DNA sequences for the regulation of gene expression programs. These interactions occur within a dynamic cellular environment, where both DNA and TF proteins experience continual chemical and structural perturbations, including epigenetic modifications, DNA damage, mechanical stress, and post-translational modifications (PTMs). While many of these factors impact TF-DNA binding interactions, understanding their effects remains challenging and incomplete. This review explores the existing literature on these dynamic changes and their potential impact on TF-DNA interactions.
Collapse
Affiliation(s)
- Yumi Minyi Yao
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Irina Miodownik
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Michael P O'Hagan
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Muhammad Jbara
- School of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Ariel Afek
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
5
|
Rogoulenko E, Levy Y. Skipping events impose repeated binding attempts: profound kinetic implications of protein-DNA conformational changes. Nucleic Acids Res 2024; 52:6763-6776. [PMID: 38721783 PMCID: PMC11229352 DOI: 10.1093/nar/gkae333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/09/2024] [Accepted: 04/16/2024] [Indexed: 07/09/2024] Open
Abstract
The kinetics of protein-DNA recognition, along with its thermodynamic properties, including affinity and specificity, play a central role in shaping biological function. Protein-DNA recognition kinetics are characterized by two key elements: the time taken to locate the target site amid various nonspecific alternatives; and the kinetics involved in the recognition process, which may necessitate overcoming an energetic barrier. In this study, we developed a coarse-grained (CG) model to investigate interactions between a transcription factor called the sex-determining region Y (SRY) protein and DNA, in order to probe how DNA conformational changes affect SRY-DNA recognition and binding kinetics. We find that, not only does a requirement for such a conformational DNA transition correspond to a higher energetic barrier for binding and therefore slower kinetics, it may further impede the recognition kinetics by increasing unsuccessful binding events (skipping events) where the protein partially binds its DNA target site but fails to form the specific protein-DNA complex. Such skipping events impose the need for additional cycles protein search of nonspecific DNA sites, thus significantly extending the overall recognition time. Our results highlight a trade-off between the speed with which the protein scans nonspecific DNA and the rate at which the protein recognizes its specific target site. Finally, we examine molecular approaches potentially adopted by natural systems to enhance protein-DNA recognition despite its intrinsically slow kinetics.
Collapse
Affiliation(s)
- Elena Rogoulenko
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Yaakov Levy
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
6
|
Li J, Rohs R. Deep DNAshape webserver: prediction and real-time visualization of DNA shape considering extended k-mers. Nucleic Acids Res 2024; 52:W7-W12. [PMID: 38801070 PMCID: PMC11223853 DOI: 10.1093/nar/gkae433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 04/30/2024] [Accepted: 05/08/2024] [Indexed: 05/29/2024] Open
Abstract
Sequence-dependent DNA shape plays an important role in understanding protein-DNA binding mechanisms. High-throughput prediction of DNA shape features has become a valuable tool in the field of protein-DNA recognition, transcription factor-DNA binding specificity, and gene regulation. However, our widely used webserver, DNAshape, relies on statistically summarized pentamer query tables to query DNA shape features. These query tables do not consider flanking regions longer than two base pairs, and acquiring a query table for hexamers or higher-order k-mers is currently still unrealistic due to limitations in achieving sufficient statistical coverage in molecular simulations or structural biology experiments. A recent deep-learning method, Deep DNAshape, can predict DNA shape features at the core of a DNA fragment considering flanking regions of up to seven base pairs, trained on limited simulation data. However, Deep DNAshape is rather complicated to install, and it must run locally compared to the pentamer-based DNAshape webserver, creating a barrier for users. Here, we present the Deep DNAshape webserver, which has the benefits of both methods while being accurate, fast, and accessible to all users. Additional improvements of the webserver include the detection of user input in real time, the ability of interactive visualization tools and different modes of analyses. URL: https://deepdnashape.usc.edu.
Collapse
Affiliation(s)
- Jinsen Li
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089, USA
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA 90089, USA
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
7
|
Roldán-Piñero C, Luengo-Márquez J, Assenza S, Pérez R. Systematic Comparison of Atomistic Force Fields for the Mechanical Properties of Double-Stranded DNA. J Chem Theory Comput 2024; 20:2261-2272. [PMID: 38411091 PMCID: PMC10938644 DOI: 10.1021/acs.jctc.3c01089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 02/14/2024] [Accepted: 02/14/2024] [Indexed: 02/28/2024]
Abstract
The response of double-stranded DNA to external mechanical stress plays a central role in its interactions with the protein machinery in the cell. Modern atomistic force fields have been shown to provide highly accurate predictions for the fine structural features of the duplex. In contrast, and despite their pivotal function, less attention has been devoted to the accuracy of the prediction of the elastic parameters. Several reports have addressed the flexibility of double-stranded DNA via all-atom molecular dynamics, yet the collected information is insufficient to have a clear understanding of the relative performance of the various force fields. In this work, we fill this gap by performing a systematic study in which several systems, characterized by different sequence contexts, are simulated with the most popular force fields within the AMBER family, bcs1 and OL15, as well as with CHARMM36. Analysis of our results, together with their comparison with previous work focused on bsc0, allows us to unveil the differences in the predicted rigidity between the newest force fields and suggests a roadmap to test their performance against experiments. In the case of the stretch modulus, we reconcile these differences, showing that a single mapping between sequence-dependent conformation and elasticity via the crookedness parameter captures simultaneously the results of all force fields, supporting the key role of crookedness in the mechanical response of double-stranded DNA.
Collapse
Affiliation(s)
- Carlos Roldán-Piñero
- Departamento
de Física Teórica de la Materia Condensada, Universidad Autónoma de Madrid, E-28049 Madrid, Spain
| | - Juan Luengo-Márquez
- Departamento
de Física Teórica de la Materia Condensada, Universidad Autónoma de Madrid, E-28049 Madrid, Spain
- Instituto
Nicolás Cabrera, Universidad Autónoma
de Madrid, E-28049 Madrid, Spain
| | - Salvatore Assenza
- Departamento
de Física Teórica de la Materia Condensada, Universidad Autónoma de Madrid, E-28049 Madrid, Spain
- Instituto
Nicolás Cabrera, Universidad Autónoma
de Madrid, E-28049 Madrid, Spain
- Condensed
Matter Physics Center (IFIMAC), Universidad
Autónoma de Madrid, E-28049 Madrid, Spain
| | - Rubén Pérez
- Departamento
de Física Teórica de la Materia Condensada, Universidad Autónoma de Madrid, E-28049 Madrid, Spain
- Condensed
Matter Physics Center (IFIMAC), Universidad
Autónoma de Madrid, E-28049 Madrid, Spain
| |
Collapse
|
8
|
Li J, Chiu TP, Rohs R. Predicting DNA structure using a deep learning method. Nat Commun 2024; 15:1243. [PMID: 38336958 PMCID: PMC10858265 DOI: 10.1038/s41467-024-45191-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 01/17/2024] [Indexed: 02/12/2024] Open
Abstract
Understanding the mechanisms of protein-DNA binding is critical in comprehending gene regulation. Three-dimensional DNA structure, also described as DNA shape, plays a key role in these mechanisms. In this study, we present a deep learning-based method, Deep DNAshape, that fundamentally changes the current k-mer based high-throughput prediction of DNA shape features by accurately accounting for the influence of extended flanking regions, without the need for extensive molecular simulations or structural biology experiments. By using the Deep DNAshape method, DNA structural features can be predicted for any length and number of DNA sequences in a high-throughput manner, providing an understanding of the effects of flanking regions on DNA structure in a target region of a sequence. The Deep DNAshape method provides access to the influence of distant flanking regions on a region of interest. Our findings reveal that DNA shape readout mechanisms of a core target are quantitatively affected by flanking regions, including extended flanking regions, providing valuable insights into the detailed structural readout mechanisms of protein-DNA binding. Furthermore, when incorporated in machine learning models, the features generated by Deep DNAshape improve the model prediction accuracy. Collectively, Deep DNAshape can serve as versatile and powerful tool for diverse DNA structure-related studies.
Collapse
Affiliation(s)
- Jinsen Li
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA
| | - Tsu-Pei Chiu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA.
- Department of Chemistry, University of Southern California, Los Angeles, CA, 90089, USA.
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA, 90089, USA.
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, USA.
| |
Collapse
|
9
|
Vernon TN, Terrell JR, Albrecht AV, Germann MW, Wilson WD, Poon GMK. Dissection of integrated readout reveals the structural thermodynamics of DNA selection by transcription factors. Structure 2024; 32:83-96.e4. [PMID: 38042148 DOI: 10.1016/j.str.2023.11.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 10/12/2023] [Accepted: 11/07/2023] [Indexed: 12/04/2023]
Abstract
Nucleobases such as inosine have been extensively utilized to map direct contacts by proteins in the DNA groove. Their deployment as targeted probes of dynamics and hydration, which are dominant thermodynamic drivers of affinity and specificity, has been limited by a paucity of suitable experimental models. We report a joint crystallographic, thermodynamic, and computational study of the bidentate complex of the arginine side chain with a Watson-Crick guanine (Arg×GC), a highly specific configuration adopted by major transcription factors throughout the eukaryotic branches in the Tree of Life. Using the ETS-family factor PU.1 as a high-resolution structural framework, inosine substitution for guanine resulted in a sharp dissection of conformational dynamics and hydration and elucidated their role in the DNA specificity of PU.1. Our work suggests an under-exploited utility of modified nucleobases in untangling the structural thermodynamics of interactions, such as the Arg×GC motif, where direct and indirect readout are tightly integrated.
Collapse
Affiliation(s)
- Tyler N Vernon
- Department of Chemistry, Georgia State University, Atlanta, GA 30302, USA
| | - J Ross Terrell
- Department of Chemistry, Georgia State University, Atlanta, GA 30302, USA
| | - Amanda V Albrecht
- Department of Chemistry, Georgia State University, Atlanta, GA 30302, USA
| | - Markus W Germann
- Department of Chemistry, Georgia State University, Atlanta, GA 30302, USA; Department of Biology, Georgia State University, Atlanta, GA 30302, USA.
| | - W David Wilson
- Department of Chemistry, Georgia State University, Atlanta, GA 30302, USA; Center for Diagnostics and Therapeutics, Georgia State University, Atlanta, GA 30302, USA.
| | - Gregory M K Poon
- Department of Chemistry, Georgia State University, Atlanta, GA 30302, USA; Center for Diagnostics and Therapeutics, Georgia State University, Atlanta, GA 30302, USA.
| |
Collapse
|
10
|
Mitra R, Li J, Sagendorf JM, Jiang Y, Chiu TP, Rohs R. DeepPBS: Geometric deep learning for interpretable prediction of protein-DNA binding specificity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.15.571942. [PMID: 38293168 PMCID: PMC10827229 DOI: 10.1101/2023.12.15.571942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Predicting specificity in protein-DNA interactions is a challenging yet essential task for understanding gene regulation. Here, we present Deep Predictor of Binding Specificity (DeepPBS), a geometric deep-learning model designed to predict binding specificity across protein families based on protein-DNA structures. The DeepPBS architecture allows investigation of different family-specific recognition patterns. DeepPBS can be applied to predicted structures, and can aid in the modeling of protein-DNA complexes. DeepPBS is interpretable and can be used to calculate protein heavy atom-level importance scores, demonstrated as a case-study on p53-DNA interface. When aggregated at the protein residue level, these scores conform well with alanine scanning mutagenesis experimental data. The inference time for DeepPBS is sufficiently fast for analyzing simulation trajectories, as demonstrated on a molecular-dynamics simulation of a Drosophila Hox-DNA tertiary complex with its cofactor. DeepPBS and its corresponding data resources offer a foundation for machine-aided protein-DNA interaction studies, guiding experimental choices and complex design, as well as advancing our understanding of molecular interactions.
Collapse
|