1
|
He S, Taher NM, Simard AR, Hvorecny KL, Ragusa MJ, Bahl CD, Hickman AB, Dyda F, Madden DR. Molecular basis for the transcriptional regulation of an epoxide-based virulence circuit in Pseudomonas aeruginosa. Nucleic Acids Res 2024; 52:12727-12747. [PMID: 39413156 DOI: 10.1093/nar/gkae889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 08/30/2024] [Accepted: 10/03/2024] [Indexed: 10/18/2024] Open
Abstract
The opportunistic pathogen Pseudomonas aeruginosa infects the airways of people with cystic fibrosis (CF) and produces a virulence factor Cif that is associated with worse outcomes. Cif is an epoxide hydrolase that reduces cell-surface abundance of the cystic fibrosis transmembrane conductance regulator (CFTR) and sabotages pro-resolving signals. Its expression is regulated by a divergently transcribed TetR family transcriptional repressor. CifR represents the first reported epoxide-sensing bacterial transcriptional regulator, but neither its interaction with cognate operator sequences nor the mechanism of activation has been investigated. Using biochemical and structural approaches, we uncovered the molecular mechanisms controlling this complex virulence operon. We present here the first molecular structures of CifR alone and in complex with operator DNA, resolved in a single crystal lattice. Significant conformational changes between these two structures suggest how CifR regulates the expression of the virulence gene cif. Interactions between the N-terminal extension of CifR with the DNA minor groove of the operator play a significant role in the operator recognition of CifR. We also determined that cysteine residue Cys107 is critical for epoxide sensing and DNA release. These results offer new insights into the stereochemical regulation of an epoxide-based virulence circuit in a critically important clinical pathogen.
Collapse
Affiliation(s)
- Susu He
- Department of Biochemistry and Cell Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Noor M Taher
- Department of Biochemistry and Cell Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Adam R Simard
- Department of Biochemistry and Cell Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Kelli L Hvorecny
- Department of Biochemistry and Cell Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Michael J Ragusa
- Department of Biochemistry and Cell Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
- Department of Chemistry, Dartmouth, Hanover, NH 03755, USA
| | - Christopher D Bahl
- Department of Biochemistry and Cell Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Alison B Hickman
- Laboratory of Molecular Biology, NIDDK, National Institutes of Health, Bethesda, MD 20892, USA
| | - Fred Dyda
- Laboratory of Molecular Biology, NIDDK, National Institutes of Health, Bethesda, MD 20892, USA
| | - Dean R Madden
- Department of Biochemistry and Cell Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
- Department of Chemistry, Dartmouth, Hanover, NH 03755, USA
| |
Collapse
|
2
|
Mukherjee S, Moafinejad SN, Badepally NG, Merdas K, Bujnicki JM. Advances in the field of RNA 3D structure prediction and modeling, with purely theoretical approaches, and with the use of experimental data. Structure 2024; 32:1860-1876. [PMID: 39321802 DOI: 10.1016/j.str.2024.08.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 08/08/2024] [Accepted: 08/22/2024] [Indexed: 09/27/2024]
Abstract
Recent advancements in RNA three-dimensional (3D) structure prediction have provided significant insights into RNA biology, highlighting the essential role of RNA in cellular functions and its therapeutic potential. This review summarizes the latest developments in computational methods, particularly the incorporation of artificial intelligence and machine learning, which have improved the efficiency and accuracy of RNA structure predictions. We also discuss the integration of new experimental data types, including cryoelectron microscopy (cryo-EM) techniques and high-throughput sequencing, which have transformed RNA structure modeling. The combination of experimental advances with computational methods represents a significant leap in RNA structure determination. We review the outcomes of RNA-Puzzles and critical assessment of structure prediction (CASP) challenges, which assess the state of the field and limitations of existing methods. Future perspectives are discussed, focusing on the impact of RNA 3D structure prediction on understanding RNA mechanisms and its implications for drug discovery and RNA-targeted therapies, opening new avenues in molecular biology.
Collapse
Affiliation(s)
- Sunandan Mukherjee
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland
| | - S Naeim Moafinejad
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland
| | - Nagendar Goud Badepally
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland
| | - Katarzyna Merdas
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland
| | - Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland.
| |
Collapse
|
3
|
Wong F, He D, Krishnan A, Hong L, Wang AZ, Wang J, Hu Z, Omori S, Li A, Rao J, Yu Q, Jin W, Zhang T, Ilia K, Chen JX, Zheng S, King I, Li Y, Collins JJ. Deep generative design of RNA aptamers using structural predictions. NATURE COMPUTATIONAL SCIENCE 2024:10.1038/s43588-024-00720-6. [PMID: 39506080 DOI: 10.1038/s43588-024-00720-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Accepted: 10/07/2024] [Indexed: 11/08/2024]
Abstract
RNAs represent a class of programmable biomolecules capable of performing diverse biological functions. Recent studies have developed accurate RNA three-dimensional structure prediction methods, which may enable new RNAs to be designed in a structure-guided manner. Here, we develop a structure-to-sequence deep learning platform for the de novo generative design of RNA aptamers. We show that our approach can design RNA aptamers that are predicted to be structurally similar, yet sequence dissimilar, to known light-up aptamers that fluoresce in the presence of small molecules. We experimentally validate several generated RNA aptamers to have fluorescent activity, show that these aptamers can be optimized for activity in silico, and find that they exhibit a mechanism of fluorescence similar to that of known light-up aptamers. Our results demonstrate how structural predictions can guide the targeted and resource-efficient design of new RNA sequences.
Collapse
Affiliation(s)
- Felix Wong
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Medical Engineering & Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Integrated Biosciences, Redwood City, CA, USA
| | - Dongchen He
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Aarti Krishnan
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Medical Engineering & Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Liang Hong
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Alexander Z Wang
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Medical Engineering & Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jiuming Wang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Zhihang Hu
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Satotaka Omori
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Integrated Biosciences, Redwood City, CA, USA
| | - Alicia Li
- Integrated Biosciences, Redwood City, CA, USA
| | - Jiahua Rao
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Qinze Yu
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Wengong Jin
- Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Tianqing Zhang
- Institute for Medical Engineering & Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Katherine Ilia
- Institute for Medical Engineering & Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jack X Chen
- Institute for Medical Engineering & Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Shuangjia Zheng
- Global Institute of Future Technology, Shanghai Jiao Tong University, Shanghai, China
| | - Irwin King
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Yu Li
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Institute for Medical Engineering & Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China.
- The CUHK Shenzhen Research Institute, Shenzhen, China.
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA.
| | - James J Collins
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Institute for Medical Engineering & Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA.
| |
Collapse
|
4
|
Mitra R, Cohen AS, Sagendorf JM, Berman HM, Rohs R. DNAproDB: an updated database for the automated and interactive analysis of protein-DNA complexes. Nucleic Acids Res 2024:gkae970. [PMID: 39494533 DOI: 10.1093/nar/gkae970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2024] [Revised: 10/07/2024] [Accepted: 10/11/2024] [Indexed: 11/05/2024] Open
Abstract
DNAproDB (https://dnaprodb.usc.edu/) is a database, visualization tool, and processing pipeline for analyzing structural features of protein-DNA interactions. Here, we present a substantially updated version of the database through additional structural annotations, search, and user interface functionalities. The update expands the number of pre-analyzed protein-DNA structures, which are automatically updated weekly. The analysis pipeline identifies water-mediated hydrogen bonds that are incorporated into the visualizations of protein-DNA complexes. Tertiary structure-aware nucleotide layouts are now available. New file formats and external database annotations are supported. The website has been redesigned, and interacting with graphs and data is more intuitive. We also present a statistical analysis on the updated collection of structures revealing salient patterns in protein-DNA interactions.
Collapse
Affiliation(s)
- Raktim Mitra
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Ari S Cohen
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Jared M Sagendorf
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Helen M Berman
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089, USA
- Department of Physics & Astronomy, University of Southern California, Los Angeles, CA 90089, USA
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
5
|
Tong Y, Childs-Disney JL, Disney MD. Targeting RNA with small molecules, from RNA structures to precision medicines: IUPHAR review: 40. Br J Pharmacol 2024; 181:4152-4173. [PMID: 39224931 DOI: 10.1111/bph.17308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 06/10/2024] [Accepted: 07/09/2024] [Indexed: 09/04/2024] Open
Abstract
RNA plays important roles in regulating both health and disease biology in all kingdoms of life. Notably, RNA can form intricate three-dimensional structures, and their biological functions are dependent on these structures. Targeting the structured regions of RNA with small molecules has gained increasing attention over the past decade, because it provides both chemical probes to study fundamental biology processes and lead medicines for diseases with unmet medical needs. Recent advances in RNA structure prediction and determination and RNA biology have accelerated the rational design and development of RNA-targeted small molecules to modulate disease pathology. However, challenges remain in advancing RNA-targeted small molecules towards clinical applications. This review summarizes strategies to study RNA structures, to identify small molecules recognizing these structures, and to augment the functionality of RNA-binding small molecules. We focus on recent advances in developing RNA-targeted small molecules as potential therapeutics in a variety of diseases, encompassing different modes of actions and targeting strategies. Furthermore, we present the current gaps between early-stage discovery of RNA-binding small molecules and their clinical applications, as well as a roadmap to overcome these challenges in the near future.
Collapse
Affiliation(s)
- Yuquan Tong
- Department of Chemistry, The Scripps Research Institute, Jupiter, Florida, USA
- Department of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation & Technology, Jupiter, Florida, USA
| | - Jessica L Childs-Disney
- Department of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation & Technology, Jupiter, Florida, USA
| | - Matthew D Disney
- Department of Chemistry, The Scripps Research Institute, Jupiter, Florida, USA
- Department of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation & Technology, Jupiter, Florida, USA
| |
Collapse
|
6
|
Shin J, Meinke G, Bohm AA, Bullock PA. A model for polyomavirus helicase activity derived in part from the AlphaFold2 structure of SV40 T-antigen. J Virol 2024; 98:e0111924. [PMID: 39311578 PMCID: PMC11494911 DOI: 10.1128/jvi.01119-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Accepted: 08/11/2024] [Indexed: 09/27/2024] Open
Abstract
The mechanism used by polyomavirus and other viral SF3 helicases to unwind DNA at replication forks remains unknown. Using AlphaFold2, we have determined the structure of a representative SF3 helicase, the SV40 T-antigen (T-ag). This model has been analyzed in terms of the features of T-ag required for helicase activity, particularly the proximity of the T-ag origin binding domain (OBD) to the replication fork and the distribution of basic residues on the surface of the OBD that are known to play roles in DNA unwinding. These and related studies provide additional evidence that the T-ag OBDs have a role in the unwinding of DNA at the replication fork. Nuclear magnetic resonance and modeling experiments also indicate that protonated histidines on the surface of the T-ag OBD play an important role in the unwinding process, and additional modeling studies indicate that protonated histidines are essential in other SF3 and SF6 helicases. Finally, a model for T-ag's helicase activity is presented, which is a variant of the "rope climber." According to this model, the hands are the N-terminal OBD domains that interact with the replication fork, while the C-terminal helicase domains contain the feet that bind to single-stranded DNA. IMPORTANCE Enzymes termed helicases are essential for the replication of DNA tumor viruses. Unfortunately, much remains to be determined about this class of enzymes, including their structures and the mechanism(s) they employ to unwind DNA. Herein, we present the full-length structure of a model helicase encoded by a DNA tumor virus. Moreover, this AI-based structure has been analyzed in terms of its basic functional properties, such as the orientation of the helicase at replication forks and the relative locations of the amino acid residues that are critical for helicase activity. Obtaining this information is important because it permits proposals regarding how DNA is routed through these model helicases. Also presented is structural evidence that the conclusions drawn from our detailed analyses of one model helicase, encoded by one class of tumor viruses, are likely to apply to other viral and eukaryotic helicases.
Collapse
Affiliation(s)
- Jong Shin
- Department of Molecular Metabolism, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Gretchen Meinke
- Department of Developmental, Molecular and Chemical Biology, Tufts University School of Medicine, Boston, Massachusetts, USA
| | - Alex A. Bohm
- Department of Developmental, Molecular and Chemical Biology, Tufts University School of Medicine, Boston, Massachusetts, USA
| | - Peter A. Bullock
- Department of Developmental, Molecular and Chemical Biology, Tufts University School of Medicine, Boston, Massachusetts, USA
| |
Collapse
|
7
|
Sasse A, Ray D, Laverty KU, Tam CL, Albu M, Zheng H, Lyudovyk O, Dalal T, Nie K, Magis C, Notredame C, Weirauch MT, Hughes TR, Morris Q. Reconstructing the sequence specificities of RNA-binding proteins across eukaryotes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.15.618476. [PMID: 39464061 PMCID: PMC11507768 DOI: 10.1101/2024.10.15.618476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
RNA-binding proteins (RBPs) are key regulators of gene expression. Here, we introduce EuPRI (Eukaryotic Protein-RNA Interactions) - a freely available resource of RNA motifs for 34,736 RBPs from 690 eukaryotes. EuPRI includes in vitro binding data for 504 RBPs, including newly collected RNAcompete data for 174 RBPs, along with thousands of reconstructed motifs. We reconstruct these motifs with a new computational platform - Joint Protein-Ligand Embedding (JPLE) - which can detect distant homology relationships and map specificity-determining peptides. EuPRI quadruples the number of known RBP motifs, expanding the motif repertoire across all major eukaryotic clades, and assigning motifs to the majority of human RBPs. EuPRI drastically improves knowledge of RBP motifs in flowering plants. For example, it increases the number of Arabidopsis thaliana RBP motifs 7-fold, from 14 to 105. EuPRI also has broad utility for inferring post-transcriptional function and evolutionary relationships. We demonstrate this by predicting a role for 12 Arabidopsis thaliana RBPs in RNA stability and identifying rapid and recent evolution of post-transcriptional regulatory networks in worms and plants. In contrast, the vertebrate RNA motif set has remained relatively stable after its drastic expansion between the metazoan and vertebrate ancestors. EuPRI represents a powerful resource for the study of gene regulation across eukaryotes.
Collapse
Affiliation(s)
- Alexander Sasse
- Department of Molecular Genetics, University of Toronto, Toronto, ON Canada
- Donnelly Centre, University of Toronto, Toronto, ON Canada
- Department of Computer Science, University of Washington, Seattle, WA, USA
- Vector Institute, Toronto, ON Canada
| | - Debashish Ray
- Donnelly Centre, University of Toronto, Toronto, ON Canada
| | - Kaitlin U Laverty
- Department of Molecular Genetics, University of Toronto, Toronto, ON Canada
- Donnelly Centre, University of Toronto, Toronto, ON Canada
- Vector Institute, Toronto, ON Canada
- Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Cyrus L Tam
- Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Graduate Program in Computational Biology and Medicine, Weill-Cornell Graduate School, New York, NY, USA
| | - Mihai Albu
- Donnelly Centre, University of Toronto, Toronto, ON Canada
| | - Hong Zheng
- Donnelly Centre, University of Toronto, Toronto, ON Canada
| | - Olga Lyudovyk
- Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Graduate Program in Computational Biology and Medicine, Weill-Cornell Graduate School, New York, NY, USA
| | - Taykhoom Dalal
- Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Graduate Program in Computational Biology and Medicine, Weill-Cornell Graduate School, New York, NY, USA
| | - Kate Nie
- Department of Molecular Genetics, University of Toronto, Toronto, ON Canada
- Donnelly Centre, University of Toronto, Toronto, ON Canada
- Vector Institute, Toronto, ON Canada
| | - Cedrik Magis
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Cedric Notredame
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Matthew T Weirauch
- Center for Autoimmune Genomics and Etiology, Divisions of Allergy & Immunology, Human Genetics, Biomedical Informatics and Developmental Biology, Cincinnati Children's Hospital, Cincinnati, OH, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Timothy R Hughes
- Department of Molecular Genetics, University of Toronto, Toronto, ON Canada
- Donnelly Centre, University of Toronto, Toronto, ON Canada
| | - Quaid Morris
- Department of Molecular Genetics, University of Toronto, Toronto, ON Canada
- Donnelly Centre, University of Toronto, Toronto, ON Canada
- Vector Institute, Toronto, ON Canada
- Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Graduate Program in Computational Biology and Medicine, Weill-Cornell Graduate School, New York, NY, USA
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| |
Collapse
|
8
|
Norton T, Bhattacharya D. Sifting through the noise: A survey of diffusion probabilistic models and their applications to biomolecules. J Mol Biol 2024:168818. [PMID: 39389290 DOI: 10.1016/j.jmb.2024.168818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 09/20/2024] [Accepted: 10/03/2024] [Indexed: 10/12/2024]
Abstract
Diffusion probabilistic models have made their way into a number of high-profile applications since their inception. In particular, there has been a wave of research into using diffusion models in the prediction and design of biomolecular structures and sequences. Their growing ubiquity makes it imperative for researchers in these fields to understand them. This paper serves as a general overview for the theory behind these models and the current state of research. We first introduce diffusion models and discuss common motifs used when applying them to biomolecules. We then present the significant outcomes achieved through the application of these models in generative and predictive tasks. This survey aims to provide readers with a comprehensive understanding of the increasingly critical role of diffusion models.
Collapse
Affiliation(s)
- Trevor Norton
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | | |
Collapse
|
9
|
Wang J. Deep Learning in Hematology: From Molecules to Patients. Clin Hematol Int 2024; 6:19-42. [PMID: 39417017 PMCID: PMC11477942 DOI: 10.46989/001c.124131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Accepted: 06/29/2024] [Indexed: 10/19/2024] Open
Abstract
Deep learning (DL), a subfield of machine learning, has made remarkable strides across various aspects of medicine. This review examines DL's applications in hematology, spanning from molecular insights to patient care. The review begins by providing a straightforward introduction to the basics of DL tailored for those without prior knowledge, touching on essential concepts, principal architectures, and prevalent training methods. It then discusses the applications of DL in hematology, concentrating on elucidating the models' architecture, their applications, performance metrics, and inherent limitations. For example, at the molecular level, DL has improved the analysis of multi-omics data and protein structure prediction. For cells and tissues, DL enables the automation of cytomorphology analysis, interpretation of flow cytometry data, and diagnosis from whole slide images. At the patient level, DL's utility extends to analyzing curated clinical data, electronic health records, and clinical notes through large language models. While DL has shown promising results in various hematology applications, challenges remain in model generalizability and explainability. Moreover, the integration of novel DL architectures into hematology has been relatively slow in comparison to that in other medical fields.
Collapse
Affiliation(s)
- Jiasheng Wang
- Division of Hematology, Department of MedicineThe Ohio State University Comprehensive Cancer Center
| |
Collapse
|
10
|
Joshi CK, Jamasb AR, Viñas R, Harris C, Mathis S, Morehead A, Anand R, Liò P. gRNAde: Geometric Deep Learning for 3D RNA inverse design. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.31.587283. [PMID: 38826198 PMCID: PMC11142113 DOI: 10.1101/2024.03.31.587283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Computational RNA design tasks are often posed as inverse problems, where sequences are designed based on adopting a single desired secondary structure without considering 3D geometry and conformational diversity. We introduce gRNAde, a geometric RNA design pipeline operating on 3D RNA backbones to design sequences that explicitly account for structure and dynamics. gRNAde uses a multi-state Graph Neural Network and autoregressive decoding to generates candidate RNA sequences conditioned on one or more 3D backbone structures where the identities of the bases are unknown. On a single-state fixed backbone re-design benchmark of 14 RNA structures from the PDB identified by Das et al. (2010), gRNAde obtains higher native sequence recovery rates (56% on average) compared to Rosetta (45% on average), taking under a second to produce designs compared to the reported hours for Rosetta. We further demonstrate the utility of gRNAde on a new benchmark of multi-state design for structurally flexible RNAs, as well as zero-shot ranking of mutational fitness landscapes in a retrospective analysis of a recent ribozyme. Open source code: github.com/chaitjo/geometric-rna-design.
Collapse
Affiliation(s)
| | - Arian R Jamasb
- University of Cambridge, UK
- Prescient Design, Genentech, Roche
| | | | | | | | | | | | | |
Collapse
|
11
|
Zeng C, Zhuo C, Gao J, Liu H, Zhao Y. Advances and Challenges in Scoring Functions for RNA-Protein Complex Structure Prediction. Biomolecules 2024; 14:1245. [PMID: 39456178 PMCID: PMC11506084 DOI: 10.3390/biom14101245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Revised: 09/24/2024] [Accepted: 09/30/2024] [Indexed: 10/28/2024] Open
Abstract
RNA-protein complexes play a crucial role in cellular functions, providing insights into cellular mechanisms and potential therapeutic targets. However, experimental determination of these complex structures is often time-consuming and resource-intensive, and it rarely yields high-resolution data. Many computational approaches have been developed to predict RNA-protein complex structures in recent years. Despite these advances, achieving accurate and high-resolution predictions remains a formidable challenge, primarily due to the limitations inherent in current RNA-protein scoring functions. These scoring functions are critical tools for evaluating and interpreting RNA-protein interactions. This review comprehensively explores the latest advancements in scoring functions for RNA-protein docking, delving into the fundamental principles underlying various approaches, including coarse-grained knowledge-based, all-atom knowledge-based, and machine-learning-based methods. We critically evaluate the strengths and limitations of existing scoring functions, providing a detailed performance assessment. Considering the significant progress demonstrated by machine learning techniques, we discuss emerging trends and propose future research directions to enhance the accuracy and efficiency of scoring functions in RNA-protein complex prediction. We aim to inspire the development of more sophisticated and reliable computational tools in this rapidly evolving field.
Collapse
Affiliation(s)
| | | | | | | | - Yunjie Zhao
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan 430079, China; (C.Z.); (C.Z.); (J.G.); (H.L.)
| |
Collapse
|
12
|
Taly A, Verger A. [Prediction of complex biomolecular structures by AlphaFold 3]. Med Sci (Paris) 2024; 40:725-727. [PMID: 39450957 DOI: 10.1051/medsci/2024124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2024] Open
Affiliation(s)
- Antoine Taly
- Laboratoire de biochimie théorique, UPR 9080 CNRS, Université Paris Cité Paris France
| | - Alexis Verger
- CNRS EMR 9002 Biologie structurale intégrative, Inserm U1167 - Facteurs de risques et déterminants moléculaires des maladies liées au vieillissement (RID-AGE), Univ. Lille, Centre hospitalo-universitaire de Lille, Institut Pasteur de Lille Lille France
| |
Collapse
|
13
|
Jiang H, Xu Y, Tong Y, Zhang D, Zhou R. IsRNAcirc: 3D structure prediction of circular RNAs based on coarse-grained molecular dynamics simulation. PLoS Comput Biol 2024; 20:e1012293. [PMID: 39466881 PMCID: PMC11542809 DOI: 10.1371/journal.pcbi.1012293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 11/07/2024] [Accepted: 10/12/2024] [Indexed: 10/30/2024] Open
Abstract
As an emerging class of RNA molecules, circular RNAs play pivotal roles in various biological processes, thereby determining their three-dimensional (3D) structure is crucial for a deep understanding of their biological significances. Similar to linear RNAs, the development of computational methods for circular RNA 3D structure prediction is challenging, especially considering the inherent flexibility and potentially long length of circular RNAs. Here, we introduce an extension of our previous IsRNA2 model, named IsRNAcirc, to enable circular RNA 3D structure predictions through coarse-grained molecular dynamics simulations. The workflow of IsRNAcirc consists of four main steps, including input preparation, end closure, structure prediction, and model refinement. Our results demonstrate that IsRNAcirc can provide reasonable 3D structure predictions for circular RNAs, which significantly reduce the locally irrational elements contained in the initial input. Moreover, for a validation test set comprising 34 circular RNAs, our IsRNAcirc can generate 3D models with better scores than the template-based 3dRNA method. These findings demonstrate that our IsRNAcirc method is a promising tool to explore the structural details along with intricate interactions of circular RNAs.
Collapse
Affiliation(s)
- Haolin Jiang
- College of Life Sciences and Institute of Quantitative Biology, Zhejiang University, Hangzhou, Zhejiang, China
| | - Yulian Xu
- College of Life Sciences, China Jiliang University, Hangzhou, China
- China Jiliang University—Aoming (Hangzhou) Biomedical Co., Ltd. Joint Laboratory, Hangzhou, China
| | - Yunguang Tong
- College of Life Sciences, China Jiliang University, Hangzhou, China
- Aoming (Hangzhou) Biomedical Co., Ltd., Hangzhou, China
| | - Dong Zhang
- College of Life Sciences and Institute of Quantitative Biology, Zhejiang University, Hangzhou, Zhejiang, China
| | - Ruhong Zhou
- College of Life Sciences and Institute of Quantitative Biology, Zhejiang University, Hangzhou, Zhejiang, China
- The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| |
Collapse
|
14
|
Bonilla SL, Jang K. Challenges, advances, and opportunities in RNA structural biology by Cryo-EM. Curr Opin Struct Biol 2024; 88:102894. [PMID: 39121532 DOI: 10.1016/j.sbi.2024.102894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 07/03/2024] [Accepted: 07/15/2024] [Indexed: 08/12/2024]
Abstract
RNAs are remarkably versatile molecules that can fold into intricate three-dimensional (3D) structures to perform diverse cellular and viral functions. Despite their biological importance, relatively few RNA 3D structures have been solved, and our understanding of RNA structure-function relationships remains in its infancy. This limitation partly arises from challenges posed by RNA's complex conformational landscape, characterized by structural flexibility, formation of multiple states, and a propensity to misfold. Recently, cryo-electron microscopy (cryo-EM) has emerged as a powerful tool for the visualization of conformationally dynamic RNA-only 3D structures. However, RNA's characteristics continue to pose challenges. We discuss experimental methods developed to overcome these hurdles, including the engineering of modular modifications that facilitate the visualization of small RNAs, improve particle alignment, and validate structural models.
Collapse
Affiliation(s)
- Steve L Bonilla
- Laboratory of RNA Structural Biology and Biophysics, The Rockefeller University, New York, NY, 10065, USA.
| | - Karen Jang
- Laboratory of RNA Structural Biology and Biophysics, The Rockefeller University, New York, NY, 10065, USA
| |
Collapse
|
15
|
Cao X, Zhang Y, Ding Y, Wan Y. Identification of RNA structures and their roles in RNA functions. Nat Rev Mol Cell Biol 2024; 25:784-801. [PMID: 38926530 DOI: 10.1038/s41580-024-00748-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/28/2024] [Indexed: 06/28/2024]
Abstract
The development of high-throughput RNA structure profiling methods in the past decade has greatly facilitated our ability to map and characterize different aspects of RNA structures transcriptome-wide in cell populations, single cells and single molecules. The resulting high-resolution data have provided insights into the static and dynamic nature of RNA structures, revealing their complexity as they perform their respective functions in the cell. In this Review, we discuss recent technical advances in the determination of RNA structures, and the roles of RNA structures in RNA biogenesis and functions, including in transcription, processing, translation, degradation, localization and RNA structure-dependent condensates. We also discuss the current understanding of how RNA structures could guide drug design for treating genetic diseases and battling pathogenic viruses, and highlight existing challenges and future directions in RNA structure research.
Collapse
Affiliation(s)
- Xinang Cao
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, Singapore, Singapore
| | - Yueying Zhang
- Department of Cell and Developmental Biology, John Innes Centre, Norwich, UK
| | - Yiliang Ding
- Department of Cell and Developmental Biology, John Innes Centre, Norwich, UK.
| | - Yue Wan
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, Singapore, Singapore.
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
16
|
Feiss M, Sippy JA. DNA Packaging Specificity in the λ-Like Phages: Gifsy-1. Mol Microbiol 2024; 122:491-503. [PMID: 39233649 DOI: 10.1111/mmi.15306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 08/07/2024] [Accepted: 08/09/2024] [Indexed: 09/06/2024]
Abstract
DNA viruses recognize viral DNA and package it into virions. Specific recognition is needed to distinguish viral DNA from host cell DNA. The λ-like Escherichia coli phages are interesting and good models to examine genome packaging by large DNA viruses. Gifsy-1 is a λ-like Salmonella phage. Gifsy-1's DNA packaging specificity was compared with those of closely related phages λ, 21, and N15. In vivo packaging studies showed that a Gifsy-1-specific phage packaged λ DNA at ca. 50% efficiency and λ packages Gifsy-1-specific DNA at ~30% efficiency. The results indicate that Gifsy-1 and λ share the same DNA packaging specificity. N15 is also shown to package Gifsy-1 DNA. Phage 21 fails to package λ, N15, and Gifsy-1-specific DNAs; the efficiencies are 0.01%, 0.01%, and 1%, respectively. A known incompatibility between the 21 helix-turn-helix motif and cosBλ is proposed to account for the inability of 21 to package Gifsy-1 DNA. A model is proposed to explain the 100-fold difference in packaging efficiency between λ and Gifsy-1-specific DNAs by phage 21. Database sequences of enteric prophages indicate that phages with Gifsy-1's DNA packaging determinants are confined to Salmonella species. Similarly, prophages with λ DNA packaging specificity are rarely found in Salmonella. It is proposed that λ and Gifsy-1 have diverged from a common ancestor phage, and that the differences may reflect adaptation of their packaging systems to host cell differences.
Collapse
Affiliation(s)
- Michael Feiss
- Department of Microbiology and Immunology, Carver College of Medicine, University of Iowa, Iowa City, Iowa, USA
| | - Jean Arens Sippy
- Department of Microbiology and Immunology, Carver College of Medicine, University of Iowa, Iowa City, Iowa, USA
| |
Collapse
|
17
|
Kilgas S, Syed A, Toolan-Kerr P, Swift ML, Roychoudhury S, Sarkar A, Wilkins S, Quigley M, Poetsch AR, Botuyan MV, Cui G, Mer G, Ule J, Drané P, Chowdhury D. NEAT1 modulates the TIRR/53BP1 complex to maintain genome integrity. Nat Commun 2024; 15:8438. [PMID: 39349456 PMCID: PMC11443056 DOI: 10.1038/s41467-024-52862-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Accepted: 09/20/2024] [Indexed: 10/02/2024] Open
Abstract
Tudor Interacting Repair Regulator (TIRR) is an RNA-binding protein (RBP) that interacts directly with 53BP1, restricting its access to DNA double-strand breaks (DSBs) and its association with p53. We utilized iCLIP to identify RNAs that directly bind to TIRR within cells, identifying the long non-coding RNA NEAT1 as the primary RNA partner. The high affinity of TIRR for NEAT1 is due to prevalent G-rich motifs in the short isoform (NEAT1_1) region of NEAT1. This interaction destabilizes the TIRR/53BP1 complex, promoting 53BP1's function. NEAT1_1 is enriched during the G1 phase of the cell cycle, thereby ensuring that TIRR-dependent inhibition of 53BP1's function is cell cycle-dependent. TDP-43, an RBP that is implicated in neurodegenerative diseases, modulates the TIRR/53BP1 complex by promoting the production of the NEAT1 short isoform, NEAT1_1. Together, we infer that NEAT1_1, and factors regulating NEAT1_1, may impact 53BP1-dependent DNA repair processes, with implications for a spectrum of diseases.
Collapse
Affiliation(s)
- Susan Kilgas
- Division of Radiation and Genome Stability, Department of Radiation Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Aleem Syed
- Division of Radiation and Genome Stability, Department of Radiation Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Patrick Toolan-Kerr
- The Francis Crick Institute, 1 Midland Road, London, UK
- UK Dementia Research Institute at King's College London, 5 Cutcombe Rd, London, UK
| | - Michelle L Swift
- Division of Radiation and Genome Stability, Department of Radiation Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Shrabasti Roychoudhury
- Division of Radiation and Genome Stability, Department of Radiation Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Aniruddha Sarkar
- Division of Radiation and Genome Stability, Department of Radiation Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Sarah Wilkins
- Division of Radiation and Genome Stability, Department of Radiation Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Yale School of Medicine, 333 Cedar St, New Haven, CT, USA
| | - Mikayla Quigley
- Division of Radiation and Genome Stability, Department of Radiation Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Boston Children's Hospital, 300 Longwood Ave, Boston, MA, USA
| | - Anna R Poetsch
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Tatzberg 47-49, Dresden, Germany
| | | | - Gaofeng Cui
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN, USA
| | - Georges Mer
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN, USA
| | - Jernej Ule
- The Francis Crick Institute, 1 Midland Road, London, UK
- UK Dementia Research Institute at King's College London, 5 Cutcombe Rd, London, UK
| | - Pascal Drané
- Division of Radiation and Genome Stability, Department of Radiation Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.
| | - Dipanjan Chowdhury
- Division of Radiation and Genome Stability, Department of Radiation Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
18
|
Stohr AM, Ma D, Chen W, Blenner M. Engineering conditional protein-protein interactions for dynamic cellular control. Biotechnol Adv 2024; 77:108457. [PMID: 39343083 DOI: 10.1016/j.biotechadv.2024.108457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Revised: 08/28/2024] [Accepted: 09/26/2024] [Indexed: 10/01/2024]
Abstract
Conditional protein-protein interactions enable dynamic regulation of cellular activity and are an attractive approach to probe native protein interactions, improve metabolic engineering of microbial factories, and develop smart therapeutics. Conditional protein-protein interactions have been engineered to respond to various chemical, light, and nucleic acid-based stimuli. These interactions have been applied to assemble protein fragments, build protein scaffolds, and spatially organize proteins in many microbial and higher-order hosts. To foster the development of novel conditional protein-protein interactions that respond to new inputs or can be utilized in alternative settings, we provide an overview of the process of designing new engineered protein interactions while showcasing many recently developed computational tools that may accelerate protein engineering in this space.
Collapse
Affiliation(s)
- Anthony M Stohr
- Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, DE 19716, USA
| | - Derron Ma
- Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, DE 19716, USA
| | - Wilfred Chen
- Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, DE 19716, USA.
| | - Mark Blenner
- Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, DE 19716, USA.
| |
Collapse
|
19
|
Dialpuri J, Agirre J, Cowtan K, Bond P. NucleoFind: a deep-learning network for interpreting nucleic acid electron density. Nucleic Acids Res 2024; 52:e84. [PMID: 39162213 PMCID: PMC11417358 DOI: 10.1093/nar/gkae715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 07/31/2024] [Accepted: 08/06/2024] [Indexed: 08/21/2024] Open
Abstract
Nucleic acid electron density interpretation after phasing by molecular replacement or other methods remains a difficult problem for computer programs to deal with. Programs tend to rely on time-consuming and computationally exhaustive searches to recognise characteristic features. We present NucleoFind, a deep-learning-based approach to interpreting and segmenting electron density. Using an electron density map from X-ray crystallography obtained after molecular replacement, the positions of the phosphate group, sugar ring and nitrogenous base group can be predicted with high accuracy. On average, 78% of phosphate atoms, 85% of sugar atoms and 83% of base atoms are positioned in predicted density after giving NucleoFind maps produced following successful molecular replacement. NucleoFind can use the wealth of context these predicted maps provide to build more accurate and complete nucleic acid models automatically.
Collapse
Affiliation(s)
- Jordan S Dialpuri
- York Structural Biology Laboratory, Department of Chemistry, University of York, York, UK
| | - Jon Agirre
- York Structural Biology Laboratory, Department of Chemistry, University of York, York, UK
| | - Kathryn D Cowtan
- York Structural Biology Laboratory, Department of Chemistry, University of York, York, UK
| | - Paul S Bond
- York Structural Biology Laboratory, Department of Chemistry, University of York, York, UK
| |
Collapse
|
20
|
Rosignoli S, Pacelli M, Manganiello F, Paiardini A. An outlook on structural biology after AlphaFold: tools, limits and perspectives. FEBS Open Bio 2024. [PMID: 39313455 DOI: 10.1002/2211-5463.13902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 08/19/2024] [Accepted: 09/13/2024] [Indexed: 09/25/2024] Open
Abstract
AlphaFold and similar groundbreaking, AI-based tools, have revolutionized the field of structural bioinformatics, with their remarkable accuracy in ab-initio protein structure prediction. This success has catalyzed the development of new software and pipelines aimed at incorporating AlphaFold's predictions, often focusing on addressing the algorithm's remaining challenges. Here, we present the current landscape of structural bioinformatics shaped by AlphaFold, and discuss how the field is dynamically responding to this revolution, with new software, methods, and pipelines. While the excitement around AI-based tools led to their widespread application, it is essential to acknowledge that their practical success hinges on their integration into established protocols within structural bioinformatics, often neglected in the context of AI-driven advancements. Indeed, user-driven intervention is still as pivotal in the structure prediction process as in complementing state-of-the-art algorithms with functional and biological knowledge.
Collapse
Affiliation(s)
- Serena Rosignoli
- Department of Biochemical sciences "A. Rossi Fanelli", Sapienza Università di Roma, Italy
| | - Maddalena Pacelli
- Department of Biochemical sciences "A. Rossi Fanelli", Sapienza Università di Roma, Italy
| | - Francesca Manganiello
- Department of Biochemical sciences "A. Rossi Fanelli", Sapienza Università di Roma, Italy
| | - Alessandro Paiardini
- Department of Biochemical sciences "A. Rossi Fanelli", Sapienza Università di Roma, Italy
| |
Collapse
|
21
|
Zeng W, Dou Y, Pan L, Xu L, Peng S. Improving prediction performance of general protein language model by domain-adaptive pretraining on DNA-binding protein. Nat Commun 2024; 15:7838. [PMID: 39244557 PMCID: PMC11380688 DOI: 10.1038/s41467-024-52293-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 08/29/2024] [Indexed: 09/09/2024] Open
Abstract
DNA-protein interactions exert the fundamental structure of many pivotal biological processes, such as DNA replication, transcription, and gene regulation. However, accurate and efficient computational methods for identifying these interactions are still lacking. In this study, we propose a method ESM-DBP through refining the DNA-binding protein sequence repertory and domain-adaptive pretraining based the general protein language model. Our method considers the lacking exploration of general language model for DNA-binding protein domain-specific knowledge, so we screen out 170,264 DNA-binding protein sequences to construct the domain-adaptive language model. Experimental results on four downstream tasks show that ESM-DBP provides a better feature representation of DNA-binding protein compared to the original language model, resulting in improved prediction performance and outperforming the state-of-the-art methods. Moreover, ESM-DBP can still perform well even for those sequences with only a few homologous sequences. ChIP-seq on two predicted cases further support the validity of the proposed method.
Collapse
Affiliation(s)
- Wenwu Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Yutao Dou
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Liangrui Pan
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Liwen Xu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China.
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China.
| |
Collapse
|
22
|
Bussi G, Bonomi M, Gkeka P, Sattler M, Al-Hashimi HM, Auffinger P, Duca M, Foricher Y, Incarnato D, Jones AN, Kirmizialtin S, Krepl M, Orozco M, Palermo G, Pasquali S, Salmon L, Schwalbe H, Westhof E, Zacharias M. RNA dynamics from experimental and computational approaches. Structure 2024; 32:1281-1287. [PMID: 39241758 DOI: 10.1016/j.str.2024.07.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 06/21/2024] [Accepted: 07/29/2024] [Indexed: 09/09/2024]
Abstract
Conformational dynamics is crucial for the biological function of RNA molecules and for their potential as therapeutic targets. This meeting report outlines key "take-home" messages that emerged from the presentations and discussions during the CECAM workshop "RNA dynamics from experimental and computational approaches" in Paris, June 26-28, 2023.
Collapse
Affiliation(s)
- Giovanni Bussi
- Scuola Internazionale Superiore di Studi Avanzati (SISSA), Via Bonomea 265, 34136 Trieste, Italy.
| | - Massimiliano Bonomi
- Institut Pasteur, Université Paris Cité, CNRS UMR 3528, Computational Structural Biology Unit, Paris, France.
| | - Paraskevi Gkeka
- Integrated Drug Discovery, Molecular Design Sciences, Sanofi, Vitry-sur-Seine, France.
| | - Michael Sattler
- Technical University of Munich, Munich, Germany; Helmholtz Munich, Munich, Germany.
| | - Hashim M Al-Hashimi
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Pascal Auffinger
- Université de Strasbourg, Architecture et Réactivité de l'ARN, Institut de Biologie Moléculaire et Cellulaire du CNRS, 2 Allée Konrad Roentgen, 67084 Strasbourg, France
| | - Maria Duca
- Université Côte d'Azur, CNRS, Institute of Chemistry of Nice, Nice, France
| | - Yann Foricher
- Integrated Drug Discovery, Small Molecules Medicinal Chemistry, Sanofi, Vitry-sur-Seine, France
| | - Danny Incarnato
- Department of Molecular Genetics, Groningen Biomolecular Sciences and Biotechnology Institute (GBB), University of Groningen, Groningen, the Netherlands
| | - Alisha N Jones
- Department of Chemistry, New York University, New York, NY, USA
| | - Serdal Kirmizialtin
- Department of Chemistry, New York University, New York, NY, USA; Chemistry Program, Science Division, New York University, Abu Dhabi, United Arab Emirates
| | - Miroslav Krepl
- Institute of Biophysics of the Czech Academy of Sciences, Kralovopolska 135, Brno 612 00, Czech Republic
| | - Modesto Orozco
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, and Department of Biochemistry and Biomedicine, University of Barcelona, Barcelona, Spain
| | - Giulia Palermo
- Department of Bioengineering and Department of Chemistry, The University of California, Riverside, Riverside, CA, USA
| | - Samuela Pasquali
- Laboratoire Biologie Fonctionnelle et Adaptative, CNRS UMR 8251 INSERM ERL 1133, Université Paris Cité, 35 rue Hélène Brion, 75013 Paris, France
| | - Loïc Salmon
- Centre de RMN à Très Hauts Champs, UMR 5082 (CNRS, École Normale Supérieure de Lyon, Université Claude Bernard Lyon 1), University of Lyon, 69100 Villeurbanne, France
| | - Harald Schwalbe
- Institute for Organic Chemistry and Chemical Biology, Center for Biomolecular Magnetic Resonance, Goethe-University Frankfurt, 60438 Frankfurt/Main, Germany
| | - Eric Westhof
- Architecture et Réactivité de l'ARN, Université de Strasbourg, Institut de biologie moléculaire et cellulaire du CNRS, 67084 Strasbourg, France
| | - Martin Zacharias
- Physics Department and Center of Protein Assemblies, Technical University of Munich, Munich, Germany
| |
Collapse
|
23
|
Mitra R, Li J, Sagendorf JM, Jiang Y, Cohen AS, Chiu TP, Glasscock CJ, Rohs R. Geometric deep learning of protein-DNA binding specificity. Nat Methods 2024; 21:1674-1683. [PMID: 39103447 PMCID: PMC11399107 DOI: 10.1038/s41592-024-02372-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Accepted: 06/14/2024] [Indexed: 08/07/2024]
Abstract
Predicting protein-DNA binding specificity is a challenging yet essential task for understanding gene regulation. Protein-DNA complexes usually exhibit binding to a selected DNA target site, whereas a protein binds, with varying degrees of binding specificity, to a wide range of DNA sequences. This information is not directly accessible in a single structure. Here, to access this information, we present Deep Predictor of Binding Specificity (DeepPBS), a geometric deep-learning model designed to predict binding specificity from protein-DNA structure. DeepPBS can be applied to experimental or predicted structures. Interpretable protein heavy atom importance scores for interface residues can be extracted. When aggregated at the protein residue level, these scores are validated through mutagenesis experiments. Applied to designed proteins targeting specific DNA sequences, DeepPBS was demonstrated to predict experimentally measured binding specificity. DeepPBS offers a foundation for machine-aided studies that advance our understanding of molecular interactions and guide experimental designs and synthetic biology.
Collapse
Affiliation(s)
- Raktim Mitra
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Jinsen Li
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Jared M Sagendorf
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Yibei Jiang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Ari S Cohen
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Tsu-Pei Chiu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Cameron J Glasscock
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
- Department of Chemistry, University of Southern California, Los Angeles, CA, USA.
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA, USA.
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
24
|
Ai H, Pan M, Liu L. Chemical Synthesis of Human Proteoforms and Application in Biomedicine. ACS CENTRAL SCIENCE 2024; 10:1442-1459. [PMID: 39220697 PMCID: PMC11363345 DOI: 10.1021/acscentsci.4c00642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 07/04/2024] [Accepted: 07/05/2024] [Indexed: 09/04/2024]
Abstract
Limited understanding of human proteoforms with complex posttranslational modifications and the underlying mechanisms poses a major obstacle to research on human health and disease. This Outlook discusses opportunities and challenges of de novo chemical protein synthesis in human proteoform studies. Our analysis suggests that to develop a comprehensive, robust, and cost-effective methodology for chemical synthesis of various human proteoforms, new chemistries of the following types need to be developed: (1) easy-to-use peptide ligation chemistries allowing more efficient de novo synthesis of protein structural domains, (2) robust temporary structural support strategies for ligation and folding of challenging targets, and (3) efficient transpeptidative protein domain-domain ligation methods for multidomain proteins. Our analysis also indicates that accurate chemical synthesis of human proteoforms can be applied to the following aspects of biomedical research: (1) dissection and reconstitution of the proteoform interaction networks, (2) structural mechanism elucidation and functional analysis of human proteoform complexes, and (3) development and evaluation of drugs targeting human proteoforms. Overall, we suggest that through integrating chemical protein synthesis with in vivo functional analysis, mechanistic biochemistry, and drug development, synthetic chemistry would play a pivotal role in human proteoform research and facilitate the development of precision diagnostics and therapeutics.
Collapse
Affiliation(s)
- Huasong Ai
- New
Cornerstone Science Laboratory, Tsinghua-Peking Joint Center for Life
Sciences, MOE Key Laboratory of Bioorganic Phosphorus Chemistry and
Chemical Biology, Center for Synthetic and Systems Biology, Department
of Chemistry, Tsinghua University, Beijing 100084, China
- Institute
of Translational Medicine, School of Pharmacy, School of Chemistry
and Chemical Engineering, National Center for Translational Medicine
(Shanghai), Shanghai Jiao Tong University, Shanghai 200240, China
| | - Man Pan
- Institute
of Translational Medicine, School of Pharmacy, School of Chemistry
and Chemical Engineering, National Center for Translational Medicine
(Shanghai), Shanghai Jiao Tong University, Shanghai 200240, China
| | - Lei Liu
- New
Cornerstone Science Laboratory, Tsinghua-Peking Joint Center for Life
Sciences, MOE Key Laboratory of Bioorganic Phosphorus Chemistry and
Chemical Biology, Center for Synthetic and Systems Biology, Department
of Chemistry, Tsinghua University, Beijing 100084, China
| |
Collapse
|
25
|
Szabla R, Li M, Warner V, Song Y, Junop M. DdrC, a unique DNA repair factor from D. radiodurans, senses and stabilizes DNA breaks through a novel lesion-recognition mechanism. Nucleic Acids Res 2024; 52:9282-9302. [PMID: 39036966 PMCID: PMC11347143 DOI: 10.1093/nar/gkae635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 07/03/2024] [Accepted: 07/11/2024] [Indexed: 07/23/2024] Open
Abstract
The bacterium Deinococcus radiodurans is known to survive high doses of DNA damaging agents. This resistance is the result of robust antioxidant systems which protect efficient DNA repair mechanisms that are unique to Deinococcus species. The protein DdrC has been identified as an important component of this repair machinery. DdrC is known to bind to DNA in vitro and has been shown to circularize and compact DNA fragments. The mechanism and biological relevance of this activity is poorly understood. Here, we show that the DdrC homodimer is a lesion-sensing protein that binds to two single-strand (ss) or double-strand (ds) breaks. The immobilization of DNA breaks in pairs consequently leads to the circularization of linear DNA and the compaction of nicked DNA. The degree of compaction is directly proportional with the number of available nicks. Previously, the structure of the DdrC homodimer was solved in an unusual asymmetric conformation. Here, we solve the structure of DdrC under different crystallographic environments and confirm that the asymmetry is an endogenous feature of DdrC. We propose a dynamic structural mechanism where the asymmetry is necessary to trap a pair of lesions. We support this model with mutant disruption and computational modeling experiments.
Collapse
Affiliation(s)
- Robert Szabla
- Department of Biochemistry, Western University, London, Ontario N6A 3K7, Canada
| | - Mingyi Li
- Department of Biochemistry, Western University, London, Ontario N6A 3K7, Canada
| | - Victoria Warner
- Department of Biochemistry, Western University, London, Ontario N6A 3K7, Canada
| | - Yifeng Song
- Department of Biochemistry, Western University, London, Ontario N6A 3K7, Canada
| | - Murray Junop
- Department of Biochemistry, Western University, London, Ontario N6A 3K7, Canada
| |
Collapse
|
26
|
Huzar J, Coreas R, Landry MP, Tikhomirov G. AI-based Prediction of Protein Corona Composition on DNA Nanostructures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.25.609594. [PMID: 39253427 PMCID: PMC11383312 DOI: 10.1101/2024.08.25.609594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
DNA nanotechnology has emerged as a powerful approach to engineering biophysical tools, therapeutics, and diagnostics because it enables the construction of designer nanoscale structures with high programmability. Based on DNA base pairing rules, nanostructure size, shape, surface functionality, and structural reconfiguration can be programmed with a degree of spatial, temporal, and energetic precision that is difficult to achieve with other methods. However, the properties and structure of DNA constructs are greatly altered in vivo due to spontaneous protein adsorption from biofluids. These adsorbed proteins, referred to as the protein corona, remain challenging to control or predict, and subsequently, their functionality and fate in vivo are difficult to engineer. To address these challenges, we prepared a library of diverse DNA nanostructures and investigated the relationship between their design features and the composition of their protein corona. We identified protein characteristics important for their adsorption to DNA nanostructures and developed a machine-learning model that predicts which proteins will be enriched on a DNA nanostructure based on the DNA structures' design features and protein properties. Our work will help to understand and program the function of DNA nanostructures in vivo for biophysical and biomedical applications.
Collapse
Affiliation(s)
- Jared Huzar
- Biophysics Graduate Group, University of California, Berkeley, Berkeley, CA
| | - Roxana Coreas
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, Berkeley, CA
| | - Markita P. Landry
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, Berkeley, CA
- Innovative Genomics Institute, Berkeley, CA
- California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA
- Chan Zuckerberg Biohub, San Francisco, CA
| | - Grigory Tikhomirov
- Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, CA
| |
Collapse
|
27
|
Wang B, Li W. Advances in the Application of Protein Language Modeling for Nucleic Acid Protein Binding Site Prediction. Genes (Basel) 2024; 15:1090. [PMID: 39202449 PMCID: PMC11353971 DOI: 10.3390/genes15081090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Revised: 08/13/2024] [Accepted: 08/14/2024] [Indexed: 09/03/2024] Open
Abstract
Protein and nucleic acid binding site prediction is a critical computational task that benefits a wide range of biological processes. Previous studies have shown that feature selection holds particular significance for this prediction task, making the generation of more discriminative features a key area of interest for many researchers. Recent progress has shown the power of protein language models in handling protein sequences, in leveraging the strengths of attention networks, and in successful applications to tasks such as protein structure prediction. This naturally raises the question of the applicability of protein language models in predicting protein and nucleic acid binding sites. Various approaches have explored this potential. This paper first describes the development of protein language models. Then, a systematic review of the latest methods for predicting protein and nucleic acid binding sites is conducted by covering benchmark sets, feature generation methods, performance comparisons, and feature ablation studies. These comparisons demonstrate the importance of protein language models for the prediction task. Finally, the paper discusses the challenges of protein and nucleic acid binding site prediction and proposes possible research directions and future trends. The purpose of this survey is to furnish researchers with actionable suggestions for comprehending the methodologies used in predicting protein-nucleic acid binding sites, fostering the creation of protein-centric language models, and tackling real-world obstacles encountered in this field.
Collapse
Affiliation(s)
| | - Wenjin Li
- Institute for Advanced Study, Shenzhen University, Shenzhen 518061, China;
| |
Collapse
|
28
|
Zitnik M, Li MM, Wells A, Glass K, Morselli Gysi D, Krishnan A, Murali TM, Radivojac P, Roy S, Baudot A, Bozdag S, Chen DZ, Cowen L, Devkota K, Gitter A, Gosline SJC, Gu P, Guzzi PH, Huang H, Jiang M, Kesimoglu ZN, Koyuturk M, Ma J, Pico AR, Pržulj N, Przytycka TM, Raphael BJ, Ritz A, Sharan R, Shen Y, Singh M, Slonim DK, Tong H, Yang XH, Yoon BJ, Yu H, Milenković T. Current and future directions in network biology. BIOINFORMATICS ADVANCES 2024; 4:vbae099. [PMID: 39143982 PMCID: PMC11321866 DOI: 10.1093/bioadv/vbae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 05/31/2024] [Accepted: 07/08/2024] [Indexed: 08/16/2024]
Abstract
Summary Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology. Availability and implementation Not applicable.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Aydin Wells
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
| | - Deisy Morselli Gysi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
- Department of Statistics, Federal University of Paraná, Curitiba, Paraná 81530-015, Brazil
- Department of Physics, Northeastern University, Boston, MA 02115, United States
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| | - Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Wisconsin Institute for Discovery, Madison, WI 53715, United States
| | - Anaïs Baudot
- Aix Marseille Université, INSERM, MMG, Marseille, France
| | - Serdar Bozdag
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- Department of Mathematics, University of North Texas, Denton, TX 76203, United States
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Kapil Devkota
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Morgridge Institute for Research, Madison, WI 53715, United States
| | - Sara J C Gosline
- Biological Sciences Division, Pacific Northwest National Laboratory, Seattle, WA 98109, United States
| | - Pengfei Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Pietro H Guzzi
- Department of Medical and Surgical Sciences, University Magna Graecia of Catanzaro, Catanzaro, 88100, Italy
| | - Heng Huang
- Department of Computer Science, University of Maryland College Park, College Park, MD 20742, United States
| | - Meng Jiang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Ziynet Nesibe Kesimoglu
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Mehmet Koyuturk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, United States
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, WC1E 6BT, England
- ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, 08010, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
| | - Anna Ritz
- Department of Biology, Reed College, Portland, OR 97202, United States
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, United States
| | - Donna K Slonim
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Hanghang Tong
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| | - Xinan Holly Yang
- Department of Pediatrics, University of Chicago, Chicago, IL 60637, United States
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, United States
| | - Haiyuan Yu
- Department of Computational Biology, Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, United States
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| |
Collapse
|
29
|
Yancoskie M, Khaleghi R, Gururajan A, Raghunathan A, Gupta A, Diethelm S, Maritz C, Sturla S, Krishnan M, Naegeli H. ASH1L guards cis-regulatory elements against cyclobutane pyrimidine dimer induction. Nucleic Acids Res 2024; 52:8254-8270. [PMID: 38884271 PMCID: PMC11317172 DOI: 10.1093/nar/gkae517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 05/29/2024] [Accepted: 06/04/2024] [Indexed: 06/18/2024] Open
Abstract
The histone methyltransferase ASH1L, first discovered for its role in transcription, has been shown to accelerate the removal of ultraviolet (UV) light-induced cyclobutane pyrimidine dimers (CPDs) by nucleotide excision repair. Previous reports demonstrated that CPD excision is most efficient at transcriptional regulatory elements, including enhancers, relative to other genomic sites. Therefore, we analyzed DNA damage maps in ASH1L-proficient and ASH1L-deficient cells to understand how ASH1L controls enhancer stability. This comparison showed that ASH1L protects enhancer sequences against the induction of CPDs besides stimulating repair activity. ASH1L reduces CPD formation at C-containing but not at TT dinucleotides, and no protection occurs against pyrimidine-(6,4)-pyrimidone photoproducts or cisplatin crosslinks. The diminished CPD induction extends to gene promoters but excludes retrotransposons. This guardian role against CPDs in regulatory elements is associated with the presence of H3K4me3 and H3K27ac histone marks, which are known to interact with the PHD and BRD motifs of ASH1L, respectively. Molecular dynamics simulations identified a DNA-binding AT hook of ASH1L that alters the distance and dihedral angle between neighboring C nucleotides to disfavor dimerization. The loss of this protection results in a higher frequency of C->T transitions at enhancers of skin cancers carrying ASH1L mutations compared to ASH1L-intact counterparts.
Collapse
Affiliation(s)
- Michelle N Yancoskie
- Institute of Pharmacology and Toxicology, University of Zurich-Vetsuisse, Zurich 8057, Switzerland
| | - Reihaneh Khaleghi
- Institute of Pharmacology and Toxicology, University of Zurich-Vetsuisse, Zurich 8057, Switzerland
| | - Anirvinya Gururajan
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| | - Aadarsh Raghunathan
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| | - Aryan Gupta
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| | - Sarah Diethelm
- Institute of Pharmacology and Toxicology, University of Zurich-Vetsuisse, Zurich 8057, Switzerland
| | - Corina Maritz
- Institute of Pharmacology and Toxicology, University of Zurich-Vetsuisse, Zurich 8057, Switzerland
| | - Shana J Sturla
- Department of Health Sciences and Technology, ETH Zurich, Zurich 8092, Switzerland
| | - Marimuthu Krishnan
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| | - Hanspeter Naegeli
- Institute of Pharmacology and Toxicology, University of Zurich-Vetsuisse, Zurich 8057, Switzerland
| |
Collapse
|
30
|
Guo M, Yang F, Zhu L, Wang L, Li Z, Qi Z, Fotopoulos V, Yu J, Zhou J. Loss of cold tolerance is conferred by absence of the WRKY34 promoter fragment during tomato evolution. Nat Commun 2024; 15:6667. [PMID: 39107290 PMCID: PMC11303406 DOI: 10.1038/s41467-024-51036-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Accepted: 07/28/2024] [Indexed: 08/10/2024] Open
Abstract
Natural evolution has resulted in reduced cold tolerance in cultivated tomato (Solanum lycopersicum). Herein, we perform a combined analysis of ATAC-Seq and RNA-Seq in cold-sensitive cultivated tomato and cold-tolerant wild tomato (S. habrochaites). We identify that WRKY34 has the most significant association with differential chromatin accessibility and expression patterns under cold stress. We find that a 60 bp InDel in the WRKY34 promoter causes differences in its transcription and cold tolerance among 376 tomato accessions. This 60 bp fragment contains a GATA cis-regulatory element that binds to SWIBs and GATA29, which synergistically suppress WRKY34 expression under cold stress. Moreover, WRKY34 interferes with the CBF cold response pathway through regulating transcription and protein levels. Our findings emphasize the importance of polymorphisms in cis-regulatory regions and their effects on chromatin structure and gene expression during crop evolution.
Collapse
Affiliation(s)
- Mingyue Guo
- Department of Horticulture, Zhejiang Provincial Key Laboratory of Horticultural Crop Quality Regulation, Zhejiang University, Yuhangtang Road 866, Hangzhou, 310058, China
| | - Fengjun Yang
- Department of Horticulture, Zhejiang Provincial Key Laboratory of Horticultural Crop Quality Regulation, Zhejiang University, Yuhangtang Road 866, Hangzhou, 310058, China
| | - Lijuan Zhu
- Department of Horticulture, Zhejiang Provincial Key Laboratory of Horticultural Crop Quality Regulation, Zhejiang University, Yuhangtang Road 866, Hangzhou, 310058, China
| | - Leilei Wang
- Department of Horticulture, Zhejiang Provincial Key Laboratory of Horticultural Crop Quality Regulation, Zhejiang University, Yuhangtang Road 866, Hangzhou, 310058, China
| | - Zhichao Li
- Department of Horticulture, Zhejiang Provincial Key Laboratory of Horticultural Crop Quality Regulation, Zhejiang University, Yuhangtang Road 866, Hangzhou, 310058, China
| | - Zhenyu Qi
- Hainan Institute, Zhejiang University, Sanya, 572000, China
- Agricultural Experiment Station, Zhejiang University, Hangzhou, 310058, China
| | - Vasileios Fotopoulos
- Cyprus University of Technology, Department of Agricultural Sciences, Biotechnology and Food Science, Lemesos, 3036, Cyprus
| | - Jingquan Yu
- Department of Horticulture, Zhejiang Provincial Key Laboratory of Horticultural Crop Quality Regulation, Zhejiang University, Yuhangtang Road 866, Hangzhou, 310058, China
- Hainan Institute, Zhejiang University, Sanya, 572000, China
- Key Laboratory of Horticultural Plants Growth, Development and Quality Improvement, Ministry of Agriculture and Rural Affairs of China, Yuhangtang Road 866, Hangzhou, 310058, China
| | - Jie Zhou
- Department of Horticulture, Zhejiang Provincial Key Laboratory of Horticultural Crop Quality Regulation, Zhejiang University, Yuhangtang Road 866, Hangzhou, 310058, China.
- Hainan Institute, Zhejiang University, Sanya, 572000, China.
- Key Laboratory of Horticultural Plants Growth, Development and Quality Improvement, Ministry of Agriculture and Rural Affairs of China, Yuhangtang Road 866, Hangzhou, 310058, China.
| |
Collapse
|
31
|
Kyrilis FL, Low JKK, Mackay JP, Kastritis PL. Structural biology in cellulo: Minding the gap between conceptualization and realization. Curr Opin Struct Biol 2024; 87:102843. [PMID: 38788606 DOI: 10.1016/j.sbi.2024.102843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 04/30/2024] [Accepted: 05/01/2024] [Indexed: 05/26/2024]
Abstract
Recent technological advances have deepened our perception of cellular structure. However, most structural data doesn't originate from intact cells, limiting our understanding of cellular processes. Here, we discuss current and future developments that will bring us towards a structural picture of the cell. Electron cryotomography is the standard bearer, with its ability to provide in cellulo snapshots. Single-particle electron microscopy (of purified biomolecules and of complex mixtures) and covalent crosslinking combined with mass spectrometry also have significant roles to play, as do artificial intelligence algorithms in their many guises. To integrate these multiple approaches, data curation and standardisation will be critical - as is the need to expand efforts beyond our current protein-centric view to the other (macro)molecules that sustain life.
Collapse
Affiliation(s)
- Fotis L Kyrilis
- Institute of Chemical Biology, National Hellenic Research Foundation, Athens, Greece. https://twitter.com/Fotansky_16
| | - Jason K K Low
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia
| | - Joel P Mackay
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia.
| | - Panagiotis L Kastritis
- Institute of Chemical Biology, National Hellenic Research Foundation, Athens, Greece; Interdisciplinary Research Center HALOmem, Charles Tanford Protein Center, Martin Luther University Halle-Wittenberg, Kurt-Mothes-Straße 3a, Halle/Saale, Germany; Institute of Biochemistry and Biotechnology, Martin Luther University Halle-Wittenberg, Kurt-Mothes-Straße 3, Halle/Saale, Germany; Biozentrum, Martin Luther University Halle-Wittenberg, Weinbergweg 22, Halle/Saale, Germany.
| |
Collapse
|
32
|
Ahdritz G, Bouatta N, Floristean C, Kadyan S, Xia Q, Gerecke W, O'Donnell TJ, Berenberg D, Fisk I, Zanichelli N, Zhang B, Nowaczynski A, Wang B, Stepniewska-Dziubinska MM, Zhang S, Ojewole A, Guney ME, Biderman S, Watkins AM, Ra S, Lorenzo PR, Nivon L, Weitzner B, Ban YEA, Chen S, Zhang M, Li C, Song SL, He Y, Sorger PK, Mostaque E, Zhang Z, Bonneau R, AlQuraishi M. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nat Methods 2024; 21:1514-1524. [PMID: 38744917 DOI: 10.1038/s41592-024-02272-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 04/03/2024] [Indexed: 05/16/2024]
Abstract
AlphaFold2 revolutionized structural biology with the ability to predict protein structures with exceptionally high accuracy. Its implementation, however, lacks the code and data required to train new models. These are necessary to (1) tackle new tasks, like protein-ligand complex structure prediction, (2) investigate the process by which the model learns and (3) assess the model's capacity to generalize to unseen regions of fold space. Here we report OpenFold, a fast, memory efficient and trainable implementation of AlphaFold2. We train OpenFold from scratch, matching the accuracy of AlphaFold2. Having established parity, we find that OpenFold is remarkably robust at generalizing even when the size and diversity of its training set is deliberately limited, including near-complete elisions of classes of secondary structure elements. By analyzing intermediate structures produced during training, we also gain insights into the hierarchical manner in which OpenFold learns to fold. In sum, our studies demonstrate the power and utility of OpenFold, which we believe will prove to be a crucial resource for the protein modeling community.
Collapse
Affiliation(s)
- Gustaf Ahdritz
- Department of Systems Biology, Columbia University, New York, NY, USA
- Harvard University, Cambridge, MA, USA
| | - Nazim Bouatta
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA.
| | | | - Sachin Kadyan
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Qinghui Xia
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - William Gerecke
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA
| | | | - Daniel Berenberg
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
| | - Ian Fisk
- Flatiron Institute, New York, NY, USA
| | | | - Bo Zhang
- Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT, USA
| | | | | | | | | | | | | | - Stella Biderman
- EleutherAI, New York, NY, USA
- Booz Allen Hamilton, McLean, VA, USA
| | | | - Stephen Ra
- Prescient Design, Genentech, New York, NY, USA
| | | | | | | | | | | | - Minjia Zhang
- University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | | | | | | | - Peter K Sorger
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA
| | | | - Zhao Zhang
- Rutgers University, New Brunswick, NJ, USA
| | | | | |
Collapse
|
33
|
Yang XY, Shen Z, Xie J, Greenwald J, Marathe I, Lin Q, Xie WJ, Wysocki VH, Fu TM. Molecular basis of Gabija anti-phage supramolecular assemblies. Nat Struct Mol Biol 2024; 31:1243-1250. [PMID: 38627580 PMCID: PMC11418746 DOI: 10.1038/s41594-024-01283-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 03/22/2024] [Indexed: 05/15/2024]
Abstract
As one of the most prevalent anti-phage defense systems in prokaryotes, Gabija consists of a Gabija protein A (GajA) and a Gabija protein B (GajB). The assembly and function of the Gabija system remain unclear. Here we present cryo-EM structures of Bacillus cereus GajA and GajAB complex, revealing tetrameric and octameric assemblies, respectively. In the center of the complex, GajA assembles into a tetramer, which recruits two sets of GajB dimer at opposite sides of the complex, resulting in a 4:4 GajAB supramolecular complex for anti-phage defense. Further biochemical analysis showed that GajA alone is sufficient to cut double-stranded DNA and plasmid DNA, which can be inhibited by ATP. Unexpectedly, the GajAB displays enhanced activity for plasmid DNA, suggesting a role of substrate selection by GajB. Together, our study defines a framework for understanding anti-phage immune defense by the GajAB complex.
Collapse
Affiliation(s)
- Xiao-Yuan Yang
- Department of Biological Chemistry and Pharmacology, Center for RNA Biology, The Ohio State University, Columbus, OH, USA
- The Ohio State University Comprehensive Cancer Center, Columbus, OH, USA
- Program of OSBP, The Ohio State University, Columbus, OH, USA
| | - Zhangfei Shen
- Department of Biological Chemistry and Pharmacology, Center for RNA Biology, The Ohio State University, Columbus, OH, USA
- The Ohio State University Comprehensive Cancer Center, Columbus, OH, USA
| | - Jiale Xie
- Department of Biological Chemistry and Pharmacology, Center for RNA Biology, The Ohio State University, Columbus, OH, USA
- The Ohio State University Comprehensive Cancer Center, Columbus, OH, USA
- Program of OSBP, The Ohio State University, Columbus, OH, USA
| | - Jacelyn Greenwald
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, OH, USA
| | - Ila Marathe
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, OH, USA
| | - Qingpeng Lin
- Department of Biological Chemistry and Pharmacology, Center for RNA Biology, The Ohio State University, Columbus, OH, USA
- The Ohio State University Comprehensive Cancer Center, Columbus, OH, USA
| | - Wen Jun Xie
- Department of Medicinal Chemistry, University of Florida, Gainesville, FL, USA
| | - Vicki H Wysocki
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, OH, USA
| | - Tian-Min Fu
- Department of Biological Chemistry and Pharmacology, Center for RNA Biology, The Ohio State University, Columbus, OH, USA.
- The Ohio State University Comprehensive Cancer Center, Columbus, OH, USA.
- Program of OSBP, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
34
|
Troisi R, Sica F. Structural overview of DNA and RNA G-quadruplexes in their interaction with proteins. Curr Opin Struct Biol 2024; 87:102846. [PMID: 38848656 DOI: 10.1016/j.sbi.2024.102846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 05/03/2024] [Accepted: 05/07/2024] [Indexed: 06/09/2024]
Abstract
Since the discovery of G-quadruplex (G4) participation in vital cellular processes, the regulation of the interaction of naturally occurring G4s with the relative target proteins has emerged as a promising approach for therapeutic development. Additionally, a synthetic strategy has produced several oligonucleotide aptamers, embodying a G4 module, which exhibit relevant biological activity by binding selectively to a target protein. In this context, the G4-protein structures available in the Protein Data Bank represent a valuable molecular view of the different G4 topologies involved in protein interaction. Interestingly, recent results have shown the co-existence of G4s with other structural domains such as duplexes. Overall, these findings allow a better understanding of the mechanisms that regulate intricate biological functions and suggest new design for innovative medical treatments.
Collapse
Affiliation(s)
- Romualdo Troisi
- Department of Chemical Sciences, University of Naples Federico II, Complesso Universitario di Monte Sant'Angelo, via Cintia, 80126 Naples, Italy; Institute of Biostructures and Bioimaging, CNR, via Pietro Castellino 111, 80131 Naples, Italy. https://twitter.com/TroRom
| | - Filomena Sica
- Department of Chemical Sciences, University of Naples Federico II, Complesso Universitario di Monte Sant'Angelo, via Cintia, 80126 Naples, Italy.
| |
Collapse
|
35
|
Baek M. Towards the prediction of general biomolecular interactions with AI. Nat Methods 2024; 21:1382-1383. [PMID: 39122945 DOI: 10.1038/s41592-024-02350-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2024]
Affiliation(s)
- Minkyung Baek
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
| |
Collapse
|
36
|
Nakamura A, Yamamoto H, Yano T, Hasegawa R, Makino Y, Mitsuda N, Terakawa T, Ito S, Sugano SS. Expanding the Genome-Editing Toolbox with Abyssicoccus albus Cas9 Using a Unique Protospacer Adjacent Motif Sequence. CRISPR J 2024; 7:197-209. [PMID: 39111827 DOI: 10.1089/crispr.2024.0013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2024] Open
Abstract
The genome-editing efficiency of the CRISPR-Cas9 system hinges on the recognition of the protospacer adjacent motif (PAM) sequence, which is essential for Cas9 binding to DNA. The commonly used Streptococcus pyogenes (SpyCas9) targets the 5'-NGG-3' PAM sequence, which does not cover all the potential genomic-editing sites. To expand the toolbox for genome editing, SpyCas9 has been engineered to recognize flexible PAM sequences and Cas9 orthologs have been used to recognize novel PAM sequences. In this study, Abyssicoccus albus Cas9 (AalCas9, 1059 aa), which is smaller than SpyCas9, was found to recognize a unique 5'-NNACR-3' PAM sequence. Modification of the guide RNA sequence improved the efficiency of AalCas9-mediated genome editing in both plant and human cells. Predicted structure-assisted introduction of a point mutation in the putative PAM recognition site shifted the sequence preference of AalCas9. These results provide insights into Cas9 diversity and novel tools for genome editing.
Collapse
Affiliation(s)
- Akiyoshi Nakamura
- Bioproduction Research Institute, The National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan
| | - Hiroshi Yamamoto
- Bioproduction Research Institute, The National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan
| | | | | | | | - Nobutaka Mitsuda
- Bioproduction Research Institute, The National Institute of Advanced Industrial Science and Technology (AIST), Sapporo, Japan
| | | | | | - Shigeo S Sugano
- Bioproduction Research Institute, The National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan
| |
Collapse
|
37
|
Roche R, Tarafder S, Bhattacharya D. Single-sequence protein-RNA complex structure prediction by geometric attention-enabled pairing of biological language models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.27.605468. [PMID: 39091736 PMCID: PMC11291176 DOI: 10.1101/2024.07.27.605468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
Ground-breaking progress has been made in structure prediction of biomolecular assemblies, including the recent breakthrough of AlphaFold 3. However, it remains challenging for AlphaFold 3 and other state-of-the-art deep learning-based methods to accurately predict protein-RNA complex structures, in part due to the limited availability of evolutionary and structural information related to protein-RNA interactions that are used as inputs to the existing approaches. Here, we introduce ProRNA3D-single, a new deep-learning framework for protein-RNA complex structure prediction with only single-sequence input. Using a novel geometric attention-enabled pairing of biological language models of protein and RNA, a previously unexplored avenue, ProRNA3D-single enables the prediction of interatomic protein-RNA interaction maps, which are then transformed into multi-scale geometric restraints for modeling 3D structures of protein-RNA complexes via geometry optimization. Benchmark tests show that ProRNA3D-single convincingly outperforms current state-of-the-art methods including AlphaFold 3, particularly when evolutionary information is limited; and exhibits remarkable robustness and performance resilience by attaining better accuracy with only single-sequence input than what most methods can achieve even with explicit evolutionary information. Freely available at https://github.com/Bhattacharya-Lab/ProRNA3D-single, ProRNA3D-single should be broadly useful for modeling 3D structures of protein-RNA complexes at scale, regardless of the availability of evolutionary information.
Collapse
Affiliation(s)
- Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America
| | - Sumit Tarafder
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America
| | - Debswapna Bhattacharya
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America
| |
Collapse
|
38
|
Kim HJ, Szurgot MR, van Eeuwen T, Ricketts MD, Basnet P, Zhang AL, Vogt A, Sharmin S, Kaplan CD, Garcia BA, Marmorstein R, Murakami K. Structure of the Hir histone chaperone complex. Mol Cell 2024; 84:2601-2617.e12. [PMID: 38925115 PMCID: PMC11338637 DOI: 10.1016/j.molcel.2024.05.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Revised: 04/24/2024] [Accepted: 05/31/2024] [Indexed: 06/28/2024]
Abstract
The evolutionarily conserved HIRA/Hir histone chaperone complex and ASF1a/Asf1 co-chaperone cooperate to deposit histone (H3/H4)2 tetramers on DNA for replication-independent chromatin assembly. The molecular architecture of the HIRA/Hir complex and its mode of histone deposition have remained unknown. Here, we report the cryo-EM structure of the S. cerevisiae Hir complex with Asf1/H3/H4 at 2.9-6.8 Å resolution. We find that the Hir complex forms an arc-shaped dimer with a Hir1/Hir2/Hir3/Hpc2 stoichiometry of 2/4/2/4. The core of the complex containing two Hir1/Hir2/Hir2 trimers and N-terminal segments of Hir3 forms a central cavity containing two copies of Hpc2, with one engaged by Asf1/H3/H4, in a suitable position to accommodate a histone (H3/H4)2 tetramer, while the C-terminal segments of Hir3 harbor nucleic acid binding activity to wrap DNA around the Hpc2-assisted histone tetramer. The structure suggests a model for how the Hir/Asf1 complex promotes the formation of histone tetramers for their subsequent deposition onto DNA.
Collapse
Affiliation(s)
- Hee Jong Kim
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Biochemistry and Molecular Biophysics Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Mary R Szurgot
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Biochemistry and Molecular Biophysics Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Trevor van Eeuwen
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Biochemistry and Molecular Biophysics Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - M Daniel Ricketts
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Pratik Basnet
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Athena L Zhang
- Biochemistry and Molecular Biophysics Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Austin Vogt
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Samah Sharmin
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Craig D Kaplan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Benjamin A Garcia
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Ronen Marmorstein
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Kenji Murakami
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
39
|
Nithin C, Kmiecik S, Błaszczyk R, Nowicka J, Tuszyńska I. Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA-ligand interactions. Nucleic Acids Res 2024; 52:7465-7486. [PMID: 38917327 PMCID: PMC11260495 DOI: 10.1093/nar/gkae541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 05/23/2024] [Accepted: 06/16/2024] [Indexed: 06/27/2024] Open
Abstract
Accurate RNA structure models are crucial for designing small molecule ligands that modulate their functions. This study assesses six standalone RNA 3D structure prediction methods-DeepFoldRNA, RhoFold, BRiQ, FARFAR2, SimRNA and Vfold2, excluding web-based tools due to intellectual property concerns. We focus on reproducing the RNA structure existing in RNA-small molecule complexes, particularly on the ability to model ligand binding sites. Using a comprehensive set of RNA structures from the PDB, which includes diverse structural elements, we found that machine learning (ML)-based methods effectively predict global RNA folds but are less accurate with local interactions. Conversely, non-ML-based methods demonstrate higher precision in modeling intramolecular interactions, particularly with secondary structure restraints. Importantly, ligand-binding site accuracy can remain sufficiently high for practical use, even if the overall model quality is not optimal. With the recent release of AlphaFold 3, we included this advanced method in our tests. Benchmark subsets containing new structures, not used in the training of the tested ML methods, show that AlphaFold 3's performance was comparable to other ML-based methods, albeit with some challenges in accurately modeling ligand binding sites. This study underscores the importance of enhancing binding site prediction accuracy and the challenges in modeling RNA-ligand interactions accurately.
Collapse
Affiliation(s)
- Chandran Nithin
- Molecure SA, 02-089 Warsaw, Poland
- Laboratory of Computational Biology, Biological and Chemical Research Center, Faculty of Chemistry, University of Warsaw, 02-089 Warsaw, Poland
| | - Sebastian Kmiecik
- Laboratory of Computational Biology, Biological and Chemical Research Center, Faculty of Chemistry, University of Warsaw, 02-089 Warsaw, Poland
| | | | | | | |
Collapse
|
40
|
Peixoto ML, Madan E. Unraveling the complexity: Advanced methods in analyzing DNA, RNA, and protein interactions. Adv Cancer Res 2024; 163:251-302. [PMID: 39271265 DOI: 10.1016/bs.acr.2024.06.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2024]
Abstract
Exploring the intricate interplay within and between nucleic acids, as well as their interactions with proteins, holds pivotal significance in unraveling the molecular complexities steering cancer initiation and progression. To investigate these interactions, a diverse array of highly specific and sensitive molecular techniques has been developed. The selection of a particular technique depends on the specific nature of the interactions. Typically, researchers employ an amalgamation of these different techniques to obtain a comprehensive and holistic understanding of inter- and intramolecular interactions involving DNA-DNA, RNA-RNA, DNA-RNA, or protein-DNA/RNA. Examining nucleic acid conformation reveals alternative secondary structures beyond conventional ones that have implications for cancer pathways. Mutational hotspots in cancer often lie within sequences prone to adopting these alternative structures, highlighting the importance of investigating intra-genomic and intra-transcriptomic interactions, especially in the context of mutations, to deepen our understanding of oncology. Beyond these intramolecular interactions, the interplay between DNA and RNA leads to formations like DNA:RNA hybrids (known as R-loops) or even DNA:DNA:RNA triplex structures, both influencing biological processes that ultimately impact cancer. Protein-nucleic acid interactions are intrinsic cellular phenomena crucial in both normal and pathological conditions. In particular, genetic mutations or single amino acid variations can alter a protein's structure, function, and binding affinity, thus influencing cancer progression. It is thus, imperative to understand the differences between wild-type (WT) and mutated (MT) genes, transcripts, and proteins. The review aims to summarize the frequently employed methods and techniques for investigating interactions involving nucleic acids and proteins, highlighting recent advancements and diverse adaptations of each technique.
Collapse
Affiliation(s)
- Maria Leonor Peixoto
- Champalimaud Center for the Unknown, Lisbon, Portugal; Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
| | - Esha Madan
- Department of Surgery, Virginia Commonwealth University, School of Medicine, Richmond, VA, United States; Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA, United States; VCU Institute of Molecular Medicine, Department of Human and Molecular Genetics, Virginia Commonwealth University, School of Medicine, Richmond, VA, United States.
| |
Collapse
|
41
|
He A, Wan L, Zhang Y, Yan Z, Guo P, Han D, Tan W. Structure-based investigation of a DNA aptamer targeting PTK7 reveals an intricate 3D fold guiding functional optimization. Proc Natl Acad Sci U S A 2024; 121:e2404060121. [PMID: 38985770 PMCID: PMC11260122 DOI: 10.1073/pnas.2404060121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 06/13/2024] [Indexed: 07/12/2024] Open
Abstract
DNA aptamers have emerged as novel molecular tools in disease theranostics owing to their high binding affinity and specificity for protein targets, which rely on their ability to fold into distinctive three-dimensional (3D) structures. However, delicate atomic interactions that shape the 3D structures are often ignored when designing and modeling aptamers, leading to inefficient functional optimization. Challenges persist in determining high-resolution aptamer-protein complex structures. Moreover, the experimentally determined 3D structures of DNA molecules with exquisite functions remain scarce. These factors impede our comprehension and optimization of some important DNA aptamers. Here, we performed a streamlined solution NMR-based structural investigation on the 41-nt sgc8c, a prominent DNA aptamer used to target membrane protein tyrosine kinase 7, for cancer theranostics. We show that sgc8c prefolds into an intricate three-way junction (3WJ) structure stabilized by long-range tertiary interactions and extensive base-base stackings. Delineated by NMR chemical shift perturbations, site-directed mutagenesis, and 3D structural information, we identified essential nucleotides constituting the key functional elements of sgc8c that are centralized at the core of 3WJ. Leveraging the well-established structure-function relationship, we efficiently engineered two sgc8c variants by modifying the apical loop and introducing L-DNA base pairs to simultaneously enhance thermostability, biostability, and binding affinity for both protein and cell targets, a feat not previously attained despite extensive efforts. This work showcases a simplified NMR-based approach to comprehend and optimize sgc8c without acquiring the complex structure, and offers principles for the sophisticated structure-function organization of DNA molecules.
Collapse
Affiliation(s)
- Axin He
- Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai200127, China
- Zhejiang Cancer Hospital, Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, Zhejiang310022, China
| | - Liqi Wan
- Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai200127, China
- Zhejiang Cancer Hospital, Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, Zhejiang310022, China
| | - Yuchao Zhang
- Zhejiang Cancer Hospital, Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, Zhejiang310022, China
| | - Zhenzhen Yan
- Zhejiang Cancer Hospital, Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, Zhejiang310022, China
| | - Pei Guo
- Zhejiang Cancer Hospital, Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, Zhejiang310022, China
| | - Da Han
- Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai200127, China
- Zhejiang Cancer Hospital, Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, Zhejiang310022, China
| | - Weihong Tan
- Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai200127, China
- Zhejiang Cancer Hospital, Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, Zhejiang310022, China
| |
Collapse
|
42
|
Li J, Rohs R. Deep DNAshape webserver: prediction and real-time visualization of DNA shape considering extended k-mers. Nucleic Acids Res 2024; 52:W7-W12. [PMID: 38801070 PMCID: PMC11223853 DOI: 10.1093/nar/gkae433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 04/30/2024] [Accepted: 05/08/2024] [Indexed: 05/29/2024] Open
Abstract
Sequence-dependent DNA shape plays an important role in understanding protein-DNA binding mechanisms. High-throughput prediction of DNA shape features has become a valuable tool in the field of protein-DNA recognition, transcription factor-DNA binding specificity, and gene regulation. However, our widely used webserver, DNAshape, relies on statistically summarized pentamer query tables to query DNA shape features. These query tables do not consider flanking regions longer than two base pairs, and acquiring a query table for hexamers or higher-order k-mers is currently still unrealistic due to limitations in achieving sufficient statistical coverage in molecular simulations or structural biology experiments. A recent deep-learning method, Deep DNAshape, can predict DNA shape features at the core of a DNA fragment considering flanking regions of up to seven base pairs, trained on limited simulation data. However, Deep DNAshape is rather complicated to install, and it must run locally compared to the pentamer-based DNAshape webserver, creating a barrier for users. Here, we present the Deep DNAshape webserver, which has the benefits of both methods while being accurate, fast, and accessible to all users. Additional improvements of the webserver include the detection of user input in real time, the ability of interactive visualization tools and different modes of analyses. URL: https://deepdnashape.usc.edu.
Collapse
Affiliation(s)
- Jinsen Li
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089, USA
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA 90089, USA
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
43
|
Dapkūnas J, Timinskas A, Olechnovič K, Tomkuvienė M, Venclovas Č. PPI3D: a web server for searching, analyzing and modeling protein-protein, protein-peptide and protein-nucleic acid interactions. Nucleic Acids Res 2024; 52:W264-W271. [PMID: 38619046 PMCID: PMC11223826 DOI: 10.1093/nar/gkae278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 03/19/2024] [Accepted: 04/03/2024] [Indexed: 04/16/2024] Open
Abstract
Structure-resolved protein interactions with other proteins, peptides and nucleic acids are key for understanding molecular mechanisms. The PPI3D web server enables researchers to query preprocessed and clustered structural data, analyze the results and make homology-based inferences for protein interactions. PPI3D offers three interaction exploration modes: (i) all interactions for proteins homologous to the query, (ii) interactions between two proteins or their homologs and (iii) interactions within a specific PDB entry. The server allows interactive analysis of the identified interactions in both summarized and detailed manner. This includes protein annotations, structures, the interface residues and the corresponding contact surface areas. In addition, users can make inferences about residues at the interaction interface for the query protein(s) from the sequence alignments and homology models. The weekly updated PPI3D database includes all the interaction interfaces and binding sites from PDB, clustered based on both protein sequence and structural similarity, yielding non-redundant datasets without loss of alternative interaction modes. Consequently, the PPI3D users avoid being flooded with redundant information, a typical situation for intensely studied proteins. Furthermore, PPI3D provides a possibility to download user-defined sets of interaction interfaces and analyze them locally. The PPI3D web server is available at https://bioinformatics.lt/ppi3d.
Collapse
Affiliation(s)
- Justas Dapkūnas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius LT-10257, Lithuania
| | - Albertas Timinskas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius LT-10257, Lithuania
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius LT-10257, Lithuania
- Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| | - Miglė Tomkuvienė
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius LT-10257, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius LT-10257, Lithuania
| |
Collapse
|
44
|
Wu F, Huang Y, Yang G, Ye S, Mukamel S, Jiang J. Unraveling dynamic protein structures by two-dimensional infrared spectra with a pretrained machine learning model. Proc Natl Acad Sci U S A 2024; 121:e2409257121. [PMID: 38917009 PMCID: PMC11228460 DOI: 10.1073/pnas.2409257121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 05/28/2024] [Indexed: 06/27/2024] Open
Abstract
Dynamic protein structures are crucial for deciphering their diverse biological functions. Two-dimensional infrared (2DIR) spectroscopy stands as an ideal tool for tracing rapid conformational evolutions in proteins. However, linking spectral characteristics to dynamic structures poses a formidable challenge. Here, we present a pretrained machine learning model based on 2DIR spectra analysis. This model has learned signal features from approximately 204,300 spectra to establish a "spectrum-structure" correlation, thereby tracing the dynamic conformations of proteins. It excels in accurately predicting the dynamic content changes of various secondary structures and demonstrates universal transferability on real folding trajectories spanning timescales from microseconds to milliseconds. Beyond exceptional predictive performance, the model offers attention-based spectral explanations of dynamic conformational changes. Our 2DIR-based pretrained model is anticipated to provide unique insights into the dynamic structural information of proteins in their native environments.
Collapse
Affiliation(s)
- Fan Wu
- Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei230026, Anhui, China
| | - Yan Huang
- Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei230026, Anhui, China
| | - Guokun Yang
- Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei230026, Anhui, China
| | - Sheng Ye
- Anhui Provincial Engineering Research Center for Unmanned System and Intelligent Technology, School of Artificial Intelligence, Anhui University, Hefei230601, Anhui, China
| | - Shaul Mukamel
- Department of Chemistry and of Physics & Astronomy, University of California, Irvine, CA92697
| | - Jun Jiang
- Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei230026, Anhui, China
| |
Collapse
|
45
|
Tarafder S, Roche R, Bhattacharya D. The landscape of RNA 3D structure modeling with transformer networks. Biol Methods Protoc 2024; 9:bpae047. [PMID: 39006460 PMCID: PMC11244692 DOI: 10.1093/biomethods/bpae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 06/22/2024] [Accepted: 07/01/2024] [Indexed: 07/16/2024] Open
Abstract
Transformers are a powerful subclass of neural networks catalyzing the development of a growing number of computational methods for RNA structure modeling. Here, we conduct an objective and empirical study of the predictive modeling accuracy of the emerging transformer-based methods for RNA structure prediction. Our study reveals multi-faceted complementarity between the methods and underscores some key aspects that affect the prediction accuracy.
Collapse
Affiliation(s)
- Sumit Tarafder
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | - Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | | |
Collapse
|
46
|
Saima, Khan A, Ali S, Jiang J, Miao Z, Kamil A, Khan SN, Arold ST. Clinical genomics expands the link between erroneous cell division, primary microcephaly and intellectual disability. Neurogenetics 2024; 25:179-191. [PMID: 38795246 DOI: 10.1007/s10048-024-00759-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Accepted: 04/09/2024] [Indexed: 05/27/2024]
Abstract
Primary microcephaly is a rare neurogenic and genetically heterogeneous disorder characterized by significant brain size reduction that results in numerous neurodevelopmental disorders (NDD) problems, including mild to severe intellectual disability (ID), global developmental delay (GDD), seizures and other congenital malformations. This disorder can arise from a mutation in genes involved in various biological pathways, including those within the brain. We characterized a recessive neurological disorder observed in nine young adults from five independent consanguineous Pakistani families. The disorder is characterized by microcephaly, ID, developmental delay (DD), early-onset epilepsy, recurrent infection, hearing loss, growth retardation, skeletal and limb defects. Through exome sequencing, we identified novel homozygous variants in five genes that were previously associated with brain diseases, namely CENPJ (NM_018451.5: c.1856A > G; p.Lys619Arg), STIL (NM_001048166.1: c.1235C > A; p.(Pro412Gln), CDK5RAP2 (NM_018249.6 c.3935 T > G; p.Leu1312Trp), RBBP8 (NM_203291.2 c.1843C > T; p.Gln615*) and CEP135 (NM_025009.5 c.1469A > G; p.Glu490Gly). These variants were validated by Sanger sequencing across all family members, and in silico structural analysis. Protein 3D homology modeling of wild-type and mutated proteins revealed substantial changes in the structure, suggesting a potential impact on function. Importantly, all identified genes play crucial roles in maintaining genomic integrity during cell division, with CENPJ, STIL, CDK5RAP2, and CEP135 being involved in centrosomal function. Collectively, our findings underscore the link between erroneous cell division, particularly centrosomal function, primary microcephaly and ID.
Collapse
Affiliation(s)
- Saima
- Department of Biotechnology, Abdul Wali Khan University, Mardan, 23200, Khyber Pakhtunkhwa, Pakistan
| | - Amjad Khan
- Department of Zoology, University of Lakki Marwat, Lakki, 28420, Khyber Pakhtunkhwa, Pakistan.
- Institute for Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany.
- Alexander Von Humboldt Fellowship Foundation, Berlin, Germany.
| | - Sajid Ali
- Department of Biotechnology, Abdul Wali Khan University, Mardan, 23200, Khyber Pakhtunkhwa, Pakistan
| | - Jiuhong Jiang
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
- Guangzhou National Laboratory, Guangzhou International Bio Island, Guangzhou, China
| | - Zhichao Miao
- GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macau Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou National Laboratory, Guangzhou Medical University, Guangzhou, China
- Shanghai Key Laboratory of Anesthesiology and Brain Functional Modulation, Clinical Research Center for Anesthesiology and Perioperative Medicine Translational Research Institute of Brain and Brain-Like Intelligence, Shanghai Fourth People's Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Atif Kamil
- Department of Biotechnology, Abdul Wali Khan University, Mardan, 23200, Khyber Pakhtunkhwa, Pakistan
- Department of Internal Medicine, Brody Medicine School, East Carolina University, Greenville, NC, USA
| | - Shahid Niaz Khan
- Department of Zoology, Kohat University of Science & Technology, Kohat, 26000, Khyber Pakhtunkhwa, Pakistan
| | - Stefan T Arold
- Biological and Environmental Science and Engineering Division, Computational Biology Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| |
Collapse
|
47
|
Si Y, Zou J, Gao Y, Chuai G, Liu Q, Chen L. Foundation models in molecular biology. BIOPHYSICS REPORTS 2024; 10:135-151. [PMID: 39027316 PMCID: PMC11252241 DOI: 10.52601/bpr.2024.240006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 03/04/2024] [Indexed: 07/20/2024] Open
Abstract
Determining correlations between molecules at various levels is an important topic in molecular biology. Large language models have demonstrated a remarkable ability to capture correlations from large amounts of data in the field of natural language processing as well as image generation, and correlations captured from data using large language models can also be applicable to solving a wide range of specific tasks, hence large language models are also referred to as foundation models. The massive amount of data that exists in the field of molecular biology provides an excellent basis for the development of foundation models, and the recent emergence of foundation models in the field of molecular biology has really pushed the entire field forward. We summarize the foundation models developed based on RNA sequence data, DNA sequence data, protein sequence data, single-cell transcriptome data, and spatial transcriptome data respectively, and further discuss the research directions for the development of foundation models in molecular biology.
Collapse
Affiliation(s)
- Yunda Si
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
| | - Jiawei Zou
- Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, China
| | - Yicheng Gao
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
| | - Guohui Chuai
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
| | - Qi Liu
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
| | - Luonan Chen
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
- Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
48
|
Huang H, Lin Z, He D, Hong L, Li Y. RiboDiffusion: tertiary structure-based RNA inverse folding with generative diffusion models. Bioinformatics 2024; 40:i347-i356. [PMID: 38940178 PMCID: PMC11211841 DOI: 10.1093/bioinformatics/btae259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION RNA design shows growing applications in synthetic biology and therapeutics, driven by the crucial role of RNA in various biological processes. A fundamental challenge is to find functional RNA sequences that satisfy given structural constraints, known as the inverse folding problem. Computational approaches have emerged to address this problem based on secondary structures. However, designing RNA sequences directly from 3D structures is still challenging, due to the scarcity of data, the nonunique structure-sequence mapping, and the flexibility of RNA conformation. RESULTS In this study, we propose RiboDiffusion, a generative diffusion model for RNA inverse folding that can learn the conditional distribution of RNA sequences given 3D backbone structures. Our model consists of a graph neural network-based structure module and a Transformer-based sequence module, which iteratively transforms random sequences into desired sequences. By tuning the sampling weight, our model allows for a trade-off between sequence recovery and diversity to explore more candidates. We split test sets based on RNA clustering with different cut-offs for sequence or structure similarity. Our model outperforms baselines in sequence recovery, with an average relative improvement of 11% for sequence similarity splits and 16% for structure similarity splits. Moreover, RiboDiffusion performs consistently well across various RNA length categories and RNA types. We also apply in silico folding to validate whether the generated sequences can fold into the given 3D RNA backbones. Our method could be a powerful tool for RNA design that explores the vast sequence space and finds novel solutions to 3D structural constraints. AVAILABILITY AND IMPLEMENTATION The source code is available at https://github.com/ml4bio/RiboDiffusion.
Collapse
Affiliation(s)
- Han Huang
- Department of Computer Science and Engineering, CUHK, Hong Kong SAR, 999077, China
- School of Computer Science and Engineering, Beihang University, Beijing, 100191, China
| | - Ziqian Lin
- Department of Computer Science and Engineering, CUHK, Hong Kong SAR, 999077, China
- School of Artificial Intelligence, Nanjing University, Nanjing, 210023, China
| | - Dongchen He
- Department of Computer Science and Engineering, CUHK, Hong Kong SAR, 999077, China
| | - Liang Hong
- Department of Computer Science and Engineering, CUHK, Hong Kong SAR, 999077, China
| | - Yu Li
- Department of Computer Science and Engineering, CUHK, Hong Kong SAR, 999077, China
| |
Collapse
|
49
|
He S, Huang R, Townley J, Kretsch RC, Karagianes TG, Cox DBT, Blair H, Penzar D, Vyaltsev V, Aristova E, Zinkevich A, Bakulin A, Sohn H, Krstevski D, Fukui T, Tatematsu F, Uchida Y, Jang D, Lee JS, Shieh R, Ma T, Martynov E, Shugaev MV, Bukhari HST, Fujikawa K, Onodera K, Henkel C, Ron S, Romano J, Nicol JJ, Nye GP, Wu Y, Choe C, Reade W, Das R. Ribonanza: deep learning of RNA structure through dual crowdsourcing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.24.581671. [PMID: 38464325 PMCID: PMC10925082 DOI: 10.1101/2024.02.24.581671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Prediction of RNA structure from sequence remains an unsolved problem, and progress has been slowed by a paucity of experimental data. Here, we present Ribonanza, a dataset of chemical mapping measurements on two million diverse RNA sequences collected through Eterna and other crowdsourced initiatives. Ribonanza measurements enabled solicitation, training, and prospective evaluation of diverse deep neural networks through a Kaggle challenge, followed by distillation into a single, self-contained model called RibonanzaNet. When fine tuned on auxiliary datasets, RibonanzaNet achieves state-of-the-art performance in modeling experimental sequence dropout, RNA hydrolytic degradation, and RNA secondary structure, with implications for modeling RNA tertiary structure.
Collapse
Affiliation(s)
- Shujun He
- Department of Chemical Engineering, Texas A&M University, TX, USA
| | - Rui Huang
- Department of Biochemistry, Stanford CA, USA
| | | | | | | | - David B T Cox
- Department of Biochemistry, Stanford CA, USA
- Department of Medicine, Division of Hematology, and Department of Biochemistry, Stanford CA, USA
| | | | - Dmitry Penzar
- AIRI, Moscow, Russia
- Vavilov Institute of General Genetics, Moscow 119991, Russia
- Institute of Translational Medicine, Pirogov Russian National Research Medical University, Moscow 117997, Russia
| | - Valeriy Vyaltsev
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russian Federation
| | - Elizaveta Aristova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russian Federation
| | - Arsenii Zinkevich
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russian Federation
| | - Artemy Bakulin
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russian Federation
| | - Hoyeol Sohn
- Department of Chemical Engineering, Texas A&M University, TX, USA
- Department of Biochemistry, Stanford CA, USA
- Eterna Massive Open Laboratory
- Biophysics Program, Stanford CA, USA
- Department of Medicine, Division of Hematology, and Department of Biochemistry, Stanford CA, USA
- Department of Mathematics, Stanford CA, USA
- AIRI, Moscow, Russia
- Vavilov Institute of General Genetics, Moscow 119991, Russia
- Institute of Translational Medicine, Pirogov Russian National Research Medical University, Moscow 117997, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russian Federation
- GO Inc., Tokyo, Japan
- Department of Electrical and Computer Engineering, Inha University, Incheon, Republic of Korea
- DeltaX, Seoul, Republic of Korea
- Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Russian Federation
- Department of Materials Science and Engineering, University of Virginia, Charlottesville, VA 22904-4745, USA
- Vergesense, CA
- DeNA, Tokyo, Japan
- NVIDIA, Tokyo, Japan
- NVIDIA, Munich
- Howard Hughes Medical Institute
- Department of Bioengineering, Stanford CA, USA
- Kaggle, San Francisco CA, USA
| | - Daniel Krstevski
- Department of Chemical Engineering, Texas A&M University, TX, USA
- Department of Biochemistry, Stanford CA, USA
- Eterna Massive Open Laboratory
- Biophysics Program, Stanford CA, USA
- Department of Medicine, Division of Hematology, and Department of Biochemistry, Stanford CA, USA
- Department of Mathematics, Stanford CA, USA
- AIRI, Moscow, Russia
- Vavilov Institute of General Genetics, Moscow 119991, Russia
- Institute of Translational Medicine, Pirogov Russian National Research Medical University, Moscow 117997, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russian Federation
- GO Inc., Tokyo, Japan
- Department of Electrical and Computer Engineering, Inha University, Incheon, Republic of Korea
- DeltaX, Seoul, Republic of Korea
- Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Russian Federation
- Department of Materials Science and Engineering, University of Virginia, Charlottesville, VA 22904-4745, USA
- Vergesense, CA
- DeNA, Tokyo, Japan
- NVIDIA, Tokyo, Japan
- NVIDIA, Munich
- Howard Hughes Medical Institute
- Department of Bioengineering, Stanford CA, USA
- Kaggle, San Francisco CA, USA
| | | | | | | | - Donghoon Jang
- Department of Electrical and Computer Engineering, Inha University, Incheon, Republic of Korea
| | | | - Roger Shieh
- Department of Chemical Engineering, Texas A&M University, TX, USA
- Department of Biochemistry, Stanford CA, USA
- Eterna Massive Open Laboratory
- Biophysics Program, Stanford CA, USA
- Department of Medicine, Division of Hematology, and Department of Biochemistry, Stanford CA, USA
- Department of Mathematics, Stanford CA, USA
- AIRI, Moscow, Russia
- Vavilov Institute of General Genetics, Moscow 119991, Russia
- Institute of Translational Medicine, Pirogov Russian National Research Medical University, Moscow 117997, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russian Federation
- GO Inc., Tokyo, Japan
- Department of Electrical and Computer Engineering, Inha University, Incheon, Republic of Korea
- DeltaX, Seoul, Republic of Korea
- Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Russian Federation
- Department of Materials Science and Engineering, University of Virginia, Charlottesville, VA 22904-4745, USA
- Vergesense, CA
- DeNA, Tokyo, Japan
- NVIDIA, Tokyo, Japan
- NVIDIA, Munich
- Howard Hughes Medical Institute
- Department of Bioengineering, Stanford CA, USA
- Kaggle, San Francisco CA, USA
| | - Tom Ma
- Department of Chemical Engineering, Texas A&M University, TX, USA
- Department of Biochemistry, Stanford CA, USA
- Eterna Massive Open Laboratory
- Biophysics Program, Stanford CA, USA
- Department of Medicine, Division of Hematology, and Department of Biochemistry, Stanford CA, USA
- Department of Mathematics, Stanford CA, USA
- AIRI, Moscow, Russia
- Vavilov Institute of General Genetics, Moscow 119991, Russia
- Institute of Translational Medicine, Pirogov Russian National Research Medical University, Moscow 117997, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russian Federation
- GO Inc., Tokyo, Japan
- Department of Electrical and Computer Engineering, Inha University, Incheon, Republic of Korea
- DeltaX, Seoul, Republic of Korea
- Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Russian Federation
- Department of Materials Science and Engineering, University of Virginia, Charlottesville, VA 22904-4745, USA
- Vergesense, CA
- DeNA, Tokyo, Japan
- NVIDIA, Tokyo, Japan
- NVIDIA, Munich
- Howard Hughes Medical Institute
- Department of Bioengineering, Stanford CA, USA
- Kaggle, San Francisco CA, USA
| | - Eduard Martynov
- Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Russian Federation
| | - Maxim V Shugaev
- Department of Materials Science and Engineering, University of Virginia, Charlottesville, VA 22904-4745, USA
| | | | | | | | | | - Shlomo Ron
- Department of Chemical Engineering, Texas A&M University, TX, USA
- Department of Biochemistry, Stanford CA, USA
- Eterna Massive Open Laboratory
- Biophysics Program, Stanford CA, USA
- Department of Medicine, Division of Hematology, and Department of Biochemistry, Stanford CA, USA
- Department of Mathematics, Stanford CA, USA
- AIRI, Moscow, Russia
- Vavilov Institute of General Genetics, Moscow 119991, Russia
- Institute of Translational Medicine, Pirogov Russian National Research Medical University, Moscow 117997, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russian Federation
- GO Inc., Tokyo, Japan
- Department of Electrical and Computer Engineering, Inha University, Incheon, Republic of Korea
- DeltaX, Seoul, Republic of Korea
- Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Russian Federation
- Department of Materials Science and Engineering, University of Virginia, Charlottesville, VA 22904-4745, USA
- Vergesense, CA
- DeNA, Tokyo, Japan
- NVIDIA, Tokyo, Japan
- NVIDIA, Munich
- Howard Hughes Medical Institute
- Department of Bioengineering, Stanford CA, USA
- Kaggle, San Francisco CA, USA
| | - Jonathan Romano
- Eterna Massive Open Laboratory
- Howard Hughes Medical Institute
| | | | - Grace P Nye
- Department of Biochemistry, Stanford CA, USA
| | - Yuan Wu
- Department of Biochemistry, Stanford CA, USA
- Howard Hughes Medical Institute
| | | | | | - Rhiju Das
- Department of Biochemistry, Stanford CA, USA
- Biophysics Program, Stanford CA, USA
- Howard Hughes Medical Institute
| |
Collapse
|
50
|
Zhang B, Hou Z, Yang Y, Wong KC, Zhu H, Li X. SOFB is a comprehensive ensemble deep learning approach for elucidating and characterizing protein-nucleic-acid-binding residues. Commun Biol 2024; 7:679. [PMID: 38830995 PMCID: PMC11148103 DOI: 10.1038/s42003-024-06332-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 05/15/2024] [Indexed: 06/05/2024] Open
Abstract
Proteins and nucleic-acids are essential components of living organisms that interact in critical cellular processes. Accurate prediction of nucleic acid-binding residues in proteins can contribute to a better understanding of protein function. However, the discrepancy between protein sequence information and obtained structural and functional data renders most current computational models ineffective. Therefore, it is vital to design computational models based on protein sequence information to identify nucleic acid binding sites in proteins. Here, we implement an ensemble deep learning model-based nucleic-acid-binding residues on proteins identification method, called SOFB, which characterizes protein sequences by learning the semantics of biological dynamics contexts, and then develop an ensemble deep learning-based sequence network to learn feature representation and classification by explicitly modeling dynamic semantic information. Among them, the language learning model, which is constructed from natural language to biological language, captures the underlying relationships of protein sequences, and the ensemble deep learning-based sequence network consisting of different convolutional layers together with Bi-LSTM refines various features for optimal performance. Meanwhile, to address the imbalanced issue, we adopt ensemble learning to train multiple models and then incorporate them. Our experimental results on several DNA/RNA nucleic-acid-binding residue datasets demonstrate that our proposed model outperforms other state-of-the-art methods. In addition, we conduct an interpretability analysis of the identified nucleic acid binding residue sequences based on the attention weights of the language learning model, revealing novel insights into the dynamic semantic information that supports the identified nucleic acid binding residues. SOFB is available at https://github.com/Encryptional/SOFB and https://figshare.com/articles/online_resource/SOFB_figshare_rar/25499452 .
Collapse
Affiliation(s)
- Bin Zhang
- School of Artificial Intelligence, Jilin University, Changchun, China
| | - Zilong Hou
- School of Artificial Intelligence, Jilin University, Changchun, China
| | - Yuning Yang
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Canada
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Hong Kong, Hong Kong SAR
| | - Haoran Zhu
- School of Artificial Intelligence, Jilin University, Changchun, China.
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Changchun, China.
| |
Collapse
|