1
|
Mahmoudi I, Quignot C, Martins C, Andreani J. Structural comparison of homologous protein-RNA interfaces reveals widespread overall conservation contrasted with versatility in polar contacts. PLoS Comput Biol 2024; 20:e1012650. [PMID: 39625988 PMCID: PMC11642956 DOI: 10.1371/journal.pcbi.1012650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 12/13/2024] [Accepted: 11/18/2024] [Indexed: 12/14/2024] Open
Abstract
Protein-RNA interactions play a critical role in many cellular processes and pathologies. However, experimental determination of protein-RNA structures is still challenging, therefore computational tools are needed for the prediction of protein-RNA interfaces. Although evolutionary pressures can be exploited for structural prediction of protein-protein interfaces, and recent deep learning methods using protein multiple sequence alignments have radically improved the performance of protein-protein interface structural prediction, protein-RNA structural prediction is lagging behind, due to the scarcity of structural data and the flexibility involved in these complexes. To study the evolution of protein-RNA interface structures, we first identified a large and diverse dataset of 2,022 pairs of structurally homologous interfaces (termed structural interologs). We leveraged this unique dataset to analyze the conservation of interface contacts among structural interologs based on the properties of involved amino acids and nucleotides. We uncovered that 73% of distance-based contacts and 68% of apolar contacts are conserved on average, and the strong conservation of these contacts occurs even in distant homologs with sequence identity below 20%. Distance-based contacts are also much more conserved compared to what we had found in a previous study of homologous protein-protein interfaces. In contrast, hydrogen bonds, salt bridges, and π-stacking interactions are very versatile in pairs of protein-RNA interologs, even for close homologs with high interface sequence identity. We found that almost half of the non-conserved distance-based contacts are linked to a small proportion of interface residues that no longer make interface contacts in the interolog, a phenomenon we term "interface switching out". We also examined possible recovery mechanisms for non-conserved hydrogen bonds and salt bridges, uncovering diverse scenarios of switching out, change in amino acid chemical nature, intermolecular and intramolecular compensations. Our findings provide insights for integrating evolutionary signals into predictive protein-RNA structural modeling methods.
Collapse
Affiliation(s)
- Ikram Mahmoudi
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Chloé Quignot
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Carla Martins
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Jessica Andreani
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| |
Collapse
|
2
|
Zhang C, Wang Q, Li Y, Teng A, Hu G, Wuyun Q, Zheng W. The Historical Evolution and Significance of Multiple Sequence Alignment in Molecular Structure and Function Prediction. Biomolecules 2024; 14:1531. [PMID: 39766238 PMCID: PMC11673352 DOI: 10.3390/biom14121531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Revised: 11/24/2024] [Accepted: 11/27/2024] [Indexed: 01/11/2025] Open
Abstract
Multiple sequence alignment (MSA) has evolved into a fundamental tool in the biological sciences, playing a pivotal role in predicting molecular structures and functions. With broad applications in protein and nucleic acid modeling, MSAs continue to underpin advancements across a range of disciplines. MSAs are not only foundational for traditional sequence comparison techniques but also increasingly important in the context of artificial intelligence (AI)-driven advancements. Recent breakthroughs in AI, particularly in protein and nucleic acid structure prediction, rely heavily on the accuracy and efficiency of MSAs to enhance remote homology detection and guide spatial restraints. This review traces the historical evolution of MSA, highlighting its significance in molecular structure and function prediction. We cover the methodologies used for protein monomers, protein complexes, and RNA, while also exploring emerging AI-based alternatives, such as protein language models, as complementary or replacement approaches to traditional MSAs in application tasks. By discussing the strengths, limitations, and applications of these methods, this review aims to provide researchers with valuable insights into MSA's evolving role, equipping them to make informed decisions in structural prediction research.
Collapse
Affiliation(s)
- Chenyue Zhang
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China; (C.Z.); (Y.L.); (G.H.)
| | - Qinxin Wang
- Suzhou New & High-Tech Innovation Service Center, Suzhou 215011, China;
| | - Yiyang Li
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China; (C.Z.); (Y.L.); (G.H.)
| | - Anqi Teng
- Bioscience and Biomedical Engineering Thrust, Systems Hub, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511453, China;
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China; (C.Z.); (Y.L.); (G.H.)
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Wei Zheng
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China; (C.Z.); (Y.L.); (G.H.)
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
3
|
Yu LE, White E, Woodson S. Optimized periphery-core interface increases fitness of the Bacillus subtilis glmS ribozyme. Nucleic Acids Res 2024; 52:13340-13350. [PMID: 39319588 PMCID: PMC11602151 DOI: 10.1093/nar/gkae830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 09/04/2024] [Accepted: 09/11/2024] [Indexed: 09/26/2024] Open
Abstract
Like other functional RNAs, ribozymes encode a conserved catalytic center supported by peripheral domains that vary among ribozyme sub-families. To understand how core-periphery interactions contribute to ribozyme fitness, we compared the cleavage kinetics of all single base substitutions at 152 sites across the Bacillus subtilis glmS ribozyme by high-throughput sequencing (k-seq). The in vitro activity map mirrored phylogenetic sequence conservation in glmS ribozymes, indicating that biological fitness reports all biochemically important positions. The k-seq results and folding assays showed that most deleterious mutations lower activity by impairing ribozyme self-assembly. All-atom molecular dynamics simulations of the complete ribozyme revealed how individual mutations in the core or the IL4 peripheral loop introduce a non-native tertiary interface that rewires the catalytic center, eliminating activity. We conclude that the need to avoid non-native helix packing powerfully constrains the evolution of tertiary structure motifs in RNA.
Collapse
Affiliation(s)
- Li-Eng D Yu
- Program in Cell, Molecular and Developmental Biology and Biophysics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Elise N White
- Program in Molecular Biophysics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Sarah A Woodson
- T.C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
4
|
Tarafder S, Bhattacharya D. lociPARSE: A Locality-aware Invariant Point Attention Model for Scoring RNA 3D Structures. J Chem Inf Model 2024; 64:8655-8664. [PMID: 39523843 PMCID: PMC11600500 DOI: 10.1021/acs.jcim.4c01621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Revised: 10/17/2024] [Accepted: 10/29/2024] [Indexed: 11/16/2024]
Abstract
A scoring function that can reliably assess the accuracy of a 3D RNA structural model in the absence of experimental structure is not only important for model evaluation and selection but also useful for scoring-guided conformational sampling. However, high-fidelity RNA scoring has proven to be difficult using conventional knowledge-based statistical potentials and currently available machine learning-based approaches. Here, we present lociPARSE, a locality-aware invariant point attention architecture for scoring RNA 3D structures. Unlike existing machine learning methods that estimate superposition-based root-mean-square deviation (RMSD), lociPARSE estimates Local Distance Difference Test (lDDT) scores capturing the accuracy of each nucleotide and its surrounding local atomic environment in a superposition-free manner, before aggregating information to predict global structural accuracy. Tested on multiple datasets including CASP15, lociPARSE significantly outperforms existing statistical potentials (rsRNASP, cgRNASP, DFIRE-RNA, and RASP) and machine learning methods (ARES and RNA3DCNN) across complementary assessment metrics. lociPARSE is freely available at https://github.com/Bhattacharya-Lab/lociPARSE.
Collapse
Affiliation(s)
- Sumit Tarafder
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Debswapna Bhattacharya
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, United States
| |
Collapse
|
5
|
Kinshuk S, Li L, Meckes B, Chan CTY. Sequence-Based Protein Design: A Review of Using Statistical Models to Characterize Coevolutionary Traits for Developing Hybrid Proteins as Genetic Sensors. Int J Mol Sci 2024; 25:8320. [PMID: 39125888 PMCID: PMC11312098 DOI: 10.3390/ijms25158320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 07/23/2024] [Accepted: 07/26/2024] [Indexed: 08/12/2024] Open
Abstract
Statistical analyses of homologous protein sequences can identify amino acid residue positions that co-evolve to generate family members with different properties. Based on the hypothesis that the coevolution of residue positions is necessary for maintaining protein structure, coevolutionary traits revealed by statistical models provide insight into residue-residue interactions that are important for understanding protein mechanisms at the molecular level. With the rapid expansion of genome sequencing databases that facilitate statistical analyses, this sequence-based approach has been used to study a broad range of protein families. An emerging application of this approach is to design hybrid transcriptional regulators as modular genetic sensors for novel wiring between input signals and genetic elements to control outputs. Among many allosterically regulated regulator families, the members contain structurally conserved and functionally independent protein domains, including a DNA-binding module (DBM) for interacting with a specific genetic element and a ligand-binding module (LBM) for sensing an input signal. By hybridizing a DBM and an LBM from two different family members, a hybrid regulator can be created with a new combination of signal-detection and DNA-recognition properties not present in natural systems. In this review, we present recent advances in the development of hybrid regulators and their applications in cellular engineering, especially focusing on the use of statistical analyses for characterizing DBM-LBM interactions and hybrid regulator design. Based on these studies, we then discuss the current limitations and potential directions for enhancing the impact of this sequence-based design approach.
Collapse
Affiliation(s)
- Sahaj Kinshuk
- Department of Biomedical Engineering, College of Engineering, University of North Texas, 3940 N Elm Street, Denton, TX 76207, USA; (S.K.); (L.L.); (B.M.)
| | - Lin Li
- Department of Biomedical Engineering, College of Engineering, University of North Texas, 3940 N Elm Street, Denton, TX 76207, USA; (S.K.); (L.L.); (B.M.)
| | - Brian Meckes
- Department of Biomedical Engineering, College of Engineering, University of North Texas, 3940 N Elm Street, Denton, TX 76207, USA; (S.K.); (L.L.); (B.M.)
- BioDiscovery Institute, University of North Texas, 1155 Union Circle #305220, Denton, TX 76203, USA
| | - Clement T. Y. Chan
- Department of Biomedical Engineering, College of Engineering, University of North Texas, 3940 N Elm Street, Denton, TX 76207, USA; (S.K.); (L.L.); (B.M.)
- BioDiscovery Institute, University of North Texas, 1155 Union Circle #305220, Denton, TX 76203, USA
| |
Collapse
|
6
|
Tarafder S, Bhattacharya D. lociPARSE: a locality-aware invariant point attention model for scoring RNA 3D structures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.04.565599. [PMID: 37961488 PMCID: PMC10635153 DOI: 10.1101/2023.11.04.565599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
A scoring function that can reliably assess the accuracy of a 3D RNA structural model in the absence of experimental structure is not only important for model evaluation and selection but also useful for scoring-guided conformational sampling. However, high-fidelity RNA scoring has proven to be difficult using conventional knowledge-based statistical potentials and currently-available machine learning-based approaches. Here we present lociPARSE, a locality-aware invariant point attention architecture for scoring RNA 3D structures. Unlike existing machine learning methods that estimate superposition-based root mean square deviation (RMSD), lociPARSE estimates Local Distance Difference Test (lDDT) scores capturing the accuracy of each nucleotide and its surrounding local atomic environment in a superposition-free manner, before aggregating information to predict global structural accuracy. Tested on multiple datasets including CASP15, lociPARSE significantly outperforms existing statistical potentials (rsRNASP, cgRNASP, DFIRE-RNA, and RASP) and machine learning methods (ARES and RNA3DCNN) across complementary assessment metrics. lociPARSE is freely available at https://github.com/Bhattacharya-Lab/lociPARSE.
Collapse
Affiliation(s)
- Sumit Tarafder
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, 24061, USA
| | | |
Collapse
|
7
|
Calvanese F, Lambert CN, Nghe P, Zamponi F, Weigt M. Towards parsimonious generative modeling of RNA families. Nucleic Acids Res 2024; 52:5465-5477. [PMID: 38661206 PMCID: PMC11162787 DOI: 10.1093/nar/gkae289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 03/05/2024] [Accepted: 04/05/2024] [Indexed: 04/26/2024] Open
Abstract
Generative probabilistic models emerge as a new paradigm in data-driven, evolution-informed design of biomolecular sequences. This paper introduces a novel approach, called Edge Activation Direct Coupling Analysis (eaDCA), tailored to the characteristics of RNA sequences, with a strong emphasis on simplicity, efficiency, and interpretability. eaDCA explicitly constructs sparse coevolutionary models for RNA families, achieving performance levels comparable to more complex methods while utilizing a significantly lower number of parameters. Our approach demonstrates efficiency in generating artificial RNA sequences that closely resemble their natural counterparts in both statistical analyses and SHAPE-MaP experiments, and in predicting the effect of mutations. Notably, eaDCA provides a unique feature: estimating the number of potential functional sequences within a given RNA family. For example, in the case of cyclic di-AMP riboswitches (RF00379), our analysis suggests the existence of approximately 1039 functional nucleotide sequences. While huge compared to the known <4000 natural sequences, this number represents only a tiny fraction of the vast pool of nearly 1082 possible nucleotide sequences of the same length (136 nucleotides). These results underscore the promise of sparse and interpretable generative models, such as eaDCA, in enhancing our understanding of the expansive RNA sequence space.
Collapse
Affiliation(s)
- Francesco Calvanese
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative – LCQB, Paris, France
- Laboratoire de Biophysique et Evolution, UMR CNRS-ESPCI 8231 Chimie Biologie Innovation, PSL University, Paris, France
| | - Camille N Lambert
- Laboratoire de Biophysique et Evolution, UMR CNRS-ESPCI 8231 Chimie Biologie Innovation, PSL University, Paris, France
| | - Philippe Nghe
- Laboratoire de Biophysique et Evolution, UMR CNRS-ESPCI 8231 Chimie Biologie Innovation, PSL University, Paris, France
| | - Francesco Zamponi
- Dipartimento di Fisica, Sapienza Università di Roma, Rome, Italy
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, Paris, France
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative – LCQB, Paris, France
| |
Collapse
|
8
|
Pucci F, Zerihun MB, Rooman M, Schug A. pycofitness-Evaluating the fitness landscape of RNA and protein sequences. Bioinformatics 2024; 40:btae074. [PMID: 38335928 PMCID: PMC10881095 DOI: 10.1093/bioinformatics/btae074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 01/25/2024] [Accepted: 02/06/2024] [Indexed: 02/12/2024] Open
Abstract
MOTIVATION The accurate prediction of how mutations change biophysical properties of proteins or RNA is a major goal in computational biology with tremendous impacts on protein design and genetic variant interpretation. Evolutionary approaches such as coevolution can help solving this issue. RESULTS We present pycofitness, a standalone Python-based software package for the in silico mutagenesis of protein and RNA sequences. It is based on coevolution and, more specifically, on a popular inverse statistical approach, namely direct coupling analysis by pseudo-likelihood maximization. Its efficient implementation and user-friendly command line interface make it an easy-to-use tool even for researchers with no bioinformatics background. To illustrate its strengths, we present three applications in which pycofitness efficiently predicts the deleteriousness of genetic variants and the effect of mutations on protein fitness and thermodynamic stability. AVAILABILITY AND IMPLEMENTATION https://github.com/KIT-MBS/pycofitness.
Collapse
Affiliation(s)
- Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, 1050 Brussels, Belgium
| | - Mehari B Zerihun
- John von Neumann Institute for Computing, Jülich Supercomputer Centre, 52428 Jülich, Germany
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, 1050 Brussels, Belgium
| | - Alexander Schug
- John von Neumann Institute for Computing, Jülich Supercomputer Centre, 52428 Jülich, Germany
- Department of Biology, University of Duisburg-Essen, D-45141 Essen, Germany
| |
Collapse
|
9
|
Zhu WS, Litterman AJ, Sekhon HS, Kageyama R, Arce MM, Taylor KE, Zhao W, Criswell LA, Zaitlen N, Erle DJ, Ansel KM. GCLiPP: global crosslinking and protein purification method for constructing high-resolution occupancy maps for RNA binding proteins. Genome Biol 2023; 24:281. [PMID: 38062486 PMCID: PMC10701951 DOI: 10.1186/s13059-023-03125-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 11/27/2023] [Indexed: 12/18/2023] Open
Abstract
GCLiPP is a global RNA interactome capture method that detects RNA-binding protein (RBP) occupancy transcriptome-wide. GCLiPP maps RBP-occupied sites at a higher resolution than phase separation-based techniques. GCLiPP sequence tags correspond with known RBP binding sites and are enriched for sites detected by RBP-specific crosslinking immunoprecipitation (CLIP) for abundant cytosolic RBPs. Comparison of human Jurkat T cells and mouse primary T cells uncovers shared peaks of GCLiPP signal across homologous regions of human and mouse 3' UTRs, including a conserved mRNA-destabilizing cis-regulatory element. GCLiPP signal overlapping with immune-related SNPs uncovers stabilizing cis-regulatory regions in CD5, STAT6, and IKZF1.
Collapse
Affiliation(s)
- Wandi S Zhu
- Department of Microbiology & Immunology and Sandler Asthma Basic Research Center, University of California San Francisco, San Francisco, CA, USA
| | - Adam J Litterman
- Department of Microbiology & Immunology and Sandler Asthma Basic Research Center, University of California San Francisco, San Francisco, CA, USA
| | - Harshaan S Sekhon
- Department of Microbiology & Immunology and Sandler Asthma Basic Research Center, University of California San Francisco, San Francisco, CA, USA
- University of California Berkeley, Berkeley, CA, USA
| | - Robin Kageyama
- Department of Microbiology & Immunology and Sandler Asthma Basic Research Center, University of California San Francisco, San Francisco, CA, USA
| | - Maya M Arce
- Department of Microbiology & Immunology and Sandler Asthma Basic Research Center, University of California San Francisco, San Francisco, CA, USA
| | - Kimberly E Taylor
- Department of Medicine, University of California San Francisco, San Francisco, USA
- Russell/Engleman Rheumatology Research Center, University of California San Francisco, San Francisco, USA
| | - Wenxue Zhao
- Department of Medicine, University of California San Francisco, San Francisco, USA
- Lung Biology Center, University of California San Francisco, San Francisco, USA
- School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, People's Republic of China
| | - Lindsey A Criswell
- Department of Medicine, University of California San Francisco, San Francisco, USA
- Russell/Engleman Rheumatology Research Center, University of California San Francisco, San Francisco, USA
| | - Noah Zaitlen
- Department of Medicine, University of California San Francisco, San Francisco, USA
- Lung Biology Center, University of California San Francisco, San Francisco, USA
| | - David J Erle
- Department of Medicine, University of California San Francisco, San Francisco, USA
- Lung Biology Center, University of California San Francisco, San Francisco, USA
| | - K Mark Ansel
- Department of Microbiology & Immunology and Sandler Asthma Basic Research Center, University of California San Francisco, San Francisco, CA, USA.
| |
Collapse
|
10
|
Sun J, Xu M, Ru J, James-Bott A, Xiong D, Wang X, Cribbs AP. Small molecule-mediated targeting of microRNAs for drug discovery: Experiments, computational techniques, and disease implications. Eur J Med Chem 2023; 257:115500. [PMID: 37262996 PMCID: PMC11554572 DOI: 10.1016/j.ejmech.2023.115500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/05/2023] [Accepted: 05/15/2023] [Indexed: 06/03/2023]
Abstract
Small molecules have been providing medical breakthroughs for human diseases for more than a century. Recently, identifying small molecule inhibitors that target microRNAs (miRNAs) has gained importance, despite the challenges posed by labour-intensive screening experiments and the significant efforts required for medicinal chemistry optimization. Numerous experimentally-verified cases have demonstrated the potential of miRNA-targeted small molecule inhibitors for disease treatment. This new approach is grounded in their posttranscriptional regulation of the expression of disease-associated genes. Reversing dysregulated gene expression using this mechanism may help control dysfunctional pathways. Furthermore, the ongoing improvement of algorithms has allowed for the integration of computational strategies built on top of laboratory-based data, facilitating a more precise and rational design and discovery of lead compounds. To complement the use of extensive pharmacogenomics data in prioritising potential drugs, our previous work introduced a computational approach based on only molecular sequences. Moreover, various computational tools for predicting molecular interactions in biological networks using similarity-based inference techniques have been accumulated in established studies. However, there are a limited number of comprehensive reviews covering both computational and experimental drug discovery processes. In this review, we outline a cohesive overview of both biological and computational applications in miRNA-targeted drug discovery, along with their disease implications and clinical significance. Finally, utilizing drug-target interaction (DTIs) data from DrugBank, we showcase the effectiveness of deep learning for obtaining the physicochemical characterization of DTIs.
Collapse
Affiliation(s)
- Jianfeng Sun
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
| | - Miaoer Xu
- Department of Biology, Emory University, Atlanta, GA, 30322, USA
| | - Jinlong Ru
- Chair of Prevention of Microbial Diseases, School of Life Sciences Weihenstephan, Technical University of Munich, Freising, 85354, Germany
| | - Anna James-Bott
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
| | - Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Xia Wang
- College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China.
| | - Adam P Cribbs
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
| |
Collapse
|
11
|
Liu X, Duan Y, Hong X, Xie J, Liu S. Challenges in structural modeling of RNA-protein interactions. Curr Opin Struct Biol 2023; 81:102623. [PMID: 37301066 DOI: 10.1016/j.sbi.2023.102623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 05/14/2023] [Accepted: 05/16/2023] [Indexed: 06/12/2023]
Abstract
In the past few years, the number of RNA-binding proteins (RBP) and RNA-RBP interactions has increased significantly. Here, we review recent developments in the methodology for protein-RNA and protein-protein complex structure modeling with deep learning and co-evolution, as well as discuss the challenges and opportunities for building a reliable approach for protein-RNA complex structure modelling. Protein Data bank (PDB) and Cross-linking immunoprecipitation (CLIP) data could be combined together and used to infer 2D geometry of protein-RNA interactions by deep learning.
Collapse
Affiliation(s)
- Xudong Liu
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Yingtian Duan
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Xu Hong
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Juan Xie
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Shiyong Liu
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China.
| |
Collapse
|
12
|
Gao W, Yang A, Rivas E. Thirteen dubious ways to detect conserved structural RNAs. IUBMB Life 2023; 75:471-492. [PMID: 36495545 PMCID: PMC11234323 DOI: 10.1002/iub.2694] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 10/24/2022] [Indexed: 12/14/2022]
Abstract
Covariation induced by compensatory base substitutions in RNA alignments is a great way to deduce conserved RNA structure, in principle. In practice, success depends on many factors, importantly the quality and depth of the alignment and the choice of covariation statistic. Measuring covariation between pairs of aligned positions is easy. However, using covariation to infer evolutionarily conserved RNA structure is complicated by other extraneous sources of covariation such as that resulting from homologous sequences having evolved from a common ancestor. In order to provide evidence of evolutionarily conserved RNA structure, a method to distinguish covariation due to sources other than RNA structure is necessary. Moreover, there are several sorts of artifactually generated covariation signals that can further confound the analysis. Additionally, some covariation signal is difficult to detect due to incomplete comparative data. Here, we investigate and critically discuss the practice of inferring conserved RNA structure by comparative sequence analysis. We provide new methods on how to approach and decide which of the numerous long non-coding RNAs (lncRNAs) have biologically relevant structures.
Collapse
Affiliation(s)
- William Gao
- Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Ann Yang
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
| |
Collapse
|
13
|
Alfonso-Gonzalez C, Legnini I, Holec S, Arrigoni L, Ozbulut HC, Mateos F, Koppstein D, Rybak-Wolf A, Bönisch U, Rajewsky N, Hilgers V. Sites of transcription initiation drive mRNA isoform selection. Cell 2023; 186:2438-2455.e22. [PMID: 37178687 PMCID: PMC10228280 DOI: 10.1016/j.cell.2023.04.012] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 12/16/2022] [Accepted: 04/06/2023] [Indexed: 05/15/2023]
Abstract
The generation of distinct messenger RNA isoforms through alternative RNA processing modulates the expression and function of genes, often in a cell-type-specific manner. Here, we assess the regulatory relationships between transcription initiation, alternative splicing, and 3' end site selection. Applying long-read sequencing to accurately represent even the longest transcripts from end to end, we quantify mRNA isoforms in Drosophila tissues, including the transcriptionally complex nervous system. We find that in Drosophila heads, as well as in human cerebral organoids, 3' end site choice is globally influenced by the site of transcription initiation (TSS). "Dominant promoters," characterized by specific epigenetic signatures including p300/CBP binding, impose a transcriptional constraint to define splice and polyadenylation variants. In vivo deletion or overexpression of dominant promoters as well as p300/CBP loss disrupted the 3' end expression landscape. Our study demonstrates the crucial impact of TSS choice on the regulation of transcript diversity and tissue identity.
Collapse
Affiliation(s)
- Carlos Alfonso-Gonzalez
- Max-Planck-Institute of Immunobiology and Epigenetics, 79108 Freiburg, Germany; Faculty of Biology, Albert Ludwig University, 79104 Freiburg, Germany; International Max Planck Research School for Molecular and Cellular Biology (IMPRS-MCB), 79108 Freiburg, Germany
| | - Ivano Legnini
- Laboratory for Systems Biology of Gene Regulatory Elements, Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 10115 Berlin, Germany
| | - Sarah Holec
- Max-Planck-Institute of Immunobiology and Epigenetics, 79108 Freiburg, Germany
| | - Laura Arrigoni
- Max-Planck-Institute of Immunobiology and Epigenetics, 79108 Freiburg, Germany
| | - Hasan Can Ozbulut
- Max-Planck-Institute of Immunobiology and Epigenetics, 79108 Freiburg, Germany; Faculty of Biology, Albert Ludwig University, 79104 Freiburg, Germany
| | - Fernando Mateos
- Max-Planck-Institute of Immunobiology and Epigenetics, 79108 Freiburg, Germany
| | - David Koppstein
- Max-Planck-Institute of Immunobiology and Epigenetics, 79108 Freiburg, Germany
| | - Agnieszka Rybak-Wolf
- Organoid Platform, Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 10115 Berlin, Germany
| | - Ulrike Bönisch
- Max-Planck-Institute of Immunobiology and Epigenetics, 79108 Freiburg, Germany
| | - Nikolaus Rajewsky
- Laboratory for Systems Biology of Gene Regulatory Elements, Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 10115 Berlin, Germany; Charité - Universitätsmedizin, Charitépl. 1, 10117 Berlin, Germany; German Center for Cardiovascular Research (DZHK), Site Berlin, Berlin, Germany; NeuroCure Cluster of Excellence, Berlin, Germany; German Cancer Consortium (DKTK); National Center for Tumor Diseases (NCT), Site Berlin, Berlin, Germany
| | - Valérie Hilgers
- Max-Planck-Institute of Immunobiology and Epigenetics, 79108 Freiburg, Germany; Signalling Research Centre CIBSS, University of Freiburg, Schänzlestraße 18, 79104 Freiburg, Germany.
| |
Collapse
|
14
|
Xie J, Zhang W, Zhu X, Deng M, Lai L. Coevolution-based prediction of key allosteric residues for protein function regulation. eLife 2023; 12:81850. [PMID: 36799896 PMCID: PMC9981151 DOI: 10.7554/elife.81850] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 02/16/2023] [Indexed: 02/18/2023] Open
Abstract
Allostery is fundamental to many biological processes. Due to the distant regulation nature, how allosteric mutations, modifications, and effector binding impact protein function is difficult to forecast. In protein engineering, remote mutations cannot be rationally designed without large-scale experimental screening. Allosteric drugs have raised much attention due to their high specificity and possibility of overcoming existing drug-resistant mutations. However, optimization of allosteric compounds remains challenging. Here, we developed a novel computational method KeyAlloSite to predict allosteric site and to identify key allosteric residues (allo-residues) based on the evolutionary coupling model. We found that protein allosteric sites are strongly coupled to orthosteric site compared to non-functional sites. We further inferred key allo-residues by pairwise comparing the difference of evolutionary coupling scores of each residue in the allosteric pocket with the functional site. Our predicted key allo-residues are in accordance with previous experimental studies for typical allosteric proteins like BCR-ABL1, Tar, and PDZ3, as well as key cancer mutations. We also showed that KeyAlloSite can be used to predict key allosteric residues distant from the catalytic site that are important for enzyme catalysis. Our study demonstrates that weak coevolutionary couplings contain important information of protein allosteric regulation function. KeyAlloSite can be applied in studying the evolution of protein allosteric regulation, designing and optimizing allosteric drugs, and performing functional protein design and enzyme engineering.
Collapse
Affiliation(s)
- Juan Xie
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking UniversityBeijingChina
| | - Weilin Zhang
- BNLMS, Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, Peking UniversityBeijingChina
| | - Xiaolei Zhu
- School of Sciences, Anhui Agricultural UniversityHefeiChina
| | - Minghua Deng
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking UniversityBeijingChina
- School of Mathematical Sciences, Peking UniversityBeijingChina
- Center for Statistical Science, Peking UniversityBeijingChina
| | - Luhua Lai
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking UniversityBeijingChina
- BNLMS, Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, Peking UniversityBeijingChina
- Research Unit of Drug Design Method, Chinese Academy of Medical Sciences (2021RU014)BeijingChina
| |
Collapse
|
15
|
Wang K, Zhou R, Wu Y, Li M. RLBind: a deep learning method to predict RNA-ligand binding sites. Brief Bioinform 2023; 24:6832814. [PMID: 36398911 DOI: 10.1093/bib/bbac486] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 09/28/2022] [Accepted: 10/14/2022] [Indexed: 11/19/2022] Open
Abstract
Identification of RNA-small molecule binding sites plays an essential role in RNA-targeted drug discovery and development. These small molecules are expected to be leading compounds to guide the development of new types of RNA-targeted therapeutics compared with regular therapeutics targeting proteins. RNAs can provide many potential drug targets with diverse structures and functions. However, up to now, only a few methods have been proposed. Predicting RNA-small molecule binding sites still remains a big challenge. New computational model is required to better extract the features and predict RNA-small molecule binding sites more accurately. In this paper, a deep learning model, RLBind, was proposed to predict RNA-small molecule binding sites from sequence-dependent and structure-dependent properties by combining global RNA sequence channel and local neighbor nucleotides channel. To our best knowledge, this research was the first to develop a convolutional neural network for RNA-small molecule binding sites prediction. Furthermore, RLBind also can be used as a potential tool when the RNA experimental tertiary structure is not available. The experimental results show that RLBind outperforms other state-of-the-art methods in predicting binding sites. Therefore, our study demonstrates that the combination of global information for full-length sequences and local information for limited local neighbor nucleotides in RNAs can improve the model's predictive performance for binding sites prediction. All datasets and resource codes are available at https://github.com/KailiWang1/RLBind.
Collapse
Affiliation(s)
- Kaili Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Renyi Zhou
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yifan Wu
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
16
|
Abstract
RNA molecules carry out various cellular functions, and understanding the mechanisms behind their functions requires the knowledge of their 3D structures. Different types of computational methods have been developed to model RNA 3D structures over the past decade. These methods were widely used by researchers although their performance needs to be further improved. Recently, along with these traditional methods, machine-learning techniques have been increasingly applied to RNA 3D structure prediction and show significant improvement in performance. Here we shall give a brief review of the traditional methods and recent related advances in machine-learning approaches for RNA 3D structure prediction.
Collapse
Affiliation(s)
- Xiujuan Ou
- Institute of Biophysics, School of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | - Yi Zhang
- Institute of Biophysics, School of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | - Yiduo Xiong
- Institute of Biophysics, School of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | - Yi Xiao
- Institute of Biophysics, School of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| |
Collapse
|
17
|
Paloncýová M, Pykal M, Kührová P, Banáš P, Šponer J, Otyepka M. Computer Aided Development of Nucleic Acid Applications in Nanotechnologies. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2022; 18:e2204408. [PMID: 36216589 DOI: 10.1002/smll.202204408] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Revised: 09/12/2022] [Indexed: 06/16/2023]
Abstract
Utilization of nucleic acids (NAs) in nanotechnologies and nanotechnology-related applications is a growing field with broad application potential, ranging from biosensing up to targeted cell delivery. Computer simulations are useful techniques that can aid design and speed up development in this field. This review focuses on computer simulations of hybrid nanomaterials composed of NAs and other components. Current state-of-the-art molecular dynamics simulations, empirical force fields (FFs), and coarse-grained approaches for the description of deoxyribonucleic acid and ribonucleic acid are critically discussed. Challenges in combining biomacromolecular and nanomaterial FFs are emphasized. Recent applications of simulations for modeling NAs and their interactions with nano- and biomaterials are overviewed in the fields of sensing applications, targeted delivery, and NA templated materials. Future perspectives of development are also highlighted.
Collapse
Affiliation(s)
- Markéta Paloncýová
- Regional Center of Advanced Technologies and Materials, The Czech Advanced Technology and Research Institute (CATRIN), Palacký University Olomouc, Šlechtitelů 27, Olomouc, 779 00, Czech Republic
| | - Martin Pykal
- Regional Center of Advanced Technologies and Materials, The Czech Advanced Technology and Research Institute (CATRIN), Palacký University Olomouc, Šlechtitelů 27, Olomouc, 779 00, Czech Republic
| | - Petra Kührová
- Regional Center of Advanced Technologies and Materials, The Czech Advanced Technology and Research Institute (CATRIN), Palacký University Olomouc, Šlechtitelů 27, Olomouc, 779 00, Czech Republic
| | - Pavel Banáš
- Regional Center of Advanced Technologies and Materials, The Czech Advanced Technology and Research Institute (CATRIN), Palacký University Olomouc, Šlechtitelů 27, Olomouc, 779 00, Czech Republic
| | - Jiří Šponer
- Regional Center of Advanced Technologies and Materials, The Czech Advanced Technology and Research Institute (CATRIN), Palacký University Olomouc, Šlechtitelů 27, Olomouc, 779 00, Czech Republic
- Institute of Biophysics of the Czech Academy of Sciences, v. v. i., Královopolská 135, Brno, 612 65, Czech Republic
| | - Michal Otyepka
- Regional Center of Advanced Technologies and Materials, The Czech Advanced Technology and Research Institute (CATRIN), Palacký University Olomouc, Šlechtitelů 27, Olomouc, 779 00, Czech Republic
- IT4Innovations, VŠB - Technical University of Ostrava, 17. listopadu 2172/15, Ostrava-Poruba, 708 00, Czech Republic
| |
Collapse
|
18
|
rMSA: a sequence search and alignment algorithm to improve RNA structure modeling. J Mol Biol 2022. [DOI: 10.1016/j.jmb.2022.167904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
|
19
|
Beck JD, Roberts JM, Kitzhaber JM, Trapp A, Serra E, Spezzano F, Hayden EJ. Predicting higher-order mutational effects in an RNA enzyme by machine learning of high-throughput experimental data. Front Mol Biosci 2022; 9:893864. [PMID: 36046603 PMCID: PMC9421044 DOI: 10.3389/fmolb.2022.893864] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 06/28/2022] [Indexed: 11/13/2022] Open
Abstract
Ribozymes are RNA molecules that catalyze biochemical reactions. Self-cleaving ribozymes are a common naturally occurring class of ribozymes that catalyze site-specific cleavage of their own phosphodiester backbone. In addition to their natural functions, self-cleaving ribozymes have been used to engineer control of gene expression because they can be designed to alter RNA processing and stability. However, the rational design of ribozyme activity remains challenging, and many ribozyme-based systems are engineered or improved by random mutagenesis and selection (in vitro evolution). Improving a ribozyme-based system often requires several mutations to achieve the desired function, but extensive pairwise and higher-order epistasis prevent a simple prediction of the effect of multiple mutations that is needed for rational design. Recently, high-throughput sequencing-based approaches have produced data sets on the effects of numerous mutations in different ribozymes (RNA fitness landscapes). Here we used such high-throughput experimental data from variants of the CPEB3 self-cleaving ribozyme to train a predictive model through machine learning approaches. We trained models using either a random forest or long short-term memory (LSTM) recurrent neural network approach. We found that models trained on a comprehensive set of pairwise mutant data could predict active sequences at higher mutational distances, but the correlation between predicted and experimentally observed self-cleavage activity decreased with increasing mutational distance. Adding sequences with increasingly higher numbers of mutations to the training data improved the correlation at increasing mutational distances. Systematically reducing the size of the training data set suggests that a wide distribution of ribozyme activity may be the key to accurate predictions. Because the model predictions are based only on sequence and activity data, the results demonstrate that this machine learning approach allows readily obtainable experimental data to be used for RNA design efforts even for RNA molecules with unknown structures. The accurate prediction of RNA functions will enable a more comprehensive understanding of RNA fitness landscapes for studying evolution and for guiding RNA-based engineering efforts.
Collapse
Affiliation(s)
| | - Jessica M. Roberts
- Biomolecular Sciences Graduate Programs, Boise State University, Boise, ID, United States
| | - Joey M. Kitzhaber
- Department of Computer Science, Boise State University, Boise, ID, United States
| | - Ashlyn Trapp
- Department of Biological Sciences, Boise State University, Boise, ID, United States
| | | | | | - Eric J. Hayden
- Biomolecular Sciences Graduate Programs, Boise State University, Boise, ID, United States
- Department of Computer Science, Boise State University, Boise, ID, United States
- *Correspondence: Eric J. Hayden,
| |
Collapse
|
20
|
Singh J, Paliwal K, Litfin T, Singh J, Zhou Y. Predicting RNA distance-based contact maps by integrated deep learning on physics-inferred secondary structure and evolutionary-derived mutational coupling. Bioinformatics 2022; 38:3900-3910. [PMID: 35751593 PMCID: PMC9364379 DOI: 10.1093/bioinformatics/btac421] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 04/30/2022] [Accepted: 06/28/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Recently, AlphaFold2 achieved high experimental accuracy for the majority of proteins in Critical Assessment of Structure Prediction (CASP 14). This raises the hope that one day, we may achieve the same feat for RNA structure prediction for those structured RNAs, which is as fundamentally and practically important similar to protein structure prediction. One major factor in the recent advancement of protein structure prediction is the highly accurate prediction of distance-based contact maps of proteins. RESULTS Here, we showed that by integrated deep learning with physics-inferred secondary structures, co-evolutionary information and multiple sequence-alignment sampling, we can achieve RNA contact-map prediction at a level of accuracy similar to that in protein contact-map prediction. More importantly, highly accurate prediction for top L long-range contacts can be assured for those RNAs with a high effective number of homologous sequences (Neff > 50). The initial use of the predicted contact map as distance-based restraints confirmed its usefulness in 3D structure prediction. AVAILABILITY AND IMPLEMENTATION SPOT-RNA-2D is available as a web server at https://sparks-lab.org/server/spot-rna-2d/ and as a standalone program at https://github.com/jaswindersingh2/SPOT-RNA-2D. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Thomas Litfin
- Institute for Glycomics, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Jaspreet Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Yaoqi Zhou
- To whom correspondence should be addressed. or or
| |
Collapse
|
21
|
van Keulen SC, Martin J, Colizzi F, Frezza E, Trpevski D, Diaz NC, Vidossich P, Rothlisberger U, Hellgren Kotaleski J, Wade RC, Carloni P. Multiscale molecular simulations to investigate adenylyl cyclase‐based signaling in the brain. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1623] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Siri C. van Keulen
- Computational Structural Biology Group, Bijvoet Center for Biomolecular Research, Science for Life, Faculty of Science – Chemistry Utrecht University Utrecht The Netherlands
| | - Juliette Martin
- CNRS, UMR 5086 Molecular Microbiology and Structural Biochemistry University of Lyon Lyon France
| | - Francesco Colizzi
- Molecular Ocean Laboratory, Department of Marine Biology and Oceanography Institute of Marine Sciences, ICM‐CSIC Barcelona Spain
| | - Elisa Frezza
- Université Paris Cité, CiTCoM, CNRS Paris France
| | - Daniel Trpevski
- Science for Life Laboratory, School of Electrical Engineering and Computer Science KTH Royal Institute of Technology Stockholm
| | - Nuria Cirauqui Diaz
- CNRS, UMR 5086 Molecular Microbiology and Structural Biochemistry University of Lyon Lyon France
| | - Pietro Vidossich
- Molecular Modeling and Drug Discovery Lab Istituto Italiano di Tecnologia Genoa Italy
| | - Ursula Rothlisberger
- Laboratory of Computational Chemistry and Biochemistry Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne
| | - Jeanette Hellgren Kotaleski
- Science for Life Laboratory, School of Electrical Engineering and Computer Science KTH Royal Institute of Technology Stockholm
- Department of Neuroscience Karolinska Institute Stockholm
| | - Rebecca C. Wade
- Molecular and Cellular Modeling Group Heidelberg Institute for Theoretical Studies (HITS) Heidelberg Germany
- Center for Molecular Biology (ZMBH), DKFZ‐ZMBH Alliance, and Interdisciplinary Center for Scientific Computing (IWR) Heidelberg University Heidelberg Germany
| | - Paolo Carloni
- Institute for Neuroscience and Medicine (INM‐9) and Institute for Advanced Simulations (IAS‐5) “Computational biomedicine” Forschungszentrum Jülich Jülich Germany
- INM‐11 JARA‐Institute: Molecular Neuroscience and Neuroimaging Forschungszentrum Jülich Jülich Germany
| |
Collapse
|
22
|
Ngampruetikorn V, Sachdeva V, Torrence J, Humplik J, Schwab DJ, Palmer SE. Inferring couplings in networks across order-disorder phase transitions. PHYSICAL REVIEW RESEARCH 2022; 4:023240. [PMID: 37576946 PMCID: PMC10421637 DOI: 10.1103/physrevresearch.4.023240] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Statistical inference is central to many scientific endeavors, yet how it works remains unresolved. Answering this requires a quantitative understanding of the intrinsic interplay between statistical models, inference methods, and the structure in the data. To this end, we characterize the efficacy of direct coupling analysis (DCA) - a highly successful method for analyzing amino acid sequence data-in inferring pairwise interactions from samples of ferromagnetic Ising models on random graphs. Our approach allows for physically motivated exploration of qualitatively distinct data regimes separated by phase transitions. We show that inference quality depends strongly on the nature of data-generating distributions: optimal accuracy occurs at an intermediate temperature where the detrimental effects from macroscopic order and thermal noise are minimal. Importantly our results indicate that DCA does not always outperform its local-statistics-based predecessors; while DCA excels at low temperatures, it becomes inferior to simple correlation thresholding at virtually all temperatures when data are limited. Our findings offer insights into the regime in which DCA operates so successfully, and more broadly, how inference interacts with the structure in the data.
Collapse
Affiliation(s)
- Vudtiwat Ngampruetikorn
- Initiative for the Theoretical Sciences, The Graduate Center, CUNY, New York, New York 10016, USA
| | - Vedant Sachdeva
- Department of Organismal Biology and Anatomy and Department of Physics, University of Chicago, Chicago, Illinois 60637, USA
| | - Johanna Torrence
- Department of Organismal Biology and Anatomy and Department of Physics, University of Chicago, Chicago, Illinois 60637, USA
| | - Jan Humplik
- Institute of Science and Technology Austria, 3400 Klosterneuburg, Austria
| | - David J Schwab
- Initiative for the Theoretical Sciences, The Graduate Center, CUNY, New York, New York 10016, USA
| | - Stephanie E Palmer
- Department of Organismal Biology and Anatomy and Department of Physics, University of Chicago, Chicago, Illinois 60637, USA
| |
Collapse
|
23
|
Zhang M, Hwang IT, Li K, Bai J, Chen JF, Weissman T, Zou JY, Lu Z. Classification and clustering of RNA crosslink-ligation data reveal complex structures and homodimers. Genome Res 2022; 32:968-985. [PMID: 35332099 PMCID: PMC9104705 DOI: 10.1101/gr.275979.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Accepted: 01/11/2022] [Indexed: 12/04/2022]
Abstract
The recent development and application of methods based on the general principle of "crosslinking and proximity ligation" (crosslink-ligation) are revolutionizing RNA structure studies in living cells. However, extracting structure information from such data presents unique challenges. Here, we introduce a set of computational tools for the systematic analysis of data from a wide variety of crosslink-ligation methods, specifically focusing on read mapping, alignment classification, and clustering. We design a new strategy to map short reads with irregular gaps at high sensitivity and specificity. Analysis of previously published data reveals distinct properties and bias caused by the crosslinking reactions. We perform rigorous and exhaustive classification of alignments and discover eight types of arrangements that provide distinct information on RNA structures and interactions. To deconvolve the dense and intertwined gapped alignments, we develop a network/graph-based tool Crosslinked RNA Secondary Structure Analysis using Network Techniques (CRSSANT), which enables clustering of gapped alignments and discovery of new alternative and dynamic conformations. We discover that multiple crosslinking and ligation events can occur on the same RNA, generating multisegment alignments to report complex high-level RNA structures and multi-RNA interactions. We find that alignments with overlapped segments are produced from potential homodimers and develop a new method for their de novo identification. Analysis of overlapping alignments revealed potential new homodimers in cellular noncoding RNAs and RNA virus genomes in the Picornaviridae family. Together, this suite of computational tools enables rapid and efficient analysis of RNA structure and interaction data in living cells.
Collapse
Affiliation(s)
- Minjie Zhang
- Department of Pharmacology and Pharmaceutical Sciences, University of Southern California, Los Angeles, California 90089, USA
| | - Irena T Hwang
- Department of Electrical Engineering, Stanford University, Stanford, California 94305, USA
| | - Kongpan Li
- Department of Pharmacology and Pharmaceutical Sciences, University of Southern California, Los Angeles, California 90089, USA
| | - Jianhui Bai
- Department of Pharmacology and Pharmaceutical Sciences, University of Southern California, Los Angeles, California 90089, USA
| | - Jian-Fu Chen
- Center for Craniofacial Molecular Biology, University of Southern California (USC), Los Angeles, California 90033, USA
| | - Tsachy Weissman
- Department of Electrical Engineering, Stanford University, Stanford, California 94305, USA
| | - James Y Zou
- Department of Electrical Engineering, Stanford University, Stanford, California 94305, USA
- Department of Biomedical Data Science and Chan-Zuckerberg Biohub, Stanford University, Palo Alto, California 94305, USA
| | - Zhipeng Lu
- Department of Pharmacology and Pharmaceutical Sciences, University of Southern California, Los Angeles, California 90089, USA
| |
Collapse
|
24
|
Labes S, Stupp D, Wagner N, Bloch I, Lotem M, L Lahad E, Polak P, Pupko T, Tabach Y. Machine-learning of complex evolutionary signals improves classification of SNVs. NAR Genom Bioinform 2022; 4:lqac025. [PMID: 35402908 PMCID: PMC8988715 DOI: 10.1093/nargab/lqac025] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 02/08/2022] [Accepted: 03/28/2022] [Indexed: 12/12/2022] Open
Abstract
Conservation is a strong predictor for the pathogenicity of single-nucleotide variants (SNVs). However, some positions that present complex conservation patterns across vertebrates stray from this paradigm. Here, we analyzed the association between complex conservation patterns and the pathogenicity of SNVs in the 115 disease-genes that had sufficient variant data. We show that conservation is not a one-rule-fits-all solution since its accuracy highly depends on the analyzed set of species and genes. For example, pairwise comparisons between the human and 99 vertebrate species showed that species differ in their ability to predict the clinical outcomes of variants among different genes using conservation. Furthermore, certain genes were less amenable for conservation-based variant prediction, while others demonstrated species that optimize prediction. These insights led to developing EvoDiagnostics, which uses the conservation against each species as a feature within a random-forest machine-learning classification algorithm. EvoDiagnostics outperformed traditional conservation algorithms, deep-learning based methods and most ensemble tools in every prediction-task, highlighting the strength of optimizing conservation analysis per-species and per-gene. Overall, we suggest a new and a more biologically relevant approach for analyzing conservation, which improves prediction of variant pathogenicity.
Collapse
Affiliation(s)
- Sapir Labes
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| | - Doron Stupp
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| | - Naama Wagner
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Idit Bloch
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| | - Michal Lotem
- Sharett Institute of Oncology, Hadassah University Medical Center, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| | - Ephrat L Lahad
- Medical Genetics Institute, Shaare Zedek Medical Center, Jerusalem9103102, Israel
| | - Paz Polak
- Oncological Sciences, Icahn School of Medicine at Mount Sinai, NY10029, USA
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Yuval Tabach
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| |
Collapse
|
25
|
Terai G, Asai K. QRNAstruct: a method for extracting secondary structural features of RNA via regression with biological activity. Nucleic Acids Res 2022; 50:e73. [PMID: 35390152 PMCID: PMC9303433 DOI: 10.1093/nar/gkac220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Revised: 02/15/2022] [Accepted: 03/24/2022] [Indexed: 12/04/2022] Open
Abstract
Recent technological advances have enabled the generation of large amounts of data consisting of RNA sequences and their functional activity. Here, we propose a method for extracting secondary structure features that affect the functional activity of RNA from sequence–activity data. Given pairs of RNA sequences and their corresponding bioactivity values, our method calculates position-specific structural features of the input RNA sequences, considering every possible secondary structure of each RNA. A Ridge regression model is trained using the structural features as feature vectors and the bioactivity values as response variables. Optimized model parameters indicate how secondary structure features affect bioactivity. We used our method to extract intramolecular structural features of bacterial translation initiation sites and self-cleaving ribozymes, and the intermolecular features between rRNAs and Shine–Dalgarno sequences and between U1 RNAs and splicing sites. We not only identified known structural features but also revealed more detailed insights into structure–activity relationships than previously reported. Importantly, the datasets we analyzed here were obtained from different experimental systems and differed in size, sequence length and similarity, and number of RNA molecules involved, demonstrating that our method is applicable to various types of data consisting of RNA sequences and bioactivity values.
Collapse
Affiliation(s)
- Goro Terai
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwanoha 5-1-5, Kashiwa, Chiba 277-8561, Japan
| | - Kiyoshi Asai
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwanoha 5-1-5, Kashiwa, Chiba 277-8561, Japan
| |
Collapse
|
26
|
Van Damme R, Li K, Zhang M, Bai J, Lee WH, Yesselman JD, Lu Z, Velema WA. Chemical reversible crosslinking enables measurement of RNA 3D distances and alternative conformations in cells. Nat Commun 2022; 13:911. [PMID: 35177610 PMCID: PMC8854666 DOI: 10.1038/s41467-022-28602-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Accepted: 01/19/2022] [Indexed: 02/06/2023] Open
Abstract
Three-dimensional (3D) structures dictate the functions of RNA molecules in a wide variety of biological processes. However, direct determination of RNA 3D structures in vivo is difficult due to their large sizes, conformational heterogeneity, and dynamics. Here we present a method, Spatial 2'-Hydroxyl Acylation Reversible Crosslinking (SHARC), which uses chemical crosslinkers of defined lengths to measure distances between nucleotides in cellular RNA. Integrating crosslinking, exonuclease (exo) trimming, proximity ligation, and high throughput sequencing, SHARC enables transcriptome-wide tertiary structure contact maps at high accuracy and precision, revealing heterogeneous RNA structures and interactions. SHARC data provide constraints that improves Rosetta-based RNA 3D structure modeling at near-nanometer resolution. Integrating SHARC-exo with other crosslinking-based methods, we discover compact folding of the 7SK RNA, a critical regulator of transcriptional elongation. These results establish a strategy for measuring RNA 3D distances and alternative conformations in their native cellular context.
Collapse
Affiliation(s)
- Ryan Van Damme
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, Los Angeles, CA, 90033, USA
| | - Kongpan Li
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, Los Angeles, CA, 90033, USA
| | - Minjie Zhang
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, Los Angeles, CA, 90033, USA
| | - Jianhui Bai
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, Los Angeles, CA, 90033, USA
| | - Wilson H Lee
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, Los Angeles, CA, 90033, USA
| | - Joseph D Yesselman
- Department of Chemistry, University of Nebraska-Lincoln, 832A Hamilton Hall, Lincoln, NE, 68588, USA
| | - Zhipeng Lu
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, Los Angeles, CA, 90033, USA.
| | - Willem A Velema
- Institute for Molecules and Materials, Radboud University Nijmegen, Nijmegen, The Netherlands.
| |
Collapse
|
27
|
Fang X, Gallego J, Wang YX. Deriving RNA topological structure from SAXS. Methods Enzymol 2022; 677:479-529. [DOI: 10.1016/bs.mie.2022.08.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
28
|
Zerihun MB, Pucci F, Schug A. CoCoNet-boosting RNA contact prediction by convolutional neural networks. Nucleic Acids Res 2021; 49:12661-12672. [PMID: 34871451 PMCID: PMC8682773 DOI: 10.1093/nar/gkab1144] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/27/2021] [Accepted: 11/05/2021] [Indexed: 11/24/2022] Open
Abstract
Co-evolutionary models such as direct coupling analysis (DCA) in combination with machine learning (ML) techniques based on deep neural networks are able to predict accurate protein contact or distance maps. Such information can be used as constraints in structure prediction and massively increase prediction accuracy. Unfortunately, the same ML methods cannot readily be applied to RNA as they rely on large structural datasets only available for proteins. Here, we demonstrate how the available smaller data for RNA can be used to improve prediction of RNA contact maps. We introduce an algorithm called CoCoNet that is based on a combination of a Coevolutionary model and a shallow Convolutional Neural Network. Despite its simplicity and the small number of trained parameters, the method boosts the positive predictive value (PPV) of predicted contacts by about 70% with respect to DCA as tested by cross-validation of about eighty RNA structures. However, the direct inclusion of the CoCoNet contacts in 3D modeling tools does not result in a proportional increase of the 3D RNA structure prediction accuracy. Therefore, we suggest that the field develops, in addition to contact PPV, metrics which estimate the expected impact for 3D structure modeling tools better. CoCoNet is freely available and can be found at https://github.com/KIT-MBS/coconet.
Collapse
Affiliation(s)
- Mehari B Zerihun
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428 Jülich, Germany.,Steinbuch Centre for Computing, Karlsruhe Institute of Technology, 76344 Eggenstein-Leopoldshafen, Germany
| | - Fabrizio Pucci
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428 Jülich, Germany.,Computational Biology and Bioinformatics, Université Libre de Bruxelles 1050, Brussels, Belgium
| | - Alexander Schug
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428 Jülich, Germany.,Faculty of Biology, University of Duisburg-Essen, 45117 Essen, Germany
| |
Collapse
|
29
|
Manigrasso J, Marcia M, De Vivo M. Computer-aided design of RNA-targeted small molecules: A growing need in drug discovery. Chem 2021. [DOI: 10.1016/j.chempr.2021.05.021] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
30
|
Colizzi F, Orozco M. Probing allosteric regulations with coevolution-driven molecular simulations. SCIENCE ADVANCES 2021; 7:eabj0786. [PMID: 34516882 PMCID: PMC8442858 DOI: 10.1126/sciadv.abj0786] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 07/19/2021] [Indexed: 06/13/2023]
Abstract
Protein-mediated allosteric regulations are essential in biology, but their quantitative characterization continues to posit formidable challenges for both experiments and computations. Here, we combine coevolutionary information, multiscale molecular simulations, and free-energy methods to interrogate and quantify the allosteric regulation of functional changes in protein complexes. We apply this approach to investigate the regulation of adenylyl cyclase (AC) by stimulatory and inhibitory G proteins—a prototypical allosteric system that has long escaped from in-depth molecular characterization. We reveal a surprisingly simple ON/OFF regulation of AC functional dynamics through multiple pathways of information transfer. The binding of G proteins reshapes the free-energy landscape of AC following the classical population-shift paradigm. The model agrees with structural and biochemical data and reveals previously unknown experimentally consistent intermediates. Our approach showcases a general strategy to explore uncharted functional space in complex biomolecular regulations.
Collapse
Affiliation(s)
- Francesco Colizzi
- Institute for Research in Biomedicine (IRB Barcelona), Barcelona Institute of Science and Technology (BIST), Carrer de Baldiri Reixac 10, Barcelona 08028, Spain
| | - Modesto Orozco
- Institute for Research in Biomedicine (IRB Barcelona), Barcelona Institute of Science and Technology (BIST), Carrer de Baldiri Reixac 10, Barcelona 08028, Spain
- Departament de Bioquímica i Biomedicina, Facultat de Biologia, Universitat de Barcelona, Avinguda Diagonal 647, Barcelona 08028, Spain
| |
Collapse
|
31
|
Sun S, Wang W, Peng Z, Yang J. RNA inter-nucleotide 3D closeness prediction by deep residual neural networks. Bioinformatics 2021; 37:1093-1098. [PMID: 33135062 PMCID: PMC8150135 DOI: 10.1093/bioinformatics/btaa932] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Revised: 10/01/2020] [Accepted: 10/22/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Recent years have witnessed that the inter-residue contact/distance in proteins could be accurately predicted by deep neural networks, which significantly improve the accuracy of predicted protein structure models. In contrast, fewer studies have been done for the prediction of RNA inter-nucleotide 3D closeness. RESULTS We proposed a new algorithm named RNAcontact for the prediction of RNA inter-nucleotide 3D closeness. RNAcontact was built based on the deep residual neural networks. The covariance information from multiple sequence alignments and the predicted secondary structure were used as the input features of the networks. Experiments show that RNAcontact achieves the respective precisions of 0.8 and 0.6 for the top L/10 and L (where L is the length of an RNA) predictions on an independent test set, significantly higher than other evolutionary coupling methods. Analysis shows that about 1/3 of the correctly predicted 3D closenesses are not base pairings of secondary structure, which are critical to the determination of RNA structure. In addition, we demonstrated that the predicted 3D closeness could be used as distance restraints to guide RNA structure folding by the 3dRNA package. More accurate models could be built by using the predicted 3D closeness than the models without using 3D closeness. AVAILABILITY AND IMPLEMENTATION The webserver and a standalone package are available at: http://yanglab.nankai.edu.cn/RNAcontact/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Saisai Sun
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Wenkai Wang
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin 300072, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| |
Collapse
|
32
|
Zhang T, Singh J, Litfin T, Zhan J, Paliwal K, Zhou Y. RNAcmap: A Fully Automatic Pipeline for Predicting Contact Maps of RNAs by Evolutionary Coupling Analysis. Bioinformatics 2021; 37:3494-3500. [PMID: 34021744 DOI: 10.1093/bioinformatics/btab391] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Revised: 03/27/2021] [Accepted: 05/18/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The accuracy of RNA secondary and tertiary structure prediction can be significantly improved by using structural restraints derived from evolutionary coupling or direct coupling analysis. Currently, these coupling analyses relied on manually curated multiple sequence alignments collected in the Rfam database, which contains 3016 families. By comparison, millions of non-coding RNA sequences are known. Here, we established RNAcmap, a fully automatic pipeline that enables evolutionary coupling analysis for any RNA sequences. The homology search was based on the covariance model built by INFERNAL according to two secondary structure predictors: a folding-based algorithm RNAfold and the latest deep-learning method SPOT-RNA. RESULTS We showed that the performance of RNAcmap is less dependent on the specific evolutionary coupling tool but is more dependent on the accuracy of secondary structure predictor with the best performance given by RNAcmap (SPOT-RNA). The performance of RNAcmap (SPOT-RNA) is comparable to that based on Rfam-supplied alignment and consistent for those sequences that are not in Rfam collections. Further improvement can be made with a simple meta predictor RNAcmap (SPOT-RNA/RNAfold) depending on which secondary structure predictor can find more homologous sequences. Reliable base-pairing information generated from RNAcmap, for RNAs with high effective homologous sequences, in particular, will be useful for aiding RNA structure prediction. AVAILABILITY RNAcmap is available as a web server at https://sparks-lab.org/server/rnacmap/ and as a standalone application along with the datasets at https://github.com/sparks-lab-org/RNAcmap_standalone. A platform independent and fully configured docker image of RNAcmap is also provided at https://hub.docker.com/r/jaswindersingh2/rnacmap.
Collapse
Affiliation(s)
- Tongchuan Zhang
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Jaswinder Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Thomas Litfin
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Jian Zhan
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia.,Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
| |
Collapse
|
33
|
Pairing a high-resolution statistical potential with a nucleobase-centric sampling algorithm for improving RNA model refinement. Nat Commun 2021; 12:2777. [PMID: 33986288 PMCID: PMC8119458 DOI: 10.1038/s41467-021-23100-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 04/13/2021] [Indexed: 12/04/2022] Open
Abstract
Refining modelled structures to approach experimental accuracy is one of the most challenging problems in molecular biology. Despite many years’ efforts, the progress in protein or RNA structure refinement has been slow because the global minimum given by the energy scores is not at the experimentally determined “native” structure. Here, we propose a fully knowledge-based energy function that captures the full orientation dependence of base–base, base–oxygen and oxygen–oxygen interactions with the RNA backbone modelled by rotameric states and internal energies. A total of 4000 quantum-mechanical calculations were performed to reweight base–base statistical potentials for minimizing possible effects of indirect interactions. The resulting BRiQ knowledge-based potential, equipped with a nucleobase-centric sampling algorithm, provides a robust improvement in refining near-native RNA models generated by a wide variety of modelling techniques. Predicting RNA structure from sequence is challenging due to the relative sparsity of experimentally-determined RNA 3D structures for model training. Here, the authors propose a way to incorporate knowledge on interactions at the atomic and base–base level to refine the prediction of RNA structures.
Collapse
|
34
|
McCafferty CL, Taylor DW, Marcotte EM. Improving integrative 3D modeling into low- to medium-resolution electron microscopy structures with evolutionary couplings. Protein Sci 2021; 30:1006-1021. [PMID: 33759266 PMCID: PMC8040867 DOI: 10.1002/pro.4067] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2021] [Revised: 03/16/2021] [Accepted: 03/16/2021] [Indexed: 12/12/2022]
Abstract
Electron microscopy (EM) continues to provide near-atomic resolution structures for well-behaved proteins and protein complexes. Unfortunately, structures of some complexes are limited to low- to medium-resolution due to biochemical or conformational heterogeneity. Thus, the application of unbiased systematic methods for fitting individual structures into EM maps is important. A method that employs co-evolutionary information obtained solely from sequence data could prove invaluable for quick, confident localization of subunits within these structures. Here, we incorporate the co-evolution of intermolecular amino acids as a new type of distance restraint in the integrative modeling platform in order to build three-dimensional models of atomic structures into EM maps ranging from 10-14 Å in resolution. We validate this method using four complexes of known structure, where we highlight the conservation of intermolecular couplings despite dynamic conformational changes using the BAM complex. Finally, we use this method to assemble the subunits of the bacterial holo-translocon into a model that agrees with previous biochemical data. The use of evolutionary couplings in integrative modeling improves systematic, unbiased fitting of atomic models into medium- to low-resolution EM maps, providing additional information to integrative models lacking in spatial data.
Collapse
Affiliation(s)
| | - David W. Taylor
- Department of Molecular BiosciencesUniversity of Texas at AustinAustinTexasUSA
- Center for Systems and Synthetic BiologyUniversity of Texas at AustinAustinTexasUSA
- LIVESTRONG Cancer InstitutesDell Medical SchoolAustinTexasUSA
| | - Edward M. Marcotte
- Department of Molecular BiosciencesUniversity of Texas at AustinAustinTexasUSA
- Center for Systems and Synthetic BiologyUniversity of Texas at AustinAustinTexasUSA
| |
Collapse
|
35
|
Zeng HL, Aurell E. Inferring genetic fitness from genomic data. Phys Rev E 2021; 101:052409. [PMID: 32575265 DOI: 10.1103/physreve.101.052409] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 05/04/2020] [Indexed: 11/07/2022]
Abstract
The genetic composition of a naturally developing population is considered as due to mutation, selection, genetic drift, and recombination. Selection is modeled as single-locus terms (additive fitness) and two-loci terms (pairwise epistatic fitness). The problem is posed to infer epistatic fitness from population-wide whole-genome data from a time series of a developing population. We generate such data in silico and show that in the quasilinkage equilibrium phase of Kimura, Neher, and Shraiman, which pertains at high enough recombination rates and low enough mutation rates, epistatic fitness can be quantitatively correctly inferred using inverse Ising-Potts methods.
Collapse
Affiliation(s)
- Hong-Li Zeng
- School of Science, and New Energy Technology Engineering Laboratory of Jiangsu Province, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.,Nordita, Royal Institute of Technology, and Stockholm University, SE-10691 Stockholm, Sweden
| | - Erik Aurell
- KTH-Royal Institute of Technology, AlbaNova University Center, SE-106 91 Stockholm, Sweden.,Faculty of Physics, Astronomy and Applied Computer Science, Jagiellonian University, 30-348 Kraków, Poland
| |
Collapse
|
36
|
Feng C, Tan YL, Cheng YX, Shi YZ, Tan ZJ. Salt-Dependent RNA Pseudoknot Stability: Effect of Spatial Confinement. Front Mol Biosci 2021; 8:666369. [PMID: 33928126 PMCID: PMC8078894 DOI: 10.3389/fmolb.2021.666369] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 03/17/2021] [Indexed: 12/27/2022] Open
Abstract
Macromolecules, such as RNAs, reside in crowded cell environments, which could strongly affect the folded structures and stability of RNAs. The emergence of RNA-driven phase separation in biology further stresses the potential functional roles of molecular crowding. In this work, we employed the coarse-grained model that was previously developed by us to predict 3D structures and stability of the mouse mammary tumor virus (MMTV) pseudoknot under different spatial confinements over a wide range of salt concentrations. The results show that spatial confinements can not only enhance the compactness and stability of MMTV pseudoknot structures but also weaken the dependence of the RNA structure compactness and stability on salt concentration. Based on our microscopic analyses, we found that the effect of spatial confinement on the salt-dependent RNA pseudoknot stability mainly comes through the spatial suppression of extended conformations, which are prevalent in the partially/fully unfolded states, especially at low ion concentrations. Furthermore, our comprehensive analyses revealed that the thermally unfolding pathway of the pseudoknot can be significantly modulated by spatial confinements, since the intermediate states with more extended conformations would loss favor when spatial confinements are introduced.
Collapse
Affiliation(s)
- Chenjie Feng
- Key Laboratory of Artificial Micro and Nano-structures of Ministry of Education, Center for Theoretical Physics, School of Physics and Technology, Wuhan University, Wuhan, China
| | - Ya-Lan Tan
- Key Laboratory of Artificial Micro and Nano-structures of Ministry of Education, Center for Theoretical Physics, School of Physics and Technology, Wuhan University, Wuhan, China
| | - Yu-Xuan Cheng
- Key Laboratory of Artificial Micro and Nano-structures of Ministry of Education, Center for Theoretical Physics, School of Physics and Technology, Wuhan University, Wuhan, China
| | - Ya-Zhou Shi
- Research Center of Nonlinear Science, School of Mathematics and Computer Science, Wuhan Textile University, Wuhan, China
| | - Zhi-Jie Tan
- Key Laboratory of Artificial Micro and Nano-structures of Ministry of Education, Center for Theoretical Physics, School of Physics and Technology, Wuhan University, Wuhan, China
| |
Collapse
|
37
|
Rivas E. Evolutionary conservation of RNA sequence and structure. WILEY INTERDISCIPLINARY REVIEWS-RNA 2021; 12:e1649. [PMID: 33754485 PMCID: PMC8250186 DOI: 10.1002/wrna.1649] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 02/24/2021] [Accepted: 02/25/2021] [Indexed: 12/22/2022]
Abstract
An RNA structure prediction from a single‐sequence RNA folding program is not evidence for an RNA whose structure is important for function. Random sequences have plausible and complex predicted structures not easily distinguishable from those of structural RNAs. How to tell when an RNA has a conserved structure is a question that requires looking at the evolutionary signature left by the conserved RNA. This question is important not just for long noncoding RNAs which usually lack an identified function, but also for RNA binding protein motifs which can be single stranded RNAs or structures. Here we review recent advances using sequence and structural analysis to determine when RNA structure is conserved or not. Although covariation measures assess structural RNA conservation, one must distinguish covariation due to RNA structure from covariation due to independent phylogenetic substitutions. We review a statistical test to measure false positives expected under the null hypothesis of phylogenetic covariation alone (specificity). We also review a complementary test that measures power, that is, expected covariation derived from sequence variation alone (sensitivity). Power in the absence of covariation signals the absence of a conserved RNA structure. We analyze artifacts that falsely identify conserved RNA structure such as the misuse of programs that do not assess significance, the use of inappropriate statistics confounded by signals other than covariation, or misalignments that induce spurious covariation. Among artifacts that obscure the signal of a conserved RNA structure, we discuss the inclusion of pseudogenes in alignments which increase power but destroy covariation. This article is categorized under:RNA Structure and Dynamics > RNA Structure, Dynamics and Chemistry RNA Evolution and Genomics > Computational Analyses of RNA RNA Evolution and Genomics > RNA and Ribonucleoprotein Evolution
Collapse
Affiliation(s)
- Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
| |
Collapse
|
38
|
Singh J, Paliwal K, Zhang T, Singh J, Litfin T, Zhou Y. Improved RNA Secondary Structure and Tertiary Base-pairing Prediction Using Evolutionary Profile, Mutational Coupling and Two-dimensional Transfer Learning. Bioinformatics 2021; 37:2589-2600. [PMID: 33704363 DOI: 10.1093/bioinformatics/btab165] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 02/05/2021] [Accepted: 03/08/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The recent discovery of numerous non-coding RNAs (long non-coding RNAs, in particular) has transformed our perception about the roles of RNAs in living organisms. Our ability to understand them, however, is hampered by our inability to solve their secondary and tertiary structures in high resolution efficiently by existing experimental techniques. Computational prediction of RNA secondary structure, on the other hand, has received much-needed improvement, recently, through deep learning of a large approximate data, followed by transfer learning with gold-standard base-pairing structures from high-resolution 3-D structures. Here, we expand this single-sequence-based learning to the use of evolutionary profiles and mutational coupling. RESULTS The new method allows large improvement not only in canonical base-pairs (RNA secondary structures) but more so in base-pairing associated with tertiary interactions such as pseudoknots, noncanonical and lone base-pairs. In particular, it is highly accurate for those RNAs of more than 1000 homologous sequences by achieving >0.8 F1-score (harmonic mean of sensitivity and precision) for 14/16 RNAs tested. The method can also significantly improve base-pairing prediction by incorporating artificial but functional homologous sequences generated from deep mutational scanning without any modification. The fully automatic method (publicly available as server and standalone software) should provide the scientific community a new powerful tool to capture not only the secondary structure but also tertiary base-pairing information for building three-dimensional models. It also highlights the future of accurately solving the base-pairing structure by using a large number of natural and/or artificial homologous sequences. AVAILABILITY Standalone-version of SPOT-RNA2 is available at https://github.com/jaswindersingh2/SPOT-RNA2. Direct prediction can also be made at https://sparks-lab.org/server/spot-rna2/. The datasets used in this research can also be downloaded from the GITHUB and the webserver mentioned above.
Collapse
Affiliation(s)
- Jaswinder Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Tongchuan Zhang
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Jaspreet Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Thomas Litfin
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| |
Collapse
|
39
|
Abstract
The molecules of the ribonucleic acid (RNA) perform a variety of vital roles in all living cells. Their biological function depends on their structure and dynamics, both of which are difficult to experimentally determine but can be theoretically inferred based on the RNA sequence. SimRNA is one of the computational methods for molecular simulations of RNA 3D structure formation. The method is based on a simplified (coarse-grained) representation of nucleotide chains, a statistically derived model of interactions (statistical potential), and the Monte Carlo method as a conformational sampling scheme.The current version of SimRNA (3.22) is able to predict basic topologies of RNA molecules with sizes up to about 50-70 nucleotides, based on their sequences only, and larger molecules if supplied with appropriate distance restraints. The user can specify various types of restraints, including secondary structure, pairwise atom-atom distances, and positions of atoms. SimRNA can be also used for studying systems composed of several chains of RNA. SimRNA is a folding simulations method, thus it allows for examining folding pathways, getting an approximate view of the energy landscapes.
Collapse
|
40
|
Green AG, Elhabashy H, Brock KP, Maddamsetti R, Kohlbacher O, Marks DS. Large-scale discovery of protein interactions at residue resolution using co-evolution calculated from genomic sequences. Nat Commun 2021; 12:1396. [PMID: 33654096 PMCID: PMC7925567 DOI: 10.1038/s41467-021-21636-z] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 01/27/2021] [Indexed: 12/28/2022] Open
Abstract
Increasing numbers of protein interactions have been identified in high-throughput experiments, but only a small proportion have solved structures. Recently, sequence coevolution-based approaches have led to a breakthrough in predicting monomer protein structures and protein interaction interfaces. Here, we address the challenges of large-scale interaction prediction at residue resolution with a fast alignment concatenation method and a probabilistic score for the interaction of residues. Importantly, this method (EVcomplex2) is able to assess the likelihood of a protein interaction, as we show here applied to large-scale experimental datasets where the pairwise interactions are unknown. We predict 504 interactions de novo in the E. coli membrane proteome, including 243 that are newly discovered. While EVcomplex2 does not require available structures, coevolving residue pairs can be used to produce structural models of protein interactions, as done here for membrane complexes including the Flagellar Hook-Filament Junction and the Tol/Pal complex. Our understanding of the residue-level details of protein interactions remains incomplete. Here, the authors show sequence coevolution can be used to infer interacting proteins with residue-level details, including predicting 467 interactions de novo in the Escherichia coli cell envelope proteome.
Collapse
Affiliation(s)
- Anna G Green
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Hadeer Elhabashy
- Biomolecular Interactions, Max Planck Institute for Developmental Biology, 72076, Tübingen, Germany.,Institute for Bioinformatics and Medical Informatics, University of Tübingen, Sand 14, 72076, Tübingen, Germany.,Department of Computer Science, University of Tübingen, WSI/ZBIT, Sand 14, 72076, Tübingen, Germany
| | - Kelly P Brock
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Rohan Maddamsetti
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Oliver Kohlbacher
- Biomolecular Interactions, Max Planck Institute for Developmental Biology, 72076, Tübingen, Germany. .,Institute for Bioinformatics and Medical Informatics, University of Tübingen, Sand 14, 72076, Tübingen, Germany. .,Department of Computer Science, University of Tübingen, WSI/ZBIT, Sand 14, 72076, Tübingen, Germany. .,Quantitative Biology Center, University of Tübingen, Auf der Morgenstelle 8, 72076, Tübingen, Germany. .,Institute for Translational Bioinformatics, University Hospital Tübingen, Sand 14, 72076, Tübingen, Germany.
| | - Debora S Marks
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Sand 14, 72076, Tübingen, Germany. .,Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA.
| |
Collapse
|
41
|
Calonaci N, Jones A, Cuturello F, Sattler M, Bussi G. Machine learning a model for RNA structure prediction. NAR Genom Bioinform 2021; 2:lqaa090. [PMID: 33575634 PMCID: PMC7671377 DOI: 10.1093/nargab/lqaa090] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 10/06/2020] [Accepted: 10/20/2020] [Indexed: 01/04/2023] Open
Abstract
RNA function crucially depends on its structure. Thermodynamic models currently used for secondary structure prediction rely on computing the partition function of folding ensembles, and can thus estimate minimum free-energy structures and ensemble populations. These models sometimes fail in identifying native structures unless complemented by auxiliary experimental data. Here, we build a set of models that combine thermodynamic parameters, chemical probing data (DMS and SHAPE) and co-evolutionary data (direct coupling analysis) through a network that outputs perturbations to the ensemble free energy. Perturbations are trained to increase the ensemble populations of a representative set of known native RNA structures. In the chemical probing nodes of the network, a convolutional window combines neighboring reactivities, enlightening their structural information content and the contribution of local conformational ensembles. Regularization is used to limit overfitting and improve transferability. The most transferable model is selected through a cross-validation strategy that estimates the performance of models on systems on which they are not trained. With the selected model we obtain increased ensemble populations for native structures and more accurate predictions in an independent validation set. The flexibility of the approach allows the model to be easily retrained and adapted to incorporate arbitrary experimental information.
Collapse
Affiliation(s)
- Nicola Calonaci
- International School for Advanced Studies, via Bonomea 265, 34136 Trieste, Italy
| | - Alisha Jones
- Institute of Structural Biology, Helmholtz Zentrum München, Neuherberg 85764, Germany
| | - Francesca Cuturello
- International School for Advanced Studies, via Bonomea 265, 34136 Trieste, Italy
| | - Michael Sattler
- Institute of Structural Biology, Helmholtz Zentrum München, Neuherberg 85764, Germany
| | - Giovanni Bussi
- International School for Advanced Studies, via Bonomea 265, 34136 Trieste, Italy
| |
Collapse
|
42
|
Marchand JA, Pierson Smela MD, Jordan THH, Narasimhan K, Church GM. TBDB: a database of structurally annotated T-box riboswitch:tRNA pairs. Nucleic Acids Res 2021; 49:D229-D235. [PMID: 32882008 PMCID: PMC7778990 DOI: 10.1093/nar/gkaa721] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 08/01/2020] [Accepted: 08/21/2020] [Indexed: 11/26/2022] Open
Abstract
T-box riboswitches constitute a large family of tRNA-binding leader sequences that play a central role in gene regulation in many gram-positive bacteria. Accurate inference of the tRNA binding to T-box riboswitches is critical to predict their cis-regulatory activity. However, there is no central repository of information on the tRNA binding specificities of T-box riboswitches, and de novo prediction of binding specificities requires advanced knowledge of computational tools to annotate riboswitch secondary structure features. Here, we present the T-box Riboswitch Annotation Database (TBDB, https://tbdb.io), an open-access database with a collection of 23,535 T-box riboswitch sequences, spanning the major phyla of 3,632 bacterial species. Among structural predictions, the TBDB also identifies specifier sequences, cognate tRNA binding partners, and downstream regulatory targets. To our knowledge, the TBDB presents the largest collection of feature, sequence, and structural annotations carried out on this important family of regulatory RNA.
Collapse
Affiliation(s)
- Jorge A Marchand
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Merrick D Pierson Smela
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.,Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, USA.,Wyss Institute for Biologically Inspired Engineering, Boston, MA 02115, USA
| | - Thomas H H Jordan
- Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Kamesh Narasimhan
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - George M Church
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.,Wyss Institute for Biologically Inspired Engineering, Boston, MA 02115, USA
| |
Collapse
|
43
|
Su H, Peng Z, Yang J. Recognition of small molecule-RNA binding sites using RNA sequence and structure. Bioinformatics 2021; 37:36-42. [PMID: 33416863 PMCID: PMC8034527 DOI: 10.1093/bioinformatics/btaa1092] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 12/12/2020] [Accepted: 12/23/2020] [Indexed: 11/22/2022] Open
Abstract
Motivation RNA molecules become attractive small molecule drug targets to treat disease in recent years. Computer-aided drug design can be facilitated by detecting the RNA sites that bind small molecules. However, very limited progress has been reported for the prediction of small molecule–RNA binding sites. Results We developed a novel method RNAsite to predict small molecule–RNA binding sites using sequence profile- and structure-based descriptors. RNAsite was shown to be competitive with the state-of-the-art methods on the experimental structures of two independent test sets. When predicted structure models were used, RNAsite outperforms other methods by a large margin. The possibility of improving RNAsite by geometry-based binding pocket detection was investigated. The influence of RNA structure’s flexibility and the conformational changes caused by ligand binding on RNAsite were also discussed. RNAsite is anticipated to be a useful tool for the design of RNA-targeting small molecule drugs. Availability and implementation http://yanglab.nankai.edu.cn/RNAsite. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hong Su
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, 300072, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| |
Collapse
|
44
|
Reinharz V, Tlusty T. αβDCA method identifies unspecific binding but specific disruption of the group I intron by the StpA chaperone. RNA (NEW YORK, N.Y.) 2020; 26:1530-1540. [PMID: 32747608 PMCID: PMC7566574 DOI: 10.1261/rna.074336.119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 07/19/2020] [Indexed: 06/11/2023]
Abstract
Chaperone proteins-the most disordered among all protein groups-help RNAs fold into their functional structure by destabilizing misfolded configurations or stabilizing the functional ones. But disentangling the mechanism underlying RNA chaperoning is challenging, mostly because of inherent disorder of the chaperones and the transient nature of their interactions with RNA. In particular, it is unclear how specific the interactions are and what role is played by amino acid charge and polarity patterns. Here, we address these questions in the RNA chaperone StpA. We adapted direct coupling analysis (DCA) into the αβDCA method that can treat in tandem sequences written in two alphabets, nucleotides and amino acids. With αβDCA, we could analyze StpA-RNA interactions and show consistency with a previously proposed two-pronged mechanism: StpA disrupts specific positions in the group I intron while globally and loosely binding to the entire structure. Moreover, the interactions are strongly associated with the charge pattern: Negatively charged regions in the destabilizing StpA amino-terminal affect a few specific positions in the RNA, located in stems and in the pseudoknot. In contrast, positive regions in the carboxy-terminal contain strongly coupled amino acids that promote nonspecific or weakly specific binding to the RNA. The present study opens new avenues to examine the functions of disordered proteins and to design disruptive proteins based on their charge patterns.
Collapse
Affiliation(s)
- Vladimir Reinharz
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan 44919, Republic of Korea
- Department of Computer Science, Université du Québec à Montréal, Montréal, H2X 3Y7, Canada
| | - Tsvi Tlusty
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan 44919, Republic of Korea
- Department of Physics, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea
| |
Collapse
|
45
|
Wilburn GW, Eddy SR. Remote homology search with hidden Potts models. PLoS Comput Biol 2020; 16:e1008085. [PMID: 33253143 PMCID: PMC7728182 DOI: 10.1371/journal.pcbi.1008085] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 12/10/2020] [Accepted: 10/27/2020] [Indexed: 12/03/2022] Open
Abstract
Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments.
Collapse
Affiliation(s)
- Grey W. Wilburn
- Department of Physics, Harvard University, Cambridge, Massachusetts, United States of America
| | - Sean R. Eddy
- Howard Hughes Medical Institute, Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
- John A Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
46
|
Li B, Cao Y, Westhof E, Miao Z. Advances in RNA 3D Structure Modeling Using Experimental Data. Front Genet 2020; 11:574485. [PMID: 33193680 PMCID: PMC7649352 DOI: 10.3389/fgene.2020.574485] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 09/02/2020] [Indexed: 12/26/2022] Open
Abstract
RNA is a unique bio-macromolecule that can both record genetic information and perform biological functions in a variety of molecular processes, including transcription, splicing, translation, and even regulating protein function. RNAs adopt specific three-dimensional conformations to enable their functions. Experimental determination of high-resolution RNA structures using x-ray crystallography is both laborious and demands expertise, thus, hindering our comprehension of RNA structural biology. The computational modeling of RNA structure was a milestone in the birth of bioinformatics. Although computational modeling has been greatly improved over the last decade showing many successful cases, the accuracy of such computational modeling is not only length-dependent but also varies according to the complexity of the structure. To increase credibility, various experimental data were integrated into computational modeling. In this review, we summarize the experiments that can be integrated into RNA structure modeling as well as the computational methods based on these experimental data. We also demonstrate how computational modeling can help the experimental determination of RNA structure. We highlight the recent advances in computational modeling which can offer reliable structure models using high-throughput experimental data.
Collapse
Affiliation(s)
- Bing Li
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
| | - Yang Cao
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
| | - Eric Westhof
- Architecture et Réactivité de l’ARN, Institut de Biologie Moléculaire et Cellulaire du CNRS, Université de Strasbourg, Strasbourg, France
| | - Zhichao Miao
- Translational Research Institute of Brain and Brain-Like Intelligence, Department of Anesthesiology, Shanghai Fourth People’s Hospital Affiliated to Tongji University School of Medicine, Shanghai, China
- Newcastle Fibrosis Research Group, Institute of Cellular Medicine, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, United Kingdom
| |
Collapse
|
47
|
Tian P, Best RB. Exploring the sequence fitness landscape of a bridge between protein folds. PLoS Comput Biol 2020; 16:e1008285. [PMID: 33048928 PMCID: PMC7553338 DOI: 10.1371/journal.pcbi.1008285] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 08/24/2020] [Indexed: 12/15/2022] Open
Abstract
Most foldable protein sequences adopt only a single native fold. Recent protein design studies have, however, created protein sequences which fold into different structures apon changes of environment, or single point mutation, the best characterized example being the switch between the folds of the GA and GB binding domains of streptococcal protein G. To obtain further insight into the design of sequences which can switch folds, we have used a computational model for the fitness landscape of a single fold, built from the observed sequence variation of protein homologues. We have recently shown that such coevolutionary models can be used to design novel foldable sequences. By appropriately combining two of these models to describe the joint fitness landscape of GA and GB, we are able to describe the propensity of a given sequence for each of the two folds. We have successfully tested the combined model against the known series of designed GA/GB hybrids. Using Monte Carlo simulations on this landscape, we are able to identify pathways of mutations connecting the two folds. In the absence of a requirement for domain stability, the most frequent paths go via sequences in which neither domain is stably folded, reminiscent of the propensity for certain intrinsically disordered proteins to fold into different structures according to context. Even if the folded state is required to be stable, we find that there is nonetheless still a wide range of sequences which are close to the transition region and therefore likely fold switches, consistent with recent estimates that fold switching may be more widespread than had been thought.
Collapse
Affiliation(s)
- Pengfei Tian
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, U.S.A
| | - Robert B. Best
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, U.S.A
| |
Collapse
|
48
|
Abstract
The ribosome translates the genetic code into proteins in all domains of life. Its size and complexity demand long-range interactions that regulate ribosome function. These interactions are largely unknown. Here, we apply a global coevolution method, statistical coupling analysis (SCA), to identify coevolving residue networks (sectors) within the 23S ribosomal RNA (rRNA) of the large ribosomal subunit. As in proteins, SCA reveals a hierarchical organization of evolutionary constraints with near-independent groups of nucleotides forming physically contiguous networks within the three-dimensional structure. Using a quantitative, continuous-culture-with-deep-sequencing assay, we confirm that the top two SCA-predicted sectors contribute to ribosome function. These sectors map to distinct ribosome activities, and their origins trace to phylogenetic divergences across all domains of life. These findings provide a foundation to map ribosome allostery, explore ribosome biogenesis, and engineer ribosomes for new functions. Despite differences in chemical structure, protein and RNA enzymes appear to share a common internal logic of interaction and assembly.
Collapse
|
49
|
Watkins AM, Rangan R, Das R. FARFAR2: Improved De Novo Rosetta Prediction of Complex Global RNA Folds. Structure 2020; 28:963-976.e6. [PMID: 32531203 PMCID: PMC7415647 DOI: 10.1016/j.str.2020.05.011] [Citation(s) in RCA: 121] [Impact Index Per Article: 24.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 04/27/2020] [Accepted: 05/20/2020] [Indexed: 01/01/2023]
Abstract
Predicting RNA three-dimensional structures from sequence could accelerate understanding of the growing number of RNA molecules being discovered across biology. Rosetta's Fragment Assembly of RNA with Full-Atom Refinement (FARFAR) has shown promise in community-wide blind RNA-Puzzle trials, but lack of a systematic and automated benchmark has left unclear what limits FARFAR performance. Here, we benchmark FARFAR2, an algorithm integrating RNA-Puzzle-inspired innovations with updated fragment libraries and helix modeling. In 16 of 21 RNA-Puzzles revisited without experimental data or expert intervention, FARFAR2 recovers native-like structures more accurate than models submitted during the RNA-Puzzles trials. Remaining bottlenecks include conformational sampling for >80-nucleotide problems and scoring function limitations more generally. Supporting these conclusions, preregistered blind models for adenovirus VA-I RNA and five riboswitch complexes predicted native-like folds with 3- to 14 Å root-mean-square deviation accuracies. We present a FARFAR2 webserver and three large model archives (FARFAR2-Classics, FARFAR2-Motifs, and FARFAR2-Puzzles) to guide future applications and advances.
Collapse
Affiliation(s)
- Andrew Martin Watkins
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ramya Rangan
- Biophysics Program, Stanford University, Stanford, CA 94305, USA
| | - Rhiju Das
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA 94305, USA; Biophysics Program, Stanford University, Stanford, CA 94305, USA.
| |
Collapse
|
50
|
PRIME-3D2D is a 3D2D model to predict binding sites of protein-RNA interaction. Commun Biol 2020; 3:384. [PMID: 32678300 PMCID: PMC7366699 DOI: 10.1038/s42003-020-1114-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 06/29/2020] [Indexed: 11/08/2022] Open
Abstract
Protein-RNA interaction participates in many biological processes. So, studying protein–RNA interaction can help us to understand the function of protein and RNA. Although the protein–RNA 3D3D model, like PRIME, was useful in building 3D structural complexes, it can’t be used genome-wide, due to lacking RNA 3D structures. To take full advantage of RNA secondary structures revealed from high-throughput sequencing, we present PRIME-3D2D to predict binding sites of protein–RNA interaction. PRIME-3D2D is almost as good as PRIME at modeling protein–RNA complexes. PRIME-3D2D can be used to predict binding sites on PDB data (MCC = 0.75/0.70 for binding sites in protein/RNA) and transcription-wide (MCC = 0.285 for binding sites in RNA). Testing on PDB and yeast transcription-wide data show that PRIME-3D2D performs better than other binding sites predictor. So, PRIME-3D2D can be used to predict the binding sites both on PDB and genome-wide, and it’s freely available. Xie et al. report a new computational method PRIME-3D2D to predict binding sites of protein–RNA interaction by considering protein 3D structure and RNA 2D structure. It is freely available, performs better than other binding sites predictor and is as good as PRIME to model protein–RNA complex.
Collapse
|