1
|
Tse V, Guiterrez M, Townley J, Romano J, Pearl J, Chacaltana G, Players E, Das R, Sanford JR, Stone MD. OpenASO: RNA Rescue - designing splice-modulating antisense oligonucleotides through community science. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.15.618608. [PMID: 39463988 PMCID: PMC11507933 DOI: 10.1101/2024.10.15.618608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
Splice-modulating antisense oligonucleotides (ASOs) are precision RNA-based drugs that are becoming an established modality to treat human disease. Previously, we reported the discovery of ASOs that target a novel, putative intronic RNA structure to rescue splicing of multiple pathogenic variants of F8 exon 16 that cause hemophilia A. However, the conventional approach to discovering splice-modulating ASOs is both laborious and expensive. Here, we describe an alternative paradigm that integrates data-driven RNA structure prediction and community science to discover splice-modulating ASOs. Using a splicing-deficient pathogenic variant of F8 exon 16 as a model, we show that 25% of the top-scoring molecules designed in the Eterna OpenASO challenge have a statistically significant impact on enhancing exon 16 splicing. Additionally, we show that a distinct combination of ASOs designed by Eterna players can additively enhance the inclusion of the splicing-deficient exon 16 variant. Together, our data suggests that crowdsourcing designs from a community of citizen scientists may accelerate the discovery of splice-modulating ASOs with potential to treat human disease.
Collapse
Affiliation(s)
- Victor Tse
- Department of Molecular, Cell and Developmental Biology, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
- Center for Molecular Biology of RNA, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Martin Guiterrez
- Department of Molecular, Cell and Developmental Biology, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
- Center for Molecular Biology of RNA, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Jill Townley
- Eterna Massive Open Laboratory. Consortium authors listed in Supplemental Table 1
| | - Jonathan Romano
- Eterna Massive Open Laboratory. Consortium authors listed in Supplemental Table 1
- Howard Hughes Medical Institute, Stanford, CA 94305, USA
| | - Jennifer Pearl
- Eterna Massive Open Laboratory. Consortium authors listed in Supplemental Table 1
| | - Guillermo Chacaltana
- Department of Chemistry and Biochemistry, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
- Center for Molecular Biology of RNA, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Eterna Players
- Eterna Massive Open Laboratory. Consortium authors listed in Supplemental Table 1
| | - Rhiju Das
- Department of Biochemistry, Stanford University, Stanford, CA 94305, USA
- Department of Physics, Stanford University, Stanford, CA 94305, USA
- Eterna Massive Open Laboratory. Consortium authors listed in Supplemental Table 1
- Howard Hughes Medical Institute, Stanford, CA 94305, USA
| | - Jeremy R. Sanford
- Department of Molecular, Cell and Developmental Biology, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
- Center for Molecular Biology of RNA, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Michael D. Stone
- Department of Chemistry and Biochemistry, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
- Center for Molecular Biology of RNA, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| |
Collapse
|
2
|
Muñoz-Velasco I, Cruz-González A, Hernández-Morales R, Campillo-Balderas JA, Cottom-Salas W, Jácome R, Vázquez-Salazar A. Pioneering role of RNA in the early evolution of life. Genet Mol Biol 2024; 47Suppl 1:e20240028. [PMID: 39437147 PMCID: PMC11445735 DOI: 10.1590/1678-4685-gmb-2024-0028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 06/26/2024] [Indexed: 10/25/2024] Open
Abstract
The catalytic, regulatory and structural properties of RNA, combined with their extraordinary ubiquity in cellular processes, are consistent with the proposal that this molecule played a much more conspicuous role in heredity and metabolism during the early stages of biological evolution. This review explores the pivotal role of RNA in the earliest life forms and its relevance in modern biological systems. It examines current models that study the early evolution of life, providing insights into the primordial RNA world and its legacy in contemporary biology.
Collapse
Affiliation(s)
- Israel Muñoz-Velasco
- Universidad Nacional Autónoma de México, Facultad de Ciencias, Departamento de Biología Celular, Mexico City, Mexico
| | - Adrián Cruz-González
- Universidad Nacional Autónoma de México, Facultad de Ciencias, Departamento de Biología Evolutiva, Mexico City, Mexico
| | - Ricardo Hernández-Morales
- Universidad Nacional Autónoma de México, Facultad de Ciencias, Departamento de Biología Evolutiva, Mexico City, Mexico
| | | | - Wolfgang Cottom-Salas
- Universidad Nacional Autónoma de México, Facultad de Ciencias, Departamento de Biología Evolutiva, Mexico City, Mexico
| | - Rodrigo Jácome
- Universidad Nacional Autónoma de México, Facultad de Ciencias, Departamento de Biología Evolutiva, Mexico City, Mexico
| | - Alberto Vázquez-Salazar
- University of California Los Angeles, Department of Chemical and Biomolecular Engineering, California, USA
| |
Collapse
|
3
|
Matthies MC, Krueger R, Torda AE, Ward M. Differentiable partition function calculation for RNA. Nucleic Acids Res 2024; 52:e14. [PMID: 38038257 PMCID: PMC10853804 DOI: 10.1093/nar/gkad1168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 10/24/2023] [Accepted: 11/28/2023] [Indexed: 12/02/2023] Open
Abstract
Ribonucleic acid (RNA) is an essential molecule in a wide range of biological functions. In 1990, McCaskill introduced a dynamic programming algorithm for computing the partition function of an RNA sequence. McCaskill's algorithm is widely used today for understanding the thermodynamic properties of RNA. In this work, we introduce a generalization of McCaskill's algorithm that is well-defined over continuous inputs. Crucially, this enables us to implement an end-to-end differentiable partition function calculation. The derivative can be computed with respect to the input, or to any other fixed values, such as the parameters of the energy model. This builds a bridge between RNA thermodynamics and the tools of differentiable programming including deep learning as it enables the partition function to be incorporated directly into any end-to-end differentiable pipeline. To demonstrate the effectiveness of our new approach, we tackle the inverse folding problem directly using gradient optimization. We find that using the gradient to optimize the sequence directly is sufficient to arrive at sequences with a high probability of folding into the desired structure. This indicates that the gradients we compute are meaningful.
Collapse
Affiliation(s)
- Marco C Matthies
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, 20146 Hamburg, Germany
| | - Ryan Krueger
- Department of Applied Mathematics, Harvard University, 29 Oxford St, Cambridge, MA 02138, USA
| | - Andrew E Torda
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, 20146 Hamburg, Germany
| | - Max Ward
- Department of Computer Science and Software Engineering, The University of Western Australia, 241, 35 Stirling Hwy, Crawley, WA 6009, Australia
| |
Collapse
|
4
|
Pham TM, Miffin T, Sun H, Sharp KK, Wang X, Zhu M, Hoshika S, Peterson RJ, Benner SA, Kahn JD, Mathews DH. DNA Structure Design Is Improved Using an Artificially Expanded Alphabet of Base Pairs Including Loop and Mismatch Thermodynamic Parameters. ACS Synth Biol 2023; 12:2750-2763. [PMID: 37671922 PMCID: PMC10510751 DOI: 10.1021/acssynbio.3c00358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Indexed: 09/07/2023]
Abstract
We show that in silico design of DNA secondary structures is improved by extending the base pairing alphabet beyond A-T and G-C to include the pair between 2-amino-8-(1'-β-d-2'-deoxyribofuranosyl)-imidazo-[1,2-a]-1,3,5-triazin-(8H)-4-one and 6-amino-3-(1'-β-d-2'-deoxyribofuranosyl)-5-nitro-(1H)-pyridin-2-one, abbreviated as P and Z. To obtain the thermodynamic parameters needed to include P-Z pairs in the designs, we performed 47 optical melting experiments and combined the results with previous work to fit free energy and enthalpy nearest neighbor folding parameters for P-Z pairs and G-Z wobble pairs. We find G-Z pairs have stability comparable to that of A-T pairs and should therefore be included as base pairs in structure prediction and design algorithms. Additionally, we extrapolated the set of loop, terminal mismatch, and dangling end parameters to include the P and Z nucleotides. These parameters were incorporated into the RNAstructure software package for secondary structure prediction and analysis. Using the RNAstructure Design program, we solved 99 of the 100 design problems posed by Eterna using the ACGT alphabet or supplementing it with P-Z pairs. Extending the alphabet reduced the propensity of sequences to fold into off-target structures, as evaluated by the normalized ensemble defect (NED). The NED values were improved relative to those from the Eterna example solutions in 91 of 99 cases in which Eterna-player solutions were provided. P-Z-containing designs had average NED values of 0.040, significantly below the 0.074 of standard-DNA-only designs, and inclusion of the P-Z pairs decreased the time needed to converge on a design. This work provides a sample pipeline for inclusion of any expanded alphabet nucleotides into prediction and design workflows.
Collapse
Affiliation(s)
- Tuan M. Pham
- Department
of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, United States
| | - Terrel Miffin
- Department
of Chemistry & Biochemistry, University
of Maryland, College
Park, Maryland 20742, United States
| | - Hongying Sun
- Department
of Surgery, University of Rochester Medical
Center, Rochester, New York 14642, United States
| | - Kenneth K. Sharp
- Department
of Chemistry & Biochemistry, University
of Maryland, College
Park, Maryland 20742, United States
| | - Xiaoyu Wang
- Department
of Chemistry & Biochemistry, University
of Maryland, College
Park, Maryland 20742, United States
| | - Mingyi Zhu
- Department
of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, United States
| | - Shuichi Hoshika
- Foundation
for Applied Molecular Evolution, Alachua, Florida 32615, United States
| | | | - Steven A. Benner
- Foundation
for Applied Molecular Evolution, Alachua, Florida 32615, United States
| | - Jason D. Kahn
- Department
of Chemistry & Biochemistry, University
of Maryland, College
Park, Maryland 20742, United States
| | - David H. Mathews
- Department
of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, United States
| |
Collapse
|
5
|
Zhou T, Dai N, Li S, Ward M, Mathews DH, Huang L. RNA design via structure-aware multifrontier ensemble optimization. Bioinformatics 2023; 39:i563-i571. [PMID: 37387188 DOI: 10.1093/bioinformatics/btad252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION RNA design is the search for a sequence or set of sequences that will fold to desired structure, also known as the inverse problem of RNA folding. However, the sequences designed by existing algorithms often suffer from low ensemble stability, which worsens for long sequence design. Additionally, for many methods only a small number of sequences satisfying the MFE criterion can be found by each run of design. These drawbacks limit their use cases. RESULTS We propose an innovative optimization paradigm, SAMFEO, which optimizes ensemble objectives (equilibrium probability or ensemble defect) by iterative search and yields a very large number of successfully designed RNA sequences as byproducts. We develop a search method which leverages structure level and ensemble level information at different stages of the optimization: initialization, sampling, mutation, and updating. Our work, while being less complicated than others, is the first algorithm that is able to design thousands of RNA sequences for the puzzles from the Eterna100 benchmark. In addition, our algorithm solves the most Eterna100 puzzles among all the general optimization based methods in our study. The only baseline solving more puzzles than our work is dependent on handcrafted heuristics designed for a specific folding model. Surprisingly, our approach shows superiority on designing long sequences for structures adapted from the database of 16S Ribosomal RNAs. AVAILABILITY AND IMPLEMENTATION Our source code and data used in this article is available at https://github.com/shanry/SAMFEO.
Collapse
Affiliation(s)
- Tianshuo Zhou
- School of Electrical Engineering and Computer Science, Oregon State University, Corvalli OR 97330, United States
| | - Ning Dai
- School of Electrical Engineering and Computer Science, Oregon State University, Corvalli OR 97330, United States
| | - Sizhen Li
- School of Electrical Engineering and Computer Science, Oregon State University, Corvalli OR 97330, United States
| | - Max Ward
- Department of Computer Science and Software Engineering, The University of Western Australia, Perth, Australia
| | - David H Mathews
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY 14642, United States
- Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, United States
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 14642, United States
| | - Liang Huang
- School of Electrical Engineering and Computer Science, Oregon State University, Corvalli OR 97330, United States
| |
Collapse
|
6
|
Hudson-Smith NV, Alvarez-Reyes W, Yao X, He J, Rodriguez RS, Mitchell S, Abed MM, Spanolios E, Krause MOP, Haynes CL. NanoAdventure: Development of a Text-Based Adventure Game in English, Spanish, and Chinese for Communicating about Nanotechnology and the Nanoscale. JOURNAL OF CHEMICAL EDUCATION 2023; 100:2269-2280. [PMID: 38221949 PMCID: PMC10786637 DOI: 10.1021/acs.jchemed.3c00042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
Video games and immersive, narrative experiences are often called upon to help students understand difficult scientific concepts, such as sense of scale. However, the development of educational video games requires expertise and, frequently, a sizable budget. Here, we report on the use of an interactive text-style video game, NanoAdventure, to communicate about sense of scale and nanotechnology to the public. NanoAdventure was developed on an open-source, free-to-use platform with simple coding and enhanced with free or low-cost assets. NanoAdventure was launched in three languages (English, Spanish, Chinese) and compared to textbook-style and blog-style control texts in a randomized study. Participants answered questions on their knowledge of nanotechnology and their attitudes toward nanotechnology before and after reading one randomly assigned text (textbook, blog, or NanoAdventure game). Our results demonstrate that interactive fiction is effective in communicating about sense of scale and nanotechnology as well as the relevance of nanotechnology to a general public. NanoAdventure was found to be the most "fun" and easy to read of all text styles by participants in a randomized trial. Here, we make the case for interactive "Choose Your Own Adventure" style games as another effective tool among educational game models for chemistry and science communication.
Collapse
Affiliation(s)
- Natalie V. Hudson-Smith
- Department of Chemistry, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Wilanyi Alvarez-Reyes
- Department of Chemistry, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Xiaoxiao Yao
- Department of Chemistry, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Jiayi He
- Department of Chemistry, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Rebeca Sarahi Rodriguez
- Department of Chemistry, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Stephanie Mitchell
- Department of Chemistry, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Mahmoud Matar Abed
- Department of Chemistry, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Eleni Spanolios
- Department of Chemistry, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Miriam O. P. Krause
- Department of Chemistry, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Christy L. Haynes
- Department of Chemistry, University of Minnesota, Minneapolis, Minnesota 55455, United States
| |
Collapse
|
7
|
Pham TM, Miffin T, Sun H, Sharp KK, Wang X, Zhu M, Hoshika S, Peterson RJ, Benner SA, Kahn JD, Mathews DH. DNA Structure Design Is Improved Using an Artificially Expanded Alphabet of Base Pairs Including Loop and Mismatch Thermodynamic Parameters. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.06.543917. [PMID: 37333404 PMCID: PMC10274641 DOI: 10.1101/2023.06.06.543917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
We show that in silico design of DNA secondary structures is improved by extending the base pairing alphabet beyond A-T and G-C to include the pair between 2-amino-8-(1'-β-D-2'-deoxyribofuranosyl)-imidazo-[1,2- a ]-1,3,5-triazin-(8 H )-4-one and 6-amino-3-(1'-β-D-2'-deoxyribofuranosyl)-5-nitro-(1 H )-pyridin-2-one, simply P and Z. To obtain the thermodynamic parameters needed to include P-Z pairs in the designs, we performed 47 optical melting experiments and combined the results with previous work to fit a new set of free energy and enthalpy nearest neighbor folding parameters for P-Z pairs and G-Z wobble pairs. We find that G-Z pairs have stability comparable to A-T pairs and therefore should be considered quantitatively by structure prediction and design algorithms. Additionally, we extrapolated the set of loop, terminal mismatch, and dangling end parameters to include P and Z nucleotides. These parameters were incorporated into the RNAstructure software package for secondary structure prediction and analysis. Using the RNAstructure Design program, we solved 99 of the 100 design problems posed by Eterna using the ACGT alphabet or supplementing with P-Z pairs. Extending the alphabet reduced the propensity of sequences to fold into off-target structures, as evaluated by the normalized ensemble defect (NED). The NED values were improved relative to those from the Eterna example solutions in 91 of 99 cases where Eterna-player solutions were provided. P-Z-containing designs had average NED values of 0.040, significantly below the 0.074 of standard-DNA-only designs, and inclusion of the P-Z pairs decreased the time needed to converge on a design. This work provides a sample pipeline for inclusion of any expanded alphabet nucleotides into prediction and design workflows.
Collapse
Affiliation(s)
- Tuan M. Pham
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY
| | - Terrel Miffin
- Department of Chemistry & Biochemistry, University of Maryland, College Park, MD
| | - Hongying Sun
- Department of Surgery, University of Rochester Medical Center, Rochester, NY
| | - Kenneth K. Sharp
- Department of Chemistry & Biochemistry, University of Maryland, College Park, MD
| | - Xiaoyu Wang
- Department of Chemistry & Biochemistry, University of Maryland, College Park, MD
| | - Mingyi Zhu
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY
| | | | | | | | - Jason D. Kahn
- Department of Chemistry & Biochemistry, University of Maryland, College Park, MD
| | - David H. Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY
| |
Collapse
|
8
|
Dunkel H, Wehrmann H, Jensen LR, Kuss AW, Simm S. MncR: Late Integration Machine Learning Model for Classification of ncRNA Classes Using Sequence and Structural Encoding. Int J Mol Sci 2023; 24:8884. [PMID: 37240230 PMCID: PMC10218863 DOI: 10.3390/ijms24108884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 05/11/2023] [Accepted: 05/13/2023] [Indexed: 05/28/2023] Open
Abstract
Non-coding RNA (ncRNA) classes take over important housekeeping and regulatory functions and are quite heterogeneous in terms of length, sequence conservation and secondary structure. High-throughput sequencing reveals that the expressed novel ncRNAs and their classification are important to understand cell regulation and identify potential diagnostic and therapeutic biomarkers. To improve the classification of ncRNAs, we investigated different approaches of utilizing primary sequences and secondary structures as well as the late integration of both using machine learning models, including different neural network architectures. As input, we used the newest version of RNAcentral, focusing on six ncRNA classes, including lncRNA, rRNA, tRNA, miRNA, snRNA and snoRNA. The late integration of graph-encoded structural features and primary sequences in our MncR classifier achieved an overall accuracy of >97%, which could not be increased by more fine-grained subclassification. In comparison to the actual best-performing tool ncRDense, we had a minimal increase of 0.5% in all four overlapping ncRNA classes on a similar test set of sequences. In summary, MncR is not only more accurate than current ncRNA prediction tools but also allows the prediction of long ncRNA classes (lncRNAs, certain rRNAs) up to 12.000 nts and is trained on a more diverse ncRNA dataset retrieved from RNAcentral.
Collapse
Affiliation(s)
- Heiko Dunkel
- Institute of Bioinformatics, University Medicine Greifswald, Walther-Rathenau Str. 48, 17489 Greifswald, Germany
| | - Henning Wehrmann
- Department of Biosciences, Molecular Cell Biology of Plants, Goethe University, 60438 Frankfurt am Main, Germany
| | - Lars R. Jensen
- Human Molecular Genetics Group, Department of Functional Genomics, Interfaculty Institute of Genetics and Functional Genomics, University Medicine Greifswald, 17475 Greifswald, Germany
| | - Andreas W. Kuss
- Human Molecular Genetics Group, Department of Functional Genomics, Interfaculty Institute of Genetics and Functional Genomics, University Medicine Greifswald, 17475 Greifswald, Germany
| | - Stefan Simm
- Institute of Bioinformatics, University Medicine Greifswald, Walther-Rathenau Str. 48, 17489 Greifswald, Germany
| |
Collapse
|
9
|
Schaffter SW, Wintenberg ME, Murphy TM, Strychalski EA. Design Approaches to Expand the Toolkit for Building Cotranscriptionally Encoded RNA Strand Displacement Circuits. ACS Synth Biol 2023; 12:1546-1561. [PMID: 37134273 DOI: 10.1021/acssynbio.3c00079] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Cotranscriptionally encoded RNA strand displacement (ctRSD) circuits are an emerging tool for programmable molecular computation, with potential applications spanning in vitro diagnostics to continuous computation inside living cells. In ctRSD circuits, RNA strand displacement components are continuously produced together via transcription. These RNA components can be rationally programmed through base pairing interactions to execute logic and signaling cascades. However, the small number of ctRSD components characterized to date limits circuit size and capabilities. Here, we characterize over 200 ctRSD gate sequences, exploring different input, output, and toehold sequences and changes to other design parameters, including domain lengths, ribozyme sequences, and the order in which gate strands are transcribed. This characterization provides a library of sequence domains for engineering ctRSD components, i.e., a toolkit, enabling circuits with up to 4-fold more inputs than previously possible. We also identify specific failure modes and systematically develop design approaches that reduce the likelihood of failure across different gate sequences. Lastly, we show the ctRSD gate design is robust to changes in transcriptional encoding, opening a broad design space for applications in more complex environments. Together, these results deliver an expanded toolkit and design approaches for building ctRSD circuits that will dramatically extend capabilities and potential applications.
Collapse
Affiliation(s)
- Samuel W Schaffter
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| | - Molly E Wintenberg
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| | - Terence M Murphy
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| | | |
Collapse
|
10
|
Krüger A, Watkins AM, Wellington-Oguri R, Romano J, Kofman C, DeFoe A, Kim Y, Anderson-Lee J, Fisker E, Townley J, d'Aquino AE, Das R, Jewett MC. Community science designed ribosomes with beneficial phenotypes. Nat Commun 2023; 14:961. [PMID: 36810740 PMCID: PMC9944925 DOI: 10.1038/s41467-023-35827-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 01/04/2023] [Indexed: 02/23/2023] Open
Abstract
Functional design of ribosomes with mutant ribosomal RNA (rRNA) can expand opportunities for understanding molecular translation, building cells from the bottom-up, and engineering ribosomes with altered capabilities. However, such efforts are hampered by cell viability constraints, an enormous combinatorial sequence space, and limitations on large-scale, 3D design of RNA structures and functions. To address these challenges, we develop an integrated community science and experimental screening approach for rational design of ribosomes. This approach couples Eterna, an online video game that crowdsources RNA sequence design to community scientists in the form of puzzles, with in vitro ribosome synthesis, assembly, and translation in multiple design-build-test-learn cycles. We apply our framework to discover mutant rRNA sequences that improve protein synthesis in vitro and cell growth in vivo, relative to wild type ribosomes, under diverse environmental conditions. This work provides insights into rRNA sequence-function relationships and has implications for synthetic biology.
Collapse
Affiliation(s)
- Antje Krüger
- Department of Chemical and Biological Engineering, Chemistry of Life Processes Institute, and Center for Synthetic Biology, Northwestern University, Evanston, IL, 60208, USA.,Resilience US Inc, 9310 Athena Circle, La Jolla, CA, 92037, USA
| | - Andrew M Watkins
- Department of Biochemistry, Stanford University, Stanford, CA, 94305, USA.,Prescient Design, Genentech, 1 DNA Way, South San Francisco, CA, 94080, USA
| | | | - Jonathan Romano
- Department of Biochemistry, Stanford University, Stanford, CA, 94305, USA.,Eterna Massive Open Laboratory, Stanford, CA, 94305, USA.,Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY, 14260, USA
| | - Camila Kofman
- Department of Chemical and Biological Engineering, Chemistry of Life Processes Institute, and Center for Synthetic Biology, Northwestern University, Evanston, IL, 60208, USA
| | - Alysse DeFoe
- Department of Chemical and Biological Engineering, Chemistry of Life Processes Institute, and Center for Synthetic Biology, Northwestern University, Evanston, IL, 60208, USA
| | - Yejun Kim
- Department of Chemical and Biological Engineering, Chemistry of Life Processes Institute, and Center for Synthetic Biology, Northwestern University, Evanston, IL, 60208, USA
| | | | - Eli Fisker
- Eterna Massive Open Laboratory, Stanford, CA, 94305, USA
| | - Jill Townley
- Eterna Massive Open Laboratory, Stanford, CA, 94305, USA
| | | | - Anne E d'Aquino
- Department of Chemical and Biological Engineering, Chemistry of Life Processes Institute, and Center for Synthetic Biology, Northwestern University, Evanston, IL, 60208, USA
| | - Rhiju Das
- Department of Biochemistry, Stanford University, Stanford, CA, 94305, USA. .,Howard Hughes Medical Institute, Stanford University, Stanford, CA, 94305, USA.
| | - Michael C Jewett
- Department of Chemical and Biological Engineering, Chemistry of Life Processes Institute, and Center for Synthetic Biology, Northwestern University, Evanston, IL, 60208, USA. .,Robert H. Lurie Comprehensive Cancer Center and Simpson Querrey Institute, Northwestern University, Chicago, IL, 60611, USA.
| |
Collapse
|
11
|
RNA secondary structure packages evaluated and improved by high-throughput experiments. Nat Methods 2022; 19:1234-1242. [PMID: 36192461 PMCID: PMC9839360 DOI: 10.1038/s41592-022-01605-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 08/10/2022] [Indexed: 01/17/2023]
Abstract
Despite the popularity of computer-aided study and design of RNA molecules, little is known about the accuracy of commonly used structure modeling packages in tasks sensitive to ensemble properties of RNA. Here, we demonstrate that the EternaBench dataset, a set of more than 20,000 synthetic RNA constructs designed on the RNA design platform Eterna, provides incisive discriminative power in evaluating current packages in ensemble-oriented structure prediction tasks. We find that CONTRAfold and RNAsoft, packages with parameters derived through statistical learning, achieve consistently higher accuracy than more widely used packages in their standard settings, which derive parameters primarily from thermodynamic experiments. We hypothesized that training a multitask model with the varied data types in EternaBench might improve inference on ensemble-based prediction tasks. Indeed, the resulting model, named EternaFold, demonstrated improved performance that generalizes to diverse external datasets including complete messenger RNAs, viral genomes probed in human cells and synthetic designs modeling mRNA vaccines.
Collapse
|
12
|
Merleau NSC, Smerlak M. aRNAque: an evolutionary algorithm for inverse pseudoknotted RNA folding inspired by Lévy flights. BMC Bioinformatics 2022; 23:335. [PMID: 35964008 PMCID: PMC9375295 DOI: 10.1186/s12859-022-04866-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 07/29/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We study in this work the inverse folding problem for RNA, which is the discovery of sequences that fold into given target secondary structures. RESULTS We implement a Lévy mutation scheme in an updated version of aRNAque an evolutionary inverse folding algorithm and apply it to the design of RNAs with and without pseudoknots. We find that the Lévy mutation scheme increases the diversity of designed RNA sequences and reduces the average number of evaluations of the evolutionary algorithm. Compared to antaRNA, aRNAque CPU time is higher but more successful in finding designed sequences that fold correctly into the target structures. CONCLUSION We propose that a Lévy flight offers a better standard mutation scheme for optimizing RNA design. Our new version of aRNAque is available on GitHub as a python script and the benchmark results show improved performance on both Pseudobase++ and the Eterna100 datasets, compared to existing inverse folding tools.
Collapse
Affiliation(s)
- Nono S. C. Merleau
- Max Planck Institute for Mathematics in the Sciences, Inselstrasse 22, 04103 Leipzig, Germany
| | - Matteo Smerlak
- Max Planck Institute for Mathematics in the Sciences, Inselstrasse 22, 04103 Leipzig, Germany
| |
Collapse
|
13
|
Schaffter SW, Strychalski EA. Cotranscriptionally encoded RNA strand displacement circuits. SCIENCE ADVANCES 2022; 8:eabl4354. [PMID: 35319994 PMCID: PMC8942360 DOI: 10.1126/sciadv.abl4354] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 02/01/2022] [Indexed: 05/21/2023]
Abstract
Engineered molecular circuits that process information in biological systems could address emerging human health and biomanufacturing needs. However, such circuits can be difficult to rationally design and scale. DNA-based strand displacement reactions have demonstrated the largest and most computationally powerful molecular circuits to date but are limited in biological systems due to the difficulty in genetically encoding components. Here, we develop scalable cotranscriptionally encoded RNA strand displacement (ctRSD) circuits that are rationally programmed via base pairing interactions. ctRSD circuits address the limitations of DNA-based strand displacement circuits by isothermally producing circuit components via transcription. We demonstrate circuit programmability in vitro by implementing logic and amplification elements, as well as multilayer cascades. Furthermore, we show that circuit kinetics are accurately predicted by a simple model of coupled transcription and strand displacement, enabling model-driven design. We envision ctRSD circuits will enable the rational design of powerful molecular circuits that operate in biological systems, including living cells.
Collapse
|
14
|
Wayment-Steele HK, Kladwang W, Watkins AM, Kim DS, Tunguz B, Reade W, Demkin M, Romano J, Wellington-Oguri R, Nicol JJ, Gao J, Onodera K, Fujikawa K, Mao H, Vandewiele G, Tinti M, Steenwinckel B, Ito T, Noumi T, He S, Ishi K, Lee Y, Öztürk F, Chiu KY, Öztürk E, Amer K, Fares M, Das R. Deep learning models for predicting RNA degradation via dual crowdsourcing. NAT MACH INTELL 2022; 4:1174-1184. [PMID: 36567960 PMCID: PMC9771809 DOI: 10.1038/s42256-022-00571-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 10/21/2022] [Indexed: 12/16/2022]
Abstract
Medicines based on messenger RNA (mRNA) hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a key task in designing more stable RNA-based therapeutics. Here, we describe a crowdsourced machine learning competition ('Stanford OpenVaccine') on Kaggle, involving single-nucleotide resolution measurements on 6,043 diverse 102-130-nucleotide RNA constructs that were themselves solicited through crowdsourcing on the RNA design platform Eterna. The entire experiment was completed in less than 6 months, and 41% of nucleotide-level predictions from the winning model were within experimental error of the ground truth measurement. Furthermore, these models generalized to blindly predicting orthogonal degradation data on much longer mRNA molecules (504-1,588 nucleotides) with improved accuracy compared with previously published models. These results indicate that such models can represent in-line hydrolysis with excellent accuracy, supporting their use for designing stabilized messenger RNAs. The integration of two crowdsourcing platforms, one for dataset creation and another for machine learning, may be fruitful for other urgent problems that demand scientific discovery on rapid timescales.
Collapse
Affiliation(s)
- Hannah K. Wayment-Steele
- grid.168010.e0000000419368956Department of Chemistry, Stanford University, Stanford, CA USA ,grid.497584.30000 0004 6761 3573Eterna Massive Open Laboratory, Stanford, CA USA
| | - Wipapat Kladwang
- grid.497584.30000 0004 6761 3573Eterna Massive Open Laboratory, Stanford, CA USA ,grid.168010.e0000000419368956Department of Biochemistry, Stanford University, Stanford, CA USA
| | - Andrew M. Watkins
- grid.497584.30000 0004 6761 3573Eterna Massive Open Laboratory, Stanford, CA USA ,grid.168010.e0000000419368956Department of Biochemistry, Stanford University, Stanford, CA USA ,grid.418158.10000 0004 0534 4718Prescient Design, Genentech, San Francisco, CA USA
| | - Do Soon Kim
- grid.497584.30000 0004 6761 3573Eterna Massive Open Laboratory, Stanford, CA USA ,grid.168010.e0000000419368956Department of Biochemistry, Stanford University, Stanford, CA USA
| | - Bojan Tunguz
- grid.168010.e0000000419368956Department of Biochemistry, Stanford University, Stanford, CA USA ,grid.451133.10000 0004 0458 4453NVIDIA Corporation, Santa Clara, CA USA
| | | | | | - Jonathan Romano
- grid.497584.30000 0004 6761 3573Eterna Massive Open Laboratory, Stanford, CA USA ,grid.168010.e0000000419368956Department of Biochemistry, Stanford University, Stanford, CA USA ,grid.273335.30000 0004 1936 9887Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY USA
| | | | - John J. Nicol
- grid.497584.30000 0004 6761 3573Eterna Massive Open Laboratory, Stanford, CA USA
| | | | | | | | | | - Gilles Vandewiele
- grid.5342.00000 0001 2069 7798IDLab, Ghent University, Technologiepark-Zwijnaarde, Gent, Belgium
| | - Michele Tinti
- grid.8241.f0000 0004 0397 2876The Wellcome Centre for Anti-Infectives Research, College of Life Sciences, University of Dundee, Dundee, UK
| | - Bram Steenwinckel
- grid.5342.00000 0001 2069 7798IDLab, Ghent University, Technologiepark-Zwijnaarde, Gent, Belgium
| | | | - Taiga Noumi
- grid.497111.b0000 0004 0570 906XKeyence Corporation, 1-3-14, Higashi-Nakajima, Higashi-Yodogawa-ku, Osaka, Japan
| | - Shujun He
- grid.264756.40000 0004 4687 2082Department of Chemical Engineering, Texas A&M University, College Station, TX USA
| | | | - Youhan Lee
- grid.418964.60000 0001 0742 3338Korea Atomic Energy Research Institute, Daejeon, Republic of Korea ,Kakao Brain Corp, Seongnam, Gyeonggi-do Republic of Korea
| | | | | | | | - Karim Amer
- grid.440877.80000 0004 0377 5987Center for Informatics Science, Nile University, Sheikh Zayed, Giza, Egypt
| | - Mohamed Fares
- grid.440877.80000 0004 0377 5987Center for Informatics Science, Nile University, Sheikh Zayed, Giza, Egypt ,grid.419725.c0000 0001 2151 8157National Research Centre, Dokki, Cairo, Egypt
| | | | - Rhiju Das
- grid.497584.30000 0004 6761 3573Eterna Massive Open Laboratory, Stanford, CA USA ,grid.168010.e0000000419368956Department of Biochemistry, Stanford University, Stanford, CA USA ,grid.168010.e0000000419368956Howard Hughes Medical Institute, Stanford University, Stanford, CA USA
| |
Collapse
|
15
|
Wayment-Steele HK, Kladwang W, Watkins AM, Kim DS, Tunguz B, Reade W, Demkin M, Romano J, Wellington-Oguri R, Nicol JJ, Gao J, Onodera K, Fujikawa K, Mao H, Vandewiele G, Tinti M, Steenwinckel B, Ito T, Noumi T, He S, Ishi K, Lee Y, Öztürk F, Chiu A, Öztürk E, Amer K, Fares M, Participants E, Das R. Deep learning models for predicting RNA degradation via dual crowdsourcing. ARXIV 2021:arXiv:2110.07531v2. [PMID: 34671698 PMCID: PMC8528079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Revised: 04/22/2022] [Indexed: 12/31/2022]
Abstract
Messenger RNA-based medicines hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a key task in designing more stable RNA-based therapeutics. Here, we describe a crowdsourced machine learning competition ("Stanford OpenVaccine") on Kaggle, involving single-nucleotide resolution measurements on 6043 102-130-nucleotide diverse RNA constructs that were themselves solicited through crowdsourcing on the RNA design platform Eterna. The entire experiment was completed in less than 6 months, and 41% of nucleotide-level predictions from the winning model were within experimental error of the ground truth measurement. Furthermore, these models generalized to blindly predicting orthogonal degradation data on much longer mRNA molecules (504-1588 nucleotides) with improved accuracy compared to previously published models. Top teams integrated natural language processing architectures and data augmentation techniques with predictions from previous dynamic programming models for RNA secondary structure. These results indicate that such models are capable of representing in-line hydrolysis with excellent accuracy, supporting their use for designing stabilized messenger RNAs. The integration of two crowdsourcing platforms, one for data set creation and another for machine learning, may be fruitful for other urgent problems that demand scientific discovery on rapid timescales.
Collapse
Affiliation(s)
- Hannah K Wayment-Steele
- Department of Chemistry, Stanford University, Stanford, California 94305, USA
- Eterna Massive Open Laboratory
| | - Wipapat Kladwang
- Department of Biochemistry, Stanford University, California 94305, USA
- Eterna Massive Open Laboratory
| | - Andrew M Watkins
- Department of Biochemistry, Stanford University, California 94305, USA
- Eterna Massive Open Laboratory
| | - Do Soon Kim
- Department of Biochemistry, Stanford University, California 94305, USA
- Eterna Massive Open Laboratory
| | - Bojan Tunguz
- Department of Biochemistry, Stanford University, California 94305, USA
- NVIDIA Corporation, Santa Clara, California 95051
| | | | | | - Jonathan Romano
- Department of Biochemistry, Stanford University, California 94305, USA
- Eterna Massive Open Laboratory
- Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, New York, 14260, USA
| | | | | | - Jiayang Gao
- High-flyer AI, Hangzhou, Zhejiang, China, 310000
| | | | | | - Hanfei Mao
- Yanfu Investments, Shanghai, China, 200000
| | - Gilles Vandewiele
- IDLab, Ghent University, Technologiepark-Zwijnaarde, Gent, Belgium, B-9052
| | - Michele Tinti
- College of Life Sciences, University of Dundee, Dundee DD1 4HN, United Kingdom
| | - Bram Steenwinckel
- IDLab, Ghent University, Technologiepark-Zwijnaarde, Gent, Belgium, B-9052
| | - Takuya Ito
- Universal Knowledge Inc., Tokyo 150-0013, Japan
| | - Taiga Noumi
- Keyence Corporation, 1-3-14, Higashi-Nakajima, Higashi-Yodogawa-ku, Osaka, 533-8555, Japan
| | - Shujun He
- Department of Chemical Engineering, Texas A&M University, College Station, TX 77843
| | | | - Youhan Lee
- Kakao Brain, Seongnam, Gyeonggi-do, Republic of Korea
| | | | | | | | - Karim Amer
- Center for Informatics Science, Nile University, Sheikh Zayed, Giza, Egypt, 12588
| | - Mohamed Fares
- National Research Centre, Dokki, Cairo, Egypt, 12622
| | | | - Rhiju Das
- Department of Biochemistry, Stanford University, California 94305, USA
- Eterna Massive Open Laboratory
- Department of Physics, Stanford University, California 94305, USA
| |
Collapse
|
16
|
Minuesa G, Alsina C, Garcia-Martin JA, Oliveros J, Dotu I. MoiRNAiFold: a novel tool for complex in silico RNA design. Nucleic Acids Res 2021; 49:4934-4943. [PMID: 33956139 PMCID: PMC8136780 DOI: 10.1093/nar/gkab331] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Revised: 04/09/2021] [Accepted: 04/21/2021] [Indexed: 12/23/2022] Open
Abstract
Novel tools for in silico design of RNA constructs such as riboregulators are required in order to reduce time and cost to production for the development of diagnostic and therapeutic advances. Here, we present MoiRNAiFold, a versatile and user-friendly tool for de novo synthetic RNA design. MoiRNAiFold is based on Constraint Programming and it includes novel variable types, heuristics and restart strategies for Large Neighborhood Search. Moreover, this software can handle dozens of design constraints and quality measures and improves features for RNA regulation control of gene expression, such as Translation Efficiency calculation. We demonstrate that MoiRNAiFold outperforms any previous software in benchmarking structural RNA puzzles from EteRNA. Importantly, with regard to biologically relevant RNA designs, we focus on RNA riboregulators, demonstrating that the designed RNA sequences are functional both in vitro and in vivo. Overall, we have generated a powerful tool for de novo complex RNA design that we make freely available as a web server (https://moiraibiodesign.com/design/).
Collapse
Affiliation(s)
- Gerard Minuesa
- Moirai Biodesign, c/ Baldiri Reixach s/n, Parc Científic de Barcelona (PCB), 08028 Barcelona, Spain
| | - Cristina Alsina
- Moirai Biodesign, c/ Baldiri Reixach s/n, Parc Científic de Barcelona (PCB), 08028 Barcelona, Spain
| | - Juan Antonio Garcia-Martin
- Bioinformatics for Genomics and Proteomics. National Centre for Biotechnology (CNB-CSIC). c/ Darwin 3, 28049 Madrid, Spain
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Universidad Carlos III de Madrid, 28911 Madrid, Spain
| | - Juan Carlos Oliveros
- Bioinformatics for Genomics and Proteomics. National Centre for Biotechnology (CNB-CSIC). c/ Darwin 3, 28049 Madrid, Spain
| | - Ivan Dotu
- Moirai Biodesign, c/ Baldiri Reixach s/n, Parc Científic de Barcelona (PCB), 08028 Barcelona, Spain
| |
Collapse
|
17
|
Yallapragada VVB, Xu T, Walker SP, Tabirca S, Tangney M. Pepblock Builder VR - An Open-Source Tool for Gaming-Based Bio-Edutainment in Interactive Protein Design. Front Bioeng Biotechnol 2021; 9:674211. [PMID: 34055764 PMCID: PMC8160467 DOI: 10.3389/fbioe.2021.674211] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Accepted: 03/24/2021] [Indexed: 11/13/2022] Open
Abstract
Proteins mediate and perform various fundamental functions of life. This versatility of protein function is an attribute of its 3D structure. In recent years, our understanding of protein 3D structure has been complemented with advances in computational and mathematical tools for protein modelling and protein design. 3D molecular visualisation is an essential part in every protein design and protein modelling workflow. Over the years, stand-alone and web-based molecular visualisation tools have been used to emulate three-dimensional view on computers. The advent of virtual reality provided the scope for immersive control of molecular visualisation. While these technologies have significantly improved our insights into protein modelling, designing new proteins with a defined function remains a complicated process. Current tools to design proteins lack user-interactivity and demand high computational skills. In this work, we present the Pepblock Builder VR, a gaming-based molecular visualisation tool for bio-edutainment and understanding protein design. Simulating the concepts of protein design and incorporating gaming principles into molecular visualisation promotes effective game-based learning. Unlike traditional sequence-based protein design and fragment-based stitching, the Pepblock Builder VR provides a building block style environment for complex structure building. This provides users a unique visual structure building experience. Furthermore, the inclusion of virtual reality to the Pepblock Builder VR brings immersive learning and provides users with "being there" experience in protein visualisation. The Pepblock Builder VR works both as a stand-alone and VR-based application, and with a gamified user interface, the Pepblock Builder VR aims to expand the horizons of scientific data generation to the masses.
Collapse
Affiliation(s)
- Venkata V. B. Yallapragada
- Cancer Research @ UCC, University College Cork, Cork, Ireland
- SynBioCentre, University College Cork, Cork, Ireland
| | - Tianshu Xu
- School of Computer Science and Information Technology, University College Cork, Cork, Ireland
| | - Sidney P. Walker
- Cancer Research @ UCC, University College Cork, Cork, Ireland
- SynBioCentre, University College Cork, Cork, Ireland
| | - Sabin Tabirca
- School of Computer Science and Information Technology, University College Cork, Cork, Ireland
- Department of Computer Science, Transylvania University of Braşov, Braşov, Romania
| | - Mark Tangney
- Cancer Research @ UCC, University College Cork, Cork, Ireland
- SynBioCentre, University College Cork, Cork, Ireland
- APC Microbiome Ireland, University College Cork, Cork, Ireland
- iEd Hub, University College Cork, Cork, Ireland
| |
Collapse
|
18
|
Schlick T, Portillo-Ledesma S, Myers CG, Beljak L, Chen J, Dakhel S, Darling D, Ghosh S, Hall J, Jan M, Liang E, Saju S, Vohr M, Wu C, Xu Y, Xue E. Biomolecular Modeling and Simulation: A Prospering Multidisciplinary Field. Annu Rev Biophys 2021; 50:267-301. [PMID: 33606945 PMCID: PMC8105287 DOI: 10.1146/annurev-biophys-091720-102019] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
We reassess progress in the field of biomolecular modeling and simulation, following up on our perspective published in 2011. By reviewing metrics for the field's productivity and providing examples of success, we underscore the productive phase of the field, whose short-term expectations were overestimated and long-term effects underestimated. Such successes include prediction of structures and mechanisms; generation of new insights into biomolecular activity; and thriving collaborations between modeling and experimentation, including experiments driven by modeling. We also discuss the impact of field exercises and web games on the field's progress. Overall, we note tremendous success by the biomolecular modeling community in utilization of computer power; improvement in force fields; and development and application of new algorithms, notably machine learning and artificial intelligence. The combined advances are enhancing the accuracy andscope of modeling and simulation, establishing an exemplary discipline where experiment and theory or simulations are full partners.
Collapse
Affiliation(s)
- Tamar Schlick
- Department of Chemistry, New York University, New York, New York 10003, USA;
- Courant Institute of Mathematical Sciences, New York University, New York, New York 10012, USA
- New York University-East China Normal University Center for Computational Chemistry, New York University Shanghai, Shanghai 200122, China
| | | | - Christopher G Myers
- Department of Chemistry, New York University, New York, New York 10003, USA;
| | - Lauren Beljak
- College of Arts and Science, New York University, New York, New York 10003, USA
| | - Justin Chen
- College of Arts and Science, New York University, New York, New York 10003, USA
| | - Sami Dakhel
- College of Arts and Science, New York University, New York, New York 10003, USA
| | - Daniel Darling
- College of Arts and Science, New York University, New York, New York 10003, USA
| | - Sayak Ghosh
- College of Arts and Science, New York University, New York, New York 10003, USA
| | - Joseph Hall
- College of Arts and Science, New York University, New York, New York 10003, USA
| | - Mikaeel Jan
- College of Arts and Science, New York University, New York, New York 10003, USA
| | - Emily Liang
- College of Arts and Science, New York University, New York, New York 10003, USA
| | - Sera Saju
- College of Arts and Science, New York University, New York, New York 10003, USA
| | - Mackenzie Vohr
- College of Arts and Science, New York University, New York, New York 10003, USA
| | - Chris Wu
- College of Arts and Science, New York University, New York, New York 10003, USA
| | - Yifan Xu
- College of Arts and Science, New York University, New York, New York 10003, USA
| | - Eva Xue
- College of Arts and Science, New York University, New York, New York 10003, USA
| |
Collapse
|
19
|
Thavarajah W, Hertz LM, Bushhouse DZ, Archuleta CM, Lucks JB. RNA Engineering for Public Health: Innovations in RNA-Based Diagnostics and Therapeutics. Annu Rev Chem Biomol Eng 2021; 12:263-286. [PMID: 33900805 PMCID: PMC9714562 DOI: 10.1146/annurev-chembioeng-101420-014055] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
RNA is essential for cellular function: From sensing intra- and extracellular signals to controlling gene expression, RNA mediates a diverse and expansive list of molecular processes. A long-standing goal of synthetic biology has been to develop RNA engineering principles that can be used to harness and reprogram these RNA-mediated processes to engineer biological systems to solve pressing global challenges. Recent advances in the field of RNA engineering are bringing this to fruition, enabling the creation of RNA-based tools to combat some of the most urgent public health crises. Specifically, new diagnostics using engineered RNAs are able to detect both pathogens and chemicals while generating an easily detectable fluorescent signal as an indicator. New classes of vaccines and therapeutics are also using engineered RNAs to target a wide range of genetic and pathogenic diseases. Here, we discuss the recent breakthroughs in RNA engineering enabling these innovations and examine how advances in RNA design promise to accelerate the impact of engineered RNA systems.
Collapse
Affiliation(s)
- Walter Thavarajah
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois 60208, USA; .,Center for Synthetic Biology, Northwestern University, Evanston, Illinois 60208, USA.,Center for Water Research, Northwestern University, Evanston, Illinois 60208, USA
| | - Laura M Hertz
- Center for Synthetic Biology, Northwestern University, Evanston, Illinois 60208, USA.,Interdisciplinary Biological Sciences Graduate Program, Northwestern University, Evanston, Illinois 60208, USA
| | - David Z Bushhouse
- Center for Synthetic Biology, Northwestern University, Evanston, Illinois 60208, USA.,Interdisciplinary Biological Sciences Graduate Program, Northwestern University, Evanston, Illinois 60208, USA
| | - Chloé M Archuleta
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois 60208, USA; .,Center for Synthetic Biology, Northwestern University, Evanston, Illinois 60208, USA.,Center for Water Research, Northwestern University, Evanston, Illinois 60208, USA
| | - Julius B Lucks
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois 60208, USA; .,Center for Synthetic Biology, Northwestern University, Evanston, Illinois 60208, USA.,Center for Water Research, Northwestern University, Evanston, Illinois 60208, USA.,Center for Engineering Sustainability and Resilience, Northwestern University, Evanston, Illinois 60208, USA
| |
Collapse
|
20
|
Inverse RNA Folding Workflow to Design and Test Ribozymes that Include Pseudoknots. Methods Mol Biol 2021; 2167:113-143. [PMID: 32712918 DOI: 10.1007/978-1-0716-0716-9_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Ribozymes are RNAs that catalyze reactions. They occur in nature, and can also be evolved in vitro to catalyze novel reactions. This chapter provides detailed protocols for using inverse folding software to design a ribozyme sequence that will fold to a known ribozyme secondary structure and for testing the catalytic activity of the sequence experimentally. This protocol is able to design sequences that include pseudoknots, which is important as all naturally occurring full-length ribozymes have pseudoknots. The starting point is the known pseudoknot-containing secondary structure of the ribozyme and knowledge of any nucleotides whose identity is required for function. The output of the protocol is a set of sequences that have been tested for function. Using this protocol, we were previously successful at designing highly active double-pseudoknotted HDV ribozymes.
Collapse
|
21
|
Galizi R, Duncan JN, Rostain W, Quinn CM, Storch M, Kushwaha M, Jaramillo A. Engineered RNA-Interacting CRISPR Guide RNAs for Genetic Sensing and Diagnostics. CRISPR J 2020; 3:398-408. [PMID: 33095053 DOI: 10.1089/crispr.2020.0029] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
CRISPR guide RNAs (gRNAs) can be programmed with relative ease to allow the genetic editing of nearly any DNA or RNA sequence. Here, we propose novel molecular architectures to achieve RNA-dependent modulation of CRISPR activity in response to specific RNA molecules. We designed and tested, in both living Escherichia coli cells and cell-free assays for rapid prototyping, cis-repressed RNA-interacting guide RNA (igRNA) that switch to their active state only upon interaction with small RNA fragments or long RNA transcripts, including pathogen-derived mRNAs of medical relevance such as the human immunodeficiency virus infectivity factor. The proposed CRISPR-igRNAs are fully customizable and easily adaptable to the majority if not all the available CRISPR-Cas variants to modulate a variety of genetic functions in response to specific cellular conditions, providing orthogonal activation and increased specificity. We thereby foresee a large scope of application for therapeutic, diagnostic, and biotech applications in both prokaryotic and eukaryotic systems.
Collapse
Affiliation(s)
- Roberto Galizi
- Centre for Applied Entomology and Parasitology, School of Life Sciences, Keele University, Keele, United Kingdom
- Department of Life Sciences, Imperial College London, London, United Kingdom
| | - John N Duncan
- School of Life Sciences, University of Warwick, Coventry, United Kingdom
| | - William Rostain
- School of Life Sciences, University of Warwick, Coventry, United Kingdom
| | - Charlotte M Quinn
- Department of Life Sciences, Imperial College London, London, United Kingdom
- Department of Vector Biology, Liverpool School of Tropical Medicine, Liverpool, United Kingdom
| | - Marko Storch
- Department of Life Sciences, Imperial College London, London, United Kingdom
- London Biofoundry, Imperial College London, London, United Kingdom
| | - Manish Kushwaha
- School of Life Sciences, University of Warwick, Coventry, United Kingdom
- Université Paris-Saclay, INRAE, AgroParisTech, MIcalis Institute, Paris, France
| | - Alfonso Jaramillo
- School of Life Sciences, University of Warwick, Coventry, United Kingdom
- Warwick Integrative Synthetic Biology Centre (WISB), University of Warwick, Coventry, United Kingdom
| |
Collapse
|
22
|
Badu S, Melnik R, Singh S. Mathematical and computational models of RNA nanoclusters and their applications in data-driven environments. MOLECULAR SIMULATION 2020. [DOI: 10.1080/08927022.2020.1804564] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Shyam Badu
- MS2Discovery Interdisciplinary Research Institute, Wilfrid Laurier University, Waterloo, Ontario, Canada
| | - Roderick Melnik
- MS2Discovery Interdisciplinary Research Institute, Wilfrid Laurier University, Waterloo, Ontario, Canada
- BCAM-Basque Center for Applied Mathematics, Bilbao, Spain
| | - Sundeep Singh
- MS2Discovery Interdisciplinary Research Institute, Wilfrid Laurier University, Waterloo, Ontario, Canada
| |
Collapse
|
23
|
Wellington-Oguri R, Fisker E, Zada M, Wiley M, Townley J, Players E. Evidence of an Unusual Poly(A) RNA Signature Detected by High-Throughput Chemical Mapping. Biochemistry 2020; 59:2041-2046. [PMID: 32412236 DOI: 10.1021/acs.biochem.0c00215] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Homopolymeric adenosine RNA plays numerous roles in both cells and noncellular genetic material. We report herein an unusual poly(A) signature in chemical mapping data generated by the Eterna Massive Open Laboratory. Poly(A) sequences of length seven or more show unexpected results in the selective 2'-hydroxyl acylation read out by primer extension (SHAPE) and dimethyl sulfate (DMS) chemical probing. This unusual signature first appears in poly(A) sequences of length seven and grows to its maximum strength at length ∼10. In a long poly(A) sequence, the substitution of a single A by any other nucleotide disrupts the signature, but only for the 6 or so nucleotides on the 5' side of the substitution.
Collapse
Affiliation(s)
- Roger Wellington-Oguri
- Eterna Massive Open Laboratory, Stanford University, B400 Beckman Center, Stanford, California 93405, United States
| | - Eli Fisker
- Eterna Massive Open Laboratory, Stanford University, B400 Beckman Center, Stanford, California 93405, United States
| | - Mathew Zada
- Eterna Massive Open Laboratory, Stanford University, B400 Beckman Center, Stanford, California 93405, United States
| | - Michelle Wiley
- Eterna Massive Open Laboratory, Stanford University, B400 Beckman Center, Stanford, California 93405, United States
| | - Jill Townley
- Eterna Massive Open Laboratory, Stanford University, B400 Beckman Center, Stanford, California 93405, United States
| | - Eterna Players
- Eterna Massive Open Laboratory, Stanford University, B400 Beckman Center, Stanford, California 93405, United States
| |
Collapse
|
24
|
Abstract
A ribonucleic acid (RNA) sequence is a word over an alphabet on four elements [Formula: see text] called bases. RNA sequences fold into secondary structures where some bases pair with one another, while others remain unpaired. The two fundamental problems in RNA algorithmic are to predict how sequences fold within some models of energy and to design sequences of bases that will fold into targeted secondary structures. Predicting how a given RNA sequence folds into a pseudoknot-free secondary structure is known to be solvable in cubic time since the eighties and in truly subcubic time by a recent result of Bringmann et al. (FOCS, 2016), whereas Lyngsø has shown it is computationally hard if pseudoknots are allowed (ICALP, 2004). As a stark contrast, it is unknown whether or not designing a given RNA secondary structure is a tractable task; this has been raised as a challenging open question by Condon (ICALP, 2003). Because of its crucial importance in a number of fields such as pharmaceutical research and biochemistry, there are dozens of heuristics and software libraries dedicated to the RNA secondary structure design. It is therefore rather surprising that the computational complexity of this central problem in bioinformatics has been unsettled for decades. In this article, we show that in the simplest model of energy, which is the Watson-Crick model, the design of secondary structures is computationally hard if one adds natural constraints of the form: index i of the sequence has to be labeled by base b. This negative result suggests that the same lower bound holds for more realistic models of energy. It is noteworthy that the additional constraints are by no means artificial: they are provided by all the RNA design pieces of software and they do correspond to the actual practice (e.g., the instances of the EteRNA project).
Collapse
Affiliation(s)
- Édouard Bonnet
- Univ Lyon, CNRS, ENS de Lyon, Université Claude Bernard Lyon 1, LIP UMR5668, Lyon, France
| | - Paweł Rzążewski
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland.,Faculty of Mathematics, Informatics and Mechanics, Institute of Informatics, University of Warsaw, Warsaw, Poland
| | - Florian Sikora
- Université Paris-Dauphine, PSL University, CNRS, LAMSADE, Paris, France
| |
Collapse
|
25
|
Lu W, Tang Y, Wu H, Huang H, Fu Q, Qiu J, Li H. Predicting RNA secondary structure via adaptive deep recurrent neural networks with energy-based filter. BMC Bioinformatics 2019; 20:684. [PMID: 31874602 PMCID: PMC6929275 DOI: 10.1186/s12859-019-3258-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Background RNA secondary structure prediction is an important issue in structural bioinformatics, and RNA pseudoknotted secondary structure prediction represents an NP-hard problem. Recently, many different machine-learning methods, Markov models, and neural networks have been employed for this problem, with encouraging results regarding their predictive accuracy; however, their performances are usually limited by the requirements of the learning model and over-fitting, which requires use of a fixed number of training features. Because most natural biological sequences have variable lengths, the sequences have to be truncated before the features are employed by the learning model, which not only leads to the loss of information but also destroys biological-sequence integrity. Results To address this problem, we propose an adaptive sequence length based on deep-learning model and integrate an energy-based filter to remove the over-fitting base pairs. Conclusions Comparative experiments conducted on an authoritative dataset RNA STRAND (RNA secondary STRucture and statistical Analysis Database) revealed a 12% higher accuracy relative to three currently used methods.
Collapse
Affiliation(s)
- Weizhong Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiang, 215000, China
| | - Ye Tang
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiang, 215000, China
| | - Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiang, 215000, China. .,Anhui Key Laboratory of Intelligent Building Energy Efficiency, Anhui Jianzhu University, Hefei, Anhui, 230601, China.
| | - Hongmei Huang
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiang, 215000, China
| | - Qiming Fu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiang, 215000, China
| | - Jing Qiu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiang, 215000, China
| | - Haiou Li
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiang, 215000, China
| |
Collapse
|
26
|
Guerrini CJ, Lewellyn M, Majumder MA, Trejo M, Canfield I, McGuire AL. Donors, authors, and owners: how is genomic citizen science addressing interests in research outputs? BMC Med Ethics 2019; 20:84. [PMID: 31752834 PMCID: PMC6868686 DOI: 10.1186/s12910-019-0419-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Accepted: 10/14/2019] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Citizen science is increasingly prevalent in the biomedical sciences, including the field of human genomics. Genomic citizen science initiatives present new opportunities to engage individuals in scientific discovery, but they also are provoking new questions regarding who owns the outputs of the research, including intangible ideas and discoveries and tangible writings, tools, technologies, and products. The legal and ethical claims of participants to research outputs become stronger-and also more likely to conflict with those of institution-based researchers and other stakeholders-as participants become more involved, quantitatively and qualitatively, in the research process. It is not yet known, however, how genomic citizen science initiatives are managing the interests of their participants in accessing and controlling research outputs in practice. To help fill this gap, we conducted an in-depth review of relevant policies and practices of U.S.-based genomic citizen science initiatives. METHODS We queried the peer-reviewed literature and grey literature to identify 22 genomic citizen science initiatives that satisfied six inclusion criteria. A data collection form was used to capture initiative features, policies, and practices relevant to participants' access to and control over research outputs. RESULTS This analysis revealed that the genomic citizen science landscape is diverse and includes many initiatives that do not have institutional affiliations. Two trends that are in apparent tension were identified: commercialization and operationalization of a philosophy of openness. While most initiatives supported participants' access to research outputs, including datasets and published findings, none supported participants' control over results via intellectual property, licensing, or commercialization rights. However, several initiatives disclaimed their own rights to profit from outputs. CONCLUSIONS There are opportunities for citizen science initiatives to incorporate more features that support participants' access to and control over research outputs, consistent with their specific objectives, operations, and technical capabilities.
Collapse
Affiliation(s)
- Christi J Guerrini
- Baylor College of Medicine, Center for Medical Ethics and Health Policy, 1 Baylor Plaza, Houston, TX, 77030, USA.
| | - Meaganne Lewellyn
- Baylor College of Medicine, Center for Medical Ethics and Health Policy, 1 Baylor Plaza, Houston, TX, 77030, USA
| | - Mary A Majumder
- Baylor College of Medicine, Center for Medical Ethics and Health Policy, 1 Baylor Plaza, Houston, TX, 77030, USA
| | - Meredith Trejo
- Baylor College of Medicine, Center for Medical Ethics and Health Policy, 1 Baylor Plaza, Houston, TX, 77030, USA
| | - Isabel Canfield
- Baylor College of Medicine, Center for Medical Ethics and Health Policy, 1 Baylor Plaza, Houston, TX, 77030, USA
| | - Amy L McGuire
- Baylor College of Medicine, Center for Medical Ethics and Health Policy, 1 Baylor Plaza, Houston, TX, 77030, USA
| |
Collapse
|
27
|
Das R, Keep B, Washington P, Riedel-Kruse IH. Scientific Discovery Games for Biomedical Research. Annu Rev Biomed Data Sci 2019; 2:253-279. [PMID: 34308269 DOI: 10.1146/annurev-biodatasci-072018-021139] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Over the past decade, scientific discovery games (SDGs) have emerged as a viable approach for biomedical research, engaging hundreds of thousands of volunteer players and resulting in numerous scientific publications. After describing the origins of this novel research approach, we review the scientific output of SDGs across molecular modeling, sequence alignment, neuroscience, pathology, cellular biology, genomics, and human cognition. We find compelling results and technical innovations arising in problem-oriented games such as Foldit and Eterna and in data-oriented games such as EyeWire and Project Discovery. We discuss emergent properties of player communities shared across different projects, including the diversity of communities and the extraordinary contributions of some volunteers, such as paper writing. Finally, we highlight connections to artificial intelligence, biological cloud laboratories, new game genres, science education, and open science that may drive the next generation of SDGs.
Collapse
Affiliation(s)
- Rhiju Das
- Department of Biochemistry and Department of Physics, Stanford University, Stanford, California 94305, USA
| | - Benjamin Keep
- Department of Learning Sciences, Stanford University, Stanford, California 94305, USA
| | - Peter Washington
- Department of Bioengineering, Stanford University, Stanford, California 94305, USA
| | | |
Collapse
|
28
|
Koodli RV, Keep B, Coppess KR, Portela F, Das R. EternaBrain: Automated RNA design through move sets and strategies from an Internet-scale RNA videogame. PLoS Comput Biol 2019; 15:e1007059. [PMID: 31247029 PMCID: PMC6597038 DOI: 10.1371/journal.pcbi.1007059] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Accepted: 04/30/2019] [Indexed: 11/18/2022] Open
Abstract
Emerging RNA-based approaches to disease detection and gene therapy require RNA sequences that fold into specific base-pairing patterns, but computational algorithms generally remain inadequate for these secondary structure design tasks. The Eterna project has crowdsourced RNA design to human video game players in the form of puzzles that reach extraordinary difficulty. Here, we demonstrate that Eterna participants' moves and strategies can be leveraged to improve automated computational RNA design. We present an eternamoves-large repository consisting of 1.8 million of player moves on 12 of the most-played Eterna puzzles as well as an eternamoves-select repository of 30,477 moves from the top 72 players on a select set of more advanced puzzles. On eternamoves-select, we present a multilayer convolutional neural network (CNN) EternaBrain that achieves test accuracies of 51% and 34% in base prediction and location prediction, respectively, suggesting that top players' moves are partially stereotyped. Pipelining this CNN's move predictions with single-action-playout (SAP) of six strategies compiled by human players solves 61 out of 100 independent puzzles in the Eterna100 benchmark. EternaBrain-SAP outperforms previously published RNA design algorithms and achieves similar or better performance than a newer generation of deep learning methods, while being largely orthogonal to these other methods. Our study provides useful lessons for future efforts to achieve human-competitive performance with automated RNA design algorithms.
Collapse
Affiliation(s)
- Rohan V. Koodli
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, United States of America
| | - Benjamin Keep
- Department of Education, Stanford University, Stanford, CA, United States of America
| | - Katherine R. Coppess
- Department of Physics, Stanford University, Stanford, CA, United States of America
| | - Fernando Portela
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, United States of America
| | | | - Rhiju Das
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, United States of America
- Department of Physics, Stanford University, Stanford, CA, United States of America
| |
Collapse
|
29
|
|
30
|
Jain S, Laederach A, Ramos SBV, Schlick T. A pipeline for computational design of novel RNA-like topologies. Nucleic Acids Res 2018; 46:7040-7051. [PMID: 30137633 PMCID: PMC6101589 DOI: 10.1093/nar/gky524] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Revised: 05/22/2018] [Accepted: 05/24/2018] [Indexed: 12/11/2022] Open
Abstract
Designing novel RNA topologies is a challenge, with important therapeutic and industrial applications. We describe a computational pipeline for design of novel RNA topologies based on our coarse-grained RNA-As-Graphs (RAG) framework. RAG represents RNA structures as tree graphs and describes RNA secondary (2D) structure topologies (currently up to 13 vertices, ≈260 nucleotides). We have previously identified novel graph topologies that are RNA-like among these. Here we describe a systematic design pipeline and illustrate design for six broad design problems using recently developed tools for graph-partitioning and fragment assembly (F-RAG). Following partitioning of the target graph, corresponding atomic fragments from our RAG-3D database are combined using F-RAG, and the candidate atomic models are scored using a knowledge-based potential developed for 3D structure prediction. The sequences of the top scoring models are screened further using available tools for 2D structure prediction. The results indicate that our modular approach based on RNA-like topologies rather than specific 2D structures allows for greater flexibility in the design process, and generates a large number of candidate sequences quickly. Experimental structure probing using SHAPE-MaP for two sequences agree with our predictions and suggest that our combined tools yield excellent candidates for further sequence and experimental screening.
Collapse
Affiliation(s)
- Swati Jain
- Department of Chemistry, New York University, 1001 Silver, 100 Washington Square East, New York, NY 10003, USA
| | - Alain Laederach
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Silvia B V Ramos
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Tamar Schlick
- Department of Chemistry, New York University, 1001 Silver, 100 Washington Square East, New York, NY 10003, USA
- Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA
- NYU-ECNU Center for Computational Chemistry at New York University Shanghai, Room 340, Geography Building, North Zhongshan Road, 3663 Shanghai, China
| |
Collapse
|
31
|
Eastman P, Shi J, Ramsundar B, Pande VS. Solving the RNA design problem with reinforcement learning. PLoS Comput Biol 2018; 14:e1006176. [PMID: 29927936 PMCID: PMC6029810 DOI: 10.1371/journal.pcbi.1006176] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2017] [Revised: 07/03/2018] [Accepted: 05/04/2018] [Indexed: 11/19/2022] Open
Abstract
We use reinforcement learning to train an agent for computational RNA design: given a target secondary structure, design a sequence that folds to that structure in silico. Our agent uses a novel graph convolutional architecture allowing a single model to be applied to arbitrary target structures of any length. After training it on randomly generated targets, we test it on the Eterna100 benchmark and find it outperforms all previous algorithms. Analysis of its solutions shows it has successfully learned some advanced strategies identified by players of the game Eterna, allowing it to solve some very difficult structures. On the other hand, it has failed to learn other strategies, possibly because they were not required for the targets in the training set. This suggests the possibility that future improvements to the training protocol may yield further gains in performance.
Collapse
Affiliation(s)
- Peter Eastman
- Department of Bioengineering, Stanford University, Stanford, CA, United States of America
| | - Jade Shi
- Department of Chemistry, Stanford University, Stanford, CA, United States of America
| | - Bharath Ramsundar
- Department of Computer Science, Stanford University, Stanford, CA, United States of America
| | - Vijay S. Pande
- Department of Bioengineering, Stanford University, Stanford, CA, United States of America
| |
Collapse
|
32
|
Geary C, Chworos A, Verzemnieks E, Voss NR, Jaeger L. Composing RNA Nanostructures from a Syntax of RNA Structural Modules. NANO LETTERS 2017; 17:7095-7101. [PMID: 29039189 PMCID: PMC6363482 DOI: 10.1021/acs.nanolett.7b03842] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Natural stable RNAs fold and assemble into complex three-dimensional architectures by relying on the hierarchical formation of intricate, recurrent networks of noncovalent tertiary interactions. These sequence-dependent networks specify RNA structural modules enabling orientational and topological control of helical struts to form larger self-folding domains. Borrowing concepts from linguistics, we defined an extended structural syntax of RNA modules for programming RNA strands to assemble into complex, responsive nanostructures under both thermodynamic and kinetic control. Based on this syntax, various RNA building blocks promote the multimolecular assembly of objects with well-defined three-dimensional shapes as well as the isothermal folding of long RNAs into complex single-stranded nanostructures during transcription. This work offers a glimpse of the limitless potential of RNA as an informational medium for designing programmable and functional nanomaterials useful for synthetic biology, nanomedicine, and nanotechnology.
Collapse
Affiliation(s)
- Cody Geary
- Department of Chemistry and Biochemistry, Biomolecular Science and Engineering Program, University of California, Santa Barbara, California 93106-9510, United States
| | - Arkadiusz Chworos
- Department of Chemistry and Biochemistry, Biomolecular Science and Engineering Program, University of California, Santa Barbara, California 93106-9510, United States
| | - Erik Verzemnieks
- Department of Chemistry and Biochemistry, Biomolecular Science and Engineering Program, University of California, Santa Barbara, California 93106-9510, United States
| | - Neil R. Voss
- Biological, Chemical, and Physical Sciences Department, Roosevelt University, 1400 North Roosevelt Blvd., Schaumburg, Illinois 60173, United States
| | - Luc Jaeger
- Department of Chemistry and Biochemistry, Biomolecular Science and Engineering Program, University of California, Santa Barbara, California 93106-9510, United States
| |
Collapse
|
33
|
Freedman S, Mullane K. The academic-industrial complex: navigating the translational and cultural divide. Drug Discov Today 2017; 22:976-993. [PMID: 28336175 DOI: 10.1016/j.drudis.2017.03.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Revised: 02/11/2017] [Accepted: 03/13/2017] [Indexed: 12/26/2022]
Abstract
In general, the fruits of academic discoveries can only be realized through joint efforts with industry. However, the poor reproducibility of much academic research has damaged credibility and jeopardized translational efforts that could benefit patients. Meanwhile, journals are rife with articles bemoaning the limited productivity and increasing costs of the biopharmaceutical industry and its resultant predilection for mergers and reorganizations while decreasing internal research efforts. The ensuing disarray and uncertainty has created tremendous opportunities for academia and industry to form even closer ties, and to embrace new operational and financial models to their joint benefit. This review article offers a personal perspective on the opportunities, models and approaches that harness the increased interface and growing interdependency between biomedical research institutes, the biopharmaceutical industry and the technological world.
Collapse
Affiliation(s)
- Stephen Freedman
- Gladstone Institutes,1650 Owens Street, San Francisco, CA 94158, USA
| | - Kevin Mullane
- Gladstone Institutes,1650 Owens Street, San Francisco, CA 94158, USA.
| |
Collapse
|
34
|
Science and Culture: Putting a game face on biomedical research. Proc Natl Acad Sci U S A 2016; 113:6577-8. [PMID: 27302944 DOI: 10.1073/pnas.1607585113] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
35
|
|