51
|
Kappel K, Zhang K, Su Z, Watkins AM, Kladwang W, Li S, Pintilie G, Topkar VV, Rangan R, Zheludev IN, Yesselman JD, Chiu W, Das R. Accelerated cryo-EM-guided determination of three-dimensional RNA-only structures. Nat Methods 2020; 17:699-707. [PMID: 32616928 PMCID: PMC7386730 DOI: 10.1038/s41592-020-0878-9] [Citation(s) in RCA: 109] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 05/20/2020] [Indexed: 02/05/2023]
Abstract
The discovery and design of biologically important RNA molecules is outpacing three-dimensional structural characterization. Here, we demonstrate that cryo-electron microscopy can routinely resolve maps of RNA-only systems and that these maps enable subnanometer-resolution coordinate estimation when complemented with multidimensional chemical mapping and Rosetta DRRAFTER computational modeling. This hybrid 'Ribosolve' pipeline detects and falsifies homologies and conformational rearrangements in 11 previously unknown 119- to 338-nucleotide protein-free RNA structures: full-length Tetrahymena ribozyme, hc16 ligase with and without substrate, full-length Vibrio cholerae and Fusobacterium nucleatum glycine riboswitch aptamers with and without glycine, Mycobacterium SAM-IV riboswitch with and without S-adenosylmethionine, and the computer-designed ATP-TTR-3 aptamer with and without AMP. Simulation benchmarks, blind challenges, compensatory mutagenesis, cross-RNA homologies and internal controls demonstrate that Ribosolve can accurately resolve the global architectures of RNA molecules but does not resolve atomic details. These tests offer guidelines for making inferences in future RNA structural studies with similarly accelerated throughput.
Collapse
Affiliation(s)
- Kalli Kappel
- Biophysics Program, Stanford University, Stanford, CA, USA
| | - Kaiming Zhang
- Department of Bioengineering, James H. Clark Center, Stanford University, Stanford, CA, USA
| | - Zhaoming Su
- Department of Bioengineering, James H. Clark Center, Stanford University, Stanford, CA, USA
- The State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Sichuan, China
| | - Andrew M Watkins
- Department of Biochemistry, Stanford University, Stanford, CA, USA
| | - Wipapat Kladwang
- Department of Biochemistry, Stanford University, Stanford, CA, USA
| | - Shanshan Li
- Department of Bioengineering, James H. Clark Center, Stanford University, Stanford, CA, USA
| | - Grigore Pintilie
- Department of Bioengineering, James H. Clark Center, Stanford University, Stanford, CA, USA
| | - Ved V Topkar
- Biophysics Program, Stanford University, Stanford, CA, USA
| | - Ramya Rangan
- Biophysics Program, Stanford University, Stanford, CA, USA
| | - Ivan N Zheludev
- Department of Biochemistry, Stanford University, Stanford, CA, USA
| | - Joseph D Yesselman
- Department of Biochemistry, Stanford University, Stanford, CA, USA
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, NE, USA
| | - Wah Chiu
- Biophysics Program, Stanford University, Stanford, CA, USA.
- Department of Bioengineering, James H. Clark Center, Stanford University, Stanford, CA, USA.
- Division of CryoEM and Bioimaging, SSRL, SLAC National Accelerator Laboratory, Stanford University, Menlo Park, CA, USA.
| | - Rhiju Das
- Biophysics Program, Stanford University, Stanford, CA, USA.
- Department of Biochemistry, Stanford University, Stanford, CA, USA.
- Department of Physics, Stanford University, Stanford, CA, USA.
| |
Collapse
|
52
|
Kladwang W, Topkar VV, Liu B, Rangan R, Hodges TL, Keane SC, Al-Hashimi H, Das R. Anomalous Reverse Transcription through Chemical Modifications in Polyadenosine Stretches. Biochemistry 2020; 59:2154-2170. [PMID: 32407625 DOI: 10.1021/acs.biochem.0c00020] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Thermostable reverse transcriptases are workhorse enzymes underlying nearly all modern techniques for RNA structure mapping and for the transcriptome-wide discovery of RNA chemical modifications. Despite their wide use, these enzymes' behaviors at chemical modified nucleotides remain poorly understood. Wellington-Oguri et al. recently reported an apparent loss of chemical modification within putatively unstructured polyadenosine stretches modified by dimethyl sulfate or 2' hydroxyl acylation, as probed by reverse transcription. Here, reanalysis of these and other publicly available data, capillary electrophoresis experiments on chemically modified RNAs, and nuclear magnetic resonance spectroscopy on (A)12 and variants show that this effect is unlikely to arise from an unusual structure of polyadenosine. Instead, tests of different reverse transcriptases on chemically modified RNAs and molecules synthesized with single 1-methyladenosines implicate a previously uncharacterized reverse transcriptase behavior: near-quantitative bypass through chemical modifications within polyadenosine stretches. All tested natural and engineered reverse transcriptases (MMLV; SuperScript II, III, and IV; TGIRT-III; and MarathonRT) exhibit this anomalous bypass behavior. Accurate DMS-guided structure modeling of the polyadenylated HIV-1 3' untranslated region requires taking into account this anomaly. Our results suggest that poly(rA-dT) hybrid duplexes can trigger an unexpectedly effective reverse transcriptase bypass and that chemical modifications in mRNA poly(A) tails may be generally undercounted.
Collapse
Affiliation(s)
- Wipapat Kladwang
- Department of Biochemistry, Stanford University School of Medicine, Stanford, California 94305, United States
| | - Ved V Topkar
- Biophysics Program, Stanford University, Stanford, California 94305, United States
| | - Bei Liu
- Department of Biochemistry, Duke University School of Medicine, Durham, North Carolina 27710, United States
| | - Ramya Rangan
- Biophysics Program, Stanford University, Stanford, California 94305, United States
| | - Tracy L Hodges
- Biophysics Program, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Sarah C Keane
- Biophysics Program, University of Michigan, Ann Arbor, Michigan 48109, United States.,Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Hashim Al-Hashimi
- Department of Biochemistry, Duke University School of Medicine, Durham, North Carolina 27710, United States.,Department of Chemistry, Duke University School of Medicine, Durham, North Carolina 27710, United States
| | - Rhiju Das
- Department of Biochemistry, Stanford University School of Medicine, Stanford, California 94305, United States.,Biophysics Program, Stanford University, Stanford, California 94305, United States.,Department of Physics, Stanford University, Stanford, California 94305, United States
| |
Collapse
|
53
|
Wu J, Zhou C, Li J, Li C, Tao X, Leontis NB, Zirbel CL, Bisaro DM, Ding B. Functional analysis reveals G/U pairs critical for replication and trafficking of an infectious non-coding viroid RNA. Nucleic Acids Res 2020; 48:3134-3155. [PMID: 32083649 PMCID: PMC7102988 DOI: 10.1093/nar/gkaa100] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 02/03/2020] [Accepted: 02/18/2020] [Indexed: 01/19/2023] Open
Abstract
While G/U pairs are present in many RNAs, the lack of molecular studies to characterize the roles of multiple G/U pairs within a single RNA limits our understanding of their biological significance. From known RNA 3D structures, we observed that the probability a G/U will form a Watson-Crick (WC) base pair depends on sequence context. We analyzed 17 G/U pairs in the 359-nucleotide genome of Potato spindle tuber viroid (PSTVd), a circular non-coding RNA that replicates and spreads systemically in host plants. Most putative G/U base pairs were experimentally supported by selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE). Deep sequencing PSTVd genomes from plants inoculated with a cloned master sequence revealed naturally occurring variants, and showed that G/U pairs are maintained to the same extent as canonical WC base pairs. Comprehensive mutational analysis demonstrated that nearly all G/U pairs are critical for replication and/or systemic spread. Two selected G/U pairs were found to be required for PSTVd entry into, but not for exit from, the host vascular system. This study identifies critical roles for G/U pairs in the survival of an infectious RNA, and increases understanding of structure-based regulation of replication and trafficking of pathogen and cellular RNAs.
Collapse
Affiliation(s)
- Jian Wu
- Department of Molecular Genetics, Center for Applied Plant Sciences, Center for RNA Biology, and Infectious Diseases Institute, The Ohio State University, Columbus, OH 43210, USA.,Graduate Program in Molecular, Cellular, and Developmental Biology, The Ohio State University, Columbus, OH 43210, USA
| | - Cuiji Zhou
- Department of Molecular Genetics, Center for Applied Plant Sciences, Center for RNA Biology, and Infectious Diseases Institute, The Ohio State University, Columbus, OH 43210, USA
| | - James Li
- Department of Molecular Genetics, Center for Applied Plant Sciences, Center for RNA Biology, and Infectious Diseases Institute, The Ohio State University, Columbus, OH 43210, USA
| | - Chun Li
- Department of Plant Pathology, Nanjing Agricultural University, Nanjing 210095, China
| | - Xiaorong Tao
- Department of Plant Pathology, Nanjing Agricultural University, Nanjing 210095, China
| | - Neocles B Leontis
- Department of Chemistry, Bowling Green State University, Bowling Green, OH 43403, USA
| | - Craig L Zirbel
- Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH 43403, USA
| | - David M Bisaro
- Department of Molecular Genetics, Center for Applied Plant Sciences, Center for RNA Biology, and Infectious Diseases Institute, The Ohio State University, Columbus, OH 43210, USA.,Graduate Program in Molecular, Cellular, and Developmental Biology, The Ohio State University, Columbus, OH 43210, USA
| | - Biao Ding
- Department of Molecular Genetics, Center for Applied Plant Sciences, Center for RNA Biology, and Infectious Diseases Institute, The Ohio State University, Columbus, OH 43210, USA.,Graduate Program in Molecular, Cellular, and Developmental Biology, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
54
|
Wellington-Oguri R, Fisker E, Zada M, Wiley M, Townley J, Players E. Evidence of an Unusual Poly(A) RNA Signature Detected by High-Throughput Chemical Mapping. Biochemistry 2020; 59:2041-2046. [PMID: 32412236 DOI: 10.1021/acs.biochem.0c00215] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Homopolymeric adenosine RNA plays numerous roles in both cells and noncellular genetic material. We report herein an unusual poly(A) signature in chemical mapping data generated by the Eterna Massive Open Laboratory. Poly(A) sequences of length seven or more show unexpected results in the selective 2'-hydroxyl acylation read out by primer extension (SHAPE) and dimethyl sulfate (DMS) chemical probing. This unusual signature first appears in poly(A) sequences of length seven and grows to its maximum strength at length ∼10. In a long poly(A) sequence, the substitution of a single A by any other nucleotide disrupts the signature, but only for the 6 or so nucleotides on the 5' side of the substitution.
Collapse
Affiliation(s)
- Roger Wellington-Oguri
- Eterna Massive Open Laboratory, Stanford University, B400 Beckman Center, Stanford, California 93405, United States
| | - Eli Fisker
- Eterna Massive Open Laboratory, Stanford University, B400 Beckman Center, Stanford, California 93405, United States
| | - Mathew Zada
- Eterna Massive Open Laboratory, Stanford University, B400 Beckman Center, Stanford, California 93405, United States
| | - Michelle Wiley
- Eterna Massive Open Laboratory, Stanford University, B400 Beckman Center, Stanford, California 93405, United States
| | - Jill Townley
- Eterna Massive Open Laboratory, Stanford University, B400 Beckman Center, Stanford, California 93405, United States
| | - Eterna Players
- Eterna Massive Open Laboratory, Stanford University, B400 Beckman Center, Stanford, California 93405, United States
| |
Collapse
|
55
|
Abstract
A ribonucleic acid (RNA) sequence is a word over an alphabet on four elements [Formula: see text] called bases. RNA sequences fold into secondary structures where some bases pair with one another, while others remain unpaired. The two fundamental problems in RNA algorithmic are to predict how sequences fold within some models of energy and to design sequences of bases that will fold into targeted secondary structures. Predicting how a given RNA sequence folds into a pseudoknot-free secondary structure is known to be solvable in cubic time since the eighties and in truly subcubic time by a recent result of Bringmann et al. (FOCS, 2016), whereas Lyngsø has shown it is computationally hard if pseudoknots are allowed (ICALP, 2004). As a stark contrast, it is unknown whether or not designing a given RNA secondary structure is a tractable task; this has been raised as a challenging open question by Condon (ICALP, 2003). Because of its crucial importance in a number of fields such as pharmaceutical research and biochemistry, there are dozens of heuristics and software libraries dedicated to the RNA secondary structure design. It is therefore rather surprising that the computational complexity of this central problem in bioinformatics has been unsettled for decades. In this article, we show that in the simplest model of energy, which is the Watson-Crick model, the design of secondary structures is computationally hard if one adds natural constraints of the form: index i of the sequence has to be labeled by base b. This negative result suggests that the same lower bound holds for more realistic models of energy. It is noteworthy that the additional constraints are by no means artificial: they are provided by all the RNA design pieces of software and they do correspond to the actual practice (e.g., the instances of the EteRNA project).
Collapse
Affiliation(s)
- Édouard Bonnet
- Univ Lyon, CNRS, ENS de Lyon, Université Claude Bernard Lyon 1, LIP UMR5668, Lyon, France
| | - Paweł Rzążewski
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland.,Faculty of Mathematics, Informatics and Mechanics, Institute of Informatics, University of Warsaw, Warsaw, Poland
| | - Florian Sikora
- Université Paris-Dauphine, PSL University, CNRS, LAMSADE, Paris, France
| |
Collapse
|
56
|
Inverse folding with RNA-As-Graphs produces a large pool of candidate sequences with target topologies. J Struct Biol 2019; 209:107438. [PMID: 31874236 DOI: 10.1016/j.jsb.2019.107438] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 12/18/2019] [Accepted: 12/19/2019] [Indexed: 02/07/2023]
Abstract
We present an RNA-As-Graphs (RAG) based inverse folding algorithm, RAG-IF, to design novel RNA sequences that fold onto target tree graph topologies. The algorithm can be used to enhance our recently reported computational design pipeline (Jain et al., NAR 2018). The RAG approach represents RNA secondary structures as tree and dual graphs, where RNA loops and helices are coarse-grained as vertices and edges, opening the usage of graph theory methods to study, predict, and design RNA structures. Our recently developed computational pipeline for design utilizes graph partitioning (RAG-3D) and atomic fragment assembly (F-RAG) to design sequences to fold onto RNA-like tree graph topologies; the atomic fragments are taken from existing RNA structures that correspond to tree subgraphs. Because F-RAG may not produce the target folds for all designs, automated mutations by RAG-IF algorithm enhance the candidate pool markedly. The crucial residues for mutation are identified by differences between the predicted and the target topology. A genetic algorithm then mutates the selected residues, and the successful sequences are optimized to retain only the minimal or essential mutations. Here we evaluate RAG-IF for 6 RNA-like topologies and generate a large pool of successful candidate sequences with a variety of minimal mutations. We find that RAG-IF adds robustness and efficiency to our RNA design pipeline, making inverse folding motivated by graph topology rather than secondary structure more productive.
Collapse
|
57
|
Khatib F, Desfosses A, Koepnick B, Flatten J, Popović Z, Baker D, Cooper S, Gutsche I, Horowitz S. Building de novo cryo-electron microscopy structures collaboratively with citizen scientists. PLoS Biol 2019; 17:e3000472. [PMID: 31714936 PMCID: PMC6850521 DOI: 10.1371/journal.pbio.3000472] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
With the rapid improvement of cryo-electron microscopy (cryo-EM) resolution, new computational tools are needed to assist and improve upon atomic model building and refinement options. This communication demonstrates that microscopists can now collaborate with the players of the computer game Foldit to generate high-quality de novo structural models. This development could greatly speed the generation of excellent cryo-EM structures when used in addition to current methods.
Collapse
Affiliation(s)
- Firas Khatib
- Department of Computer and Information Science, University of Massachusetts Dartmouth, Dartmouth, Massachusetts, United States of America
- * E-mail: (FB); (SH)
| | - Ambroise Desfosses
- Institut de Biologie Structurale, University Grenoble Alpes, CEA, CNRS, Grenoble, France
| | | | - Brian Koepnick
- Department of Biochemistry, University of Washington, Seattle, Washington, United States of America
| | - Jeff Flatten
- Center for Game Science, Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington, United States of America
| | - Zoran Popović
- Center for Game Science, Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington, United States of America
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, Washington, United States of America
| | - Seth Cooper
- Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts, United States of America
| | - Irina Gutsche
- Institut de Biologie Structurale, University Grenoble Alpes, CEA, CNRS, Grenoble, France
| | - Scott Horowitz
- Department of Chemistry and Biochemistry and the Knoebel Institute for Healthy Aging, University of Denver, Denver, Colorado, United States of America
- * E-mail: (FB); (SH)
| |
Collapse
|
58
|
Denny SK, Greenleaf WJ. Linking RNA Sequence, Structure, and Function on Massively Parallel High-Throughput Sequencers. Cold Spring Harb Perspect Biol 2019; 11:a032300. [PMID: 30322887 PMCID: PMC6771372 DOI: 10.1101/cshperspect.a032300] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
High-throughput sequencing methods have revolutionized our ability to catalog the diversity of RNAs and RNA-protein interactions that can exist in our cells. However, the relationship between RNA sequence, structure, and function is enormously complex, demonstrating the need for methods that can provide quantitative thermodynamic and kinetic measurements of macromolecular interaction with RNA, at a scale commensurate with the sequence diversity of RNA. Here, we discuss a class of methods that extend the core functionality of DNA sequencers to enable high-throughput measurements of RNA folding and RNA-protein interactions. Topics discussed include a description of the method and multiple applications to RNA-binding proteins, riboswitch design and engineering, and RNA tertiary structure energetics.
Collapse
Affiliation(s)
- Sarah K Denny
- Stanford University Department of Genetics, Stanford, California 94305
| | - William J Greenleaf
- Stanford University Department of Genetics, Stanford, California 94305
- Stanford University Department of Applied Physics, Stanford, California 94025
- Chan Zuckerberg Biohub, San Francisco, California 94158
| |
Collapse
|
59
|
Ponnada A, Cooper S, Thapa-Chhetry B, Miller JA, John D, Intille S. Designing Videogames to Crowdsource Accelerometer Data Annotation for Activity Recognition Research. PROCEEDINGS OF THE ... ANNUAL SYMPOSIUM ON COMPUTER-HUMAN INTERACTION IN PLAY. ACM SIGCHI ANNUAL SYMPOSIUM ON COMPUTER-HUMAN INTERACTION IN PLAY 2019; 2019:135-147. [PMID: 31768505 PMCID: PMC6876631 DOI: 10.1145/3311350.3347153] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Human activity recognition using wearable accelerometers can enable in-situ detection of physical activities to support novel human-computer interfaces and interventions. However, developing valid algorithms that use accelerometer data to detect everyday activities often requires large amounts of training datasets, precisely labeled with the start and end times of the activities of interest. Acquiring annotated data is challenging and time-consuming. Applied games, such as human computation games (HCGs) have been used to annotate images, sounds, and videos to support advances in machine learning using the collective effort of "non-expert game players." However, their potential to annotate accelerometer data has not been formally explored. In this paper, we present two proof-of-concept, web-based HCGs aimed at enabling game players to annotate accelerometer data. Using results from pilot studies with Amazon Mechanical Turk players, we discuss key challenges, opportunities, and, more generally, the potential of using applied videogames for annotating raw accelerometer data to support activity recognition research.
Collapse
|
60
|
Yesselman JD, Eiler D, Carlson ED, Gotrik MR, d'Aquino AE, Ooms AN, Kladwang W, Carlson PD, Shi X, Costantino DA, Herschlag D, Lucks JB, Jewett MC, Kieft JS, Das R. Computational design of three-dimensional RNA structure and function. NATURE NANOTECHNOLOGY 2019; 14:866-873. [PMID: 31427748 PMCID: PMC7324284 DOI: 10.1038/s41565-019-0517-8] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Accepted: 06/24/2019] [Indexed: 05/30/2023]
Abstract
RNA nanotechnology seeks to create nanoscale machines by repurposing natural RNA modules. The field is slowed by the current need for human intuition during three-dimensional structural design. Here, we demonstrate that three distinct problems in RNA nanotechnology can be reduced to a pathfinding problem and automatically solved through an algorithm called RNAMake. First, RNAMake discovers highly stable single-chain solutions to the classic problem of aligning a tetraloop and its sequence-distal receptor, with experimental validation from chemical mapping, gel electrophoresis, solution X-ray scattering and crystallography with 2.55 Å resolution. Second, RNAMake automatically generates structured tethers that integrate 16S and 23S ribosomal RNAs into single-chain ribosomal RNAs that remain uncleaved by ribonucleases and assemble onto messenger RNA. Third, RNAMake enables the automated stabilization of small-molecule binding RNAs, with designed tertiary contacts that improve the binding affinity of the ATP aptamer and improve the fluorescence and stability of the Spinach RNA in cell extracts and in living Escherichia coli cells.
Collapse
Affiliation(s)
- Joseph D Yesselman
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
| | - Daniel Eiler
- Department of Biochemistry and Molecular Genetics, University of Colorado Denver School of Medicine, Aurora, CO, USA
| | - Erik D Carlson
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Chemistry of Life Processes Institute, Northwestern University, Evanston, IL, USA
- Center for Synthetic Biology, Northwestern University, Evanston, IL, USA
| | - Michael R Gotrik
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
| | - Anne E d'Aquino
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Center for Synthetic Biology, Northwestern University, Evanston, IL, USA
- Interdisciplinary Biological Sciences Graduate Program, Northwestern University, Evanston, IL, USA
| | - Alexandra N Ooms
- Department of Cancer Genetics & Genomics, Stanford University School of Medicine, Stanford, CA, USA
| | - Wipapat Kladwang
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
| | - Paul D Carlson
- Robert F. Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY, USA
| | - Xuesong Shi
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
| | - David A Costantino
- Department of Biochemistry and Molecular Genetics, University of Colorado Denver School of Medicine, Aurora, CO, USA
| | - Daniel Herschlag
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
- Department of Chemistry, Stanford University School of Medicine, Stanford, CA, USA
- Stanford ChEM-H (Chemistry, Engineering, and Medicine for Human Health), Stanford University, Stanford, CA, USA
| | - Julius B Lucks
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Chemistry of Life Processes Institute, Northwestern University, Evanston, IL, USA
- Center for Synthetic Biology, Northwestern University, Evanston, IL, USA
- Interdisciplinary Biological Sciences Graduate Program, Northwestern University, Evanston, IL, USA
| | - Michael C Jewett
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Chemistry of Life Processes Institute, Northwestern University, Evanston, IL, USA
- Center for Synthetic Biology, Northwestern University, Evanston, IL, USA
- Interdisciplinary Biological Sciences Graduate Program, Northwestern University, Evanston, IL, USA
| | - Jeffrey S Kieft
- Department of Biochemistry and Molecular Genetics, University of Colorado Denver School of Medicine, Aurora, CO, USA
| | - Rhiju Das
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA.
- Department of Physics, Stanford University, Stanford, CA, USA.
| |
Collapse
|
61
|
Wu MJ, Andreasson JOL, Kladwang W, Greenleaf W, Das R. Automated Design of Diverse Stand-Alone Riboswitches. ACS Synth Biol 2019; 8:1838-1846. [PMID: 31298841 PMCID: PMC6703183 DOI: 10.1021/acssynbio.9b00142] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
![]()
Riboswitches that couple binding
of ligands to conformational changes
offer sensors and control elements for RNA synthetic biology and medical
biotechnology. However, design of these riboswitches has required
expert intuition or software specialized to transcription or translation
outputs; design has been particularly challenging for applications
in which the riboswitch output cannot be amplified by other molecular
machinery. We present a fully automated design method called RiboLogic
for such “stand-alone” riboswitches and test it via high-throughput experiments on 2875 molecules using
RNA-MaP (RNA on a massively parallel array) technology. These molecules
consistently modulate their affinity to the MS2 bacteriophage coat
protein upon binding of flavin mononucleotide, tryptophan, theophylline,
and microRNA miR-208a, achieving activation ratios of up to 20 and
significantly better performance than control designs. By encompassing
a wide diversity of stand-alone switches and highly quantitative data,
the resulting ribologic-solves experimental data
set provides a rich resource for further improvement of riboswitch
models and design methods.
Collapse
|
62
|
Yesselman JD, Tian S, Liu X, Shi L, Li JB, Das R. Updates to the RNA mapping database (RMDB), version 2. Nucleic Acids Res 2019; 46:D375-D379. [PMID: 30053264 PMCID: PMC5753257 DOI: 10.1093/nar/gkx873] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2017] [Accepted: 09/19/2017] [Indexed: 12/20/2022] Open
Abstract
Chemical mapping is a broadly utilized technique for probing the structure and function of RNAs. The volume of chemical mapping data continues to grow as more researchers routinely employ this information and as experimental methods increase in throughput and information content. To create a central location for these data, we established an RNA mapping database (RMDB) 5 years ago. The RMDB, which is available at http://rmdb.stanford.edu, now contains chemical mapping data for over 800 entries, involving 134 000 natural and engineered RNAs, in vitro and in cellulo. The entries include large data sets from multidimensional techniques that focus on RNA tertiary structure and co-transcriptional folding, resulting in over 15 million residues probed. The database interface has been redesigned and now offers interactive graphical browsing of structural, thermodynamic and kinetic data at single-nucleotide resolution. The front-end interface now uses the force-directed RNA applet for secondary structure visualization and other JavaScript-based views of bar graphs and annotations. A new interface also streamlines the process for depositing new chemical mapping data to the RMDB.
Collapse
Affiliation(s)
- Joseph D Yesselman
- Department of Biochemistry, Stanford University School of Medicine, Stanford CA 94305, USA
| | - Siqi Tian
- Department of Biochemistry, Stanford University School of Medicine, Stanford CA 94305, USA
| | - Xin Liu
- Department of Genetics, Stanford University School of Medicine, Stanford CA 94305, USA
| | | | - Jin Billy Li
- Department of Genetics, Stanford University School of Medicine, Stanford CA 94305, USA
| | - Rhiju Das
- Department of Biochemistry, Stanford University School of Medicine, Stanford CA 94305, USA.,Department of Physics, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
63
|
Gaffney S, Ad O, Smaga S, Schepartz A, Townsend JP. GEM-NET: Lessons in Multi-Institution Teamwork Using Collaboration Software. ACS CENTRAL SCIENCE 2019; 5:1159-1169. [PMID: 31404233 PMCID: PMC6661976 DOI: 10.1021/acscentsci.9b00111] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Indexed: 06/10/2023]
Abstract
The Center for Genetically Encoded Materials (C-GEM) is an NSF Phase I Center for Chemical Innovation that comprises six laboratories spread across three university campuses. Our success as a multi-institution research team demanded the development of a software infrastructure, GEM-NET, that allows all C-GEM members to work together seamlessly-as though everyone was in the same room. GEM-NET was designed to support both science and communication by integrating task management, scheduling, data sharing, and collaborative document and code editing with frictionless internal and public communication; it also maintains security over data and internal communications. In this Article, we document the design and implementation of GEM-NET: our objectives and motivating goals, how each component contributes to these goals, and the lessons learned throughout development. We also share open source code for several custom applications and document how GEM-NET can benefit users in multiple fields and teams that are both small and large. We anticipate that this knowledge will guide other multi-institution teams, regardless of discipline, to plan their software infrastructure and utilize it as swiftly and smoothly as possible.
Collapse
Affiliation(s)
- Stephen
G. Gaffney
- Department
of Biostatistics, Yale University School
of Public Health, New Haven, Connecticut 06510, United States
| | - Omer Ad
- Department
of Chemistry, Yale University, New Haven, Connecticut 06510, United States
| | - Sarah Smaga
- Department
of Chemistry, Yale University, New Haven, Connecticut 06510, United States
| | - Alanna Schepartz
- Department
of Chemistry, Yale University, New Haven, Connecticut 06510, United States
- Department
of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06510, United States
| | - Jeffrey P. Townsend
- Department
of Biostatistics, Yale University School
of Public Health, New Haven, Connecticut 06510, United States
- Program
in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06510, United States
| |
Collapse
|
64
|
Spasic A, Assmann SM, Bevilacqua PC, Mathews DH. Modeling RNA secondary structure folding ensembles using SHAPE mapping data. Nucleic Acids Res 2019; 46:314-323. [PMID: 29177466 PMCID: PMC5758915 DOI: 10.1093/nar/gkx1057] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 10/30/2017] [Indexed: 12/22/2022] Open
Abstract
RNA secondary structure prediction is widely used for developing hypotheses about the structures of RNA sequences, and structure can provide insight about RNA function. The accuracy of structure prediction is known to be improved using experimental mapping data that provide information about the pairing status of single nucleotides, and these data can now be acquired for whole transcriptomes using high-throughput sequencing. Prior methods for using these experimental data focused on predicting structures for sequences assuming that they populate a single structure. Most RNAs populate multiple structures, however, where the ensemble of strands populates structures with different sets of canonical base pairs. The focus on modeling single structures has been a bottleneck for accurately modeling RNA structure. In this work, we introduce Rsample, an algorithm for using experimental data to predict more than one RNA structure for sequences that populate multiple structures at equilibrium. We demonstrate, using SHAPE mapping data, that we can accurately model RNA sequences that populate multiple structures, including the relative probabilities of those structures. This program is freely available as part of the RNAstructure software package.
Collapse
Affiliation(s)
- Aleksandar Spasic
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Sarah M Assmann
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Philip C Bevilacqua
- Department of Chemistry, Department of Biochemistry & Molecular Biology, Center for RNA Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA.,Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| |
Collapse
|
65
|
Das R, Keep B, Washington P, Riedel-Kruse IH. Scientific Discovery Games for Biomedical Research. Annu Rev Biomed Data Sci 2019; 2:253-279. [PMID: 34308269 DOI: 10.1146/annurev-biodatasci-072018-021139] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Over the past decade, scientific discovery games (SDGs) have emerged as a viable approach for biomedical research, engaging hundreds of thousands of volunteer players and resulting in numerous scientific publications. After describing the origins of this novel research approach, we review the scientific output of SDGs across molecular modeling, sequence alignment, neuroscience, pathology, cellular biology, genomics, and human cognition. We find compelling results and technical innovations arising in problem-oriented games such as Foldit and Eterna and in data-oriented games such as EyeWire and Project Discovery. We discuss emergent properties of player communities shared across different projects, including the diversity of communities and the extraordinary contributions of some volunteers, such as paper writing. Finally, we highlight connections to artificial intelligence, biological cloud laboratories, new game genres, science education, and open science that may drive the next generation of SDGs.
Collapse
Affiliation(s)
- Rhiju Das
- Department of Biochemistry and Department of Physics, Stanford University, Stanford, California 94305, USA
| | - Benjamin Keep
- Department of Learning Sciences, Stanford University, Stanford, California 94305, USA
| | - Peter Washington
- Department of Bioengineering, Stanford University, Stanford, California 94305, USA
| | | |
Collapse
|
66
|
Abstract
Online citizen science projects such as GalaxyZoo1, Eyewire2 and Phylo3 have been very successful for data collection, annotation, and processing, but for the most part have harnessed human pattern recognition skills rather than human creativity. An exception is the game EteRNA4, in which game players learn to build new RNA structures by exploring the discrete two-dimensional space of Watson-Crick base pairing possibilities. Building new proteins, however, is a more challenging task to present in a game, as both the representation and evaluation of a protein structure are intrinsically three-dimensional. We posed the challenge of de novo protein design in the online protein folding game Foldit5. Players were presented with a fully extended peptide chain and challenged to craft a folded protein structure with an amino acid sequence encoding that structure. After many iterations of player design, analysis of the top scoring solutions, and subsequent game improvement, Foldit players can now, starting from an extended polypeptide chain, generate a diversity of protein structures and sequences which encode them in silico. 146 Foldit player designs with sequences unrelated to naturally occurring proteins were encoded in synthetic genes; 56 were found to be expressed in E. coli with good solubility and to adopt stable monomeric folded structures in solution. The diversity of these structures is unprecedented in de novo protein design, representing 20 different folds—including a new fold not observed in natural proteins. High resolution structures were determined for four of the designs, and are nearly identical to the player models. This work makes explicit the considerable implicit knowledge contributing to success in de novo protein design, and shows that citizen scientists can discover creative new solutions to outstanding scientific challenges, such as the protein design problem.
Collapse
|
67
|
Koodli RV, Keep B, Coppess KR, Portela F, Das R. EternaBrain: Automated RNA design through move sets and strategies from an Internet-scale RNA videogame. PLoS Comput Biol 2019; 15:e1007059. [PMID: 31247029 PMCID: PMC6597038 DOI: 10.1371/journal.pcbi.1007059] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Accepted: 04/30/2019] [Indexed: 11/18/2022] Open
Abstract
Emerging RNA-based approaches to disease detection and gene therapy require RNA sequences that fold into specific base-pairing patterns, but computational algorithms generally remain inadequate for these secondary structure design tasks. The Eterna project has crowdsourced RNA design to human video game players in the form of puzzles that reach extraordinary difficulty. Here, we demonstrate that Eterna participants' moves and strategies can be leveraged to improve automated computational RNA design. We present an eternamoves-large repository consisting of 1.8 million of player moves on 12 of the most-played Eterna puzzles as well as an eternamoves-select repository of 30,477 moves from the top 72 players on a select set of more advanced puzzles. On eternamoves-select, we present a multilayer convolutional neural network (CNN) EternaBrain that achieves test accuracies of 51% and 34% in base prediction and location prediction, respectively, suggesting that top players' moves are partially stereotyped. Pipelining this CNN's move predictions with single-action-playout (SAP) of six strategies compiled by human players solves 61 out of 100 independent puzzles in the Eterna100 benchmark. EternaBrain-SAP outperforms previously published RNA design algorithms and achieves similar or better performance than a newer generation of deep learning methods, while being largely orthogonal to these other methods. Our study provides useful lessons for future efforts to achieve human-competitive performance with automated RNA design algorithms.
Collapse
Affiliation(s)
- Rohan V. Koodli
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, United States of America
| | - Benjamin Keep
- Department of Education, Stanford University, Stanford, CA, United States of America
| | - Katherine R. Coppess
- Department of Physics, Stanford University, Stanford, CA, United States of America
| | - Fernando Portela
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, United States of America
| | | | - Rhiju Das
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, United States of America
- Department of Physics, Stanford University, Stanford, CA, United States of America
| |
Collapse
|
68
|
Yu G, Chen X. Host-Guest Chemistry in Supramolecular Theranostics. Theranostics 2019; 9:3041-3074. [PMID: 31244941 PMCID: PMC6567976 DOI: 10.7150/thno.31653] [Citation(s) in RCA: 116] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 02/24/2019] [Indexed: 12/12/2022] Open
Abstract
Macrocyclic hosts, such as cyclodextrins, calixarenes, cucurbiturils, and pillararenes, exhibit unparalleled advantages in disease diagnosis and therapy over the past years by fully taking advantage of their host-guest molecular recognitions. The dynamic nature of the non-covalent interactions and selective host-guest complexation endow the resultant nanomaterials with intriguing properties, holding promising potentials in theranostic fields. Interestingly, the differences in microenvironment between the abnormal and normal cells/tissues can be employed as the stimuli to modulate the host-guest interactions, realizing the purpose of precise diagnosis and specific delivery of drugs to lesion sites. In this review, we summarize the progress of supramolecular theranostics on the basis of host-guest chemistry benefiting from their fantastic topological structures and outstanding supramolecular chemistry. These state-of-the-art examples provide new methodologies to overcome the obstacles faced by the traditional theranostic systems, promoting their clinical translations.
Collapse
Affiliation(s)
| | - Xiaoyuan Chen
- Laboratory of Molecular Imaging and Nanomedicine, National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Bethesda, Maryland 20892, United States
| |
Collapse
|
69
|
Taly A, Nitti F, Baaden M, Pasquali S. Molecular modelling as the spark for active learning approaches for interdisciplinary biology teaching. Interface Focus 2019; 9:20180065. [PMID: 31065338 DOI: 10.1098/rsfs.2018.0065] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/13/2019] [Indexed: 02/04/2023] Open
Abstract
We present here an interdisciplinary workshop on the subject of biomolecules offered to undergraduate and high school students with the aim of boosting their interest toward all areas of science contributing to the study of life. The workshop involves mathematics, physics, chemistry, computer science and biology. Based on our own areas of research, molecular modelling is chosen as the central axis as it involves all disciplines. To provide a strong biological motivation for the study of the dynamics of biomolecules, the theme of the workshop is the origin of life. All sessions are built around active pedagogy, including games, and a final poster presentation.
Collapse
Affiliation(s)
- A Taly
- Laboratoire de Biochimie Théorique, Université Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - F Nitti
- APC, Laboratoire d'Asptroparticules et Cosmologie, Université Paris Diderot, CNRS/IN2P3, CEA/IRFU, Observatoire de Paris, Sorbonne Paris Cité, 10 rue Alice Domon et Léonie Duquet, 75205 Paris Cedex 13, France
| | - M Baaden
- Laboratoire de Biochimie Théorique, Université Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - S Pasquali
- Laboratoire Cibles Thérapeutiques et Conception de Médicaments, CNRS UMR 8038 Université Paris Descartes, Paris 75006, France
| |
Collapse
|
70
|
Citizen science frontiers: Efficiency, engagement, and serendipitous discovery with human-machine systems. Proc Natl Acad Sci U S A 2019; 116:1902-1909. [PMID: 30718393 DOI: 10.1073/pnas.1807190116] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Citizen science has proved to be a unique and effective tool in helping science and society cope with the ever-growing data rates and volumes that characterize the modern research landscape. It also serves a critical role in engaging the public with research in a direct, authentic fashion and by doing so promotes a better understanding of the processes of science. To take full advantage of the onslaught of data being experienced across the disciplines, it is essential that citizen science platforms leverage the complementary strengths of humans and machines. This Perspectives piece explores the issues encountered in designing human-machine systems optimized for both efficiency and volunteer engagement, while striving to safeguard and encourage opportunities for serendipitous discovery. We discuss case studies from Zooniverse, a large online citizen science platform, and show that combining human and machine classifications can efficiently produce results superior to those of either one alone and how smart task allocation can lead to further efficiencies in the system. While these examples make clear the promise of human-machine integration within an online citizen science system, we then explore in detail how system design choices can inadvertently lower volunteer engagement, create exclusionary practices, and reduce opportunity for serendipitous discovery. Throughout we investigate the tensions that arise when designing a human-machine system serving the dual goals of carrying out research in the most efficient manner possible while empowering a broad community to authentically engage in this research.
Collapse
|
71
|
Humans best judge how much to cooperate when facing hard problems in large groups. Sci Rep 2019; 9:5497. [PMID: 30940850 PMCID: PMC6445098 DOI: 10.1038/s41598-019-41773-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2018] [Accepted: 02/28/2019] [Indexed: 11/08/2022] Open
Abstract
We report the results of a game-theoretic experiment with human players who solve problems of increasing complexity by cooperating in groups of increasing size. Our experimental environment is set up to make it complicated for players to use rational calculation for making the cooperative decisions. This environment is directly translated into a computer simulation, from which we extract the collaboration strategy that leads to the maximal attainable score. Based on this, we measure the error that players make when estimating the benefits of collaboration, and find that humans massively underestimate these benefits when facing easy problems or working alone or in small groups. In contrast, when confronting hard problems or collaborating in large groups, humans accurately judge the best level of collaboration and easily achieve the maximal score. Our findings are independent on groups’ composition and players’ personal traits. We interpret them as varying degrees of usefulness of social heuristics, which seems to depend on the size of the involved group and the complexity of the situation.
Collapse
|
72
|
Jain S, Saju S, Petingi L, Schlick T. An extended dual graph library and partitioning algorithm applicable to pseudoknotted RNA structures. Methods 2019; 162-163:74-84. [PMID: 30928508 DOI: 10.1016/j.ymeth.2019.03.022] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Revised: 02/28/2019] [Accepted: 03/22/2019] [Indexed: 12/18/2022] Open
Abstract
Exploring novel RNA topologies is imperative for understanding RNA structure and pursuing its design. Our RNA-As-Graphs (RAG) approach exploits graph theory tools and uses coarse-grained tree and dual graphs to represent RNA helices and loops by vertices and edges. Only dual graphs represent pseudoknotted RNAs fully. Here we develop a dual graph enumeration algorithm to generate an expanded library of dual graph topologies for 2-9 vertices, and extend our dual graph partitioning algorithm to identify all possible RNA subgraphs. Our enumeration algorithm connects smaller-vertex graphs, using all possible edge combinations, to build larger-vertex graphs and retain all non-isomorphic graph topologies, thereby more than doubling the size of our prior library to a total of 110,667 dual graph topologies. We apply our dual graph partitioning algorithm, which keeps pseudoknots and junctions intact, to all existing RNA structures to identify all possible substructures up to 9 vertices. In addition, our expanded dual graph library assigns graph topologies to all RNA graphs and subgraphs, rectifying prior inconsistencies. We update our RAG-3Dual database of RNA atomic fragments with all newly identified substructures and their graph IDs, increasing its size by more than 50 times. The enlarged dual graph library and RAG-3Dual database provide a comprehensive repertoire of graph topologies and atomic fragments to study yet undiscovered RNA molecules and design RNA sequences with novel topologies, including a variety of pseudoknotted RNAs.
Collapse
Affiliation(s)
- Swati Jain
- Department of Chemistry, New York University, 1021 Silver, 100 Washington Square East, New York, NY 10003, USA
| | - Sera Saju
- Department of Chemistry, New York University, 1021 Silver, 100 Washington Square East, New York, NY 10003, USA
| | - Louis Petingi
- Computer Science Department, College of Staten Island, City University of New York, Staten Island, New York, NY 10314, USA
| | - Tamar Schlick
- Department of Chemistry, New York University, 1021 Silver, 100 Washington Square East, New York, NY 10003, USA; Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA; NYU-East China Normal University Center for Computational Chemistry at New York University Shanghai, Room 340, Geography Building, North Zhongshan Road, 3663 Shanghai, China.
| |
Collapse
|
73
|
Interactive programming paradigm for real-time experimentation with remote living matter. Proc Natl Acad Sci U S A 2019; 116:5411-5419. [PMID: 30824592 PMCID: PMC6431204 DOI: 10.1073/pnas.1815367116] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Biology cloud laboratories are an emerging approach to lowering access barriers for life-science experimentation. However, suitable programming approaches and interfaces are lacking for both domain experts and lay users, especially ones that enable interaction with the living matter itself and not just the control of equipment. Here we present a programming paradigm for real-time interactive applications with remotely housed biological systems which is accessible and useful for scientists, programmers, and lay people. Our user studies show that scientists and nonscientists are able to rapidly develop a variety of applications, such as interactive biophysics experiments and games. This paradigm has the potential to make first-hand experiences with biology accessible to all of society and to accelerate the rate of scientific discovery. Recent advancements in life-science instrumentation and automation enable entirely new modes of human interaction with microbiological processes and corresponding applications for science and education through biology cloud laboratories. A critical barrier for remote and on-site life-science experimentation (for both experts and nonexperts alike) is the absence of suitable abstractions and interfaces for programming living matter. To this end we conceptualize a programming paradigm that provides stimulus and sensor control functions for real-time manipulation of physical biological matter. Additionally, a simulation mode facilitates higher user throughput, program debugging, and biophysical modeling. To evaluate this paradigm, we implemented a JavaScript-based web toolkit, “Bioty,” that supports real-time interaction with swarms of phototactic Euglena cells hosted on a cloud laboratory. Studies with remote and on-site users demonstrate that individuals with little to no biology knowledge and intermediate programming knowledge were able to successfully create and use scientific applications and games. This work informs the design of programming environments for controlling living matter in general, for living material microfabrication and swarm robotics applications, and for lowering the access barriers to the life sciences for professional and citizen scientists, learners, and the lay public.
Collapse
|
74
|
Remote optimization of an ultracold atoms experiment by experts and citizen scientists. Proc Natl Acad Sci U S A 2018; 115:E11231-E11237. [PMID: 30413625 PMCID: PMC6275530 DOI: 10.1073/pnas.1716869115] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
The emerging field of gamified citizen science continually probes the fault line between human and artificial intelligence. A better understanding of citizen scientists’ search strategies may lead to cognitive insights and provide inspiration for algorithmic improvements. Our project remotely engages both the general public and experts in the real-time optimization of an experimental laboratory setting. In this citizen science project the game and data acquisition are designed as a social science experiment aimed at extracting the collective search behavior of the players. A further understanding of these human skills will be a crucial challenge in the coming years, as hybrid intelligence solutions are pursued in corporate and research environments. We introduce a remote interface to control and optimize the experimental production of Bose–Einstein condensates (BECs) and find improved solutions using two distinct implementations. First, a team of theoreticians used a remote version of their dressed chopped random basis optimization algorithm (RedCRAB), and second, a gamified interface allowed 600 citizen scientists from around the world to participate in real-time optimization. Quantitative studies of player search behavior demonstrated that they collectively engage in a combination of local and global searches. This form of multiagent adaptive search prevents premature convergence by the explorative behavior of low-performing players while high-performing players locally refine their solutions. In addition, many successful citizen science games have relied on a problem representation that directly engaged the visual or experiential intuition of the players. Here we demonstrate that citizen scientists can also be successful in an entirely abstract problem visualization. This is encouraging because a much wider range of challenges could potentially be opened to gamification in the future.
Collapse
|
75
|
Bellaousov S, Kayedkhordeh M, Peterson RJ, Mathews DH. Accelerated RNA secondary structure design using preselected sequences for helices and loops. RNA (NEW YORK, N.Y.) 2018; 24:1555-1567. [PMID: 30097542 PMCID: PMC6191713 DOI: 10.1261/rna.066324.118] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Accepted: 08/06/2018] [Indexed: 06/08/2023]
Abstract
Nucleic acids can be designed to be nano-machines, pharmaceuticals, or probes. RNA secondary structures can form the basis of self-assembling nanostructures. There are only four natural RNA bases, therefore it can be difficult to design sequences that fold to a single, specified structure because many other structures are often possible for a given sequence. One approach taken by state-of-the-art sequence design methods is to select sequences that fold to the specified structure using stochastic, iterative refinement. The goal of this work is to accelerate design. Many existing iterative methods select and refine sequences one base pair and one unpaired nucleotide at a time. Here, the hypothesis that sequences can be preselected in order to accelerate design was tested. To this aim, a database was built of helix sequences that demonstrate thermodynamic features found in natural sequences and that also have little tendency to cross-hybridize. Additionally, a database was assembled of RNA loop sequences with low helix-formation propensity and little tendency to cross-hybridize with either the helices or other loops. These databases of preselected sequences accelerate the selection of sequences that fold with minimal ensemble defect by replacing some of the trial and error of current refinement approaches. When using the database of preselected sequences as compared to randomly chosen sequences, sequences for natural structures are designed 36 times faster, and random structures are designed six times faster. The sequences selected with the aid of the database have similar ensemble defect as those sequences selected at random. The sequence database is part of RNAstructure package at http://rna.urmc.rochester.edu/RNAstructure.html.
Collapse
Affiliation(s)
- Stanislav Bellaousov
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| | - Mohammad Kayedkhordeh
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| | | | - David H Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| |
Collapse
|
76
|
Horn B, Miller JA, Smith G, Cooper S. A Monte Carlo Approach to Skill-Based Automated Playtesting. PROCEEDINGS. AAAI ARTIFICIAL INTELLIGENCE AND INTERACTIVE DIGITAL ENTERTAINMENT CONFERENCE 2018; 2018:166-172. [PMID: 30613687 PMCID: PMC6319931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In order to create well-crafted learning progressions, designers guide players as they present game skills and give ample time for the player to master those skills. However, analyzing the quality of learning progressions is challenging, especially during the design phase, as content is ever-changing. This research presents the application of Stratabots-automated player simulations based on models of players with varying sets of skills-to the human computation game Foldit. Stratabot performance analysis coupled with player data reveals a relatively smooth learning progression within tutorial levels, yet still shows evidence for improvement. Leveraging existing general gameplaying algorithms such as Monte Carlo Evaluation can reduce the development time of this approach to automated playtesting without losing predicitive power of the player model.
Collapse
Affiliation(s)
- Britton Horn
- Northeastern University, Boston, Massachusetts, USA
| | | | - Gillian Smith
- Worcester Polytechnic Institute, Worcester, Massachusetts, USA
| | - Seth Cooper
- Northeastern University, Boston, Massachusetts, USA
| |
Collapse
|
77
|
Abstract
Abstract
Crowdsourcing is a very effective technique for outsourcing work to a vast network usually comprising anonymous people. In this study, we review the application of crowdsourcing to modeling systems originating from systems biology. We consider a variety of verified approaches, including well-known projects such as EyeWire, FoldIt, and DREAM Challenges, as well as novel projects conducted at the European Center for Bioinformatics and Genomics. The latter projects utilized crowdsourced serious games to design models of dynamic biological systems, and it was demonstrated that these models could be used successfully to involve players without domain knowledge. We conclude the review of these systems by providing 10 guidelines to facilitate the efficient use of crowdsourcing.
Collapse
|
78
|
Jenkinson J. Molecular Biology Meets the Learning Sciences: Visualizations in Education and Outreach. J Mol Biol 2018; 430:4013-4027. [DOI: 10.1016/j.jmb.2018.08.020] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Revised: 08/10/2018] [Accepted: 08/22/2018] [Indexed: 10/28/2022]
|
79
|
Interactive and scalable biology cloud experimentation for scientific inquiry and education. Nat Biotechnol 2018; 34:1293-1298. [PMID: 27926727 DOI: 10.1038/nbt.3747] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
80
|
Jain S, Laederach A, Ramos SBV, Schlick T. A pipeline for computational design of novel RNA-like topologies. Nucleic Acids Res 2018; 46:7040-7051. [PMID: 30137633 PMCID: PMC6101589 DOI: 10.1093/nar/gky524] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Revised: 05/22/2018] [Accepted: 05/24/2018] [Indexed: 12/11/2022] Open
Abstract
Designing novel RNA topologies is a challenge, with important therapeutic and industrial applications. We describe a computational pipeline for design of novel RNA topologies based on our coarse-grained RNA-As-Graphs (RAG) framework. RAG represents RNA structures as tree graphs and describes RNA secondary (2D) structure topologies (currently up to 13 vertices, ≈260 nucleotides). We have previously identified novel graph topologies that are RNA-like among these. Here we describe a systematic design pipeline and illustrate design for six broad design problems using recently developed tools for graph-partitioning and fragment assembly (F-RAG). Following partitioning of the target graph, corresponding atomic fragments from our RAG-3D database are combined using F-RAG, and the candidate atomic models are scored using a knowledge-based potential developed for 3D structure prediction. The sequences of the top scoring models are screened further using available tools for 2D structure prediction. The results indicate that our modular approach based on RNA-like topologies rather than specific 2D structures allows for greater flexibility in the design process, and generates a large number of candidate sequences quickly. Experimental structure probing using SHAPE-MaP for two sequences agree with our predictions and suggest that our combined tools yield excellent candidates for further sequence and experimental screening.
Collapse
Affiliation(s)
- Swati Jain
- Department of Chemistry, New York University, 1001 Silver, 100 Washington Square East, New York, NY 10003, USA
| | - Alain Laederach
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Silvia B V Ramos
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Tamar Schlick
- Department of Chemistry, New York University, 1001 Silver, 100 Washington Square East, New York, NY 10003, USA
- Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA
- NYU-ECNU Center for Computational Chemistry at New York University Shanghai, Room 340, Geography Building, North Zhongshan Road, 3663 Shanghai, China
| |
Collapse
|
81
|
Deep learning is combined with massive-scale citizen science to improve large-scale image classification. Nat Biotechnol 2018; 36:820-828. [PMID: 30125267 DOI: 10.1038/nbt.4225] [Citation(s) in RCA: 94] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2017] [Accepted: 07/19/2018] [Indexed: 01/11/2023]
Abstract
Pattern recognition and classification of images are key challenges throughout the life sciences. We combined two approaches for large-scale classification of fluorescence microscopy images. First, using the publicly available data set from the Cell Atlas of the Human Protein Atlas (HPA), we integrated an image-classification task into a mainstream video game (EVE Online) as a mini-game, named Project Discovery. Participation by 322,006 gamers over 1 year provided nearly 33 million classifications of subcellular localization patterns, including patterns that were not previously annotated by the HPA. Second, we used deep learning to build an automated Localization Cellular Annotation Tool (Loc-CAT). This tool classifies proteins into 29 subcellular localization patterns and can deal efficiently with multi-localization proteins, performing robustly across different cell types. Combining the annotations of gamers and deep learning, we applied transfer learning to create a boosted learner that can characterize subcellular protein distribution with F1 score of 0.72. We found that engaging players of commercial computer games provided data that augmented deep learning and enabled scalable and readily improved image classification.
Collapse
|
82
|
Cooper S, Sterling ALR, Kleffner R, Silversmith WM, Siegel JB. Repurposing Citizen Science Games as Software Tools for Professional Scientists. FDG : PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FOUNDATIONS OF DIGITAL GAMES. INTERNATIONAL CONFERENCE ON THE FOUNDATIONS OF DIGITAL GAMES 2018; 2018:39. [PMID: 30465045 PMCID: PMC6241531 DOI: 10.1145/3235765.3235770] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Scientific software is often developed with professional scientists in mind, resulting in complex tools with a steep learning curve. Citizen science games, however, are designed for citizen scientists- members of the general public. These games maintain scientific accuracy while placing design goals such as usability and enjoyment at the forefront. In this paper, we identify an emerging use of game-based technology, in the repurposing of citizen science games to be software tools for professional scientists in their work. We discuss our experience in two such repurposings: Foldit, a protein folding and design game, and Eyewire, a web-based 3D neuron reconstruction game. Based on this experience, we provide evidence that the software artifacts produced for citizen science can be useful for professional scientists, and provide an overview of key design principles we found to be useful in the process of repurposing.
Collapse
|
83
|
Smittenaar P, Walker AK, McGill S, Kartsonaki C, Robinson-Vyas RJ, McQuillan JP, Christie S, Harris L, Lawson J, Henderson E, Howat W, Hanby A, Thomas GJ, Bhattarai S, Browning L, Kiltie AE. Harnessing citizen science through mobile phone technology to screen for immunohistochemical biomarkers in bladder cancer. Br J Cancer 2018; 119:220-229. [PMID: 29991697 PMCID: PMC6048059 DOI: 10.1038/s41416-018-0156-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Revised: 05/18/2018] [Accepted: 05/31/2018] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Immunohistochemistry (IHC) is often used in personalisation of cancer treatments. Analysis of large data sets to uncover predictive biomarkers by specialists can be enormously time-consuming. Here we investigated crowdsourcing as a means of reliably analysing immunostained cancer samples to discover biomarkers predictive of cancer survival. METHODS We crowdsourced the analysis of bladder cancer TMA core samples through the smartphone app 'Reverse the Odds'. Scores from members of the public were pooled and compared to a gold standard set scored by appropriate specialists. We also used crowdsourced scores to assess associations with disease-specific survival. RESULTS Data were collected over 721 days, with 4,744,339 classifications performed. The average time per classification was approximately 15 s, with approximately 20,000 h total non-gaming time contributed. The correlation between crowdsourced and expert H-scores (staining intensity × proportion) varied from 0.65 to 0.92 across the markers tested, with six of 10 correlation coefficients at least 0.80. At least two markers (MRE11 and CK20) were significantly associated with survival in patients with bladder cancer, and a further three markers showed results warranting expert follow-up. CONCLUSIONS Crowdsourcing through a smartphone app has the potential to accurately screen IHC data and greatly increase the speed of biomarker discovery.
Collapse
Affiliation(s)
| | - Alexandra K Walker
- Cancer Research UK/MRC Oxford Institute for Radiation Oncology, University of Oxford, Oxford, OX3 7DQ, UK
| | - Shaun McGill
- Cancer Research UK/MRC Oxford Institute for Radiation Oncology, University of Oxford, Oxford, OX3 7DQ, UK
| | - Christiana Kartsonaki
- Department of Population Health, University of Oxford, Oxford, OX3 7LF, UK
- MRC Population Health Research Unit, University of Oxford, Oxford, OX3 7LF, UK
| | | | | | | | | | | | - Elizabeth Henderson
- Cancer Research UK/MRC Oxford Institute for Radiation Oncology, University of Oxford, Oxford, OX3 7DQ, UK
| | - Will Howat
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB2 ORE, UK
| | - Andrew Hanby
- Leeds Institute of Cancer and Pathology (LICAP), St James's University Hospital, Leeds, LS9 7TF, UK
| | - Gareth J Thomas
- Cancer Sciences Unit, University of Southampton Faculty of Medicine, Southampton, SO16 6YD, UK
| | - Selina Bhattarai
- Leeds Teaching Hospitals NHS Trust, St James's Hospital, Leeds, LS7 9TF, UK
| | - Lisa Browning
- Department of Cellular Pathology, John Radcliffe Hospital, Oxford, OX3 9DU, UK
- The NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford, OX3 9DU, UK
| | - Anne E Kiltie
- Cancer Research UK/MRC Oxford Institute for Radiation Oncology, University of Oxford, Oxford, OX3 7DQ, UK.
| |
Collapse
|
84
|
Eastman P, Shi J, Ramsundar B, Pande VS. Solving the RNA design problem with reinforcement learning. PLoS Comput Biol 2018; 14:e1006176. [PMID: 29927936 PMCID: PMC6029810 DOI: 10.1371/journal.pcbi.1006176] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2017] [Revised: 07/03/2018] [Accepted: 05/04/2018] [Indexed: 11/19/2022] Open
Abstract
We use reinforcement learning to train an agent for computational RNA design: given a target secondary structure, design a sequence that folds to that structure in silico. Our agent uses a novel graph convolutional architecture allowing a single model to be applied to arbitrary target structures of any length. After training it on randomly generated targets, we test it on the Eterna100 benchmark and find it outperforms all previous algorithms. Analysis of its solutions shows it has successfully learned some advanced strategies identified by players of the game Eterna, allowing it to solve some very difficult structures. On the other hand, it has failed to learn other strategies, possibly because they were not required for the targets in the training set. This suggests the possibility that future improvements to the training protocol may yield further gains in performance.
Collapse
Affiliation(s)
- Peter Eastman
- Department of Bioengineering, Stanford University, Stanford, CA, United States of America
| | - Jade Shi
- Department of Chemistry, Stanford University, Stanford, CA, United States of America
| | - Bharath Ramsundar
- Department of Computer Science, Stanford University, Stanford, CA, United States of America
| | - Vijay S. Pande
- Department of Bioengineering, Stanford University, Stanford, CA, United States of America
| |
Collapse
|
85
|
Frigerio D, Pipek P, Kimmig S, Winter S, Melzheimer J, Diblíková L, Wachter B, Richter A. Citizen science and wildlife biology: Synergies and challenges. Ethology 2018. [DOI: 10.1111/eth.12746] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Didone Frigerio
- Core facility Konrad Lorenz Forschungsstelle for Behaviour and Cognition; University of Vienna; Grünau im Almtal Austria
- Department of Behavioural Biology; University of Vienna; Vienna Austria
| | - Pavel Pipek
- Department of Ecology; Faculty of Science; Charles University; Prague Czech Republic
- Department of Invasive Ecology; Institute of Botany; The Czech Academy of Sciences; Průhonice Czech Republic
| | - Sophia Kimmig
- Department of Evolutionary Ecology; Leibniz Institute for Zoo and Wildlife Research; Berlin Germany
- Department of Ecological Dynamics; Leibniz Institute for Zoo and Wildlife Research; Berlin Germany
| | - Silvia Winter
- Institute for Integrative Nature Conservation Research and Division of Plant Protection; University of Natural Resources and Life Sciences Vienna; Vienna Austria
| | - Jörg Melzheimer
- Department of Evolutionary Ecology; Leibniz Institute for Zoo and Wildlife Research; Berlin Germany
| | - Lucie Diblíková
- Department of Ecology; Faculty of Science; Charles University; Prague Czech Republic
| | - Bettina Wachter
- Department of Evolutionary Ecology; Leibniz Institute for Zoo and Wildlife Research; Berlin Germany
| | - Anett Richter
- Helmholtz Centre for Environmental Research - UFZ; German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig; Leipzig Germany
| |
Collapse
|
86
|
Baaden M, Delalande O, Ferey N, Pasquali S, Waldispühl J, Taly A. Ten simple rules to create a serious game, illustrated with examples from structural biology. PLoS Comput Biol 2018. [PMID: 29518072 PMCID: PMC5843163 DOI: 10.1371/journal.pcbi.1005955] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Affiliation(s)
- Marc Baaden
- Laboratoire de Biochimie Théorique, Université Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Olivier Delalande
- Institut de Génétique et Développement de Rennes, Univ. Rennes 1, Rennes, France
| | | | - Samuela Pasquali
- Laboratoire de Cristallographie et RMN Biologiques, Faculté des sciences pharmaceutiques et biologiques, Université Paris Descartes et Université Sorbonne Paris Cité, Paris, France
| | | | - Antoine Taly
- Laboratoire de Biochimie Théorique, Université Paris Diderot, Sorbonne Paris Cité, Paris, France
- * E-mail:
| |
Collapse
|
87
|
Woods CT, Laederach A. Classification of RNA structure change by 'gazing' at experimental data. Bioinformatics 2018; 33:1647-1655. [PMID: 28130241 PMCID: PMC5447233 DOI: 10.1093/bioinformatics/btx041] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2016] [Accepted: 01/20/2017] [Indexed: 11/12/2022] Open
Abstract
Motivation Mutations (or Single Nucleotide Variants) in folded RiboNucleic Acid structures that cause local or global conformational change are riboSNitches. Predicting riboSNitches is challenging, as it requires making two, albeit related, structure predictions. The data most often used to experimentally validate riboSNitch predictions is Selective 2' Hydroxyl Acylation by Primer Extension, or SHAPE. Experimentally establishing a riboSNitch requires the quantitative comparison of two SHAPE traces: wild-type (WT) and mutant. Historically, SHAPE data was collected on electropherograms and change in structure was evaluated by 'gel gazing.' SHAPE data is now routinely collected with next generation sequencing and/or capillary sequencers. We aim to establish a classifier capable of simulating human 'gazing' by identifying features of the SHAPE profile that human experts agree 'looks' like a riboSNitch. Results We find strong quantitative agreement between experts when RNA scientists 'gaze' at SHAPE data and identify riboSNitches. We identify dynamic time warping and seven other features predictive of the human consensus. The classSNitch classifier reported here accurately reproduces human consensus for 167 mutant/WT comparisons with an Area Under the Curve (AUC) above 0.8. When we analyze 2019 mutant traces for 17 different RNAs, we find that features of the WT SHAPE reactivity allow us to improve thermodynamic structure predictions of riboSNitches. This is significant, as accurate RNA structural analysis and prediction is likely to become an important aspect of precision medicine. Availability and Implementation The classSNitch R package is freely available at http://classsnitch.r-forge.r-project.org . Contact alain@email.unc.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chanin Tolson Woods
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Alain Laederach
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
88
|
Churkin A, Retwitzer MD, Reinharz V, Ponty Y, Waldispühl J, Barash D. Design of RNAs: comparing programs for inverse RNA folding. Brief Bioinform 2018; 19:350-358. [PMID: 28049135 PMCID: PMC6018860 DOI: 10.1093/bib/bbw120] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Computational programs for predicting RNA sequences with desired folding properties have been extensively developed and expanded in the past several years. Given a secondary structure, these programs aim to predict sequences that fold into a target minimum free energy secondary structure, while considering various constraints. This procedure is called inverse RNA folding. Inverse RNA folding has been traditionally used to design optimized RNAs with favorable properties, an application that is expected to grow considerably in the future in light of advances in the expanding new fields of synthetic biology and RNA nanostructures. Moreover, it was recently demonstrated that inverse RNA folding can successfully be used as a valuable preprocessing step in computational detection of novel noncoding RNAs. This review describes the most popular freeware programs that have been developed for such purposes, starting from RNAinverse that was devised when formulating the inverse RNA folding problem. The most recently published ones that consider RNA secondary structure as input are antaRNA, RNAiFold and incaRNAfbinv, each having different features that could be beneficial to specific biological problems in practice. The various programs also use distinct approaches, ranging from ant colony optimization to constraint programming, in addition to adaptive walk, simulated annealing and Boltzmann sampling. This review compares between the various programs and provides a simple description of the various possibilities that would benefit practitioners in selecting the most suitable program. It is geared for specific tasks requiring RNA design based on input secondary structure, with an outlook toward the future of RNA design programs.
Collapse
Affiliation(s)
- Alexander Churkin
- Shamoon College of Engineering and Physics Department at Ben-Gurion University, Beer-Sheva, Israel
| | | | - Vladimir Reinharz
- Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel
- School of Computer Science, McGill University, Montréal QC, Canada
| | - Yann Ponty
- Laboratoire d’informatique, École Polytechnique, Palaiseau, France
| | | | - Danny Barash
- Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel
| |
Collapse
|
89
|
Alialy R, Tavakkol S, Tavakkol E, Ghorbani-Aghbologhi A, Ghaffarieh A, Kim SH, Shahabi C. A Review on the Applications of Crowdsourcing in Human Pathology. J Pathol Inform 2018. [PMID: 29531847 PMCID: PMC5841017 DOI: 10.4103/jpi.jpi_65_17] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
The advent of the digital pathology has introduced new avenues of diagnostic medicine. Among them, crowdsourcing has attracted researchers' attention in the recent years, allowing them to engage thousands of untrained individuals in research and diagnosis. While there exist several articles in this regard, prior works have not collectively documented them. We, therefore, aim to review the applications of crowdsourcing in human pathology in a semi-systematic manner. We first, introduce a novel method to do a systematic search of the literature. Utilizing this method, we, then, collect hundreds of articles and screen them against a predefined set of criteria. Furthermore, we crowdsource part of the screening process, to examine another potential application of crowdsourcing. Finally, we review the selected articles and characterize the prior uses of crowdsourcing in pathology.
Collapse
Affiliation(s)
- Roshanak Alialy
- Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Sasan Tavakkol
- Department of Computer Science, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA
| | - Elham Tavakkol
- Telemedicine Research Center, National Research Institute of Tuberculosis and Lung Diseases (NRITLD), Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Amir Ghorbani-Aghbologhi
- Department of Pathology and Laboratory Medicine, Davis Medical Center, University of California, Sacramento, CA, USA
| | - Alireza Ghaffarieh
- Department of Pathology and Laboratory Medicine, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Seon Ho Kim
- Department of Computer Science, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA
| | - Cyrus Shahabi
- Department of Computer Science, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
90
|
Yesselman JD, Das R. Modeling Small Noncanonical RNA Motifs with the Rosetta FARFAR Server. Methods Mol Biol 2018; 1490:187-98. [PMID: 27665600 DOI: 10.1007/978-1-4939-6433-8_12] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Noncanonical RNA motifs help define the vast complexity of RNA structure and function, and in many cases, these loops and junctions are on the order of only ten nucleotides in size. Unfortunately, despite their small size, there is no reliable method to determine the ensemble of lowest energy structures of junctions and loops at atomic accuracy. This chapter outlines straightforward protocols using a webserver for Rosetta Fragment Assembly of RNA with Full Atom Refinement (FARFAR) ( http://rosie.rosettacommons.org/rna_denovo/submit ) to model the 3D structure of small noncanonical RNA motifs for use in visualizing motifs and for further refinement or filtering with experimental data such as NMR chemical shifts.
Collapse
Affiliation(s)
| | - Rhiju Das
- Biochemistry Department, Stanford University, Stanford, CA, 94305, USA. .,Physics Department, Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
91
|
Yang X, Yoshizoe K, Taneda A, Tsuda K. RNA inverse folding using Monte Carlo tree search. BMC Bioinformatics 2017; 18:468. [PMID: 29110632 PMCID: PMC5674771 DOI: 10.1186/s12859-017-1882-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 10/26/2017] [Indexed: 11/10/2022] Open
Abstract
Background Artificially synthesized RNA molecules provide important ways for creating a variety of novel functional molecules. State-of-the-art RNA inverse folding algorithms can design simple and short RNA sequences of specific GC content, that fold into the target RNA structure. However, their performance is not satisfactory in complicated cases. Result We present a new inverse folding algorithm called MCTS-RNA, which uses Monte Carlo tree search (MCTS), a technique that has shown exceptional performance in Computer Go recently, to represent and discover the essential part of the sequence space. To obtain high accuracy, initial sequences generated by MCTS are further improved by a series of local updates. Our algorithm has an ability to control the GC content precisely and can deal with pseudoknot structures. Using common benchmark datasets for evaluation, MCTS-RNA showed a lot of promise as a standard method of RNA inverse folding. Conclusion MCTS-RNA is available at https://github.com/tsudalab/MCTS-RNA. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1882-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiufeng Yang
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, 277-8561, Japan
| | - Kazuki Yoshizoe
- RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihombashi Chuo-ku, Tokyo, 103-0027, Japan
| | - Akito Taneda
- Graduate School of Science and Technology, Hirosaki University, 3 Bunkyo-cho, Hirosaki, 036-8561, Japan
| | - Koji Tsuda
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, 277-8561, Japan. .,Center for Materials Research by Information Integration, National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, 305-0047, Japan. .,RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihombashi Chuo-ku, Tokyo, 103-0027, Japan.
| |
Collapse
|
92
|
Gaston J, Cooper S. To Three or not to Three: Improving Human Computation Game Onboarding with a Three-Star System. PROCEEDINGS OF THE SIGCHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS. CHI CONFERENCE 2017; 2017:5034-5039. [PMID: 29082386 DOI: 10.1145/3025453.3025997] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
While many popular casual games use three-star systems, which give players up to three stars based on their performance in a level, this technique has seen limited application in human computation games (HCGs). This gives rise to the question of what impact, if any, a three-star system will have on the behavior of players in HCGs. In this work, we examined the impact of a three-star system implemented in the protein folding HCG Foldit. We compared the basic game's introductory levels with two versions using a three-star system, where players were rewarded with more stars for completing levels in fewer moves. In one version, players could continue playing levels for as many moves as they liked, and in the other, players were forced to reset the level if they used more moves than required to achieve at least one star on the level. We observed that the three-star system encouraged players to use fewer moves, take more time per move, and replay completed levels more often. We did not observe an impact on retention. This indicates that three-star systems may be useful for re-enforcing concepts introduced by HCG levels, or as a flexible means to encourage desired behaviors.
Collapse
|
93
|
Ban K, Perc M, Levnajić Z. Robust clustering of languages across Wikipedia growth. ROYAL SOCIETY OPEN SCIENCE 2017; 4:171217. [PMID: 29134106 PMCID: PMC5666289 DOI: 10.1098/rsos.171217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Accepted: 09/18/2017] [Indexed: 06/07/2023]
Abstract
Wikipedia is the largest existing knowledge repository that is growing on a genuine crowdsourcing support. While the English Wikipedia is the most extensive and the most researched one with over 5 million articles, comparatively little is known about the behaviour and growth of the remaining 283 smaller Wikipedias, the smallest of which, Afar, has only one article. Here, we use a subset of these data, consisting of 14 962 different articles, each of which exists in 26 different languages, from Arabic to Ukrainian. We study the growth of Wikipedias in these languages over a time span of 15 years. We show that, while an average article follows a random path from one language to another, there exist six well-defined clusters of Wikipedias that share common growth patterns. The make-up of these clusters is remarkably robust against the method used for their determination, as we verify via four different clustering methods. Interestingly, the identified Wikipedia clusters have little correlation with language families and groups. Rather, the growth of Wikipedia across different languages is governed by different factors, ranging from similarities in culture to information literacy.
Collapse
Affiliation(s)
- Kristina Ban
- Faculty of Information Studies, Ljubljanska cesta 31A, 8000 Novo Mesto, Slovenia
| | - Matjaž Perc
- Faculty of Natural Sciences and Mathematics, University of Maribor, Koroška cesta 160, 2000 Maribor, Slovenia
- CAMTP—Center for Applied Mathematics and Theoretical Physics, University of Maribor, Mladinska 3, 2000 Maribor, Slovenia
| | - Zoran Levnajić
- Faculty of Information Studies, Ljubljanska cesta 31A, 8000 Novo Mesto, Slovenia
- Department of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia
| |
Collapse
|
94
|
Affiliation(s)
- Shawn M Douglas
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94143, USA.
| |
Collapse
|
95
|
Abstract
Inspired by the recent success of scientific-discovery games for predicting protein tertiary and RNA secondary structures, we have developed an open software for coarse-grained RNA folding simulations, guided by human intuition. To determine the extent to which interactive simulations can accurately predict 3D RNA structures of increasing complexity and lengths (four RNAs with 22-47 nucleotides), an interactive experiment was conducted with 141 participants who had very little knowledge of nucleic acids systems and computer simulations, and had received only a brief description of the important forces stabilizing RNA structures. Their structures and full trajectories have been analyzed statistically and compared to standard replica exchange molecular dynamics simulations. Our analyses show that participants gain easily chemical intelligence to fold simple and nontrivial topologies, with little computer time, and this result opens the door for the use of human-guided simulations to RNA folding. Our experiment shows that interactive simulations have better chances of success when the user widely explores the conformational space. Interestingly, providing on-the-fly feedback of the root mean square deviation with respect to the experimental structure did not improve the quality of the proposed models.
Collapse
|
96
|
Abstract
The generation of large-scale biomedical data is creating unprecedented opportunities for basic and translational science. Typically, the data producers perform initial analyses, but it is very likely that the most informative methods may reside with other groups. Crowdsourcing the analysis of complex and massive data has emerged as a framework to find robust methodologies. When the crowdsourcing is done in the form of collaborative scientific competitions, known as Challenges, the validation of the methods is inherently addressed. Challenges also encourage open innovation, create collaborative communities to solve diverse and important biomedical problems, and foster the creation and dissemination of well-curated data repositories.
Collapse
|
97
|
Abstract
The discoveries of myriad non-coding RNA molecules, each transiting through multiple flexible states in cells or virions, present major challenges for structure determination. Advances in high-throughput chemical mapping give new routes for characterizing entire transcriptomes in vivo, but the resulting one-dimensional data generally remain too information-poor to allow accurate de novo structure determination. Multidimensional chemical mapping (MCM) methods seek to address this challenge. Mutate-and-map (M2), RNA interaction groups by mutational profiling (RING-MaP and MaP-2D analysis) and multiplexed •OH cleavage analysis (MOHCA) measure how the chemical reactivities of every nucleotide in an RNA molecule change in response to modifications at every other nucleotide. A growing body of in vitro blind tests and compensatory mutation/rescue experiments indicate that MCM methods give consistently accurate secondary structures and global tertiary structures for ribozymes, ribosomal domains and ligand-bound riboswitch aptamers up to 200 nucleotides in length. Importantly, MCM analyses provide detailed information on structurally heterogeneous RNA states, such as ligand-free riboswitches that are functionally important but difficult to resolve with other approaches. The sequencing requirements of currently available MCM protocols scale at least quadratically with RNA length, precluding general application to transcriptomes or viral genomes at present. We propose a modify-cross-link-map (MXM) expansion to overcome this and other current limitations to resolving the in vivo 'RNA structurome'.
Collapse
|
98
|
Condon A, Kirkpatrick B, Maňuch J. Design of nucleic acid strands with long low-barrier folding pathways. NATURAL COMPUTING 2017; 16:261-284. [PMID: 28690474 PMCID: PMC5480305 DOI: 10.1007/s11047-016-9587-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
A major goal of natural computing is to design biomolecules, such as nucleic acid sequences, that can be used to perform computations. We design sequences of nucleic acids that are "guaranteed" to have long folding pathways relative to their length. This particular sequences with high probability follow low-barrier folding pathways that visit a large number of distinct structures. Long folding pathways are interesting, because they demonstrate that natural computing can potentially support long and complex computations. Formally, we provide the first scalable designs of molecules whose low-barrier folding pathways, with respect to a simple, stacked pair energy model, grow superlinearly with the molecule length, but for which all significantly shorter alternative folding pathways have an energy barrier that is [Formula: see text] times that of the low-barrier pathway for any [Formula: see text] and a sufficiently long sequence.
Collapse
Affiliation(s)
- Anne Condon
- Department of Computer Science, University of British Columbia, Vancouver, Canada
| | | | - Ján Maňuch
- Department of Computer Science, University of British Columbia, Vancouver, Canada
| |
Collapse
|
99
|
Sloma MF, Mathews DH. Exact calculation of loop formation probability identifies folding motifs in RNA secondary structures. RNA (NEW YORK, N.Y.) 2016; 22:1808-1818. [PMID: 27852924 PMCID: PMC5113201 DOI: 10.1261/rna.053694.115] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Accepted: 09/08/2016] [Indexed: 05/10/2023]
Abstract
RNA secondary structure prediction is widely used to analyze RNA sequences. In an RNA partition function calculation, free energy nearest neighbor parameters are used in a dynamic programming algorithm to estimate statistical properties of the secondary structure ensemble. Previously, partition functions have largely been used to estimate the probability that a given pair of nucleotides form a base pair, the conditional stacking probability, the accessibility to binding of a continuous stretch of nucleotides, or a representative sample of RNA structures. Here it is demonstrated that an RNA partition function can also be used to calculate the exact probability of formation of hairpin loops, internal loops, bulge loops, or multibranch loops at a given position. This calculation can also be used to estimate the probability of formation of specific helices. Benchmarking on a set of RNA sequences with known secondary structures indicated that loops that were calculated to be more probable were more likely to be present in the known structure than less probable loops. Furthermore, highly probable loops are more likely to be in the known structure than the set of loops predicted in the lowest free energy structures.
Collapse
Affiliation(s)
- Michael F Sloma
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| | - David H Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| |
Collapse
|
100
|
Shaina H, UlAbdin Z, Webb BA, Arif MJ, Jamil A. De novo sequencing and transcriptome analysis of venom glands of endoparasitoid Aenasius arizonensis (Girault) (=Aenasius bambawalei Hayat) (Hymenoptera, Encyrtidae). Toxicon 2016; 121:134-144. [PMID: 27594666 DOI: 10.1016/j.toxicon.2016.08.022] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Revised: 08/11/2016] [Accepted: 08/31/2016] [Indexed: 12/25/2022]
Abstract
Aenasius bambawalei Hayat (Encyrtidae: Hymenoptera) has been synonymized with Aenasius arizonensis (Girault) is a small, newly discovered endoparasitoid of the cotton mealybug Phenacoccuss solenopsis Tinsley (Pseudococcidae: Hemiptera), which completes its life cycle inside the body of its host and it is a potential insect control tool. Despite the acquired knowledge regarding host-parasitoid interaction, little information is available on the factors of parasitoid origin able to modulate mealybug physiology. The components of A. arizonensis venom have not been well studied but venom from other parasitoids and wasps contain biologically active proteins that have potential applications in pest management or may be of medicinal importance. To provide an insight into the transcripts expressed in the venom gland of A. arizonensis, a transcriptomic database was developed utilizing high throughput RNA sequencing approaches to analyze the genes expressed in venom glands of this endoparasitic wasp. The resulting A. arizonensis RNA sequences were assembled de-novo with contigs then blasted against the NCBI non-redundant sequence database. Contigs which matched database sequences were mostly homologous to genes from hymenopteran parasitoids such as Nasonia vitripennis, Copidosoma floridanum, Fopius arsenus and Pteromalas puparium. Further analysis of the A. arizonensis database was then performed which focused on selected genes encoding proteins potentially involved in host developmental arrest, disrupting the host immune system, host paralysis, and transcripts that support these functions. Sequenced mRNAS predicted to encode full length ORFs of Calreticulin, Serine Protease Precursor and Arginine kinase proteins were identified and the tissue specific expression of these putative venom genes was analyzed by RT-PCR. In addition, results also demonstrate that de novo transcriptome assembly allows useful venom gene expression analysis in a species lacking a genome sequence database and may provide useful information for devising control tools for insect pests and other applications.
Collapse
Affiliation(s)
- Hoor Shaina
- Department of Entomology, University of Agriculture Faisalabad, Pakistan
| | - Zain UlAbdin
- Department of Entomology, University of Agriculture Faisalabad, Pakistan.
| | - Bruce A Webb
- Department of Entomology, University of Kentucky, Lexington, USA.
| | | | - Amer Jamil
- Department of Biochemistry, University of Agriculture Faisalabad, Pakistan
| |
Collapse
|