1
|
Malik A, Zhang L, Gautam M, Dai N, Li S, Zhang H, Mathews DH, Huang L. LinearAlifold: Linear-time consensus structure prediction for RNA alignments. J Mol Biol 2024; 436:168694. [PMID: 38971557 PMCID: PMC11377157 DOI: 10.1016/j.jmb.2024.168694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 06/28/2024] [Accepted: 07/01/2024] [Indexed: 07/08/2024]
Abstract
Predicting the consensus structure of a set of aligned RNA homologs is a convenient method to find conserved structures in an RNA genome, which has many applications including viral diagnostics and therapeutics. However, the most commonly used tool for this task, RNAalifold, is prohibitively slow for long sequences, due to a cubic scaling with the sequence length, taking over a day on 400 SARS-CoV-2 and SARS-related genomes (∼30,000nt). We present LinearAlifold, a much faster alternative that scales linearly with both the sequence length and the number of sequences, based on our work LinearFold that folds a single RNA in linear time. Our work is orders of magnitude faster than RNAalifold (0.7 h on the above 400 genomes, or ∼36× speedup) and achieves higher accuracies when compared to a database of known structures. More interestingly, LinearAlifold's prediction on SARS-CoV-2 correlates well with experimentally determined structures, substantially outperforming RNAalifold. Finally, LinearAlifold supports two energy models (Vienna and BL*) and four modes: minimum free energy (MFE), maximum expected accuracy (MEA), ThreshKnot, and stochastic sampling, each of which takes under an hour for hundreds of SARS-CoV variants. Our resource is at: https://github.com/LinearFold/LinearAlifold (code) and http://linearfold.org/linear-alifold (server).
Collapse
Affiliation(s)
- Apoorv Malik
- School of EECS, Oregon State University, Corvallis, OR 97330, USA
| | - Liang Zhang
- School of EECS, Oregon State University, Corvallis, OR 97330, USA
| | - Milan Gautam
- School of EECS, Oregon State University, Corvallis, OR 97330, USA
| | - Ning Dai
- School of EECS, Oregon State University, Corvallis, OR 97330, USA
| | - Sizhen Li
- School of EECS, Oregon State University, Corvallis, OR 97330, USA
| | - He Zhang
- School of EECS, Oregon State University, Corvallis, OR 97330, USA
| | - David H Mathews
- Dept. of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642, USA; Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA; Dept. of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Liang Huang
- School of EECS, Oregon State University, Corvallis, OR 97330, USA; Dept. of Biochemistry & Biophysics, Oregon State University, Corvallis, OR 97330, USA.
| |
Collapse
|
2
|
Forsdyke DR. Speciation, natural selection, and networks: three historians versus theoretical population geneticists. Theory Biosci 2024; 143:1-26. [PMID: 38282046 DOI: 10.1007/s12064-024-00412-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 01/06/2024] [Indexed: 01/30/2024]
Abstract
In 1913, the geneticist William Bateson called for a halt in studies of genetic phenomena until evolutionary fundamentals had been sufficiently addressed at the molecular level. Nevertheless, in the 1960s, the theoretical population geneticists celebrated a "modern synthesis" of the teachings of Mendel and Darwin, with an exclusive role for natural selection in speciation. This was supported, albeit with minor reservations, by historians Mark Adams and William Provine, who taught it to generations of students. In subsequent decades, doubts were raised by molecular biologists and, despite the deep influence of various mentors, Adams and Provine noted serious anomalies and began to question traditional "just-so-stories." They were joined in challenging the genetic orthodoxy by a scientist-historian, Donald Forsdyke, who suggested that a "collective variation" postulated by Darwin's young research associate, George Romanes, and a mysterious "residue" postulated by Bateson, might relate to differences in short runs of DNA bases (oligonucleotides). The dispute between a small network of historians and a large network of geneticists can be understood in the context of national politics. Contrasts are drawn between democracies, where capturing the narrative makes reversal difficult, and dictatorships, where overthrow of a supportive dictator can result in rapid reversal.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON, K7L3N6, Canada.
| |
Collapse
|
3
|
Zhang H, Li S, Dai N, Zhang L, Mathews DH, Huang L. LinearCoFold and LinearCoPartition: linear-time algorithms for secondary structure prediction of interacting RNA molecules. Nucleic Acids Res 2023; 51:e94. [PMID: 37650626 PMCID: PMC10570024 DOI: 10.1093/nar/gkad664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 06/15/2023] [Accepted: 08/17/2023] [Indexed: 09/01/2023] Open
Abstract
Many RNAs function through RNA-RNA interactions. Fast and reliable RNA structure prediction with consideration of RNA-RNA interaction is useful, however, existing tools are either too simplistic or too slow. To address this issue, we present LinearCoFold, which approximates the complete minimum free energy structure of two strands in linear time, and LinearCoPartition, which approximates the cofolding partition function and base pairing probabilities in linear time. LinearCoFold and LinearCoPartition are orders of magnitude faster than RNAcofold. For example, on a sequence pair with combined length of 26,190 nt, LinearCoFold is 86.8× faster than RNAcofold MFE mode, and LinearCoPartition is 642.3× faster than RNAcofold partition function mode. Surprisingly, LinearCoFold and LinearCoPartition's predictions have higher PPV and sensitivity of intermolecular base pairs. Furthermore, we apply LinearCoFold to predict the RNA-RNA interaction between SARS-CoV-2 genomic RNA (gRNA) and human U4 small nuclear RNA (snRNA), which has been experimentally studied, and observe that LinearCoFold's prediction correlates better with the wet lab results than RNAcofold's.
Collapse
Affiliation(s)
- He Zhang
- Baidu Research, Sunnyvale, CA, USA
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR, USA
| | - Sizhen Li
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR, USA
| | - Ning Dai
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR, USA
| | - Liang Zhang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics,Rochester, NY 14642, USA
- Center for RNA Biology, Rochester, NY 14642, USA
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Liang Huang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
4
|
Riley AT, Robson JM, Green AA. Generative and predictive neural networks for the design of functional RNA molecules. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.14.549043. [PMID: 37503279 PMCID: PMC10370010 DOI: 10.1101/2023.07.14.549043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
RNA is a remarkably versatile molecule that has been engineered for applications in therapeutics, diagnostics, and in vivo information-processing systems. However, the complex relationship between the sequence and structural properties of an RNA molecule and its ability to perform specific functions often necessitates extensive experimental screening of candidate sequences. Here we present a generalized neural network architecture that utilizes the sequence and structure of RNA molecules (SANDSTORM) to inform functional predictions. We demonstrate that this approach achieves state-of-the-art performance across several distinct RNA prediction tasks, while learning interpretable abstractions of RNA secondary structure. We paired these predictive models with generative adversarial RNA design networks (GARDN), allowing the generative modelling of novel mRNA 5' untranslated regions and toehold switch riboregulators exhibiting a predetermined fitness. This approach enabled the design of novel toehold switches with a 43-fold increase in experimentally characterized dynamic range compared to those designed using classic thermodynamic algorithms. SANDSTORM and GARDN thus represent powerful new predictive and generative tools for the development of diagnostic and therapeutic RNA molecules with improved function.
Collapse
Affiliation(s)
- Aidan T. Riley
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
- Biological Design Center, Boston University, Boston, MA 02215, USA
| | - James M. Robson
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
- Biological Design Center, Boston University, Boston, MA 02215, USA
| | - Alexander A. Green
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
- Biological Design Center, Boston University, Boston, MA 02215, USA
- Molecular Biology, Cell Biology & Biochemistry Program, Graduate School of Arts and Sciences, Boston University, Boston, MA 02215, USA
| |
Collapse
|