1
|
Sarrazin-Gendron R, Ghasemloo Gheidari P, Butyaev A, Keding T, Cai E, Zheng J, Mutalova R, Mounthanyvong J, Zhu Y, Nazarova E, Drogaris C, Erhart K, Brouillette A, Richard G, Pitchford R, Caisse S, Blanchette M, McDonald D, Knight R, Szantner A, Waldispühl J. Improving microbial phylogeny with citizen science within a mass-market video game. Nat Biotechnol 2024:10.1038/s41587-024-02175-6. [PMID: 38622344 DOI: 10.1038/s41587-024-02175-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 02/05/2024] [Indexed: 04/17/2024]
Abstract
Citizen science video games are designed primarily for users already inclined to contribute to science, which severely limits their accessibility for an estimated community of 3 billion gamers worldwide. We created Borderlands Science (BLS), a citizen science activity that is seamlessly integrated within a popular commercial video game played by tens of millions of gamers. This integration is facilitated by a novel game-first design of citizen science games, in which the game design aspect has the highest priority, and a suitable task is then mapped to the game design. BLS crowdsources a multiple alignment task of 1 million 16S ribosomal RNA sequences obtained from human microbiome studies. Since its initial release on 7 April 2020, over 4 million players have solved more than 135 million science puzzles, a task unsolvable by a single individual. Leveraging these results, we show that our multiple sequence alignment simultaneously improves microbial phylogeny estimations and UniFrac effect sizes compared to state-of-the-art computational methods. This achievement demonstrates that hyper-gamified scientific tasks attract massive crowds of contributors and offers invaluable resources to the scientific community.
Collapse
Affiliation(s)
| | | | | | - Timothy Keding
- School of Computer Science, McGill University, Montréal, QC, Canada
| | - Eddie Cai
- School of Computer Science, McGill University, Montréal, QC, Canada
| | - Jiayue Zheng
- School of Computer Science, McGill University, Montréal, QC, Canada
| | - Renata Mutalova
- School of Computer Science, McGill University, Montréal, QC, Canada
| | | | - Yuxue Zhu
- School of Computer Science, McGill University, Montréal, QC, Canada
| | - Elena Nazarova
- School of Computer Science, McGill University, Montréal, QC, Canada
| | | | - Kornél Erhart
- Massively Multiplayer Online Science, Gryon, Switzerland
| | | | | | | | | | | | - Daniel McDonald
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Rob Knight
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
- Department of Computer Science, University of California, San Diego, La Jolla, CA, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA
| | - Attila Szantner
- School of Computer Science, McGill University, Montréal, QC, Canada
- Massively Multiplayer Online Science, Gryon, Switzerland
| | - Jérôme Waldispühl
- School of Computer Science, McGill University, Montréal, QC, Canada.
| |
Collapse
|
2
|
Kaur J, Sharma A, Mundlia P, Sood V, Pandey A, Singh G, Barnwal RP. RNA-Small-Molecule Interaction: Challenging the "Undruggable" Tag. J Med Chem 2024. [PMID: 38498010 DOI: 10.1021/acs.jmedchem.3c01354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
RNA targeting, specifically with small molecules, is a relatively new and rapidly emerging avenue with the promise to expand the target space in the drug discovery field. From being "disregarded" as an "undruggable" messenger molecule to FDA approval of an RNA-targeting small-molecule drug Risdiplam, a radical change in perspective toward RNA has been observed in the past decade. RNAs serve important regulatory functions beyond canonical protein synthesis, and their dysregulation has been reported in many diseases. A deeper understanding of RNA biology reveals that RNA molecules can adopt a variety of structures, carrying defined binding pockets that can accommodate small-molecule drugs. Due to its functional diversity and structural complexity, RNA can be perceived as a prospective target for therapeutic intervention. This perspective highlights the proof of concept of RNA-small-molecule interactions, exemplified by targeting of various transcripts with functional modulators. The advent of RNA-oriented knowledge would help expedite drug discovery.
Collapse
Affiliation(s)
- Jaskirat Kaur
- Department of Biophysics, Panjab University, Chandigarh 160014, India
| | - Akanksha Sharma
- Department of Biophysics, Panjab University, Chandigarh 160014, India
- University Institute of Pharmaceutical Sciences, Panjab University, Chandigarh 160014, India
| | - Poonam Mundlia
- Department of Biophysics, Panjab University, Chandigarh 160014, India
| | - Vikas Sood
- Department of Biochemistry, Jamia Hamdard, New Delhi 110062, India
| | - Ankur Pandey
- Department of Chemistry, Panjab University, Chandigarh 160014, India
| | - Gurpal Singh
- University Institute of Pharmaceutical Sciences, Panjab University, Chandigarh 160014, India
| | | |
Collapse
|
3
|
Greenwood T, Heitsch CE. How Parameters Influence SHAPE-Directed Predictions. Methods Mol Biol 2024; 2726:105-124. [PMID: 38780729 DOI: 10.1007/978-1-0716-3519-3_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
The structure of an RNA sequence encodes information about its biological function. Dynamic programming algorithms are often used to predict the conformation of an RNA molecule from its sequence alone, and adding experimental data as auxiliary information improves prediction accuracy. This auxiliary data is typically incorporated into the nearest neighbor thermodynamic model22 by converting the data into pseudoenergies. Here, we look at how much of the space of possible structures auxiliary data allows prediction methods to explore. We find that for a large class of RNA sequences, auxiliary data shifts the predictions significantly. Additionally, we find that predictions are highly sensitive to the parameters which define the auxiliary data pseudoenergies. In fact, the parameter space can typically be partitioned into regions where different structural predictions predominate.
Collapse
|
4
|
Tieng FYF, Abdullah-Zawawi MR, Md Shahri NAA, Mohamed-Hussein ZA, Lee LH, Mutalib NSA. A Hitchhiker's guide to RNA-RNA structure and interaction prediction tools. Brief Bioinform 2023; 25:bbad421. [PMID: 38040490 PMCID: PMC10753535 DOI: 10.1093/bib/bbad421] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 10/16/2023] [Accepted: 10/26/2023] [Indexed: 12/03/2023] Open
Abstract
RNA biology has risen to prominence after a remarkable discovery of diverse functions of noncoding RNA (ncRNA). Most untranslated transcripts often exert their regulatory functions into RNA-RNA complexes via base pairing with complementary sequences in other RNAs. An interplay between RNAs is essential, as it possesses various functional roles in human cells, including genetic translation, RNA splicing, editing, ribosomal RNA maturation, RNA degradation and the regulation of metabolic pathways/riboswitches. Moreover, the pervasive transcription of the human genome allows for the discovery of novel genomic functions via RNA interactome investigation. The advancement of experimental procedures has resulted in an explosion of documented data, necessitating the development of efficient and precise computational tools and algorithms. This review provides an extensive update on RNA-RNA interaction (RRI) analysis via thermodynamic- and comparative-based RNA secondary structure prediction (RSP) and RNA-RNA interaction prediction (RIP) tools and their general functions. We also highlighted the current knowledge of RRIs and the limitations of RNA interactome mapping via experimental data. Then, the gap between RSP and RIP, the importance of RNA homologues, the relationship between pseudoknots, and RNA folding thermodynamics are discussed. It is hoped that these emerging prediction tools will deepen the understanding of RNA-associated interactions in human diseases and hasten treatment processes.
Collapse
Affiliation(s)
- Francis Yew Fu Tieng
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
| | | | - Nur Alyaa Afifah Md Shahri
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
| | - Zeti-Azura Mohamed-Hussein
- Institute of Systems Biology (INBIOSIS), UKM, Selangor 43600, Malaysia
- Department of Applied Physics, Faculty of Science and Technology, UKM, Selangor 43600, Malaysia
| | - Learn-Han Lee
- Sunway Microbiomics Centre, School of Medical and Life Sciences, Sunway University, Sunway City 47500, Malaysia
- Novel Bacteria and Drug Discovery Research Group, Microbiome and Bioresource Research Strength, Jeffrey Cheah School of Medicine and Health Sciences, Monash University of Malaysia, Selangor 47500, Malaysia
| | - Nurul-Syakima Ab Mutalib
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
- Novel Bacteria and Drug Discovery Research Group, Microbiome and Bioresource Research Strength, Jeffrey Cheah School of Medicine and Health Sciences, Monash University of Malaysia, Selangor 47500, Malaysia
- Faculty of Health Sciences, UKM, Kuala Lumpur 50300, Malaysia
| |
Collapse
|
5
|
Sieg JP, Jolley EA, Huot MJ, Babitzke P, Bevilacqua P. In vivo-like nearest neighbor parameters improve prediction of fractional RNA base-pairing in cells. Nucleic Acids Res 2023; 51:11298-11317. [PMID: 37855684 PMCID: PMC10639048 DOI: 10.1093/nar/gkad807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 09/11/2023] [Accepted: 09/27/2023] [Indexed: 10/20/2023] Open
Abstract
We conducted a thermodynamic analysis of RNA stability in Eco80 artificial cytoplasm, which mimics in vivo conditions, and compared it to transcriptome-wide probing of mRNA. Eco80 contains 80% of Escherichia coli metabolites, with biological concentrations of metal ions, including 2 mM free Mg2+ and 29 mM metabolite-chelated Mg2+. Fluorescence-detected binding isotherms (FDBI) were used to conduct a thermodynamic analysis of 24 RNA helices and found that these helices, which have an average stability of -12.3 kcal/mol, are less stable by ΔΔGo37 ∼1 kcal/mol. The FDBI data was used to determine a set of Watson-Crick free energy nearest neighbor parameters (NNPs), which revealed that Eco80 reduces the stability of three NNPs. This information was used to adjust the NN model using the RNAstructure package. The in vivo-like adjustments have minimal effects on the prediction of RNA secondary structures determined in vitro and in silico, but markedly improve prediction of fractional RNA base pairing in E. coli, as benchmarked with our in vivo DMS and EDC RNA chemical probing data. In summary, our thermodynamic and chemical probing analyses of RNA helices indicate that RNA secondary structures are less stable in cells than in artificially stable in vitro buffer conditions.
Collapse
Affiliation(s)
- Jacob P Sieg
- Department of Chemistry, Pennsylvania State University, University Park, PA 16802, USA
- Center for RNA Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Elizabeth A Jolley
- Department of Chemistry, Pennsylvania State University, University Park, PA 16802, USA
- Center for RNA Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Melanie J Huot
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Paul Babitzke
- Center for RNA Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Philip C Bevilacqua
- Department of Chemistry, Pennsylvania State University, University Park, PA 16802, USA
- Center for RNA Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
6
|
Williams AM, Jolley EA, Santiago-Martínez MG, Chan CX, Gutell RR, Ferry JG, Bevilacqua PC. In vivo structure probing of RNA in Archaea: novel insights into the ribosome structure of Methanosarcina acetivorans. RNA (NEW YORK, N.Y.) 2023; 29:1610-1620. [PMID: 37491319 PMCID: PMC10578495 DOI: 10.1261/rna.079687.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 06/24/2023] [Indexed: 07/27/2023]
Abstract
Structure probing combined with next-generation sequencing (NGS) has provided novel insights into RNA structure-function relationships. To date, such studies have focused largely on bacteria and eukaryotes, with little attention given to the third domain of life, archaea. Furthermore, functional RNAs have not been extensively studied in archaea, leaving open questions about RNA structure and function within this domain of life. With archaeal species being diverse and having many similarities to both bacteria and eukaryotes, the archaea domain has the potential to be an evolutionary bridge. In this study, we introduce a method for probing RNA structure in vivo in the archaea domain of life. We investigated the structure of ribosomal RNA (rRNA) from Methanosarcina acetivorans, a well-studied anaerobic archaeal species, grown with either methanol or acetate. After probing the RNA in vivo with dimethyl sulfate (DMS), Structure-seq2 libraries were generated, sequenced, and analyzed. We mapped the reactivity of DMS onto the secondary structure of the ribosome, which we determined independently with comparative analysis, and confirmed the accuracy of DMS probing in M. acetivorans Accessibility of the rRNA to DMS in the two carbon sources was found to be quite similar, although some differences were found. Overall, this study establishes the Structure-seq2 pipeline in the archaea domain of life and informs about ribosomal structure within M. acetivorans.
Collapse
Affiliation(s)
- Allison M Williams
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Center for RNA Molecular Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Elizabeth A Jolley
- Center for RNA Molecular Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Chemistry, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | | | - Cheong Xin Chan
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane QLD 4072, Australia
| | - Robin R Gutell
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| | - James G Ferry
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Philip C Bevilacqua
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Center for RNA Molecular Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Chemistry, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
7
|
Bai G, Yuan Q, Guo Q, Duan Y. Identification and phylogenetic analysis in Pterorhinuschinensis (Aves, Passeriformes, Leiothrichidae) based on complete mitogenome. Zookeys 2023; 1172:15-30. [PMID: 38312436 PMCID: PMC10838554 DOI: 10.3897/zookeys.1172.107098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 07/05/2023] [Indexed: 02/06/2024] Open
Abstract
The Black-throated Laughingthrush (Pterorhinuschinensis) is a bird belonging to the order Passeriformes and the family Leiothrichidae, and is found in Cambodia, China, Laos, Myanmar, Thailand and Vietnam. Pterorhinuschinensis was once classified as belonging to the genus Garrulax. However, recent research has reclassified it in the genus Pterorhinus. In this study, we sequenced and characterized the complete mitogenome of P.chinensis. The complete mitochondrial genome of P.chinensis is 17,827 bp in length. It consists of 13 PCGs, 22 tRNAs, two rRNAs, and two control regions. All genes are coded on the H-strand, except for one PCG (nad6) and eight tRNAs. All PCGs are initiated with ATG and stopped by five types of stop codons. Our comparative analyses show irregular gene rearrangement between trnT and trnP genes with another similar control region emerging between trnE and trnF genes compared with the ancestral mitochondrial gene order, called "duplicate CR gene order". The phylogenetic position of P.chinensis and phylogenetic relationships among members of Leiothrichidae are assessed based on complete mitogenomes. Phylogenetic relationships based on Bayesian inference and maximum likelihood methods showed that Garrulax and (Pterorhinus + Ianthocincla) formed a clade. Leiothrix and Liocichla also formed a clade. Our study provides support for the transfer of P.chinensis from Garrulax to Pterorhinus. Our results provide mitochondrial genome data to further understand the mitochondrial genome characteristics and taxonomic status of Leiothrichidae.
Collapse
Affiliation(s)
- Guirong Bai
- Key Laboratory for Conserving Wildlife with Small Populations in Yunnan, Southwest Forestry University, Kunming 650224, ChinaSouthwest Forestry UniversityKunmingChina
| | - Qingmiao Yuan
- Key Laboratory for Conserving Wildlife with Small Populations in Yunnan, Southwest Forestry University, Kunming 650224, ChinaSouthwest Forestry UniversityKunmingChina
| | - Qiang Guo
- Key Laboratory for Conserving Wildlife with Small Populations in Yunnan, Southwest Forestry University, Kunming 650224, ChinaSouthwest Forestry UniversityKunmingChina
| | - Yubao Duan
- Key Laboratory for Conserving Wildlife with Small Populations in Yunnan, Southwest Forestry University, Kunming 650224, ChinaSouthwest Forestry UniversityKunmingChina
| |
Collapse
|
8
|
Rehman A, Huo QB, Du YZ. The First Complete Mitochondrial Genome of Genus Isocapnia (Plecoptera: Capniidae) and Phylogenetic Assignment of Superfamily Nemouroidea. Genes (Basel) 2023; 14:genes14050965. [PMID: 37239326 DOI: 10.3390/genes14050965] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 04/18/2023] [Accepted: 04/21/2023] [Indexed: 05/28/2023] Open
Abstract
Capniidae are a family of stoneflies, also known as snow flies, who emerge in winter. The phylogeny of Capniidae is widely accepted to be based on morphological analysis. Until now, only five Capniidae mitochondrial genomes have been sequenced so far. In addition, sampling is required to determine an accurate phylogenetic association because the generic classification of this family is still controversial and needs to be investigated further. In this study, the first mitogenome of genus Isocapnia was sequenced with a length of 16,200 bp and contained 37 genes, including a control region, two rRNAs, 22 tRNAs, and 13 PCGs, respectively. Twelve PCGs originated with the common start codon ATN (ATG, ATA, or ATT), while nad5 used GTG. Eleven PCGs had TAN (TAA or TAG) as their last codon; however, cox1 and nad5 had T as their final codon due to a shortened termination codon. All tRNA genes demonstrated the cloverleaf structure, which is distinctive for metazoans excluding the tRNASer1 (AGN) that missed the dihydrouridine arm. A Phylogenetic analysis of the superfamily Nemouroidea was constructed using thirteen PCGs from 32 formerly sequenced Plecoptera species. The Bayesian inference and maximum likelihood phylogeny tree structures derived similar results across the thirteen PCGs. Our findings strongly supported Leuctridae + ((Capniidae + Taeniopterygidae) + (Nemouridae + Notonemouridae)). Ultimately, the best well-supported generic phylogenetic relationship within Capniidae is as follows; (Isocapnia + (Capnia + Zwicknia) + (Apteroperla + Mesocapnia)). These findings will enable us to better understand the evolutionary relationships within the superfamily Nemouroidea and the generic classification and mitogenome structure of the family Capniidae.
Collapse
Affiliation(s)
- Abdur Rehman
- College of Plant Protection & Institute of Applied Entomology, Yangzhou University, Yangzhou 225009, China
| | - Qing-Bo Huo
- College of Plant Protection & Institute of Applied Entomology, Yangzhou University, Yangzhou 225009, China
| | - Yu-Zhou Du
- College of Plant Protection & Institute of Applied Entomology, Yangzhou University, Yangzhou 225009, China
- Joint International Research Laboratory of Agriculture and Agri-Product Safety, The Ministry of Education, Yangzhou University, Yangzhou 225009, China
| |
Collapse
|
9
|
Lis JA. Molecular Apomorphies in the Secondary and Tertiary Structures of Length-Variable Regions (LVRs) of 18S rRNA Shed Light on the Systematic Position of the Family Thaumastellidae (Hemiptera: Heteroptera: Pentatomoidea). Int J Mol Sci 2023; 24:ijms24097758. [PMID: 37175465 PMCID: PMC10178826 DOI: 10.3390/ijms24097758] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 04/17/2023] [Accepted: 04/19/2023] [Indexed: 05/15/2023] Open
Abstract
The SSU nrDNA, a small subunit of the nuclear ribosomal DNA (coding 18S rRNA), is one of the most frequently sequenced genes in molecular studies in Hexapoda. In insects, including true bugs (Hemiptera: Heteroptera), only its primary structures (i.e., aligned sequences) are predominantly used in phylogenetic reconstructions. It is known that including RNA secondary structures in the alignment procedure is essential for improving accuracy and robustness in phylogenetic tree reconstruction. Moreover, local plasticity in rRNAs might impact their tertiary structures and corresponding functions. To determine the systematic position of Thaumastellidae within the superfamily Pentatomoidea, the secondary and-for the first time among all Hexapoda-tertiary structures of 18S rRNAs in twelve pentatomoid families were compared and analysed. Results indicate that the shapes of the secondary and tertiary structures of the length-variable regions (LVRs) in the 18S rRNA are phylogenetically highly informative. Based on these results, it is suggested that the Thaumastellidae is maintained as an independent family within the superfamily Pentatomoidea, rather than as a part of the family Cydnidae. Moreover, the analyses indicate a close relationship between Sehirinae and Parastrachiidae, expressed in morpho-molecular synapomorphies in the predicted secondary and tertiary structures of the length-variable region L (LVR L).
Collapse
Affiliation(s)
- Jerzy A Lis
- Institute of Biology, University of Opole, Oleska 22, 45-052 Opole, Poland
| |
Collapse
|
10
|
Qiu X. Sequence similarity governs generalizability of de novo deep learning models for RNA secondary structure prediction. PLoS Comput Biol 2023; 19:e1011047. [PMID: 37068100 PMCID: PMC10138783 DOI: 10.1371/journal.pcbi.1011047] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Revised: 04/27/2023] [Accepted: 03/25/2023] [Indexed: 04/18/2023] Open
Abstract
Making no use of physical laws or co-evolutionary information, de novo deep learning (DL) models for RNA secondary structure prediction have achieved far superior performances than traditional algorithms. However, their statistical underpinning raises the crucial question of generalizability. We present a quantitative study of the performance and generalizability of a series of de novo DL models, with a minimal two-module architecture and no post-processing, under varied similarities between seen and unseen sequences. Our models demonstrate excellent expressive capacities and outperform existing methods on common benchmark datasets. However, model generalizability, i.e., the performance gap between the seen and unseen sets, degrades rapidly as the sequence similarity decreases. The same trends are observed from several recent DL and machine learning models. And an inverse correlation between performance and generalizability is revealed collectively across all learning-based models with wide-ranging architectures and sizes. We further quantitate how generalizability depends on sequence and structure identity scores via pairwise alignment, providing unique quantitative insights into the limitations of statistical learning. Generalizability thus poses a major hurdle for deploying de novo DL models in practice and various pathways for future advances are discussed.
Collapse
Affiliation(s)
- Xiangyun Qiu
- Department of Physics, George Washington University, Washington DC, United States of America
| |
Collapse
|
11
|
Mattick JS. RNA out of the mist. Trends Genet 2023; 39:187-207. [PMID: 36528415 DOI: 10.1016/j.tig.2022.11.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 11/08/2022] [Accepted: 11/27/2022] [Indexed: 12/23/2022]
Abstract
RNA has long been regarded primarily as the intermediate between genes and proteins. It was a surprise then to discover that eukaryotic genes are mosaics of mRNA sequences interrupted by large tracts of transcribed but untranslated sequences, and that multicellular organisms also express many long 'intergenic' and antisense noncoding RNAs (lncRNAs). The identification of small RNAs that regulate mRNA translation and half-life did not disturb the prevailing view that animals and plant genomes are full of evolutionary debris and that their development is mainly supervised by transcription factors. Gathering evidence to the contrary involved addressing the low conservation, expression, and genetic visibility of lncRNAs, demonstrating their cell-specific roles in cell and developmental biology, and their association with chromatin-modifying complexes and phase-separated domains. The emerging picture is that most lncRNAs are the products of genetic loci termed 'enhancers', which marshal generic effector proteins to their sites of action to control cell fate decisions during development.
Collapse
Affiliation(s)
- John S Mattick
- School of Biotechnology and Biomolecular Sciences, UNSW, Sydney, NSW 2052, Australia; UNSW RNA Institute, UNSW, Sydney, NSW 2052, Australia.
| |
Collapse
|
12
|
Zhao Q, Mao Q, Zhao Z, Yuan W, He Q, Sun Q, Yao Y, Fan X. RNA independent fragment partition method based on deep learning for RNA secondary structure prediction. Sci Rep 2023; 13:2861. [PMID: 36801945 PMCID: PMC9938198 DOI: 10.1038/s41598-023-30124-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 02/16/2023] [Indexed: 02/19/2023] Open
Abstract
The non-coding RNA secondary structure largely determines its function. Hence, accuracy in structure acquisition is of great importance. Currently, this acquisition primarily relies on various computational methods. The prediction of the structures of long RNA sequences with high precision and reasonable computational cost remains challenging. Here, we propose a deep learning model, RNA-par, which could partition an RNA sequence into several independent fragments (i-fragments) based on its exterior loops. Each i-fragment secondary structure predicted individually could be further assembled to acquire the complete RNA secondary structure. In the examination of our independent test set, the average length of the predicted i-fragments was 453 nt, which was considerably shorter than that of complete RNA sequences (848 nt). The accuracy of the assembled structures was higher than that of the structures predicted directly using the state-of-the-art RNA secondary structure prediction methods. This proposed model could serve as a preprocessing step for RNA secondary structure prediction for enhancing the predictive performance (especially for long RNA sequences) and reducing the computational cost. In the future, predicting the secondary structure of long-sequence RNA with high accuracy can be enabled by developing a framework combining RNA-par with various existing RNA secondary structure prediction algorithms. Our models, test codes and test data are provided at https://github.com/mianfei71/RNAPar .
Collapse
Affiliation(s)
- Qi Zhao
- grid.412252.20000 0004 0368 6968College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169 Liaoning China
| | - Qian Mao
- grid.411356.40000 0000 9339 3042College of Light Industry, Liaoning University, Shenyang, 110036 Liaoning China
| | - Zheng Zhao
- grid.440686.80000 0001 0543 8253College of Artificial Intelligence, Dalian Maritime University, Dalian, 116026 Liaoning China
| | - Wenxuan Yuan
- grid.412252.20000 0004 0368 6968College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169 Liaoning China
| | - Qiang He
- grid.412252.20000 0004 0368 6968College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169 Liaoning China
| | - Qixuan Sun
- grid.412252.20000 0004 0368 6968College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169 Liaoning China
| | - Yudong Yao
- grid.217309.e0000 0001 2180 0654Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, NJ 07030 USA
| | - Xiaoya Fan
- School of Software, Dalian University of Technology, Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian, 116620, Liaoning, China.
| |
Collapse
|
13
|
Morishita EC. Discovery of RNA-targeted small molecules through the merging of experimental and computational technologies. Expert Opin Drug Discov 2023; 18:207-226. [PMID: 36322542 DOI: 10.1080/17460441.2022.2134852] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
INTRODUCTION The field of RNA-targeted small molecules is rapidly evolving, owing to the advances in experimental and computational technologies. With the identification of several bioactive small molecules that target RNA, including the FDA-approved risdiplam, the biopharmaceutical industry is gaining confidence in the field. This review, based on the literature obtained from PubMed, aims to disseminate information about the various technologies developed for targeting RNA with small molecules and propose areas for improvement to develop drugs more efficiently, particularly those linked to diseases with unmet medical needs. AREAS COVERED The technologies for the identification of RNA targets, screening of chemical libraries against RNA, assessing the bioactivity and target engagement of the hit compounds, structure determination, and hit-to-lead optimization are reviewed. Along with the description of the technologies, their strengths, limitations, and examples of how they can impact drug discovery are provided. EXPERT OPINION Many existing technologies employed for protein targets have been repurposed for use in the discovery of RNA-targeted small molecules. In addition, technologies tailored for RNA targets have been developed. Nevertheless, more improvements are necessary, such as artificial intelligence to dissect important RNA structures and RNA-small-molecule interactions and more powerful chemical probing and structure prediction techniques.
Collapse
|
14
|
Cooper HB, Krause KL, Gardner PP. Finding priority bacterial ribosomes for future structural and antimicrobial research based upon global RNA and protein sequence analysis. PeerJ 2023; 11:e14969. [PMID: 36974140 PMCID: PMC10039652 DOI: 10.7717/peerj.14969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 02/07/2023] [Indexed: 03/29/2023] Open
Abstract
Ribosome-targeting antibiotics comprise over half of antibiotics used in medicine, but our fundamental knowledge of their binding sites is derived primarily from ribosome structures of non-pathogenic species. These include Thermus thermophilus, Deinococcus radiodurans and the archaean Haloarcula marismortui, as well as the commensal and sometimes pathogenic organism, Escherichia coli. Advancements in electron cryomicroscopy have allowed for the determination of more ribosome structures from pathogenic bacteria, with each study highlighting species-specific differences that had not been observed in the non-pathogenic structures. These observed differences suggest that more novel ribosome structures, particularly from pathogens, are required for a more accurate understanding of the level of diversity of the entire bacterial ribosome, with the potential of leading to innovative advancements in antibiotic research. In this study, high accuracy covariance and hidden Markov models were used to annotate ribosomal RNA and protein sequences respectively from genomic sequence, allowing us to determine the underlying ribosomal sequence diversity using phylogenetic methods. This analysis provided evidence that the current non-pathogenic ribosome structures are not sufficient representatives of some pathogenic bacteria, such as Campylobacter pylori, or of whole phyla such as Bacteroidota (Bacteroidetes).
Collapse
Affiliation(s)
- Helena B. Cooper
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, Victoria, Australia
| | - Kurt L. Krause
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
| | - Paul P. Gardner
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
| |
Collapse
|
15
|
Maduranga KDG, Zadorozhnyy V, Ye Q. Symmetry-structured convolutional neural networks. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-08168-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
16
|
Childs-Disney JL, Yang X, Gibaut QMR, Tong Y, Batey RT, Disney MD. Targeting RNA structures with small molecules. Nat Rev Drug Discov 2022; 21:736-762. [PMID: 35941229 PMCID: PMC9360655 DOI: 10.1038/s41573-022-00521-4] [Citation(s) in RCA: 173] [Impact Index Per Article: 86.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/17/2022] [Indexed: 01/07/2023]
Abstract
RNA adopts 3D structures that confer varied functional roles in human biology and dysfunction in disease. Approaches to therapeutically target RNA structures with small molecules are being actively pursued, aided by key advances in the field including the development of computational tools that predict evolutionarily conserved RNA structures, as well as strategies that expand mode of action and facilitate interactions with cellular machinery. Existing RNA-targeted small molecules use a range of mechanisms including directing splicing - by acting as molecular glues with cellular proteins (such as branaplam and the FDA-approved risdiplam), inhibition of translation of undruggable proteins and deactivation of functional structures in noncoding RNAs. Here, we describe strategies to identify, validate and optimize small molecules that target the functional transcriptome, laying out a roadmap to advance these agents into the next decade.
Collapse
Affiliation(s)
| | - Xueyi Yang
- Department of Chemistry, Scripps Research, Jupiter, FL, USA
| | | | - Yuquan Tong
- Department of Chemistry, Scripps Research, Jupiter, FL, USA
| | - Robert T Batey
- Department of Biochemistry, University of Colorado, Boulder, CO, USA.
| | | |
Collapse
|
17
|
Ross CJ, Ulitsky I. Discovering functional motifs in long noncoding RNAs. WILEY INTERDISCIPLINARY REVIEWS. RNA 2022; 13:e1708. [PMID: 34981665 DOI: 10.1002/wrna.1708] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 11/19/2021] [Accepted: 12/04/2021] [Indexed: 12/27/2022]
Abstract
Long noncoding RNAs (lncRNAs) are products of pervasive transcription that closely resemble messenger RNAs on the molecular level, yet function through largely unknown modes of action. The current model is that the function of lncRNAs often relies on specific, typically short, conserved elements, connected by linkers in which specific sequences and/or structures are less important. This notion has fueled the development of both computational and experimental methods focused on the discovery of functional elements within lncRNA genes, based on diverse signals such as evolutionary conservation, predicted structural elements, or the ability to rescue loss-of-function phenotypes. In this review, we outline the main challenges that the different methods need to overcome, describe the recently developed approaches, and discuss their respective limitations. This article is categorized under: RNA Evolution and Genomics > Computational Analyses of RNA RNA Interactions with Proteins and Other Molecules > Protein-RNA Interactions: Functional Implications Regulatory RNAs/RNAi/Riboswitches > Regulatory RNAs.
Collapse
Affiliation(s)
- Caroline Jane Ross
- Biological Regulation and Molecular Neuroscience, Weizmann Institute of Science, Rehovot, Israel
| | - Igor Ulitsky
- Biological Regulation and Molecular Neuroscience, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
18
|
Gray M, Chester S, Jabbari H. KnotAli: informed energy minimization through the use of evolutionary information. BMC Bioinformatics 2022; 23:159. [PMID: 35505276 PMCID: PMC9063079 DOI: 10.1186/s12859-022-04673-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 04/05/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Improving the prediction of structures, especially those containing pseudoknots (structures with crossing base pairs) is an ongoing challenge. Homology-based methods utilize structural similarities within a family to predict the structure. However, their prediction is limited to the consensus structure, and by the quality of the alignment. Minimum free energy (MFE) based methods, on the other hand, do not rely on familial information and can predict structures of novel RNA molecules. Their prediction normally suffers from inaccuracies due to their underlying energy parameters. RESULTS We present a new method for prediction of RNA pseudoknotted secondary structures that combines the strengths of MFE prediction and alignment-based methods. KnotAli takes a multiple RNA sequence alignment as input and uses covariation and thermodynamic energy minimization to predict possibly pseudoknotted secondary structures for each individual sequence in the alignment. We compared KnotAli's performance to that of three other alignment-based programs, two that can handle pseudoknotted structures and one control, on a large data set of 3034 RNA sequences with varying lengths and levels of sequence conservation from 10 families with pseudoknotted and pseudoknot-free reference structures. We produced sequence alignments for each family using two well-known sequence aligners (MUSCLE and MAFFT). CONCLUSIONS We found KnotAli's performance to be superior in 6 of the 10 families for MUSCLE and 7 of the 10 for MAFFT. While both KnotAli and Cacofold use background noise correction strategies, we found KnotAli's predictions to be less dependent on the alignment quality. KnotAli can be found online at the Zenodo image: https://doi.org/10.5281/zenodo.5794719.
Collapse
Affiliation(s)
- Mateo Gray
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Sean Chester
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Hosna Jabbari
- Department of Computer Science, University of Victoria, Victoria, Canada. .,Institute on Aging and Lifelong Health, University of Victoria, Victoria, Canada.
| |
Collapse
|
19
|
Noller HF, Donohue JP, Gutell RR. The universally conserved nucleotides of the small subunit ribosomal RNAs. RNA (NEW YORK, N.Y.) 2022; 28:623-644. [PMID: 35115361 PMCID: PMC9014874 DOI: 10.1261/rna.079019.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 01/19/2022] [Indexed: 05/03/2023]
Abstract
The ribosomal RNAs, along with their substrates the transfer RNAs, contain the most highly conserved nucleotides in all of biology. We have assembled a database containing structure-based alignments of sequences of the small-subunit rRNAs from organisms that span the entire phylogenetic spectrum, to identify the nucleotides that are universally conserved. In its simplest (bacterial and archaeal) forms, the small-subunit rRNA has ∼1500 nt, of which we identify 140 that are absolutely invariant among the 1961 species in our alignment. We examine the positions and detailed structural and functional interactions of these universal nucleotides in the context of a half century of biochemical and genetic studies and high-resolution structures of ribosome functional complexes. The vast majority of these nucleotides are exposed on the subunit interface surface of the small subunit, where the functional processes of the ribosome take place. However, only 40 of them have been directly implicated in specific ribosomal functions, such as contacting the tRNAs, mRNA, or translation factors. The roles of many other invariant nucleotides may serve to constrain the positions and orientations of those nucleotides that are directly involved in function. Yet others can be rationalized by participation in unusual noncanonical tertiary structures that may uniquely allow correct folding of the rRNA to form a functional ribosome. However, there remain at least 50 nt whose universal conservation is not obvious, serving as a metric for the incompleteness of our understanding of ribosome structure and function.
Collapse
Affiliation(s)
- Harry F Noller
- Center for Molecular Biology of RNA, Department of Molecular, Cell and Developmental Biology, University of California at Santa Cruz, Santa Cruz, California 95064, USA
| | - John Paul Donohue
- Center for Molecular Biology of RNA, Department of Molecular, Cell and Developmental Biology, University of California at Santa Cruz, Santa Cruz, California 95064, USA
| | - Robin R Gutell
- Department of Integrative Biology, University of Texas at Austin, Austin, Texas 78712, USA
| |
Collapse
|
20
|
Rahman MA, Tutul AA, Abdullah SM, Bayzid MS. CHAPAO: Likelihood and hierarchical reference-based representation of biomolecular sequences and applications to compressing multiple sequence alignments. PLoS One 2022; 17:e0265360. [PMID: 35436292 PMCID: PMC9015123 DOI: 10.1371/journal.pone.0265360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Accepted: 02/28/2022] [Indexed: 11/18/2022] Open
Abstract
Background
High-throughput experimental technologies are generating tremendous amounts of genomic data, offering valuable resources to answer important questions and extract biological insights. Storing this sheer amount of genomic data has become a major concern in bioinformatics. General purpose compression techniques (e.g. gzip, bzip2, 7-zip) are being widely used due to their pervasiveness and relatively good speed. However, they are not customized for genomic data and may fail to leverage special characteristics and redundancy of the biomolecular sequences.
Results
We present a new lossless compression method CHAPAO (COmpressing Alignments using Hierarchical and Probabilistic Approach), which is especially designed for multiple sequence alignments (MSAs) of biomolecular data and offers very good compression gain. We have introduced a novel hierarchical referencing technique to represent biomolecular sequences which combines likelihood based analyses of the sequence similarities and graph theoretic algorithms. We performed an extensive evaluation study using a collection of real biological data from the avian phylogenomics project, 1000 plants project (1KP), and 16S and 23S rRNA datasets. We report the performance of CHAPAO in comparison with general purpose compression techniques as well as with MFCompress and Nucleotide Archival Format (NAF)—two of the best known methods especially designed for FASTA files. Experimental results suggest that CHAPAO offers significant improvements in compression gain over most other alternative methods. CHAPAO is freely available as an open source software at https://github.com/ashiq24/CHAPAO.
Conclusion
CHAPAO advances the state-of-the-art in compression algorithms and represents a potential alternative to the general purpose compression techniques as well as to the existing specialized compression techniques for biomolecular sequences.
Collapse
Affiliation(s)
- Md Ashiqur Rahman
- Department of Computer Science and Engineering/Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Abdullah Aman Tutul
- Department of Computer Science and Engineering/Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Sifat Muhammad Abdullah
- Department of Computer Science and Engineering/Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Md. Shamsuzzoha Bayzid
- Department of Computer Science and Engineering/Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
- * E-mail:
| |
Collapse
|
21
|
Li S, Zhang H, Zhang L, Liu K, Liu B, Mathews DH, Huang L. LinearTurboFold: Linear-time global prediction of conserved structures for RNA homologs with applications to SARS-CoV-2. Proc Natl Acad Sci U S A 2021; 118:e2116269118. [PMID: 34887342 PMCID: PMC8719904 DOI: 10.1073/pnas.2116269118] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/05/2021] [Indexed: 12/26/2022] Open
Abstract
The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single-sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurboFold's purely in silico prediction not only is close to experimentally guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5' and 3' untranslated regions (UTRs) (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies undiscovered conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, small interfering RNAs (siRNAs), CRISPR-Cas13 guide RNAs, and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies and will be a useful tool in fighting the current and future pandemics.
Collapse
Affiliation(s)
- Sizhen Li
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
| | - He Zhang
- Baidu Research, Sunnyvale, CA 94089
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
| | - Liang Zhang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
- Baidu Research, Sunnyvale, CA 94089
| | - Kaibo Liu
- Baidu Research, Sunnyvale, CA 94089
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
| | | | - David H Mathews
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642;
- Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 14642
| | - Liang Huang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331;
- Baidu Research, Sunnyvale, CA 94089
| |
Collapse
|
22
|
Li S, Zhang H, Zhang L, Liu K, Liu B, Mathews DH, Huang L. LinearTurboFold: Linear-Time Global Prediction of Conserved Structures for RNA Homologs with Applications to SARS-CoV-2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2020.11.23.393488. [PMID: 34816262 PMCID: PMC8609897 DOI: 10.1101/2020.11.23.393488] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in SARS-CoV-2 genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length, and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt ) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurbo-Fold's purely in silico prediction not only is close to experimentally-guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5' and 3' UTRs (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies novel conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, siRNAs, CRISPR-Cas13 guide RNAs and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies, and will be a useful tool in fighting the current and future pandemics. SIGNIFICANCE STATEMENT Conserved RNA structures are critical for designing diagnostic and therapeutic tools for many diseases including COVID-19. However, existing algorithms are much too slow to model the global structures of full-length RNA viral genomes. We present LinearTurboFold, a linear-time algorithm that is orders of magnitude faster, making it the first method to simultaneously fold and align whole genomes of SARS-CoV-2 variants, the longest known RNA virus (∼30 kilobases). Our work enables unprecedented global structural analysis and captures long-range interactions that are out of reach for existing algorithms but crucial for RNA functions. LinearTurboFold is a general technique for full-length genome studies and can help fight the current and future pandemics.
Collapse
Affiliation(s)
- Sizhen Li
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
| | - He Zhang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| | - Liang Zhang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| | - Kaibo Liu
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| | | | - David H. Mathews
- Department of Biochemistry & Biophysics, Center for RNA Biology, and Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY
| | - Liang Huang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| |
Collapse
|
23
|
Radecki P, Uppuluri R, Deshpande K, Aviran S. Accurate detection of RNA stem-loops in structurome data reveals widespread association with protein binding sites. RNA Biol 2021; 18:521-536. [PMID: 34606413 PMCID: PMC8677038 DOI: 10.1080/15476286.2021.1971382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022] Open
Abstract
RNA molecules are known to fold into specific structures which often play a central role in their functions and regulation. In silico folding of RNA transcripts, especially when assisted with structure profiling (SP) data, is capable of accurately elucidating relevant structural conformations. However, such methods scale poorly to the swaths of SP data generated by transcriptome-wide experiments, which are becoming more commonplace and advancing our understanding of RNA structure and its regulation at global and local levels. This has created a need for tools capable of rapidly deriving structural assessments from SP data in a scalable manner. One such tool we previously introduced that aims to process such data is patteRNA, a statistical learning algorithm capable of rapidly mining big SP datasets for structural elements. Here, we present a reformulation of patteRNA's pattern recognition scheme that sees significantly improved precision without major compromises to computational overhead. Specifically, we developed a data-driven logistic classifier which interprets patteRNA's statistical characterizations of SP data in addition to local sequence properties as measured with a nearest neighbour thermodynamic model. Application of the classifier to human structurome data reveals a marked association between detected stem-loops and RNA binding protein (RBP) footprints. The results of our application demonstrate that upwards of 30% of RBP footprints occur within loops of stable stem-loop elements. Overall, our work arrives at a rapid and accurate method for automatically detecting families of RNA structure motifs and demonstrates the functional relevance of identifying them transcriptome-wide.
Collapse
Affiliation(s)
- Pierce Radecki
- Biomedical Engineering Department and Genome Center, University of California, Davis, CA, USA
| | - Rahul Uppuluri
- Biomedical Engineering Department and Genome Center, University of California, Davis, CA, USA
| | - Kaustubh Deshpande
- Biomedical Engineering Department and Genome Center, University of California, Davis, CA, USA
| | - Sharon Aviran
- Biomedical Engineering Department and Genome Center, University of California, Davis, CA, USA
| |
Collapse
|
24
|
Zhao Q, Zhao Z, Fan X, Yuan Z, Mao Q, Yao Y. Review of machine learning methods for RNA secondary structure prediction. PLoS Comput Biol 2021; 17:e1009291. [PMID: 34437528 PMCID: PMC8389396 DOI: 10.1371/journal.pcbi.1009291] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
Secondary structure plays an important role in determining the function of noncoding RNAs. Hence, identifying RNA secondary structures is of great value to research. Computational prediction is a mainstream approach for predicting RNA secondary structure. Unfortunately, even though new methods have been proposed over the past 40 years, the performance of computational prediction methods has stagnated in the last decade. Recently, with the increasing availability of RNA structure data, new methods based on machine learning (ML) technologies, especially deep learning, have alleviated the issue. In this review, we provide a comprehensive overview of RNA secondary structure prediction methods based on ML technologies and a tabularized summary of the most important methods in this field. The current pending challenges in the field of RNA secondary structure prediction and future trends are also discussed.
Collapse
Affiliation(s)
- Qi Zhao
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, China
| | - Zheng Zhao
- School of Information Science and Technology, Dalian Maritime University, Dalian, Liaoning, China
| | - Xiaoya Fan
- School of Software, Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian University of Technology, Dalian, Liaoning, China
| | - Zhengwei Yuan
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital of China Medical University, Shenyang, Liaoning, China
| | - Qian Mao
- College of Light Industry, Liaoning University, Shenyang, Liaoning, China
- Key Laboratory of Agroproducts Processing Technology, Changchun University, Changchun, Jilin, China
| | - Yudong Yao
- Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, New Jersey, United States of America
| |
Collapse
|
25
|
Sweeney BA, Hoksza D, Nawrocki EP, Ribas CE, Madeira F, Cannone JJ, Gutell R, Maddala A, Meade CD, Williams LD, Petrov AS, Chan PP, Lowe TM, Finn RD, Petrov AI. R2DT is a framework for predicting and visualising RNA secondary structure using templates. Nat Commun 2021; 12:3494. [PMID: 34108470 PMCID: PMC8190129 DOI: 10.1038/s41467-021-23555-5] [Citation(s) in RCA: 56] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 05/04/2021] [Indexed: 02/05/2023] Open
Abstract
Non-coding RNAs (ncRNA) are essential for all life, and their functions often depend on their secondary (2D) and tertiary structure. Despite the abundance of software for the visualisation of ncRNAs, few automatically generate consistent and recognisable 2D layouts, which makes it challenging for users to construct, compare and analyse structures. Here, we present R2DT, a method for predicting and visualising a wide range of RNA structures in standardised layouts. R2DT is based on a library of 3,647 templates representing the majority of known structured RNAs. R2DT has been applied to ncRNA sequences from the RNAcentral database and produced >13 million diagrams, creating the world's largest RNA 2D structure dataset. The software is amenable to community expansion, and is freely available at https://github.com/rnacentral/R2DT and a web server is found at https://rnacentral.org/r2dt .
Collapse
Affiliation(s)
- Blake A Sweeney
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - David Hoksza
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic
| | - Eric P Nawrocki
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Carlos Eduardo Ribas
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Fábio Madeira
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Jamie J Cannone
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Robin Gutell
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Aparna Maddala
- School of Chemistry and Biochemistry, Center for the Origins of Life, Georgia Institute of Technology, Atlanta, GA, USA
| | - Caeden D Meade
- School of Chemistry and Biochemistry, Center for the Origins of Life, Georgia Institute of Technology, Atlanta, GA, USA
| | - Loren Dean Williams
- School of Chemistry and Biochemistry, Center for the Origins of Life, Georgia Institute of Technology, Atlanta, GA, USA
| | - Anton S Petrov
- School of Chemistry and Biochemistry, Center for the Origins of Life, Georgia Institute of Technology, Atlanta, GA, USA
| | - Patricia P Chan
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Todd M Lowe
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Anton I Petrov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK.
| |
Collapse
|
26
|
Vila-Sanjurjo A, Smith PM, Elson JL. Heterologous Inferential Analysis (HIA) and Other Emerging Concepts: In Understanding Mitochondrial Variation In Pathogenesis: There is no More Low-Hanging Fruit. Methods Mol Biol 2021; 2277:203-245. [PMID: 34080154 DOI: 10.1007/978-1-0716-1270-5_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Here we summarize our latest efforts to elucidate the role of mtDNA variants affecting the mitochondrial translation machinery, namely variants mapping to the mt-rRNA and mt-tRNA genes. Evidence is accumulating to suggest that the cellular response to interference with mitochondrial translation is different from that occurring as a result of mutations in genes encoding OXPHOS proteins. As a result, it appears safe to state that a complete view of mitochondrial disease will not be obtained until we understand the effect of mt-rRNA and mt-tRNA variants on mitochondrial protein synthesis. Despite the identification of a large number of potentially pathogenic variants in the mitochondrially encoded rRNA (mt-rRNA) genes, we lack direct methods to firmly establish their pathogenicity. In the absence of such methods, we have devised an indirect approach named heterologous inferential analysis (HIA ) that can be used to make predictions concerning the disruptive potential of a large subset of mt-rRNA variants. We have used HIA to explore the mutational landscape of 12S and 16S mt-rRNA genes. Our HIA studies include a thorough classification of all rare variants reported in the literature as well as others obtained from studies performed in collaboration with physicians. HIA has also been used with non-mammalian mt-rRNA genes to elucidate how mitotypes influence the interaction of the individual and the environment. Regarding mt-tRNA variations, rapidly growing evidence shows that the spectrum of mutations causing mitochondrial disease might differ between the different mitochondrial haplogroups seen in human populations.
Collapse
Affiliation(s)
- Antón Vila-Sanjurjo
- Departamento de Bioloxía, Facultade de Ciencias, Centro de Investigacións en Ciencias Avanzadas (CICA), Universidade da Coruña, A Coruña, Spain.
| | - Paul M Smith
- Department of Paediatrics, Royal Aberdeen Children's Hospital, Aberdeen, UK
| | - Joanna L Elson
- Biosciences Institute Newcastle, Newcastle University, Newcastle upon Tyne, UK.
- Human Metabolomics, North-West University, Potchefstroom, South Africa.
| |
Collapse
|
27
|
Rivas E. RNA structure prediction using positive and negative evolutionary information. PLoS Comput Biol 2020; 16:e1008387. [PMID: 33125376 PMCID: PMC7657543 DOI: 10.1371/journal.pcbi.1008387] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Revised: 11/11/2020] [Accepted: 09/24/2020] [Indexed: 12/22/2022] Open
Abstract
Knowing the structure of conserved structural RNAs is important to elucidate their function and mechanism of action. However, predicting a conserved RNA structure remains unreliable, even when using a combination of thermodynamic stability and evolutionary covariation information. Here we present a method to predict a conserved RNA structure that combines the following three features. First, it uses significant covariation due to RNA structure and removes spurious covariation due to phylogeny. Second, it uses negative evolutionary information: basepairs that have variation but no significant covariation are prevented from occurring. Lastly, it uses a battery of probabilistic folding algorithms that incorporate all positive covariation into one structure. The method, named CaCoFold (Cascade variation/covariation Constrained Folding algorithm), predicts a nested structure guided by a maximal subset of positive basepairs, and recursively incorporates all remaining positive basepairs into alternative helices. The alternative helices can be compatible with the nested structure such as pseudoknots, or overlapping such as competing structures, base triplets, or other 3D non-antiparallel interactions. We present evidence that CaCoFold predictions are consistent with structures modeled from crystallography. The availability of deeper comparative sequence alignments and recent advances in statistical analysis of RNA sequence covariation have made it possible to identify a reliable set of conserved base pairs, as well as a reliable set of non-basepairs (positions that vary without covarying). Predicting an overall consensus secondary structure consistent with a set of individual inferred pairs and non-pairs remains a problem. Current RNA structure prediction algorithms that predict nested secondary structures cannot use the full set of inferred covarying pairs, because covariation analysis also identifies important non-nested pairing interactions such as pseudoknots, base triples, and alternative structures. Moreover, although algorithms for incorporating negative constraints exist, negative information from covariation analysis (inferred non-pairs) has not been systematically exploited. Here I introduce an efficient approximate RNA structure prediction algorithm that incorporates all inferred pairs and excludes all non-pairs. Using this, and an improved visualization tool, I show that the method correctly identifies many non-nested structures in agreement with known crystal structures, and improves many curated consensus secondary structure annotations in RNA sequence alignment databases.
Collapse
Affiliation(s)
- Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
- * E-mail:
| |
Collapse
|
28
|
Complete mtDNA genome of Otus sunia (Aves, Strigidae) and the relaxation of selective constrains on Strigiformes mtDNA following evolution. Genomics 2020; 112:3815-3825. [DOI: 10.1016/j.ygeno.2020.02.018] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 02/10/2020] [Accepted: 02/28/2020] [Indexed: 11/17/2022]
|
29
|
Discovery of 20 novel ribosomal leader candidates in bacteria and archaea. BMC Microbiol 2020; 20:130. [PMID: 32448158 PMCID: PMC7247131 DOI: 10.1186/s12866-020-01823-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Accepted: 05/14/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND RNAs perform many functions in addition to supplying coding templates, such as binding proteins. RNA-protein interactions are important in multiple processes in all domains of life, and the discovery of additional protein-binding RNAs expands the scope for studying such interactions. To find such RNAs, we exploited a form of ribosomal regulation. Ribosome biosynthesis must be tightly regulated to ensure that concentrations of rRNAs and ribosomal proteins (r-proteins) match. One regulatory mechanism is a ribosomal leader (r-leader), which is a domain in the 5' UTR of an mRNA whose genes encode r-proteins. When the concentration of one of these r-proteins is high, the protein binds the r-leader in its own mRNA, reducing gene expression and thus protein concentrations. To date, 35 types of r-leaders have been validated or predicted. RESULTS By analyzing additional conserved RNA structures on a multi-genome scale, we identified 20 novel r-leader structures. Surprisingly, these included new r-leaders in the highly studied organisms Escherichia coli and Bacillus subtilis. Our results reveal several cases where multiple unrelated RNA structures likely bind the same r-protein ligand, and uncover previously unknown r-protein ligands. Each r-leader consistently occurs upstream of r-protein genes, suggesting a regulatory function. That the predicted r-leaders function as RNAs is supported by evolutionary correlations in the nucleotide sequences that are characteristic of a conserved RNA secondary structure. The r-leader predictions are also consistent with the locations of experimentally determined transcription start sites. CONCLUSIONS This work increases the number of known or predicted r-leader structures by more than 50%, providing additional opportunities to study structural and evolutionary aspects of RNA-protein interactions. These results provide a starting point for detailed experimental studies.
Collapse
|
30
|
Bowman JC, Petrov AS, Frenkel-Pinter M, Penev PI, Williams LD. Root of the Tree: The Significance, Evolution, and Origins of the Ribosome. Chem Rev 2020; 120:4848-4878. [PMID: 32374986 DOI: 10.1021/acs.chemrev.9b00742] [Citation(s) in RCA: 90] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
The ribosome is an ancient molecular fossil that provides a telescope to the origins of life. Made from RNA and protein, the ribosome translates mRNA to coded protein in all living systems. Universality, economy, centrality and antiquity are ingrained in translation. The translation machinery dominates the set of genes that are shared as orthologues across the tree of life. The lineage of the translation system defines the universal tree of life. The function of a ribosome is to build ribosomes; to accomplish this task, ribosomes make ribosomal proteins, polymerases, enzymes, and signaling proteins. Every coded protein ever produced by life on Earth has passed through the exit tunnel, which is the birth canal of biology. During the root phase of the tree of life, before the last common ancestor of life (LUCA), exit tunnel evolution is dominant and unremitting. Protein folding coevolved with evolution of the exit tunnel. The ribosome shows that protein folding initiated with intrinsic disorder, supported through a short, primitive exit tunnel. Folding progressed to thermodynamically stable β-structures and then to kinetically trapped α-structures. The latter were enabled by a long, mature exit tunnel that partially offset the general thermodynamic tendency of all polypeptides to form β-sheets. RNA chaperoned the evolution of protein folding from the very beginning. The universal common core of the ribosome, with a mass of nearly 2 million Daltons, was finalized by LUCA. The ribosome entered stasis after LUCA and remained in that state for billions of years. Bacterial ribosomes never left stasis. Archaeal ribosomes have remained near stasis, except for the superphylum Asgard, which has accreted rRNA post LUCA. Eukaryotic ribosomes in some lineages appear to be logarithmically accreting rRNA over the last billion years. Ribosomal expansion in Asgard and Eukarya has been incremental and iterative, without substantial remodeling of pre-existing basal structures. The ribosome preserves information on its history.
Collapse
Affiliation(s)
- Jessica C Bowman
- Center for the Origins of Life, School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Anton S Petrov
- Center for the Origins of Life, School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Moran Frenkel-Pinter
- Center for the Origins of Life, School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Petar I Penev
- Center for the Origins of Life, School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Loren Dean Williams
- Center for the Origins of Life, School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| |
Collapse
|
31
|
Willmott D, Murrugarra D, Ye Q. Improving RNA secondary structure prediction via state inference with deep recurrent neural networks. COMPUTATIONAL AND MATHEMATICAL BIOPHYSICS 2020. [DOI: 10.1515/cmb-2020-0002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Abstract
The problem of determining which nucleotides of an RNA sequence are paired or unpaired in the secondary structure of an RNA, which we call RNA state inference, can be studied by different machine learning techniques. Successful state inference of RNA sequences can be used to generate auxiliary information for data-directed RNA secondary structure prediction. Typical tools for state inference, such as hidden Markov models, exhibit poor performance in RNA state inference, owing in part to their inability to recognize nonlocal dependencies. Bidirectional long short-term memory (LSTM) neural networks have emerged as a powerful tool that can model global nonlinear sequence dependencies and have achieved state-of-the-art performances on many different classification problems.
This paper presents a practical approach to RNA secondary structure inference centered around a deep learning method for state inference. State predictions from a deep bidirectional LSTM are used to generate synthetic SHAPE data that can be incorporated into RNA secondary structure prediction via the Nearest Neighbor Thermodynamic Model (NNTM). This method produces predicted secondary structures for a diverse test set of 16S ribosomal RNA that are, on average, 25 percentage points more accurate than undirected MFE structures. Accuracy is highly dependent on the success of our state inference method, and investigating the global features of our state predictions reveals that accuracy of both our state inference and structure inference methods are highly dependent on the similarity of pairing patterns of the sequence to the training dataset. Availability of a large training dataset is critical to the success of this approach. Code available at https://github.com/dwillmott/rna-state-inf.
Collapse
Affiliation(s)
| | - David Murrugarra
- Department of Mathematics , University of Kentucky , Lexington , KY 40506-0027 USA
| | - Qiang Ye
- Department of Mathematics , University of Kentucky , Lexington , KY 40506-0027 USA
| |
Collapse
|
32
|
Shi S, Zhang XL, Yang L, Du W, Zhao XL, Wang YJ. Prediction of RNA Secondary Structure Using Quantum-inspired Genetic Algorithms. Curr Bioinform 2020. [DOI: 10.2174/1574893614666190916154103] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
The prediction of RNA secondary structure using optimization algorithms
is key to understand the real structure of an RNA. Evolutionary algorithms (EAs) are popular
strategies for RNA secondary structure prediction. However, compared to most state-of-the-art
software based on DPAs, the performances of EAs are a bit far from satisfactory.
Objective:
Therefore, a more powerful strategy is required to improve the performances of EAs
when applied to the prediciton of RNA secondary structures.
Methods:
The idea of quantum computing is introduced here yielding a new strategy to find all
possible legal paired-bases with the constraint of minimum free energy. The sate of a stem pool
with size N is encoded as a population of QGA, which is represented by N quantum bits but not
classical bits. The updating of populations is accomplished by so-called quantum crossover
operations, quantum mutation operations and quantum rotation operations.
Results:
The numerical results show that the performances of traditional EAs are significantly
improved by using QGA with regard to not only prediction accuracy and sensitivity but also
complexity. Moreover, for RNA sequences with middle-short length, QGA even improves the
state-of-art software based on DPAs in terms of both prediction accuracy and sensitivity.
Conclusion:
This work sheds an interesting light on the applications of quantum computing on
RNA structure prediction.
Collapse
Affiliation(s)
- Sha Shi
- Engineering Research Centre of Molecular and Neuro Imaging Ministry of Education, School of life Science and Technology, Xidian University, Xi’an, China
| | - Xin-Li Zhang
- Xinxiang Medical University, Xinxiang, Henan, China
| | - Le Yang
- The First Affiliated Hospical of Xi’an Jiaotong University, Xi’an, China
| | - Wei Du
- The First Affiliated Hospical of Zhengzhou University, Zhengzhou, China
| | - Xian-Li Zhao
- Northwestern Women and Children’s Hospital, Xi'an, China
| | - Yun-Jiang Wang
- State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an, China
| |
Collapse
|
33
|
Zhou G, Loper J, Geman S. Base-pair ambiguity and the kinetics of RNA folding. BMC Bioinformatics 2019; 20:666. [PMID: 31830902 PMCID: PMC6909616 DOI: 10.1186/s12859-019-3303-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 12/02/2019] [Indexed: 01/28/2023] Open
Abstract
Background A pairings of nucleotide sequences. Given this forbidding free-energy landscape, mechanisms have evolved that contribute to a directed and efficient folding process, including catalytic proteins and error-detecting chaperones. Among structural RNA molecules we make a distinction between “bound” molecules, which are active as part of ribonucleoprotein (RNP) complexes, and “unbound,” with physiological functions performed without necessarily being bound in RNP complexes. We hypothesized that unbound molecules, lacking the partnering structure of a protein, would be more vulnerable than bound molecules to kinetic traps that compete with native stem structures. We defined an “ambiguity index”—a normalized function of the primary and secondary structure of an individual molecule that measures the number of kinetic traps available to nucleotide sequences that are paired in the native structure, presuming that unbound molecules would have lower indexes. The ambiguity index depends on the purported secondary structure, and was computed under both the comparative (“gold standard”) and an equilibrium-based prediction which approximates the minimum free energy (MFE) structure. Arguing that kinetically accessible metastable structures might be more biologically relevant than thermodynamic equilibrium structures, we also hypothesized that MFE-derived ambiguities would be less effective in separating bound and unbound molecules. Results We have introduced an intuitive and easily computed function of primary and secondary structures that measures the availability of complementary sequences that could disrupt the formation of native stems on a given molecule—an ambiguity index. Using comparative secondary structures, the ambiguity index is systematically smaller among unbound than bound molecules, as expected. Furthermore, the effect is lost when the presumably more accurate comparative structure is replaced instead by the MFE structure. Conclusions A statistical analysis of the relationship between the primary and secondary structures of non-coding RNA molecules suggests that stem-disrupting kinetic traps are substantially less prevalent in molecules not participating in RNP complexes. In that this distinction is apparent under the comparative but not the MFE secondary structure, the results highlight a possible deficiency in structure predictions when based upon assumptions of thermodynamic equilibrium.
Collapse
Affiliation(s)
| | - Jackson Loper
- Data Science Institute, Columbia University, New York, NY, USA
| | - Stuart Geman
- Division of Applied Mathematics, Brown University, Providence, RI, USA
| |
Collapse
|
34
|
Singh J, Hanson J, Paliwal K, Zhou Y. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat Commun 2019; 10:5407. [PMID: 31776342 PMCID: PMC6881452 DOI: 10.1038/s41467-019-13395-9] [Citation(s) in RCA: 152] [Impact Index Per Article: 30.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Accepted: 11/01/2019] [Indexed: 01/03/2023] Open
Abstract
The majority of our human genome transcribes into noncoding RNAs with unknown structures and functions. Obtaining functional clues for noncoding RNAs requires accurate base-pairing or secondary-structure prediction. However, the performance of such predictions by current folding-based algorithms has been stagnated for more than a decade. Here, we propose the use of deep contextual learning for base-pair prediction including those noncanonical and non-nested (pseudoknot) base pairs stabilized by tertiary interactions. Since only [Formula: see text]250 nonredundant, high-resolution RNA structures are available for model training, we utilize transfer learning from a model initially trained with a recent high-quality bpRNA dataset of [Formula: see text]10,000 nonredundant RNAs made available through comparative analysis. The resulting method achieves large, statistically significant improvement in predicting all base pairs, noncanonical and non-nested base pairs in particular. The proposed method (SPOT-RNA), with a freely available server and standalone software, should be useful for improving RNA structure modeling, sequence alignment, and functional annotations.
Collapse
Affiliation(s)
- Jaswinder Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia
| | - Jack Hanson
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia.
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr., Southport, QLD, 4222, Australia.
| |
Collapse
|
35
|
Shi S, Zhang XL, Zhao XL, Yang L, Du W, Wang YJ. Prediction of the RNA Secondary Structure Using a Multi-Population Assisted Quantum Genetic Algorithm. Hum Hered 2019; 84:1-8. [PMID: 31461710 DOI: 10.1159/000501480] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Accepted: 06/13/2019] [Indexed: 12/15/2022] Open
Abstract
Quantum-inspired genetic algorithms (QGAs) were recently introduced for the prediction of RNA secondary structures, and they showed some superiority over the existing popular strategies. In this paper, for RNA secondary structure prediction, we introduce a new QGA named multi-population assisted quantum genetic algorithm (MAQGA). In contrast to the existing QGAs, our strategy involves multi-populations which evolve together in a cooperative way in each iteration, and the genetic exchange between various populations is performed by an operator transfer operation. The numerical results show that the performances of existing genetic algorithms (evolutionary algorithms [EAs]), including traditional EAs and QGAs, can be significantly improved by using our approach. Moreover, for RNA sequences with middle-short length, the MAQGA improves even this state-of-the-art software in terms of both prediction accuracy and sensitivity.
Collapse
Affiliation(s)
- Sha Shi
- Engineering Research Center of Molecular and Neuroimaging, Ministry of Education of China, and School of Life Science and Technology, Xidian University, Xi'an, China
| | | | - Xian-Li Zhao
- Northwestern Women and Children's Hospital, Xi'an, China
| | - Le Yang
- The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an Jiaotong University, Xi'an, China
| | - Wei Du
- The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Yun-Jiang Wang
- The State Key Laboratory of Integrated Services Network (ISN), Xidian University, Xi'an, China,
| |
Collapse
|
36
|
Jiang L, Peng L, Tang M, You Z, Zhang M, West A, Ruan Q, Chen W, Merilä J. Complete mitochondrial genome sequence of the Himalayan Griffon, Gyps himalayensis (Accipitriformes: Accipitridae): Sequence, structure, and phylogenetic analyses. Ecol Evol 2019; 9:8813-8828. [PMID: 31410282 PMCID: PMC6686361 DOI: 10.1002/ece3.5433] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 06/12/2019] [Accepted: 06/17/2019] [Indexed: 11/12/2022] Open
Abstract
This is the first study to describe the mitochondrial genome of the Himalayan Griffon, Gyps himalayensis, which is an Old World vulture belonging to the family Accipitridae and occurring along the Himalayas and the adjoining Tibetan Plateau. Its mitogenome is a closed circular molecule 17,381 bp in size containing 13 protein-coding genes, 22 tRNA coding genes, two rRNA-coding genes, a control region (CR), and an extra pseudo-control region (CCR) that are conserved in most Accipitridae mitogenomes. The overall base composition of the G. himalayensis mitogenome is 24.55% A, 29.49% T, 31.59% C, and 14.37% G, which is typical for bird mitochondrial genomes. The alignment of the Accipitridae species control regions showed high levels of genetic variation and abundant AT content. At the 5' end of the domain I region, a long continuous poly-C sequence was found. Two tandem repeats were found in the pseudo-control regions. Phylogenetic analysis with Bayesian inference and maximum likelihood based on 13 protein-coding genes indicated that the relationships at the family level were (Falconidae + (Cathartidae + (Sagittariidae + (Accipitridae + Pandionidae))). In the Accipitridae clade, G. himalayensis is more closely related to Aegypius monachus than to Spilornis cheela. The complete mitogenome of G. himalayensis provides a potentially useful resource for further exploration of the taxonomic status and phylogenetic history of Gyps species.
Collapse
Affiliation(s)
- Lichun Jiang
- Key Laboratory for Molecular Biology and Biopharmaceutics, School of Life Science and TechnologyMianyang Normal UniversityMianyangSichuanChina
- Ecological Security and Protection Key Laboratory of Sichuan ProvinceMianyang Normal UniversityMianyangSichuanChina
| | - Liqing Peng
- Ecological Security and Protection Key Laboratory of Sichuan ProvinceMianyang Normal UniversityMianyangSichuanChina
| | - Min Tang
- Ecological Security and Protection Key Laboratory of Sichuan ProvinceMianyang Normal UniversityMianyangSichuanChina
| | - Zhangqiang You
- Ecological Security and Protection Key Laboratory of Sichuan ProvinceMianyang Normal UniversityMianyangSichuanChina
| | - Min Zhang
- Key Laboratory for Molecular Biology and Biopharmaceutics, School of Life Science and TechnologyMianyang Normal UniversityMianyangSichuanChina
| | - Andrea West
- Centre for Integrative Ecology, School of Life and Environmental SciencesDeakin UniversityGeelongVicAustralia
| | - Qiping Ruan
- Key Laboratory for Molecular Biology and Biopharmaceutics, School of Life Science and TechnologyMianyang Normal UniversityMianyangSichuanChina
| | - Wei Chen
- Key Laboratory for Molecular Biology and Biopharmaceutics, School of Life Science and TechnologyMianyang Normal UniversityMianyangSichuanChina
- Ecological Security and Protection Key Laboratory of Sichuan ProvinceMianyang Normal UniversityMianyangSichuanChina
| | - Juha Merilä
- Ecological Genetics Research Unit, Organismal and Evolutionary Biology Research Programme, Faculty Biological & Environmental SciencesUniversity of HelsinkiHelsinkiFinland
| |
Collapse
|
37
|
Zhang Z, Cheng Q, Ge Y. The complete mitochondrial genome of Rhynchocypris oxycephalus (Teleostei: Cyprinidae) and its phylogenetic implications. Ecol Evol 2019; 9:7819-7837. [PMID: 31346443 PMCID: PMC6635945 DOI: 10.1002/ece3.5369] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 05/21/2019] [Accepted: 05/22/2019] [Indexed: 01/18/2023] Open
Abstract
Rhynchocypris oxycephalus (Teleostei: Cyprinidae) is a typical small cold water fish, which is distributed widely and mainly inhabits in East Asia. Here, we sequenced and determined the complete mitochondrial genome of R. oxycephalus and studied its phylogenetic implication. R. oxycephalus mitogenome is 16,609 bp in length (GenBank accession no.: MH885043), and it contains 13 protein-coding genes (PCGs), two rRNA genes, 22 tRNA genes, and two noncoding regions (the control region and the putative origin of light-strand replication). 12 PCGs started with ATG, while COI used GTG as the start codon. The secondary structure of tRNA-Ser (AGN) lacks the dihydrouracil (DHU) arm. The control region is 943bp in length, with a termination-associated sequence, six conserved sequence blocks (CSB-1, CSB-2, CSB-3, CSB-D, CSB-E, CSB-F), and a repetitive sequence. Phylogenetic analysis was performed with maximum likelihood and Bayesian methods based on the concatenated nucleotide sequence of 13 PCGs and the complete sequence without control region, and the result revealed that the relationship between R. oxycephalus and R. percnurus is closest, while the relationship with R. kumgangensis is farthest. The genus Rhynchocypris is revealed as a polyphyletic group, and R. kumgangensis had distant relationship with other Rhynchocypris species. In addition, COI and ND2 genes are considered as the fittest DNA barcoding gene in genus Rhynchocypris. This work provides additional molecular information for studying R. oxycephalus conservation genetics and evolutionary relationships.
Collapse
Affiliation(s)
- Zhichao Zhang
- Key Laboratory of Oceanic and Polar Fisheries, Ministry of Agriculture and Rural Affairs, East China Sea Fisheries Research InstituteChinese Academy of Fishery SciencesShanghaiChina
- Wuxi Fisheries CollegeNanjing Agricultural UniversityWuxiChina
| | - Qiqun Cheng
- Key Laboratory of Oceanic and Polar Fisheries, Ministry of Agriculture and Rural Affairs, East China Sea Fisheries Research InstituteChinese Academy of Fishery SciencesShanghaiChina
| | - Yushuang Ge
- Key Laboratory of Oceanic and Polar Fisheries, Ministry of Agriculture and Rural Affairs, East China Sea Fisheries Research InstituteChinese Academy of Fishery SciencesShanghaiChina
- College of Marine SciencesShanghai Ocean UniversityShanghaiChina
| |
Collapse
|
38
|
Mathews DH. How to benchmark RNA secondary structure prediction accuracy. Methods 2019; 162-163:60-67. [PMID: 30951834 DOI: 10.1016/j.ymeth.2019.04.003] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Revised: 03/24/2019] [Accepted: 04/01/2019] [Indexed: 11/18/2022] Open
Abstract
RNA secondary structure prediction is widely used. As new methods are developed, these are often benchmarked for accuracy against existing methods. This review discusses good practices for performing these benchmarks, including the choice of benchmarking structures, metrics to quantify accuracy, the importance of allowing flexibility for pairs in the accepted structure, and the importance of statistical testing for significance.
Collapse
Affiliation(s)
- David H Mathews
- Center for RNA Biology, Department of Biochemistry & Biophysics, and Department of Biostatistics & Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, United States.
| |
Collapse
|
39
|
Mitchell D, Renda AJ, Douds CA, Babitzke P, Assmann SM, Bevilacqua PC. In vivo RNA structural probing of uracil and guanine base-pairing by 1-ethyl-3-(3-dimethylaminopropyl)carbodiimide (EDC). RNA (NEW YORK, N.Y.) 2019; 25:147-157. [PMID: 30341176 PMCID: PMC6298566 DOI: 10.1261/rna.067868.118] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 10/18/2018] [Indexed: 05/09/2023]
Abstract
Many biological functions performed by RNAs arise from their in vivo structures. The structure of the same RNA can differ in vitro and in vivo owing in part to the influence of molecules ranging from protons to secondary metabolites to proteins. Chemical reagents that modify the Watson-Crick (WC) face of unprotected RNA bases report on the absence of base-pairing and so are of value to determining structures adopted by RNAs. Reagents have thus been sought that can report on the native RNA structures that prevail in living cells. Dimethyl sulfate (DMS) and glyoxal penetrate cell membranes and inform on RNA secondary structure in vivo through modification of adenine (A), cytosine (C), and guanine (G) bases. Uracil (U) bases, however, have thus far eluded characterization in vivo. Herein, we show that the water-soluble carbodiimide 1-ethyl-3-(3-dimethylaminopropyl)carbodiimide (EDC) is capable of modifying the WC face of U and G in vivo, favoring the former nucleobase by a factor of ∼1.5, and doing so in the eukaryote rice, as well as in the Gram-negative bacterium Escherichia coli While both EDC and glyoxal target Gs, EDC reacts with Gs in their typical neutral state, while glyoxal requires Gs to populate the rare anionic state. EDC may thus be more generally useful; however, comparison of the reactivity of EDC and glyoxal may allow the identification of Gs with perturbed pKas in vivo and genome-wide. Overall, use of EDC with DMS allows in vivo probing of the base-pairing status of all four RNA bases.
Collapse
MESH Headings
- Base Pairing
- Base Sequence
- Escherichia coli/chemistry
- Escherichia coli/genetics
- Ethyldimethylaminopropyl Carbodiimide
- Glyoxal
- Guanine/chemistry
- Indicators and Reagents
- Molecular Probe Techniques
- Molecular Probes
- Molecular Structure
- Nucleic Acid Conformation
- Oryza/chemistry
- Oryza/genetics
- RNA/chemistry
- RNA/genetics
- RNA, Bacterial/chemistry
- RNA, Bacterial/genetics
- RNA, Plant/chemistry
- RNA, Plant/genetics
- RNA, Ribosomal, 16S/chemistry
- RNA, Ribosomal, 16S/genetics
- RNA, Ribosomal, 5.8S/chemistry
- RNA, Ribosomal, 5.8S/genetics
- Uracil/chemistry
Collapse
Affiliation(s)
- David Mitchell
- Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Center for RNA Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Andrew J Renda
- Center for RNA Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Catherine A Douds
- Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Center for RNA Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Paul Babitzke
- Center for RNA Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Sarah M Assmann
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Philip C Bevilacqua
- Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Center for RNA Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
40
|
Al-Allaf FA, Abduljaleel Z, Athar M, Taher MM, Khan W, Mehmet H, Colakogullari M, Apostolidou S, Bigger B, Waddington S, Coutelle C, Themis M, Al-Ahdal MN, Al-Mohanna FA, Al-Hassnan ZN, Bouazzaoui A. Modifying inter-cistronic sequence significantly enhances IRES dependent second gene expression in bicistronic vector: Construction of optimised cassette for gene therapy of familial hypercholesterolemia. Noncoding RNA Res 2018; 4:1-14. [PMID: 30891532 PMCID: PMC6404380 DOI: 10.1016/j.ncrna.2018.11.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 11/21/2018] [Accepted: 11/21/2018] [Indexed: 01/23/2023] Open
Abstract
Internal ribosome entry site (IRES) sequences have become a valuable tool in the construction of gene transfer and therapeutic vectors for multi-cistronic gene expression from a single mRNA transcript. The optimal conditions for effective use of this sequence to construct a functional expression vector are not precisely defined but it is generally assumed that the internal ribosome entry site dependent expression of the second gene in such as cassette is less efficient than the cap-dependent expression of the first gene. Mainly tailoring inter-cistronic sequence significantly enhances IRES dependent second gene expression in bicistronic vector further in construction of optimised cassette for gene therapy of familial hypercholesterolemia. We tailored the size of the inter-cistronic spacer sequence at the 5′ region of the internal ribosome entry site sequence using sequential deletions and demonstrated that the expression of the 3′ gene can be significantly increased to similar levels as the cap-dependent expression of the 5’ gene. Maximum expression efficiency of the downstream gene was obtained when the spacer is composed of 18–141 base pairs. In this case a single mRNA transcriptional unit containing both the first and the second Cistron was detected. Whilst constructs with spacer sequences of 216 bp or longer generate a single transcriptional unit containing only the first Cistron. This suggests that long spacers may affect transcription termination. When the spacer is 188 bp, both transcripts were produced simultaneously in most transfected cells, while a fraction of them expressed only the first but not the second gene. Expression analyses of vectors containing optimised cassettes clearly confirm that efficiency of gene transfer and biological activity of the expressed transgenic proteins in the transduced cells can be achieved. Furthermore, Computational analysis was carried out by molecular dynamics (MD) simulation to determine the most emerges as viable containing specific binding site and bridging of 5′ and 3′ ends involving direct RNA-RNA contacts and RNA-protein interactions. These results provide a mechanistic basis for translation stimulation and RNA resembling for the synergistic stimulation of cap-dependent translation.
Collapse
Affiliation(s)
- Faisal A Al-Allaf
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, P.O. Box 715, Makkah, 21955, Saudi Arabia.,Science and Technology Unit, Umm Al-Qura University, P.O. Box 715, Makkah, 21955, Saudi Arabia.,Molecular Diagnostics Unit, Department of Laboratory and Blood Bank, King Abdullah Medical City, Makkah, 21955, Saudi Arabia.,Gene Therapy Research Group, Department of Molecular and Cell Medicine, Faculty of Medicine, Imperial College London, South Kensington, London, SW7 2AZ, UK.,Institute of Reproductive and Developmental Biology, Division of Clinical Sciences, Faculty of Medicine, Imperial College London, London, W12 0NN, UK
| | - Zainularifeen Abduljaleel
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, P.O. Box 715, Makkah, 21955, Saudi Arabia.,Science and Technology Unit, Umm Al-Qura University, P.O. Box 715, Makkah, 21955, Saudi Arabia
| | - Mohammad Athar
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, P.O. Box 715, Makkah, 21955, Saudi Arabia.,Science and Technology Unit, Umm Al-Qura University, P.O. Box 715, Makkah, 21955, Saudi Arabia
| | - Mohiuddin M Taher
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, P.O. Box 715, Makkah, 21955, Saudi Arabia.,Science and Technology Unit, Umm Al-Qura University, P.O. Box 715, Makkah, 21955, Saudi Arabia
| | - Wajahatullah Khan
- Department of Basic Sciences, College of Science and Health Professions, King Saud Bin Abdulaziz University for Health Sciences, PO Box 3124, Riyadh, 11426, Saudi Arabia
| | - Huseyin Mehmet
- Institute of Reproductive and Developmental Biology, Division of Clinical Sciences, Faculty of Medicine, Imperial College London, London, W12 0NN, UK
| | - Mukaddes Colakogullari
- Institute of Reproductive and Developmental Biology, Division of Clinical Sciences, Faculty of Medicine, Imperial College London, London, W12 0NN, UK
| | - Sophia Apostolidou
- Institute of Reproductive and Developmental Biology, Division of Clinical Sciences, Faculty of Medicine, Imperial College London, London, W12 0NN, UK
| | - Brian Bigger
- Gene Therapy Research Group, Department of Molecular and Cell Medicine, Faculty of Medicine, Imperial College London, South Kensington, London, SW7 2AZ, UK
| | - Simon Waddington
- Gene Therapy Research Group, Department of Molecular and Cell Medicine, Faculty of Medicine, Imperial College London, South Kensington, London, SW7 2AZ, UK
| | - Charles Coutelle
- Gene Therapy Research Group, Department of Molecular and Cell Medicine, Faculty of Medicine, Imperial College London, South Kensington, London, SW7 2AZ, UK
| | - Michael Themis
- Gene Therapy Research Group, Department of Molecular and Cell Medicine, Faculty of Medicine, Imperial College London, South Kensington, London, SW7 2AZ, UK
| | - Mohammed N Al-Ahdal
- Department of Infection and Immunity, King Faisal Specialist Hospital & Research Center, Riyadh, 11211, Saudi Arabia
| | - Futwan A Al-Mohanna
- Department of Cell Biology, King Faisal Specialist Hospital and Research Center, Riyadh, 11211, Saudi Arabia
| | - Zuhair N Al-Hassnan
- Department of Medical Genetics, King Faisal Specialist Hospital and Research Center, Riyadh, 11211, Saudi Arabia
| | - Abdellatif Bouazzaoui
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, P.O. Box 715, Makkah, 21955, Saudi Arabia.,Science and Technology Unit, Umm Al-Qura University, P.O. Box 715, Makkah, 21955, Saudi Arabia
| |
Collapse
|
41
|
Extracting information from RNA SHAPE data: Kalman filtering approach. PLoS One 2018; 13:e0207029. [PMID: 30462682 PMCID: PMC6248965 DOI: 10.1371/journal.pone.0207029] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2018] [Accepted: 10/23/2018] [Indexed: 01/26/2023] Open
Abstract
RNA SHAPE experiments have become important and successful sources of information for RNA structure prediction. In such experiments, chemical reagents are used to probe RNA backbone flexibility at the nucleotide level, which in turn provides information on base pairing and therefore secondary structure. Little is known, however, about the statistics of such SHAPE data. In this work, we explore different representations of noise in SHAPE data and propose a statistically sound framework for extracting reliable reactivity information from multiple SHAPE replicates. Our analyses of RNA SHAPE experiments underscore that a normal noise model is not adequate to represent their data. We propose instead a log-normal representation of noise and discuss its relevance. Under this assumption, we observe that processing simulated SHAPE data by directly averaging different replicates leads to bias. Such bias can be reduced by analyzing the data following a log transformation, either by log-averaging or Kalman filtering. Application of Kalman filtering has the additional advantage that a prior on the nucleotide reactivities can be introduced. We show that the performance of Kalman filtering is then directly dependent on the quality of that prior. We conclude the paper with guidelines on signal processing of RNA SHAPE data.
Collapse
|
42
|
Kutchko KM, Madden EA, Morrison C, Plante KS, Sanders W, Vincent HA, Cruz Cisneros MC, Long KM, Moorman NJ, Heise MT, Laederach A. Structural divergence creates new functional features in alphavirus genomes. Nucleic Acids Res 2018; 46:3657-3670. [PMID: 29361131 PMCID: PMC6283419 DOI: 10.1093/nar/gky012] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Revised: 12/10/2017] [Accepted: 01/05/2018] [Indexed: 12/03/2022] Open
Abstract
Alphaviruses are mosquito-borne pathogens that cause human diseases ranging from debilitating arthritis to lethal encephalitis. Studies with Sindbis virus (SINV), which causes fever, rash, and arthralgia in humans, and Venezuelan equine encephalitis virus (VEEV), which causes encephalitis, have identified RNA structural elements that play key roles in replication and pathogenesis. However, a complete genomic structural profile has not been established for these viruses. We used the structural probing technique SHAPE-MaP to identify structured elements within the SINV and VEEV genomes. Our SHAPE-directed structural models recapitulate known RNA structures, while also identifying novel structural elements, including a new functional element in the nsP1 region of SINV whose disruption causes a defect in infectivity. Although RNA structural elements are important for multiple aspects of alphavirus biology, we found the majority of RNA structures were not conserved between SINV and VEEV. Our data suggest that alphavirus RNA genomes are highly divergent structurally despite similar genomic architecture and sequence conservation; still, RNA structural elements are critical to the viral life cycle. These findings reframe traditional assumptions about RNA structure and evolution: rather than structures being conserved, alphaviruses frequently evolve new structures that may shape interactions with host immune systems or co-evolve with viral proteins.
Collapse
Affiliation(s)
- Katrina M Kutchko
- Department of Biology, UNC-Chapel Hill, USA
- Curriculum in Bioinformatics and Computational Biology, UNC-Chapel Hill, USA
| | - Emily A Madden
- Department of Microbiology and Immunology, UNC-Chapel Hill, USA
| | | | | | - Wes Sanders
- Department of Microbiology and Immunology, UNC-Chapel Hill, USA
- Lineberger Comprehensive Cancer Center, UNC-Chapel Hill, USA
| | | | | | | | - Nathaniel J Moorman
- Department of Microbiology and Immunology, UNC-Chapel Hill, USA
- Lineberger Comprehensive Cancer Center, UNC-Chapel Hill, USA
| | - Mark T Heise
- Department of Microbiology and Immunology, UNC-Chapel Hill, USA
- Department of Genetics, UNC-Chapel Hill, USA
| | - Alain Laederach
- Department of Biology, UNC-Chapel Hill, USA
- Curriculum in Bioinformatics and Computational Biology, UNC-Chapel Hill, USA
- Lineberger Comprehensive Cancer Center, UNC-Chapel Hill, USA
| |
Collapse
|
43
|
Deng H, Cheema J, Zhang H, Woolfenden H, Norris M, Liu Z, Liu Q, Yang X, Yang M, Deng X, Cao X, Ding Y. Rice In Vivo RNA Structurome Reveals RNA Secondary Structure Conservation and Divergence in Plants. MOLECULAR PLANT 2018; 11:607-622. [PMID: 29409859 PMCID: PMC5886760 DOI: 10.1016/j.molp.2018.01.008] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Revised: 01/11/2018] [Accepted: 01/25/2018] [Indexed: 05/07/2023]
Abstract
RNA secondary structure plays a critical role in gene regulation. Rice (Oryza sativa) is one of the most important food crops in the world. However, RNA structure in rice has scarcely been studied. Here, we have successfully generated in vivo Structure-seq libraries in rice. We found that the structural flexibility of mRNAs might associate with the dynamics of biological function. Higher N6-methyladenosine (m6A) modification tends to have less RNA structure in 3' UTR, whereas GC content does not significantly affect in vivo mRNA structure to maintain efficient biological processes such as translation. Comparative analysis of RNA structurome between rice and Arabidopsis revealed that higher GC content does not lead to stronger structure and less RNA structural flexibility. Moreover, we found a weak correlation between sequence and structure conservation of the orthologs between rice and Arabidopsis. The conservation and divergence of both sequence and in vivo RNA structure corresponds to diverse and specific biological processes. Our results indicate that RNA secondary structure might offer a separate layer of selection to the sequence between monocot and dicot. Therefore, our study implies that RNA structure evolves differently in various biological processes to maintain robustness in development and adaptational flexibility during angiosperm evolution.
Collapse
Affiliation(s)
- Hongjing Deng
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research, CAS Center for Excellence in Molecular Plant Sciences, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China; Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK; College of Life Sciences, University of Chinese Academy of Sciences, 100049, Beijing, China; CAS-JIC Centre of Excellence for Plant and Microbial Science (CEPAMS), John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Jitender Cheema
- Department of Computational and Systems Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Hang Zhang
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Hugh Woolfenden
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Matthew Norris
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Zhenshan Liu
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Qi Liu
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Xiaofei Yang
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Minglei Yang
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Xian Deng
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research, CAS Center for Excellence in Molecular Plant Sciences, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Xiaofeng Cao
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research, CAS Center for Excellence in Molecular Plant Sciences, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China; CAS-JIC Centre of Excellence for Plant and Microbial Science (CEPAMS), John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK.
| | - Yiliang Ding
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK; CAS-JIC Centre of Excellence for Plant and Microbial Science (CEPAMS), John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK.
| |
Collapse
|
44
|
Ledda M, Aviran S. PATTERNA: transcriptome-wide search for functional RNA elements via structural data signatures. Genome Biol 2018; 19:28. [PMID: 29495968 PMCID: PMC5833111 DOI: 10.1186/s13059-018-1399-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Accepted: 01/30/2018] [Indexed: 02/08/2023] Open
Abstract
Establishing a link between RNA structure and function remains a great challenge in RNA biology. The emergence of high-throughput structure profiling experiments is revolutionizing our ability to decipher structure, yet principled approaches for extracting information on structural elements directly from these data sets are lacking. We present PATTERNA, an unsupervised pattern recognition algorithm that rapidly mines RNA structure motifs from profiling data. We demonstrate that PATTERNA detects motifs with an accuracy comparable to commonly used thermodynamic models and highlight its utility in automating data-directed structure modeling from large data sets. PATTERNA is versatile and compatible with diverse profiling techniques and experimental conditions.
Collapse
Affiliation(s)
- Mirko Ledda
- Department of Biomedical Engineering and Genome Center, UC Davis, 1 Shields Ave, Davis, 95616 USA
- Integrative Genetics and Genomics Graduate Group, UC Davis, 1 Shields Ave, Davis, 95616 USA
| | - Sharon Aviran
- Department of Biomedical Engineering and Genome Center, UC Davis, 1 Shields Ave, Davis, 95616 USA
| |
Collapse
|
45
|
Wang Y, Cao JJ, Li WH. Complete Mitochondrial Genome of Suwallia teleckojensis (Plecoptera: Chloroperlidae) and Implications for the Higher Phylogeny of Stoneflies. Int J Mol Sci 2018; 19:E680. [PMID: 29495588 PMCID: PMC5877541 DOI: 10.3390/ijms19030680] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Revised: 02/20/2018] [Accepted: 02/24/2018] [Indexed: 11/21/2022] Open
Abstract
Stoneflies comprise an ancient group of insects, but the phylogenetic position of Plecoptera and phylogenetic relations within Plecoptera have long been controversial, and more molecular data is required to reconstruct precise phylogeny. Herein, we present the complete mitogenome of a stonefly, Suwallia teleckojensis, which is 16146 bp in length and consists of 13 protein-coding genes (PCGs), 2 ribosomal RNAs (rRNAs), 22 transfer RNAs (tRNAs) and a control region (CR). Most PCGs initiate with the standard start codon ATN. However, ND5 and ND1 started with GTG and TTG. Typical termination codons TAA and TAG were found in eleven PCGs, and the remaining two PCGs (COII and ND5) have incomplete termination codons. All transfer RNA genes (tRNAs) have the classic cloverleaf secondary structures, with the exception of tRNASer(AGN), which lacks the dihydrouridine (DHU) arm. Secondary structures of the two ribosomal RNAs were shown referring to previous models. A large tandem repeat region, two potential stem-loop (SL) structures, Poly N structure (2 poly-A, 1 poly-T and 1 poly-C), and four conserved sequence blocks (CSBs) were detected in the control region. Finally, both maximum likelihood (ML) and Bayesian inference (BI) analyses suggested that the Capniidae was monophyletic, and the other five stonefly families form a monophyletic group. In this study, S. teleckojensis was closely related to Sweltsa longistyla, and Chloroperlidae and Perlidae were herein supported to be a sister group.
Collapse
Affiliation(s)
- Ying Wang
- Department of Plant Protection, Henan Institute of Science and Technology, Xinxiang 453003, Henan, China.
| | - Jin-Jun Cao
- Department of Plant Protection, Henan Institute of Science and Technology, Xinxiang 453003, Henan, China.
| | - Wei-Hai Li
- Department of Plant Protection, Henan Institute of Science and Technology, Xinxiang 453003, Henan, China.
| |
Collapse
|
46
|
Stormo GD. An Overview of RNA Sequence Analyses: Structure Prediction, ncRNA Gene Identification, and RNAi Design. ACTA ACUST UNITED AC 2018; 43:12.1.1-12.1.3. [DOI: 10.1002/0471250953.bi1201s43] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Gary D. Stormo
- Washington University School of Medicine Saint Louis Missouri
| |
Collapse
|
47
|
A method to improve prediction of secondary structure for large single RNA sequences. Biochem Biophys Res Commun 2018; 496:523-528. [PMID: 29339162 DOI: 10.1016/j.bbrc.2018.01.086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Accepted: 01/11/2018] [Indexed: 11/20/2022]
Abstract
The function of a particular RNA molecule within an organic system is principally determined by its structure. The current physical methods available for structure determination are time consuming and expensive. Hence, computational methods for structure prediction are often used. The prediction of the structure of a large single sequence of RNA needs a lot of research work. In the present work, a method is introduced to improve the prediction of large single sequence RNA secondary structure obtained by Mfold program using the concept of minimum free energy. The Mfold program contains a constraint option that allows forcing some helices in the predicted structure. In our method, some of the firstly formed hairpins that are expected, by a statistical study, to be present in the real structure are forced in the Mfold predicted structure. The results show improvement, toward the real structure, in the Mfold predicted structure and this gives evidence to the RNA kinetic folding.
Collapse
|
48
|
Wu YZ, Rédei D, Eger J, Wang YH, Wu HY, Carapezza A, Kment P, Cai B, Sun XY, Guo PL, Luo JY, Xie Q. Phylogeny and the colourful history of jewel bugs (Insecta: Hemiptera: Scutelleridae). Cladistics 2017; 34:502-516. [DOI: 10.1111/cla.12224] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/22/2017] [Indexed: 11/30/2022] Open
Affiliation(s)
- Yan-Zhuo Wu
- Institute of Entomology; College of Life Sciences; Nankai University; 94 Weijin Road, Nankai District Tianjin 300071 China
| | - Dávid Rédei
- Institute of Entomology; College of Life Sciences; Nankai University; 94 Weijin Road, Nankai District Tianjin 300071 China
| | - Joseph Eger
- Dow AgroSciences; LLC; 2606 S. Dundee Street Tampa FL 32629 USA
| | - Yan-Hui Wang
- Institute of Entomology; College of Life Sciences; Nankai University; 94 Weijin Road, Nankai District Tianjin 300071 China
- Department of Ecology and Evolution; College of Life Sciences; Sun Yat-sen University; No. 135 Xingangxi Road Guangzhou 510275 Guangdong China
- State Key Laboratory of Biocontrol; Sun Yat-sen University; 135 Xingangxi Road Guangzhou 510275 Guangdong China
| | - Hao-Yang Wu
- Institute of Entomology; College of Life Sciences; Nankai University; 94 Weijin Road, Nankai District Tianjin 300071 China
- Department of Ecology and Evolution; College of Life Sciences; Sun Yat-sen University; No. 135 Xingangxi Road Guangzhou 510275 Guangdong China
- State Key Laboratory of Biocontrol; Sun Yat-sen University; 135 Xingangxi Road Guangzhou 510275 Guangdong China
| | - Attilio Carapezza
- University of Palermo; Via Sandro Botticelli, 15 I-90144 Palermo Italy
| | - Petr Kment
- Department of Entomology; National Museum; Cirkusová 1740 CZ-193 00 Praha 9 Czech Republic
| | - Bo Cai
- Hainan Entry-Exit Inspection and Quarantine Bureau; 9 West Haixiu Road Haikou Hainan 570311 China
| | - Xiao-Ya Sun
- Institute of Entomology; College of Life Sciences; Nankai University; 94 Weijin Road, Nankai District Tianjin 300071 China
| | - Peng-Lei Guo
- Institute of Entomology; College of Life Sciences; Nankai University; 94 Weijin Road, Nankai District Tianjin 300071 China
| | - Jiu-Yang Luo
- Institute of Entomology; College of Life Sciences; Nankai University; 94 Weijin Road, Nankai District Tianjin 300071 China
| | - Qiang Xie
- Department of Ecology and Evolution; College of Life Sciences; Sun Yat-sen University; No. 135 Xingangxi Road Guangzhou 510275 Guangdong China
| |
Collapse
|
49
|
Tan Z, Fu Y, Sharma G, Mathews DH. TurboFold II: RNA structural alignment and secondary structure prediction informed by multiple homologs. Nucleic Acids Res 2017; 45:11570-11581. [PMID: 29036420 PMCID: PMC5714223 DOI: 10.1093/nar/gkx815] [Citation(s) in RCA: 67] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 09/12/2017] [Indexed: 12/26/2022] Open
Abstract
This paper presents TurboFold II, an extension of the TurboFold algorithm for predicting secondary structures for multiple RNA homologs. TurboFold II augments the structure prediction capabilities of TurboFold by additionally providing multiple sequence alignments. Probabilities for alignment of nucleotide positions between all pairs of input sequences are iteratively estimated in TurboFold II by incorporating information from both the sequence identity and secondary structures. A multiple sequence alignment is obtained from these probabilities by using a probabilistic consistency transformation and a hierarchically computed guide tree. To assess TurboFold II, its sequence alignment and structure predictions were compared with leading tools, including methods that focus on alignment alone and methods that provide both alignment and structure prediction. TurboFold II has comparable alignment accuracy with MAFFT and higher accuracy than other tools. TurboFold II also has comparable structure prediction accuracy as the original TurboFold algorithm, which is one of the most accurate methods. TurboFold II is part of the RNAstructure software package, which is freely available for download at http://rna.urmc.rochester.edu under a GPL license.
Collapse
Affiliation(s)
- Zhen Tan
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
| | - Yinghan Fu
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
| | - Gaurav Sharma
- Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA.,Department of Electrical and Computer Engineering, University of Rochester, Hopeman 204, RC Box 270126, Rochester, NY 14627, USA.,Department of Biostatistics and Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 630, Rochester, NY 14642, USA
| | - David H Mathews
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA.,Department of Biostatistics and Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 630, Rochester, NY 14642, USA
| |
Collapse
|
50
|
Ray AK, Naiyer S, Singh SS, Bhattacharya A, Bhattacharya S. Application of SHAPE reveals in vivo RNA folding under normal and growth-stressed conditions in the human parasite Entamoeba histolytica. Mol Biochem Parasitol 2017; 219:42-51. [PMID: 29175581 DOI: 10.1016/j.molbiopara.2017.11.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2017] [Revised: 11/06/2017] [Accepted: 11/07/2017] [Indexed: 11/30/2022]
Abstract
Selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) is a versatile sequence independent method to probe RNA structure in vivo and in vitro. It has so far been tried mainly with model organisms. We show that cells of Entamoeba histolytica, a protozoan parasite of humans are hyper-sensitive to the in vivo SHAPE reagent, NAI, and show rapid loss of viability and RNA integrity. We optimized treatment conditions with 5.8S rRNA and Eh_U3 snoRNA to obtain NAI-modification while retaining RNA integrity. The modification patterns were highly reproducible. The in vivo folding was different from in vitro and correlated well with known interactions of 5.8S rRNA with proteins in vivo. The Eh_U3 snoRNA also showed many differences in its in vivo versus in vitro folding, which correlated with conserved interactions of this RNA with 18S rRNA and 5'-ETS. Further, Eh_U3 snoRNA obtained from serum-starved cells showed an open 3'-hinge structure, indicating disruption of 5'-ETS interaction. This could contribute to the observed slow processing of pre-rRNA in starved cells. Our work shows the applicability of SHAPE to study in vivo RNA folding in a parasite and will encourage the use of this reagent for RNA structure analysis in other such organisms.
Collapse
Affiliation(s)
- Ashwini Kumar Ray
- School of Environmental Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Sarah Naiyer
- School of Environmental Sciences, Jawaharlal Nehru University, New Delhi, India
| | | | - Alok Bhattacharya
- School of Life Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Sudha Bhattacharya
- School of Environmental Sciences, Jawaharlal Nehru University, New Delhi, India.
| |
Collapse
|