51
|
Sun H, Skogerbø G, Wang Z, Liu W, Li Y. Structural relationships between highly conserved elements and genes in vertebrate genomes. PLoS One 2008; 3:e3727. [PMID: 19008958 PMCID: PMC2579482 DOI: 10.1371/journal.pone.0003727] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2008] [Accepted: 10/26/2008] [Indexed: 02/03/2023] Open
Abstract
Large numbers of sequence elements have been identified to be highly conserved among vertebrate genomes. These highly conserved elements (HCEs) are often located in or around genes that are involved in transcription regulation and early development. They have been shown to be involved in cis-regulatory activities through both in vivo and additional computational studies. We have investigated the structural relationships between such elements and genes in six vertebrate genomes human, mouse, rat, chicken, zebrafish and tetraodon and detected several thousand cases of conserved HCE-gene associations, and also cases of HCEs with no common target genes. A few examples underscore the potential significance of our findings about several individual genes. We found that the conserved association between HCE/HCEs and gene/genes are not restricted to elements by their absolute distance on the genome. Notably, long-range associations were identified and the molecular functions of the associated genes do not show any particular overrepresentation of the functional categories previously reported. HCEs in close proximity are found to be linked with different set of gene/genes. The results reflect the highly complex correlation between HCEs and their putative target genes.
Collapse
Affiliation(s)
- Hong Sun
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
- Biological Technologies, Wyeth Research, Cambridge, Massachusetts, United States of America
- Shanghai Center for Bioinformation Technology, Shanghai, China
- Zhongxin Biotechnology Shanghai Co. Ltd., Shanghai, China
| | - Geir Skogerbø
- Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Zhen Wang
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Wei Liu
- Biological Technologies, Wyeth Research, Cambridge, Massachusetts, United States of America
- * E-mail: (WL); (YL)
| | - Yixue Li
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
- Shanghai Center for Bioinformation Technology, Shanghai, China
- * E-mail: (WL); (YL)
| |
Collapse
|
52
|
Stoletzki N. Conflicting selection pressures on synonymous codon use in yeast suggest selection on mRNA secondary structures. BMC Evol Biol 2008; 8:224. [PMID: 18671878 PMCID: PMC2533328 DOI: 10.1186/1471-2148-8-224] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2007] [Accepted: 07/31/2008] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Eukaryotic mRNAs often contain secondary structures in their untranslated regions that are involved in expression regulation. Whether secondary structures in the protein coding regions are of functional importance remains unclear: laboratory studies suggest stable secondary structures within the protein coding sequence interfere with translation, while several bioinformatic studies indicate stable mRNA structures are more frequent than expected. RESULTS In contrast to several studies testing for unexpected structural stabilities, I directly compare the selective constraint of sites that differ in their structural importance. I.e. for each nucleotide, I identify whether it is paired with another nucleotide, or unpaired, in the predicted secondary structure. I assume paired sites are more important for the predicted secondary structure than unpaired sites. I look at protein coding yeast sequences and use optimal codons and synonymous substitutions to test for structural constraints. As expected under selection for secondary structures, paired sites experience higher constraint than unpaired sites, i.e. significantly lower numbers of conserved optimal codons and consistently lower numbers of synonymous substitutions. This is true for structures predicted by different algorithms. CONCLUSION The results of this study are consistent with purifying selection on mRNA secondary structures in yeast protein coding sequences and suggest their biological importance. One should be aware, however, that accuracy of structure prediction is unknown for mRNAs and interrelated selective forces may contribute as well. Note that if selection pressures alternative to translational selection affect synonymous (and optimal) codon use, this may lead to under- or over-estimates of selective strength on optimal codon use depending on strength and direction of translational selection.
Collapse
Affiliation(s)
- Nina Stoletzki
- Ludwig-Maximilan Universität, Biocenter, Grosshadernerstr, 2, D-82151 Planegg-Martinsried, Germany.
| |
Collapse
|
53
|
Abstract
Background Recent evidence suggests that the number and variety of functional RNAs (ncRNAs as well as cis-acting RNA elements within mRNAs ) is much higher than previously thought; thus, the ability to computationally predict and analyze RNAs has taken on new importance. We have computationally studied the secondary structures in an alignment of six Aspergillus genomes. Little is known about the RNAs present in this set of fungi, and this diverse set of genomes has an optimal level of sequence conservation for observing the correlated evolution of base-pairs seen in RNAs. Methodology/Principal Findings We report the results of a whole-genome search for evolutionarily conserved secondary structures, as well as the results of clustering these predicted secondary structures by structural similarity. We find a total of 7450 predicted secondary structures, including a new predicted ∼60 bp long hairpin motif found primarily inside introns. We find no evidence for microRNAs. Different types of genomic regions are over-represented in different classes of predicted secondary structures. Exons contain the longest motifs (primarily long, branched hairpins), 5′ UTRs primarily contain groupings of short hairpins located near the start codon, and 3′ UTRs contain very little secondary structure compared to other regions. There is a large concentration of short hairpins just inside the boundaries of exons. The density of predicted intronic RNAs increases with the length of introns, and the density of predicted secondary structures within mRNA coding regions increases with the number of introns in a gene. Conclusions/Sigificance There are many conserved, high-confidence RNAs of unknown function in these Aspergillus genomes, as well as interesting spatial distributions of predicted secondary structures. This study increases our knowledge of secondary structure in these aspergillus organisms.
Collapse
Affiliation(s)
- Abigail Manson McGuire
- The Broad Institute of M.I.T. and Harvard, Cambridge, Massachusetts, United States of America
- * E-mail: (AMM); (JEG)
| | - James E. Galagan
- The Broad Institute of M.I.T. and Harvard, Cambridge, Massachusetts, United States of America
- * E-mail: (AMM); (JEG)
| |
Collapse
|
54
|
Biro JC. Correlation between nucleotide composition and folding energy of coding sequences with special attention to wobble bases. Theor Biol Med Model 2008; 5:14. [PMID: 18664268 PMCID: PMC2515297 DOI: 10.1186/1742-4682-5-14] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2008] [Accepted: 07/29/2008] [Indexed: 01/22/2023] Open
Abstract
Background The secondary structure and complexity of mRNA influences its accessibility to regulatory molecules (proteins, micro-RNAs), its stability and its level of expression. The mobile elements of the RNA sequence, the wobble bases, are expected to regulate the formation of structures encompassing coding sequences. Results The sequence/folding energy (FE) relationship was studied by statistical, bioinformatic methods in 90 CDS containing 26,370 codons. I found that the FE (dG) associated with coding sequences is significant and negative (407 kcal/1000 bases, mean ± S.E.M.) indicating that these sequences are able to form structures. However, the FE has only a small free component, less than 10% of the total. The contribution of the 1st and 3rd codon bases to the FE is larger than the contribution of the 2nd (central) bases. It is possible to achieve a ~4-fold change in FE by altering the wobble bases in synonymous codons. The sequence/FE relationship can be described with a simple algorithm, and the total FE can be predicted solely from the sequence composition of the nucleic acid. The contributions of different synonymous codons to the FE are additive and one codon cannot replace another. The accumulated contributions of synonymous codons of an amino acid to the total folding energy of an mRNA is strongly correlated to the relative amount of that amino acid in the translated protein. Conclusion Synonymous codons are not interchangable with regard to their role in determining the mRNA FE and the relative amounts of amino acids in the translated protein, even if they are indistinguishable in respect of amino acid coding.
Collapse
Affiliation(s)
- Jan C Biro
- Homulus Foundation, 612 S Flower Str,, #1220, Los Angeles, CA 90017, USA.
| |
Collapse
|
55
|
Genomic mid-range inhomogeneity correlates with an abundance of RNA secondary structures. BMC Genomics 2008; 9:284. [PMID: 18549495 PMCID: PMC2442090 DOI: 10.1186/1471-2164-9-284] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2008] [Accepted: 06/12/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genomes possess different levels of non-randomness, in particular, an inhomogeneity in their nucleotide composition. Inhomogeneity is manifest from the short-range where neighboring nucleotides influence the choice of base at a site, to the long-range, commonly known as isochores, where a particular base composition can span millions of nucleotides. A separate genomic issue that has yet to be thoroughly elucidated is the role that RNA secondary structure (SS) plays in gene expression. RESULTS We present novel data and approaches that show that a mid-range inhomogeneity (~30 to 1000 nt) not only exists in mammalian genomes but is also significantly associated with strong RNA SS. A whole-genome bioinformatics investigation of local SS in a set of 11,315 non-redundant human pre-mRNA sequences has been carried out. Four distinct components of these molecules (5'-UTRs, exons, introns and 3'-UTRs) were considered separately, since they differ in overall nucleotide composition, sequence motifs and periodicities. For each pre-mRNA component, the abundance of strong local SS (< -25 kcal/mol) was a factor of two to ten greater than a random expectation model. The randomization process preserves the short-range inhomogeneity of the corresponding natural sequences, thus, eliminating short-range signals as possible contributors to any observed phenomena. CONCLUSION We demonstrate that the excess of strong local SS in pre-mRNAs is linked to the little explored phenomenon of genomic mid-range inhomogeneity (MRI). MRI is an interdependence between nucleotide choice and base composition over a distance of 20-1000 nt. Additionally, we have created a public computational resource to support further study of genomic MRI.
Collapse
|
56
|
Kahali B, Basak S, Ghosh TC. Delving Deeper into the Unexpected Correlation Between Gene Expressivity and Codon Usage Bias ofEscherichia coliGenome. J Biomol Struct Dyn 2008; 25:655-61. [DOI: 10.1080/07391102.2008.10507212] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
57
|
St. Laurent G, Wahlestedt C. Noncoding RNAs: couplers of analog and digital information in nervous system function? Trends Neurosci 2007; 30:612-21. [DOI: 10.1016/j.tins.2007.10.002] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2007] [Revised: 10/03/2007] [Accepted: 10/04/2007] [Indexed: 12/14/2022]
|
58
|
On the origin of synonymous codon usage divergence between thermophilic and mesophilic prokaryotes. FEBS Lett 2007; 581:5825-30. [DOI: 10.1016/j.febslet.2007.11.054] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2007] [Revised: 11/14/2007] [Accepted: 11/16/2007] [Indexed: 01/24/2023]
|
59
|
Uncoupling RNA virus replication from transcription via the polymerase: functional and evolutionary insights. EMBO J 2007; 26:5120-30. [PMID: 18034156 DOI: 10.1038/sj.emboj.7601931] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2007] [Accepted: 10/29/2007] [Indexed: 01/11/2023] Open
Abstract
Many eukaryotic positive-strand RNA viruses transcribe subgenomic (sg) mRNAs that are virus-derived messages that template the translation of a subset of viral proteins. Currently, the premature termination (PT) mechanism of sg mRNA transcription, a process thought to operate in a variety of viruses, is best understood in tombusviruses. The viral RNA elements involved in regulating this mechanism have been well characterized in several systems; however, no corresponding protein factors have been identified yet. Here we show that tombusvirus genome replication can be effectively uncoupled from sg mRNA transcription in vivo by C-terminal modifications in its RNA-dependent RNA polymerase (RdRp). Systematic analysis of the PT transcriptional pathway using viral genomes harboring mutant RdRps revealed that the C-terminus functions primarily at an early step in this mechanism by mediating both efficient and accurate production of minus-strand templates for sg mRNA transcription. Our results also suggest a simple evolutionary scheme by which the virus could gain or enhance its transcriptional activity, and define global folding of the viral RNA genome as a previously unappreciated determinant of RdRp evolution.
Collapse
|
60
|
Forsdyke DR. Molecular sex: The importance of base composition rather than homology when nucleic acids hybridize. J Theor Biol 2007; 249:325-30. [PMID: 17868701 DOI: 10.1016/j.jtbi.2007.07.023] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2007] [Revised: 05/23/2007] [Accepted: 07/24/2007] [Indexed: 12/24/2022]
Abstract
On learning that nucleic acid hybridization had been achieved in a test tube, Huxley hailed the discovery of "molecular sex." The description was apt, since sex involves recombination, which requires hybridization that, in turn, depends on a successful homology search. Conversely, when the homology search fails, recombination fails. In yeast, this failure has been attributed to "simple sequence divergence." But sequence divergence does not impair nucleic acid hybridization simply. Most natural single-stranded nucleic acids are predisposed to adopt higher-order structures containing stem-loops. Tomizawa showed that the rate-limiting step in the hybridization of single-stranded sequences is an initial "kissing" exploration between complementary loops, which must first be appropriately extruded and aligned. Successful duplex formation requires successful synchronization of matching higher-ordered structures, which depends, not so much on the degree of similarity between their base sequences as on the closeness of their base compositions (GC%). In these terms, we can understand how the anti-recombinational effect of GC% differences supports the duplication both of genes within a genome and of genomes within a genus (speciation).
Collapse
Affiliation(s)
- Donald R Forsdyke
- Department of Biochemistry, Queen's University, Kingston, Ontario, Canada K7K3N6.
| |
Collapse
|
61
|
Kochetov AV, Palyanov A, Titov II, Grigorovich D, Sarai A, Kolchanov NA. AUG_hairpin: prediction of a downstream secondary structure influencing the recognition of a translation start site. BMC Bioinformatics 2007; 8:318. [PMID: 17760957 PMCID: PMC2001202 DOI: 10.1186/1471-2105-8-318] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2007] [Accepted: 08/30/2007] [Indexed: 12/17/2022] Open
Abstract
Background The translation start site plays an important role in the control of translation efficiency of eukaryotic mRNAs. The recognition of the start AUG codon by eukaryotic ribosomes is considered to depend on its nucleotide context. However, the fraction of eukaryotic mRNAs with the start codon in a suboptimal context is relatively large. It may be expected that mRNA should possess some features providing efficient translation, including the proper recognition of a translation start site. It has been experimentally shown that a downstream hairpin located in certain positions with respect to start codon can compensate in part for the suboptimal AUG context and also increases translation from non-AUG initiation codons. Prediction of such a compensatory hairpin may be useful in the evaluation of eukaryotic mRNA translation properties. Results We evaluated interdependency between the start codon context and mRNA secondary structure at the CDS beginning: it was found that a suboptimal start codon context significantly correlated with higher base pairing probabilities at positions 13 – 17 of CDS of human and mouse mRNAs. It is likely that the downstream hairpins are used to enhance translation of some mammalian mRNAs in vivo. Thus, we have developed a tool, AUG_hairpin, to predict local stem-loop structures located within the defined region at the beginning of mRNA coding part. The implemented algorithm is based on the available published experimental data on the CDS-located stem-loop structures influencing the recognition of upstream start codons. Conclusion An occurrence of a potential secondary structure downstream of start AUG codon in a suboptimal context (or downstream of a potential non-AUG start codon) may provide researchers with a testable assumption on the presence of additional regulatory signal influencing mRNA translation initiation rate and the start codon choice. AUG_hairpin, which has a convenient Web-interface with adjustable parameters, will make such an evaluation easy and efficient.
Collapse
Affiliation(s)
- Alex V Kochetov
- Institute of Cytology and Genetics, Lavrentieva 10, Novosibirsk 630090, Russia
- Novosibirsk State University, Novosibirsk 630090, Russia
| | - Andrey Palyanov
- Institute of Cytology and Genetics, Lavrentieva 10, Novosibirsk 630090, Russia
| | - Igor I Titov
- Institute of Cytology and Genetics, Lavrentieva 10, Novosibirsk 630090, Russia
| | - Dmitry Grigorovich
- Institute of Cytology and Genetics, Lavrentieva 10, Novosibirsk 630090, Russia
| | - Akinori Sarai
- Kyushu Institute of Technology, Iizuka, 820-8502, Japan
| | - Nikolay A Kolchanov
- Institute of Cytology and Genetics, Lavrentieva 10, Novosibirsk 630090, Russia
- Novosibirsk State University, Novosibirsk 630090, Russia
| |
Collapse
|
62
|
Forsdyke DR. Calculation of folding energies of single-stranded nucleic acid sequences: conceptual issues. J Theor Biol 2007; 248:745-53. [PMID: 17698086 DOI: 10.1016/j.jtbi.2007.07.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2007] [Revised: 07/05/2007] [Accepted: 07/09/2007] [Indexed: 12/16/2022]
Abstract
The stability of a folded single-stranded nucleic acid depends on the composition and order of its constituent bases and may be assessed by taking into account the pairing energies of its constituent dinucleotides. To assess the possible biological significance of a computed structure, Maizel and coworkers in the 1980s compared the energy of folding of a natural single-stranded RNA sequence with the energies of several versions of the same sequence produced by shuffling base order. However, in the 2000s many took as self-evident the view that shuffling at the mononucleotide level (single bases) was conceptual wrong and should be replaced by shuffling at the level of dinucleotides (retaining pairs of adjacent bases). Folding energies then became indistinguishable from those of corresponding shuffled sequences and doubt was cast on the importance of secondary structures. Nevertheless, some continued productively to employ the single base shuffling approach, the justification for which is the topic of this paper. Because dinucleotide pairing energies are needed to calculate structure, it does not follow that shuffling should not disrupt dinucleotides. Base shuffling allows determination of the relative contributions of base composition and base order to total folding energy. The potential for secondary structure arises from pressures acting at both DNA and RNA levels, and is abundant throughout genomes-with a probable primary role in recombination. Within a gene the potential can often be accommodated, and base order and composition work together (values have the same negative sign) in contributing to total folding energy. But sometimes protein-coding pressure on base order conflicts with the pressure for secondary structure and the values have opposite signs. Total folding energy can be deemed of potential biological significance when the average of several readings is significantly less than zero.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Department of Biochemistry, Queen's University, Kingston, Ontario, Canada K7L3N6.
| |
Collapse
|
63
|
Buratti E, Dhir A, Lewandowska MA, Baralle FE. RNA structure is a key regulatory element in pathological ATM and CFTR pseudoexon inclusion events. Nucleic Acids Res 2007; 35:4369-83. [PMID: 17580311 PMCID: PMC1935003 DOI: 10.1093/nar/gkm447] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Genomic variations deep in the intronic regions of pre-mRNA molecules are increasingly reported to affect splicing events. However, there is no general explanation why apparently similar variations may have either no effect on splicing or cause significant splicing alterations. In this work we have examined the structural architecture of pseudoexons previously described in ATM and CFTR patients. The ATM case derives from the deletion of a repressor element and is characterized by an aberrant 5′ss selection despite the presence of better alternatives. The CFTR pseudoexon instead derives from the creation of a new 5′ss that is used while a nearby pre-existing donor-like sequence is never selected. Our results indicate that RNA structure is a major splicing regulatory factor in both cases. Furthermore, manipulation of the original RNA structures can lead to pseudoexon inclusion following the exposure of unused 5′ss already present in their wild-type intronic sequences and prevented to be recognized because of their location in RNA stem structures. Our data show that intrinsic structural features of introns must be taken into account to understand the mechanism of pseudoexon activation in genetic diseases. Our observations may help to improve diagnostics prediction programmes and eventual therapeutic targeting.
Collapse
|
64
|
Steigele S, Huber W, Stocsits C, Stadler PF, Nieselt K. Comparative analysis of structured RNAs in S. cerevisiae indicates a multitude of different functions. BMC Biol 2007; 5:25. [PMID: 17577407 PMCID: PMC1914338 DOI: 10.1186/1741-7007-5-25] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2006] [Accepted: 06/18/2007] [Indexed: 01/06/2023] Open
Abstract
Background Non-coding RNAs (ncRNAs) are an emerging focus for both computational analysis and experimental research, resulting in a growing number of novel, non-protein coding transcripts with often unknown functions. Whole genome screens in higher eukaryotes, for example, provided evidence for a surprisingly large number of ncRNAs. To supplement these searches, we performed a computational analysis of seven yeast species and searched for new ncRNAs and RNA motifs. Results A comparative analysis of the genomes of seven yeast species yielded roughly 2800 genomic loci that showed the hallmarks of evolutionary conserved RNA secondary structures. A total of 74% of these regions overlapped with annotated non-coding or coding genes in yeast. Coding sequences that carry predicted structured RNA elements belong to a limited number of groups with common functions, suggesting that these RNA elements are involved in post-transcriptional regulation and/or cellular localization. About 700 conserved RNA structures were found outside annotated coding sequences and known ncRNA genes. Many of these predicted elements overlapped with UTR regions of particular classes of protein coding genes. In addition, a number of RNA elements overlapped with previously characterized antisense transcripts. Transcription of about 120 predicted elements located in promoter regions and other, previously un-annotated, intergenic regions was supported by tiling array experiments, ESTs, or SAGE data. Conclusion Our computational predictions strongly suggest that yeasts harbor a substantial pool of several hundred novel ncRNAs. In addition, we describe a large number of RNA structures in coding sequences and also within antisense transcripts that were previously characterized using tiling arrays.
Collapse
Affiliation(s)
- Stephan Steigele
- Wilhelm-Schickard-Institut für Informatik, ZBIT-Center for Bioinformatics Tübingen, University of Tübingen, Sand-14, D-72076 Tübingen, Germany
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics (IZBI), University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany
| | - Wolfgang Huber
- EMBL Outstation Hinxton, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Claudia Stocsits
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics (IZBI), University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics (IZBI), University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany
- Department of Theoretical Chemistry University of Vienna, Währingerstraße 17, A-1090 Wien, Austria
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| | - Kay Nieselt
- Wilhelm-Schickard-Institut für Informatik, ZBIT-Center for Bioinformatics Tübingen, University of Tübingen, Sand-14, D-72076 Tübingen, Germany
| |
Collapse
|
65
|
Washietl S, Pedersen JS, Korbel JO, Stocsits C, Gruber AR, Hackermüller J, Hertel J, Lindemeyer M, Reiche K, Tanzer A, Ucla C, Wyss C, Antonarakis SE, Denoeud F, Lagarde J, Drenkow J, Kapranov P, Gingeras TR, Guigó R, Snyder M, Gerstein MB, Reymond A, Hofacker IL, Stadler PF. Structured RNAs in the ENCODE selected regions of the human genome. Genes Dev 2007; 17:852-64. [PMID: 17568003 PMCID: PMC1891344 DOI: 10.1101/gr.5650707] [Citation(s) in RCA: 136] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2006] [Accepted: 12/12/2006] [Indexed: 12/16/2022]
Abstract
Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack characteristic signals in primary sequence, comparative approaches evaluating evolutionary conservation of structures are most promising. We have used three recently introduced programs based on either phylogenetic-stochastic context-free grammar (EvoFold) or energy directed folding (RNAz and AlifoldZ), yielding several thousand candidate structures (corresponding to approximately 2.7% of the ENCODE regions). EvoFold has its highest sensitivity in highly conserved and relatively AU-rich regions, while RNAz favors slightly GC-rich regions, resulting in a relatively small overlap between methods. Comparison with the GENCODE annotation points to functional RNAs in all genomic contexts, with a slightly increased density in 3'-UTRs. While we estimate a significant false discovery rate of approximately 50%-70% many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz and EvoFold, and an additional 239 RNAz or EvoFold predictions are supported by the (more stringent) AlifoldZ algorithm. Five hundred seventy RNAz structure predictions fall into regions that show signs of selection pressure also on the sequence level (i.e., conserved elements). More than 700 predictions overlap with noncoding transcripts detected by oligonucleotide tiling arrays. One hundred seventy-five selected candidates were tested by RT-PCR in six tissues, and expression could be verified in 43 cases (24.6%).
Collapse
Affiliation(s)
- Stefan Washietl
- Institute for Theoretical Chemistry, University of Vienna, A-1090 Wien, Austria.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
66
|
Backofen R, Bernhart SH, Flamm C, Fried C, Fritzsch G, Hackermüller J, Hertel J, Hofacker IL, Missal K, Mosig A, Prohaska SJ, Rose D, Stadler PF, Tanzer A, Washietl S, Will S. RNAs everywhere: genome-wide annotation of structured RNAs. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2007; 308:1-25. [PMID: 17171697 DOI: 10.1002/jez.b.21130] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Starting with the discovery of microRNAs and the advent of genome-wide transcriptomics, non-protein-coding transcripts have moved from a fringe topic to a central field research in molecular biology. In this contribution we review the state of the art of "computational RNomics", i.e., the bioinformatics approaches to genome-wide RNA annotation. Instead of rehashing results from recently published surveys in detail, we focus here on the open problem in the field, namely (functional) annotation of the plethora of putative RNAs. A series of exploratory studies are used to provide non-trivial examples for the discussion of some of the difficulties.
Collapse
|
67
|
Abstract
Background Comparative genomics approaches, where orthologous DNA regions are compared and inter-species conserved regions are identified, have proven extremely powerful for identifying non-coding regulatory regions located in intergenic or intronic regions. However, non-coding functional elements can also be located within coding region, as is common for exonic splicing enhancers, some transcription factor binding sites, and RNA secondary structure elements affecting mRNA stability, localization, or translation. Since these functional elements are located in regions that are themselves highly conserved because they are coding for a protein, they generally escaped detection by comparative genomics approaches. Results We introduce a comparative genomics approach for detecting non-coding functional elements located within coding regions. Codon evolution is modeled as a mixture of codon substitution models, where each component of the mixture describes the evolution of codons under a specific type of coding selective pressure. We show how to compute the posterior distribution of the entropy and parsimony scores under this null model of codon evolution. The method is applied to a set of growth hormone 1 orthologous mRNA sequences and a known exonic splicing elements is detected. The analysis of a set of CORTBP2 orthologous genes reveals a region of several hundred base pairs under strong non-coding selective pressure whose function remains unknown. Conclusion Non-coding functional elements, in particular those involved in post-transcriptional regulation, are likely to be much more prevalent than is currently known. With the numerous genome sequencing projects underway, comparative genomics approaches like that proposed here are likely to become increasingly powerful at detecting such elements.
Collapse
Affiliation(s)
- Hui Chen
- McGill Centre for Bioinformatics, McGill University, 3775 University St., room 332, Montreal, QC, Canada H3A 2B4
| | - Mathieu Blanchette
- McGill Centre for Bioinformatics, McGill University, 3775 University St., room 332, Montreal, QC, Canada H3A 2B4
| |
Collapse
|
68
|
Raponi M, Baralle FE, Pagani F. Reduced splicing efficiency induced by synonymous substitutions may generate a substrate for natural selection of new splicing isoforms: the case of CFTR exon 12. Nucleic Acids Res 2006; 35:606-13. [PMID: 17172597 PMCID: PMC1802620 DOI: 10.1093/nar/gkl1087] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Alternative splicing has been associated with increased evolutionary changes and with recent exon creation or loss. The addition of a new exon can be explained by its inclusion in only a fraction of the transcripts leaving the original form intact and giving to the new form the possibility to evolve independently but the exon loss phenomenon is less clear. To explore the mechanism that could be involved in CFTR exon 12 lower splicing efficiency in primates, we have analyzed the effect of multiple synonymous variations. Random patterns of synonymous variations were created in CFTR exon12 and the majority of them induced exon inclusion, suggesting a suboptimal splicing efficiency of the human gene. In addition, the effect of each single synonymous substitution on splicing is strongly dependent on the exonic context and does not correlate with available in silico exon splicing prediction programs. We propose that casual synonymous substitutions may lead to a reduced splicing efficiency that can result in a variable proportion of exon loss. If this phenomenon happens in in-frame exons and to an extent tolerated by the cells it can have an important evolutionary effect since it may generate a substrate for natural selection of new splicing isoforms.
Collapse
Affiliation(s)
| | | | - Franco Pagani
- To whom correspondence should be addressed: Tel: +39 040 37571; Fax: +39 040 226555;
| |
Collapse
|
69
|
Singh NN, Singh RN, Androphy EJ. Modulating role of RNA structure in alternative splicing of a critical exon in the spinal muscular atrophy genes. Nucleic Acids Res 2006; 35:371-89. [PMID: 17170000 PMCID: PMC1802598 DOI: 10.1093/nar/gkl1050] [Citation(s) in RCA: 154] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Humans have two nearly identical copies of the survival motor neuron (SMN ) gene, SMN1 and SMN2. Homozygous loss of SMN1 causes spinal muscular atrophy (SMA). SMN2 is unable to prevent the disease due to skipping of exon 7. Using a systematic approach of in vivo selection, we have previously demonstrated that a weak 5' splice site (ss) serves as the major cause of skipping of SMN2 exon 7. Here we show the inhibitory impact of RNA structure on the weak 5' ss of exon 7. We call this structure terminal stem-loop 2 (TSL2). Confirming the inhibitory nature of TSL2, point mutations that destabilize TSL2 promote exon 7 inclusion in SMN2, whereas strengthening of TSL2 promotes exon 7 skipping even in SMN1. We also demonstrate that TSL2 negatively affects the recruitment of U1snRNP at the 5' ss of exon 7. Using enzymatic structure probing, we confirm that the sequence at the junction of exon 7/intron 7 folds into TSL2 and show that mutations in TSL2 cause predicted structural changes in this region. Our findings reveal for the first time the critical role of RNA structure in regulation of alternative splicing of human SMN.
Collapse
Affiliation(s)
- Natalia N Singh
- Department of Medicine, University of Massachusetts Medical School, Worcester, MA 01605-2324, USA.
| | | | | |
Collapse
|
70
|
Dewan KK. Secondary structure formations of conotoxin genes: A possible role in mediating variability. Biochem Biophys Res Commun 2006; 349:701-8. [PMID: 16949043 DOI: 10.1016/j.bbrc.2006.08.081] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2006] [Accepted: 08/16/2006] [Indexed: 11/18/2022]
Abstract
Small venomous peptides called conotoxins produced by the predatory marine snail (genus Conus) present an interesting case for mutational studies. They have a high degree of amino acid variability among them yet they possess highly conserved structural elements that are defined by cysteine residues forming disulfide bridges along the length of the mature peptide. It has been observed that codons specifying these cysteines are also highly conserved. It is unknown how such codon conservation is maintained within the mature conotoxin gene since this entire region undergoes an accelerated rate of mutation. There is evidence suggesting that nucleic acids wield some influence in mechanisms that dictate the region and frequency where mutations occur in DNA. Nucleic acids exert this effect primarily through secondary structures that bring about local peaks and troughs in the energy relief of these transient formations. Secondary structure predictions of several conotoxin genes were analyzed to see if there was any correspondence between the highly variable regions of the conotoxin. Regions of the DNA encompassing the conserved Cys codons (and several other conserved amino acid codons) have been found to correspond to predicted secondary structures of higher stabilities. In stark contrast the regions of the conotoxin that have a higher degree of variation correlate to regions of lower stability. This striking co-relation allows for a simple model of inaccessibility of a mutator to these highly conserved regions of the conotoxin gene allowing them a relative degree of resistance towards change.
Collapse
Affiliation(s)
- Kalyan Kumar Dewan
- Unichem Biosciences R and D Centre, Society for Innovation and Development, Indian Institute of Science, Bangalore 560 012, Karnataka, India.
| |
Collapse
|
71
|
Xing Y, Wang Q, Lee C. Evolutionary divergence of exon flanks: a dissection of mutability and selection. Genetics 2006; 173:1787-91. [PMID: 16702427 PMCID: PMC1526697 DOI: 10.1534/genetics.106.057919] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2006] [Accepted: 05/01/2006] [Indexed: 01/29/2023] Open
Abstract
The intronic sequences flanking exon-intron junctions (i.e., exon flanks) are important for splice site recognition and pre-mRNA splicing. Recent studies show a higher degree of sequence conservation at flanks of alternative exons, compared to flanks of constitutive exons. In this article we performed a detailed analysis on the evolutionary divergence of exon flanks between human and chimpanzee, aiming to dissect the impact of mutability and selection on their evolution. Inside exon flanks, sites that might reside in ancestral CpG dinucleotides evolved significantly faster than sites outside of ancestral CpG dinucleotides. This result reflects a systematic variation of mutation rates (mutability) at exon flanks, depending on the local CpG contexts. Remarkably, we observed a significant reduction of the nucleotide substitution rate in flanks of alternatively spliced exons, independent of the site-by-site variation in mutability due to different CpG contexts. Our data provide concrete evidence for increased purifying selection at exon flanks associated with regulation of alternative splicing.
Collapse
Affiliation(s)
- Yi Xing
- Molecular Biology Institute, Center for Genomics and Proteomics, Department of Chemistry and Biochemistry, University of California, Los Angeles, California 90095-1570, USA
| | | | | |
Collapse
|
72
|
Chen JM, Férec C, Cooper DN. A systematic analysis of disease-associated variants in the 3' regulatory regions of human protein-coding genes II: the importance of mRNA secondary structure in assessing the functionality of 3' UTR variants. Hum Genet 2006; 120:301-33. [PMID: 16807757 DOI: 10.1007/s00439-006-0218-x] [Citation(s) in RCA: 99] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2006] [Accepted: 05/29/2006] [Indexed: 12/13/2022]
Abstract
In an attempt both to catalogue 3' regulatory region (3' RR)-mediated disease and to improve our understanding of the structure and function of the 3' RR, we have performed a systematic analysis of disease-associated variants in the 3' RRs of human protein-coding genes. We have previously analysed the variants that have occurred in two specific domains/motifs of the 3' untranslated region (3' UTR) as well as in the 3' flanking region. Here we have focused upon 83 known variants within the upstream sequence (USS; between the translational termination codon and the upstream core polyadenylation signal sequence) of the 3' UTR. To place these variants in their proper context, we first performed a comprehensive survey of known cis-regulatory elements within the USS and the mechanisms by which they effect post-transcriptional gene regulation. Although this survey supports the view that RNA regulatory elements function within the context of specific secondary structures, there are no general rules governing how secondary structure might exert its influence. We have therefore addressed this question by systematically evaluating both functional and non-functional (based upon in vitro reporter gene and/or electrophoretic mobility shift assay data) USS variant-containing sequences against known cis-regulatory motifs within the context of predicted RNA secondary structures. This has allowed us not only to establish a reliable and objective means to perform secondary structure prediction but also to identify consistent patterns of secondary structural change that could potentiate the discrimination of functional USS variants from their non-functional counterparts. The resulting rules were then used to infer potential functionality in the case of some of the remaining functionally uncharacterized USS variants, from their predicted secondary structures. This not only led us to identify further patterns of secondary structural change but also several potential novel cis-regulatory motifs within the 3' UTRs studied.
Collapse
|
73
|
Xing Y, Lee C. Alternative splicing and RNA selection pressure--evolutionary consequences for eukaryotic genomes. Nat Rev Genet 2006; 7:499-509. [PMID: 16770337 DOI: 10.1038/nrg1896] [Citation(s) in RCA: 206] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genome-wide analyses of alternative splicing have established its nearly ubiquitous role in gene regulation in many organisms. Genome sequencing and comparative genomics have made it possible to look in detail at the evolutionary history of specific alternative exons or splice sites, resulting in a flurry of publications in recent years. Here, we consider how alternative splicing has contributed to the evolution of modern genomes, and discuss constraints on evolution associated with alternative splicing that might have important medical implications.
Collapse
Affiliation(s)
- Yi Xing
- Molecular Biology Institute, Center for Genomics and Proteomics, Department of Chemistry and Biochemistry, University of California, Los Angeles, California 90095, USA
| | | |
Collapse
|
74
|
Shabalina SA, Ogurtsov AY, Spiridonov NA. A periodic pattern of mRNA secondary structure created by the genetic code. Nucleic Acids Res 2006; 34:2428-37. [PMID: 16682450 PMCID: PMC1458515 DOI: 10.1093/nar/gkl287] [Citation(s) in RCA: 151] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Single-stranded mRNA molecules form secondary structures through complementary self-interactions. Several hypotheses have been proposed on the relationship between the nucleotide sequence, encoded amino acid sequence and mRNA secondary structure. We performed the first transcriptome-wide in silico analysis of the human and mouse mRNA foldings and found a pronounced periodic pattern of nucleotide involvement in mRNA secondary structure. We show that this pattern is created by the structure of the genetic code, and the dinucleotide relative abundances are important for the maintenance of mRNA secondary structure. Although synonymous codon usage contributes to this pattern, it is intrinsic to the structure of the genetic code and manifests itself even in the absence of synonymous codon usage bias at the 4-fold degenerate sites. While all codon sites are important for the maintenance of mRNA secondary structure, degeneracy of the code allows regulation of stability and periodicity of mRNA secondary structure. We demonstrate that the third degenerate codon sites contribute most strongly to mRNA stability. These results convincingly support the hypothesis that redundancies in the genetic code allow transcripts to satisfy requirements for both protein structure and RNA structure. Our data show that selection may be operating on synonymous codons to maintain a more stable and ordered mRNA secondary structure, which is likely to be important for transcript stability and translation. We also demonstrate that functional domains of the mRNA [5′-untranslated region (5′-UTR), CDS and 3′-UTR] preferentially fold onto themselves, while the start codon and stop codon regions are characterized by relaxed secondary structures, which may facilitate initiation and termination of translation.
Collapse
Affiliation(s)
- Svetlana A Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | |
Collapse
|
75
|
Ding Y, Chan CY, Lawrence CE. Clustering of RNA secondary structures with application to messenger RNAs. J Mol Biol 2006; 359:554-71. [PMID: 16631786 DOI: 10.1016/j.jmb.2006.01.056] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2005] [Accepted: 01/12/2006] [Indexed: 01/08/2023]
Abstract
There is growing evidence of translational gene regulation at the mRNA level, and of the important roles of RNA secondary structure in these regulatory processes. Because mRNAs likely exist in a population of structures, the popular free energy minimization approach may not be well suited to prediction of mRNA structures in studies of post-transcriptional regulation. Here, we describe an alternative procedure for the characterization of mRNA structures, in which structures sampled from the Boltzmann-weighted ensemble of RNA secondary structures are clustered. Based on a random sample of full-length human mRNAs, we find that the minimum free energy (MFE) structure often poorly represents the Boltzmann ensemble, that the ensemble often contains multiple structural clusters, and that the centroids of a small number of structural clusters more effectively characterize the ensemble. We show that cluster-level characteristics and statistics are statistically reproducible. In a comparison between mRNAs and structural RNAs, similarity is observed for the number of clusters and the energy gap between the MFE structure and the sampled ensemble. However, for structural RNAs, there are more high-frequency base-pairs in both the Boltzmann ensemble and the clusters, and the clusters are more compact. The clustering features have been incorporated into the Sfold software package for nucleic acid folding and design.
Collapse
Affiliation(s)
- Ye Ding
- Wadsworth Center, New York State Department of Health, Center for Medical Science, 150 New Scotland Avenue, Albany, NY 12208, USA.
| | | | | |
Collapse
|