1
|
Zimnyakov DA, Alonova MV, Lavrukhin MS, Lyapina AM, Feodorova VA. Polarization- and Chaos-Game-Based Fingerprinting of Molecular Targets of Listeria Monocytogenes Vaccine and Fully Virulent Strains. Curr Issues Mol Biol 2023; 45:10056-10078. [PMID: 38132474 PMCID: PMC10742786 DOI: 10.3390/cimb45120628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 12/07/2023] [Accepted: 12/11/2023] [Indexed: 12/23/2023] Open
Abstract
Two approaches to the synthesis of 2D binary identifiers ("fingerprints") of DNA-associated symbol sequences are considered in this paper. One of these approaches is based on the simulation of polarization-dependent diffraction patterns formed by reading the modeled DNA-associated 2D phase-modulating structures with a coherent light beam. In this case, 2D binarized distributions of close-to-circular extreme polarization states are applied as fingerprints of analyzed nucleotide sequences. The second approach is based on the transformation of the DNA-associated chaos game representation (CGR) maps into finite-dimensional binary matrices. In both cases, the differences between the structures of the analyzed and reference symbol sequences are quantified by calculating the correlation coefficient of the synthesized binary matrices. A comparison of the approaches under consideration is carried out using symbol sequences corresponding to nucleotide sequences of the hly gene from the vaccine and wild-type strains of Listeria monocytogenes as the analyzed objects. These strains differ in terms of the number of substituted nucleotides in relation to the vaccine strain selected as a reference. The results of the performed analysis allow us to conclude that the identification of structural differences in the DNA-associated symbolic sequences is significantly more efficient when using the binary distributions of close-to-circular extreme polarization states. The approach given can be applicable for genetic differentiation immunized from vaccinated animals (DIVA).
Collapse
Affiliation(s)
- Dmitry A. Zimnyakov
- Physics Department, Yury Gagarin State Technical University of Saratov, 77 Polytechnicheskaya Str., 410054 Saratov, Russia;
- Laboratory for Fundamental and Applied Research, Saratov State University of Genetics, Biotechnology and Engineering Named after N.I. Vavilov, 335 Sokolovaya Str., 410005 Saratov, Russia; (M.S.L.); (A.M.L.); (V.A.F.)
| | - Marina V. Alonova
- Physics Department, Yury Gagarin State Technical University of Saratov, 77 Polytechnicheskaya Str., 410054 Saratov, Russia;
| | - Maxim S. Lavrukhin
- Laboratory for Fundamental and Applied Research, Saratov State University of Genetics, Biotechnology and Engineering Named after N.I. Vavilov, 335 Sokolovaya Str., 410005 Saratov, Russia; (M.S.L.); (A.M.L.); (V.A.F.)
| | - Anna M. Lyapina
- Laboratory for Fundamental and Applied Research, Saratov State University of Genetics, Biotechnology and Engineering Named after N.I. Vavilov, 335 Sokolovaya Str., 410005 Saratov, Russia; (M.S.L.); (A.M.L.); (V.A.F.)
| | - Valentina A. Feodorova
- Laboratory for Fundamental and Applied Research, Saratov State University of Genetics, Biotechnology and Engineering Named after N.I. Vavilov, 335 Sokolovaya Str., 410005 Saratov, Russia; (M.S.L.); (A.M.L.); (V.A.F.)
- Department for Microbiology and Biotechnology, Saratov State University of Genetics, Biotechnology and Engineering Named after N.I. Vavilov, 335 Sokolovaya Str., 410005 Saratov, Russia
| |
Collapse
|
2
|
Löchel HF, Heider D. Chaos game representation and its applications in bioinformatics. Comput Struct Biotechnol J 2021; 19:6263-6271. [PMID: 34900136 PMCID: PMC8636998 DOI: 10.1016/j.csbj.2021.11.008] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 11/04/2021] [Accepted: 11/05/2021] [Indexed: 11/18/2022] Open
Abstract
Chaos game representation (CGR), a milestone in graphical bioinformatics, has become a powerful tool regarding alignment-free sequence comparison and feature encoding for machine learning. The algorithm maps a sequence to 2-dimensional space, while an extension of the CGR, the so-called frequency matrix representation (FCGR), transforms sequences of different lengths into equal-sized images or matrices. The CGR is a generalized Markov chain and includes various properties, which allow a unique representation of a sequence. Therefore, it has a broad spectrum of applications in bioinformatics, such as sequence comparison and phylogenetic analysis and as an encoding of sequences for machine learning. This review introduces the construction of CGRs and FCGRs, their applications on DNA and proteins, and gives an overview of recent applications and progress in bioinformatics.
Collapse
Affiliation(s)
- Hannah Franziska Löchel
- Department of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Str. 6, D-35032 Marburg, Germany
| | - Dominik Heider
- Department of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Str. 6, D-35032 Marburg, Germany
| |
Collapse
|
3
|
Szitenberg A, Cha S, Opperman CH, Bird DM, Blaxter ML, Lunt DH. Genetic Drift, Not Life History or RNAi, Determine Long-Term Evolution of Transposable Elements. Genome Biol Evol 2016; 8:2964-2978. [PMID: 27566762 PMCID: PMC5635653 DOI: 10.1093/gbe/evw208] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/20/2016] [Indexed: 12/11/2022] Open
Abstract
Transposable elements (TEs) are a major source of genome variation across the branches of life. Although TEs may play an adaptive role in their host's genome, they are more often deleterious, and purifying selection is an important factor controlling their genomic loads. In contrast, life history, mating system, GC content, and RNAi pathways have been suggested to account for the disparity of TE loads in different species. Previous studies of fungal, plant, and animal genomes have reported conflicting results regarding the direction in which these genomic features drive TE evolution. Many of these studies have had limited power, however, because they studied taxonomically narrow systems, comparing only a limited number of phylogenetically independent contrasts, and did not address long-term effects on TE evolution. Here, we test the long-term determinants of TE evolution by comparing 42 nematode genomes spanning over 500 million years of diversification. This analysis includes numerous transitions between life history states, and RNAi pathways, and evaluates if these forces are sufficiently persistent to affect the long-term evolution of TE loads in eukaryotic genomes. Although we demonstrate statistical power to detect selection, we find no evidence that variation in these factors influence genomic TE loads across extended periods of time. In contrast, the effects of genetic drift appear to persist and control TE variation among species. We suggest that variation in the tested factors are largely inconsequential to the large differences in TE content observed between genomes, and only by these large-scale comparisons can we distinguish long-term and persistent effects from transient or random changes.
Collapse
Affiliation(s)
- Amir Szitenberg
- Evolutionary Biology Group, School of Environmental Sciences, University of Hull, England, United Kingdom The Dead Sea and Arava Science Center, Israel
| | - Soyeon Cha
- Department of Plant Pathology, North Carolina State University
| | | | - David M Bird
- Department of Plant Pathology, North Carolina State University
| | - Mark L Blaxter
- School of Biological Sciences, Institute of Evolutionary Biology, University of Edinburgh, Scotland
| | - David H Lunt
- Evolutionary Biology Group, School of Environmental Sciences, University of Hull, England, United Kingdom
| |
Collapse
|
4
|
Martin F, Barends S, Jaeger S, Schaeffer L, Prongidi-Fix L, Eriani G. Cap-assisted internal initiation of translation of histone H4. Mol Cell 2011; 41:197-209. [PMID: 21255730 DOI: 10.1016/j.molcel.2010.12.019] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2010] [Revised: 09/08/2010] [Accepted: 11/10/2010] [Indexed: 11/30/2022]
Abstract
In eukaryotes, a crucial step of translation initiation is the binding of the multifactor complex eIF4F to the 5' end of the mRNA, a prerequisite to recruitment of the activated small ribosomal 43S particle. Histone H4 mRNAs have short 5'UTRs, which do not conform to the conventional scanning-initiation model. Here we show that the ORF of histone mRNA contains two structural elements critical for translation initiation. One of the two structures binds eIF4E without the need of the cap. Ribosomal 43S particles become tethered to this site and directly loaded in the vicinity of the AUG. The other structure, 19 nucleotides downstream of the initiation codon, forms a three-way helix junction, which sequesters the m(7)G cap. This element facilitates direct positioning of the ribosome on the cognate start codon. This unusual translation initiation mode might be considered as a hybrid mechanism between the canonical and the IRES-driven translation initiation process.
Collapse
Affiliation(s)
- Franck Martin
- Architecture et Réactivité de l'ARN, Université de Strasbourg, CNRS, Institut de Biologie Moléculaire et Cellulaire, 15 rue René Descartes, 67084 Strasbourg CEDEX, France
| | | | | | | | | | | |
Collapse
|
5
|
Abstract
From the late 1980s onward, the term "bioinformatics" mostly has been used to refer to computational methods for comparative analysis of genome data. However, the term was originally more widely defined as the study of informatic processes in biotic systems. In this essay, I will trace this early history (from a personal point of view) and I will argue that the original meaning of the term is re-emerging.
Collapse
Affiliation(s)
- Paulien Hogeweg
- Theoretical Biology and Bioinformatics Group, Department of Biology, Faculty of Science, Utrecht University, Utrecht, The Netherlands.
| |
Collapse
|
6
|
On the origin of synonymous codon usage divergence between thermophilic and mesophilic prokaryotes. FEBS Lett 2007; 581:5825-30. [DOI: 10.1016/j.febslet.2007.11.054] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2007] [Revised: 11/14/2007] [Accepted: 11/16/2007] [Indexed: 01/24/2023]
|
7
|
Jaeger S, Martin F, Rudinger-Thirion J, Giegé R, Eriani G. Binding of human SLBP on the 3'-UTR of histone precursor H4-12 mRNA induces structural rearrangements that enable U7 snRNA anchoring. Nucleic Acids Res 2006; 34:4987-95. [PMID: 16982637 PMCID: PMC1635294 DOI: 10.1093/nar/gkl666] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In metazoans, cell-cycle-dependent histones are produced from poly(A)-lacking mRNAs. The 3′ end of histone mRNAs is formed by an endonucleolytic cleavage of longer precursors between a conserved stem–loop structure and a purine-rich histone downstream element (HDE). The cleavage requires at least two trans-acting factors: the stem–loop binding protein (SLBP), which binds to the stem–loop and the U7 snRNP, which anchors to histone pre-mRNAs by annealing to the HDE. Using RNA structure-probing techniques, we determined the secondary structure of the 3′-untranslated region (3′-UTR) of mouse histone pre-mRNAs H4–12, H1t and H2a–614. Surprisingly, the HDE is embedded in hairpin structures and is therefore not easily accessible for U7 snRNP anchoring. Probing of the 3′-UTR in complex with SLBP revealed structural rearrangements leading to an overall opening of the structure especially at the level of the HDE. Electrophoretic mobility shift assays demonstrated that the SLBP-induced opening of HDE actually facilitates U7 snRNA anchoring on the histone H4–12 pre-mRNAs 3′ end. These results suggest that initial binding of the SLBP functions in making the HDE more accessible for U7 snRNA anchoring.
Collapse
Affiliation(s)
| | | | | | | | - Gilbert Eriani
- To whom correspondence should be addressed: Tel: +33 3 88 41 70 42; Fax: +33 3 88 60 22 18;
| |
Collapse
|
8
|
Chamary JV, Parmley JL, Hurst LD. Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat Rev Genet 2006; 7:98-108. [PMID: 16418745 DOI: 10.1038/nrg1770] [Citation(s) in RCA: 590] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Although the assumption of the neutral theory of molecular evolution - that some classes of mutation have too small an effect on fitness to be affected by natural selection - seems intuitively reasonable, over the past few decades the theory has been in retreat. At least in species with large populations, even synonymous mutations in exons are not neutral. By contrast, in mammals, neutrality of these mutations is still commonly assumed. However, new evidence indicates that even some synonymous mutations are subject to constraint, often because they affect splicing and/or mRNA stability. This has implications for understanding disease, optimizing transgene design, detecting positive selection and estimating the mutation rate.
Collapse
Affiliation(s)
- J V Chamary
- Center for Integrative Genomics, University of Lausanne, Switzerland.
| | | | | |
Collapse
|
9
|
Chamary JV, Hurst LD. Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals. Genome Biol 2005; 6:R75. [PMID: 16168082 PMCID: PMC1242210 DOI: 10.1186/gb-2005-6-9-r75] [Citation(s) in RCA: 236] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2005] [Revised: 06/08/2005] [Accepted: 07/20/2005] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND In mammals, contrary to what is usually assumed, recent evidence suggests that synonymous mutations may not be selectively neutral. This position has proven contentious, not least because of the absence of a viable mechanism. Here we test whether synonymous mutations might be under selection owing to their effects on the thermodynamic stability of mRNA, mediated by changes in secondary structure. RESULTS We provide numerous lines of evidence that are all consistent with the above hypothesis. Most notably, by simulating evolution and reallocating the substitutions observed in the mouse lineage, we show that the location of synonymous mutations is non-random with respect to stability. Importantly, the preference for cytosine at 4-fold degenerate sites, diagnostic of selection, can be explained by its effect on mRNA stability. Likewise, by interchanging synonymous codons, we find naturally occurring mRNAs to be more stable than simulant transcripts. Housekeeping genes, whose proteins are under strong purifying selection, are also under the greatest pressure to maintain stability. CONCLUSION Taken together, our results provide evidence that, in mammals, synonymous sites do not evolve neutrally, at least in part owing to selection on mRNA stability. This has implications for the application of synonymous divergence in estimating the mutation rate.
Collapse
Affiliation(s)
- JV Chamary
- Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, UK
| | - Laurence D Hurst
- Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, UK
| |
Collapse
|
10
|
Abstract
Tracing the history of molecular changes in coronaviruses using phylogenetic methods can provide powerful insights into the patterns of modification to sequences that underlie alteration to selective pressure and molecular function in the SARS-CoV (severe acute respiratory syndrome coronavirus) genome. The topology and branch lengths of the phylogenetic relationships among the family Coronaviridae, including SARS-CoV, have been estimated using the replicase polyprotein. The spike protein fragments S1 (involved in receptor-binding) and S2 (involved in membrane fusion) have been found to have different mutation rates. Fragment S1 can be further divided into two regions (S1A, which comprises approximately the first 400 nucleotides, and S1B, comprising the next 280) that also show different rates of mutation. The phylogeny presented on the basis of S1B shows that SARS-CoV is closely related to MHV (murine hepatitis virus), which is known to bind the murine receptor CEACAM1. The predicted structure, accessibility and mutation rate of the S1B region is also presented. Because anti-SARS drugs based on S2 heptads have short half-lives and are difficult to manufacture, our findings suggest that the S1B region might be of interest for anti-SARS drug discovery.
Collapse
Affiliation(s)
- Pietro Liò
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK.
| | | |
Collapse
|
11
|
Chen SL, Lee W, Hottes AK, Shapiro L, McAdams HH. Codon usage between genomes is constrained by genome-wide mutational processes. Proc Natl Acad Sci U S A 2004; 101:3480-5. [PMID: 14990797 PMCID: PMC373487 DOI: 10.1073/pnas.0307827100] [Citation(s) in RCA: 230] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Analysis of genome-wide codon bias shows that only two parameters effectively differentiate the genome-wide codon bias of 100 eubacterial and archaeal organisms. The first parameter correlates with genome GC content, and the second parameter correlates with context-dependent nucleotide bias. Both of these parameters may be calculated from intergenic sequences. Therefore, genome-wide codon bias in eubacteria and archaea may be predicted from intergenic sequences that are not translated. When these two parameters are calculated for genes from nonmammalian eukaryotic organisms, genes from the same organism again have similar values, and genome-wide codon bias may also be predicted from intergenic sequences. In mammals, genes from the same organism are similar only in the second parameter, because GC content varies widely among isochores. Our results suggest that, in general, genome-wide codon bias is determined primarily by mutational processes that act throughout the genome, and only secondarily by selective forces acting on translated sequences.
Collapse
Affiliation(s)
- Swaine L Chen
- Department of Developmental Biology, Stanford University School of Medicine, Beckman Center, B300, Stanford, CA 94304, USA.
| | | | | | | | | |
Collapse
|
12
|
Abstract
Our thesis is that the DNA composition and structure of genomes are selected in part by mutation bias (GC pressure) and in part by ecology. To illustrate this point, we compare and contrast the oligonucleotide composition and the mosaic structure in 36 complete genomes and in 27 long genomic sequences from archaea and eubacteria. We report the following findings (1) High-GC-content genomes show a large underrepresentation of short distances between G(n) and C(n) homopolymers with respect to distances between A(n) and T(n) homopolymers; we discuss selection versus mutation bias hypotheses. (2) The oligonucleotide compositions of the genomes of Neisseria (meningitidis and gonorrhoea), Helicobacter pylori and Rhodobacter capsulatus are more biased than the other sequenced genomes. (3) The genomes of free-living species or nonchronic pathogens show more mosaic-like structure than genomes of chronic pathogens or intracellular symbionts. (4) Genome mosaicity of intracellular parasites has a maximum corresponding to the average gene length; in the genomes of free-living and nonchronic pathogens the maximum occurs at larger length scales. This suggests that free-living species can incorporate large pieces of DNA from the environment, whereas for intracellular parasites there are recombination events between homologous genes. We discuss the consequences in terms of evolution of genome size. (5) Intracellular symbionts and obligate pathogens show small, but not zero, amount of chromosome mosaicity, suggesting that recombination events occur in these species.
Collapse
Affiliation(s)
- Pietro Liò
- Department of Zoology, University of Cambridge, United Kingdom.
| |
Collapse
|
13
|
Knight RD, Freeland SJ, Landweber LF. A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol 2001; 2:RESEARCH0010. [PMID: 11305938 PMCID: PMC31479 DOI: 10.1186/gb-2001-2-4-research0010] [Citation(s) in RCA: 210] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2000] [Revised: 02/01/2001] [Accepted: 02/13/2001] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND Correlations between genome composition (in terms of GC content) and usage of particular codons and amino acids have been widely reported, but poorly explained. We show here that a simple model of processes acting at the nucleotide level explains codon usage across a large sample of species (311 bacteria, 28 archaea and 257 eukaryotes). The model quantitatively predicts responses (slope and intercept of the regression line on genome GC content) of individual codons and amino acids to genome composition. RESULTS Codons respond to genome composition on the basis of their GC content relative to their synonyms (explaining 71-87% of the variance in response among the different codons, depending on measure). Amino-acid responses are determined by the mean GC content of their codons (explaining 71-79% of the variance). Similar trends hold for genes within a genome. Position-dependent selection for error minimization explains why individual bases respond differently to directional mutation pressure. CONCLUSIONS Our model suggests that GC content drives codon usage (rather than the converse). It unifies a large body of empirical evidence concerning relationships between GC content and amino-acid or codon usage in disparate systems. The relationship between GC content and codon and amino-acid usage is ahistorical; it is replicated independently in the three domains of living organisms, reinforcing the idea that genes and genomes at mutation/selection equilibrium reproduce a unique relationship between nucleic acid and protein composition. Thus, the model may be useful in predicting amino-acid or nucleotide sequences in poorly characterized taxa.
Collapse
Affiliation(s)
- Robin D Knight
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA
| | - Stephen J Freeland
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA
| | - Laura F Landweber
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
14
|
Chen Y, Carlini DB, Baines JF, Parsch J, Braverman JM, Tanda S, Stephan W. RNA secondary structure and compensatory evolution. Genes Genet Syst 1999; 74:271-86. [PMID: 10791023 DOI: 10.1266/ggs.74.271] [Citation(s) in RCA: 66] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
The classic concept of epistatic fitness interactions between genes has been extended to study interactions within gene regions, especially between nucleotides that are important in maintaining pre-mRNA/mRNA secondary structures. It is shown that the majority of linkage disequilibria found within the Drosophila Adh gene are likely to be caused by epistatic selection operating on RNA secondary structures. A recently proposed method of RNA secondary structure prediction based on DNA sequence comparisons is reviewed and applied to several types of RNAs, including tRNA, rRNA, and mRNA. The patterns of covariation in these RNAs are analyzed based on Kimura's compensatory evolution model. The results suggest that this model describes the substitution process in the pairing regions (helices) of RNA secondary structures well when the helices are evolutionarily conserved and thermodynamically stable, but fails in some other cases. Epistatic selection maintaining pre-mRNA/mRNA secondary structures is compared to weak selective forces that determine features such as base composition and synonymous codon usage. The relationships among these forces and their relative strengths are addressed. Finally, our mutagenesis experiments using the Drosophila Adh locus are reviewed. These experiments analyze long-range compensatory interactions between the 5' and 3' ends of Adh mRNA, the different constraints on secondary structures in introns and exons, and the possible role of secondary structures in RNA splicing.
Collapse
Affiliation(s)
- Y Chen
- Department of Biology, University of Rochester, NY 14627, USA
| | | | | | | | | | | | | |
Collapse
|
15
|
Huynen M, Gutell R, Konings D. Assessing the reliability of RNA folding using statistical mechanics. J Mol Biol 1997; 267:1104-12. [PMID: 9150399 DOI: 10.1006/jmbi.1997.0889] [Citation(s) in RCA: 82] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
We have analyzed the base-pairing probability distributions of 16 S and 16 S-like, and 23 S and 23 S-like ribosomal RNAs of Archaea, Bacteria, chloroplasts, mitochondria and Eukarya, as predicted by the partition function approach for RNA folding introduced by McCaskill. A quantitative analysis of the reliability of RNA folding is done by comparing the base-pairing probability distributions with the structures predicted by comparative sequence analysis (comparative structures). We distinguish two factors that show a relationship to the reliability of RNA minimum free energy structure. The first factor is the dominance of one particular base-pair or the absence of base-pairing for a given base within the base-pairing probability distribution (BPPD). We characterize the BPPD per base, including the probability of not base-pairing, by its Shannon entropy (S). The S value indicates the uncertainty about the base-pairing of a base: low S values result from BPPDs that are strongly dominated by a single base-pair or by the absence of base-pairing. We show that bases with low S values have a relatively high probability that their minimum free energy (MFE) structure corresponds to the comparative structure. The BPPDs of prokaryotes that live at high temperatures (thermophilic Archaea and Bacteria) have, calculated at 37 degrees C, lower S values than the BPPDs of prokaryotes that live at lower temperatures (mesophilic and psychrophilic Archaea and Bacteria). This reflects an adaptation of the ribosomal RNAs to the environmental temperature. A second factor that is important to consider with regard to the reliability of MFE structure folding is a variable degree of applicability of the thermodynamic model of RNA folding for different groups of RNAs. Here we show that among the bases that show low S values, the Archaea and Bacteria have similar, high probabilities (0.96 and 0.94 in 16 S and 0.93 and 0.91 in 23 S, respectively) that the MFE structure corresponds to the comparative structure. These probabilities are lower in the chloroplasts (16 S 0.91, 23 S 0.79), mitochondria (16 S-like 0.89, 23 S-like 0.69) and Eukarya (18 S 0.81, 28 S 0.86).
Collapse
Affiliation(s)
- M Huynen
- Center for Nonlinear Studies, Los Alamos National Laboratory, NM 87545, USA
| | | | | |
Collapse
|
16
|
Teerink H, Voorma HO, Thomas AA. The human insulin-like growth factor II leader 1 contains an internal ribosomal entry site. BIOCHIMICA ET BIOPHYSICA ACTA 1995; 1264:403-8. [PMID: 8547330 DOI: 10.1016/0167-4781(95)00185-9] [Citation(s) in RCA: 52] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Insulin-like growth factor II is a small peptide growth hormone, encoded by four mRNAs with unique 5' untranslated regions and identical coding regions. The 5' untranslated region transcribed from promoter 1 is 598 nt (leader 1). The properties of this leader 1 suggest a strong regulation of translation; the high G + C-content, the presence of an upstream open reading frame, and the length of the 5' UTR are 3 elements which prohibit efficient translation and which may modulate expression. In this paper we show that the human IGFII leader 1 harbours sequence elements that allow translation initiation to occur by internal initiation on the IGF sequence. This mode of initiation was described first for picornaviral mRNAs, that are naturally uncapped. The IGFII leader 1-dependent expression in HeLa cells was resistant to infection with poliovirus; abrogation of cap-dependent initiation by poliovirus had apparently no effect on IGFII expression. Moreover, a downstream CAT-cistron in a bicistronic construct was translated upon insertion of the leader 1 sequence. The translational properties of the IGFII leader 1 suggest that internal initiation on this leader may be modulated during proliferation or differentiation, enabling cell-stage dependent expression of IGFII.
Collapse
Affiliation(s)
- H Teerink
- Department of Molecular Cell Biology, University of Utrecht, The Netherlands
| | | | | |
Collapse
|
17
|
Abstract
Recognition of function of newly sequenced DNA fragments is an important area of computational molecular biology. Here we present an extensive review of methods for prediction of functional sites, tRNA, and protein-coding genes and discuss possible further directions of research in this area.
Collapse
Affiliation(s)
- M S Gelfand
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow region, Russia
| |
Collapse
|
18
|
Bronson EC, Anderson JN. Nucleotide composition as a driving force in the evolution of retroviruses. J Mol Evol 1994; 38:506-32. [PMID: 8028030 DOI: 10.1007/bf00178851] [Citation(s) in RCA: 56] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
All complete retrovirus sequences in the GenEMBL database were examined with the goal of assessing possible relationships between the nucleotide composition of retroviral genomes, the amino acid composition of retroviral proteins, and evolutionary strategies used by retroviruses. The results demonstrated that the genome of each viral lineage has a characteristic base composition and that the variations between groups are related to retroviral phylogeny. By analogy to microbial species, we suggest that the variations arise from group-specific patterns of directional mutations where the bias can be exerted on any of the four nucleotides. It is most likely that the mutational patterns are introduced during reverse transcription, and a direct participation of reverse transcriptase in the process is suspected. A straightforward strategy was used to analyze the compositional relationship between nucleotides and encoded amino acids. The procedure entailed calculations of amino acid frequencies from nucleotide content and the comparison of the calculated values to the observed amino acid frequencies in retroviruses. The results revealed an excellent correspondence between variation in genomic base composition and variation in amino acid composition of proteins with the compositional differences extending into all major coding regions of the viruses. Because of the magnitude and dispersion of these effects, and because of the nonconservative nature of many of the substitutions between groups with different genomic biases, we suggest that the variations in protein composition driven by biased nucleotide frequencies are an important factor in shaping the characteristic phenotypes of the different viral lineages. A clue to the nature of the evolutionary forces that are responsible for the generation of nucleotide biases was provided by the observation that viruses with radically different base frequencies most often inhabit the same cell type. This observation, along with analysis of amino acid and nucleotide replacement patterns between and within reverse transcriptase sequences from the various groups, permitted us to advance a model for the evolution of retroviruses. According to the model, speciation could initiate when daughter virions from a single progenitor vary in the direction of their mutational bias. These variations would exert a pleiotropic effect on the frequencies of nucleotides in all viral genes and consequently on the frequencies of amino acids in the encoded proteins. The variants with the most extreme compositional differences would have a selective advantage because their different precursor requirements would enable them to occupy different ecological niches within a single cell.(ABSTRACT TRUNCATED AT 400 WORDS)
Collapse
Affiliation(s)
- E C Bronson
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907
| | | |
Collapse
|
19
|
Hogeweg P, Hesper B. Evolutionary dynamics and the coding structure of sequences: Multiple coding as a consequence of crossover and high mutation rates. ACTA ACUST UNITED AC 1992. [DOI: 10.1016/0097-8485(92)80044-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|