1
|
Forsdyke DR. Genomic compliance with Chargaff's second parity rule may have originated non-adaptively, but stem-loops now function adaptively. J Theor Biol 2024; 595:111943. [PMID: 39277166 DOI: 10.1016/j.jtbi.2024.111943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 07/06/2024] [Accepted: 09/07/2024] [Indexed: 09/17/2024]
Abstract
Of Chargaff's four rules on DNA base quantity, his second parity rule (PR-2) is the most contentious. Various biometricians (e.g., Sueoka, Lobry) regarded PR-2 compliance as a non-adaptive feature of modern genomes that could be modeled through interrelations among mutation rates. However, PR-2 compliance with stem-loop potential was considered adaptively relevant by biochemists familiar with analyses of nucleic acid structure (e.g., of Crick) and of meiotic recombination (e.g., of Kleckner). Meanwhile, other biometricians had shown that PR-2 complementarity extended beyond individual bases (1-mers) to oligonucleotides (k-mers), possibly reflecting "advantageous DNA structure" (Nussinov). An "introns early" hypothesis (Reanney, Forsdyke) had suggested a primordial nucleic acid world with recombination-mediated error-correction requiring genome-wide stem-loop potential to have evolved prior to localized intrusions of protein-encoding potential (exons). Thus, a primordial genome was equivalent to one long intron. Indeed, when assessed as the base order-dependent component (correcting for local influences of GC%), modern genes, especially when evolving rapidly under positive Darwinian selection, display high intronic stem-loop potential. This suggests forced migration from neighboring exons by competing protein-encoding potential. PR-2 compliance may have first arisen non-adaptively. Primary prototypic structures were later strengthened by their adaptive contribution to recombination. Thus, contentious views may actually be in harmony.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, Ontario K7L3N6, Canada.
| |
Collapse
|
2
|
Felipe Benites L, Stephens TG, Van Etten J, James T, Christian WC, Barry K, Grigoriev IV, McDermott TR, Bhattacharya D. Hot springs viruses at Yellowstone National Park have ancient origins and are adapted to thermophilic hosts. Commun Biol 2024; 7:312. [PMID: 38594478 PMCID: PMC11003980 DOI: 10.1038/s42003-024-05931-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 02/16/2024] [Indexed: 04/11/2024] Open
Abstract
Geothermal springs house unicellular red algae in the class Cyanidiophyceae that dominate the microbial biomass at these sites. Little is known about host-virus interactions in these environments. We analyzed the virus community associated with red algal mats in three neighboring habitats (creek, endolithic, soil) at Lemonade Creek, Yellowstone National Park (YNP), USA. We find that despite proximity, each habitat houses a unique collection of viruses, with the giant viruses, Megaviricetes, dominant in all three. The early branching phylogenetic position of genes encoded on metagenome assembled virus genomes (vMAGs) suggests that the YNP lineages are of ancient origin and not due to multiple invasions from mesophilic habitats. The existence of genomic footprints of adaptation to thermophily in the vMAGs is consistent with this idea. The Cyanidiophyceae at geothermal sites originated ca. 1.5 Bya and are therefore relevant to understanding biotic interactions on the early Earth.
Collapse
Affiliation(s)
- L Felipe Benites
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, NJ, 08901, USA
| | - Timothy G Stephens
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, NJ, 08901, USA
| | - Julia Van Etten
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, NJ, 08901, USA
- Graduate Program in Ecology and Evolution, Rutgers, The State University of New Jersey, New Brunswick, NJ, 08901, USA
| | - Timeeka James
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, NJ, 08901, USA
| | - William C Christian
- Department of Land Resources and Environmental Sciences, Montana State University, Bozeman, Montana, USA
| | - Kerrie Barry
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Igor V Grigoriev
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, 94720, USA
| | - Timothy R McDermott
- Department of Chemistry and Biochemistry, Montana State University, Bozeman, Montana, USA
| | - Debashish Bhattacharya
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, NJ, 08901, USA.
| |
Collapse
|
3
|
Arias PM, Butler J, Randhawa GS, Soltysiak MPM, Hill KA, Kari L. Environment and taxonomy shape the genomic signature of prokaryotic extremophiles. Sci Rep 2023; 13:16105. [PMID: 37752120 PMCID: PMC10522608 DOI: 10.1038/s41598-023-42518-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 09/11/2023] [Indexed: 09/28/2023] Open
Abstract
This study provides comprehensive quantitative evidence suggesting that adaptations to extreme temperatures and pH imprint a discernible environmental component in the genomic signature of microbial extremophiles. Both supervised and unsupervised machine learning algorithms were used to analyze genomic signatures, each computed as the k-mer frequency vector of a 500 kbp DNA fragment arbitrarily selected to represent a genome. Computational experiments classified/clustered genomic signatures extracted from a curated dataset of [Formula: see text] extremophile (temperature, pH) bacteria and archaea genomes, at multiple scales of analysis, [Formula: see text]. The supervised learning resulted in high accuracies for taxonomic classifications at [Formula: see text], and medium to medium-high accuracies for environment category classifications of the same datasets at [Formula: see text]. For [Formula: see text], our findings were largely consistent with amino acid compositional biases and codon usage patterns in coding regions, previously attributed to extreme environment adaptations. The unsupervised learning of unlabelled sequences identified several exemplars of hyperthermophilic organisms with large similarities in their genomic signatures, in spite of belonging to different domains in the Tree of Life.
Collapse
Affiliation(s)
- Pablo Millán Arias
- School of Computer Science, University of Waterloo, Waterloo, ON, Canada.
| | - Joseph Butler
- Department of Biology, University of Western Ontario, London, ON, Canada
| | - Gurjit S Randhawa
- School of Mathematical and Computational Sciences, University of Prince Edward Island, Charlottetown, PE, Canada
| | | | - Kathleen A Hill
- Department of Biology, University of Western Ontario, London, ON, Canada
| | - Lila Kari
- School of Computer Science, University of Waterloo, Waterloo, ON, Canada
| |
Collapse
|
4
|
Oldrieve GR, Malacart B, López-Vidal J, Matthews KR. The genomic basis of host and vector specificity in non-pathogenic trypanosomatids. Biol Open 2022; 11:bio059237. [PMID: 35373253 PMCID: PMC9099014 DOI: 10.1242/bio.059237] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 03/25/2022] [Indexed: 11/20/2022] Open
Abstract
Trypanosoma theileri, a non-pathogenic parasite of bovines, has a predicted surface protein architecture that likely aids survival in its mammalian host. Their surface proteins are encoded by genes which account for ∼10% of their genome. A non-pathogenic parasite of sheep, Trypanosoma melophagium, is transmitted by the sheep ked and is closely related to T. theileri. To explore host and vector specificity between these species, we sequenced the T. melophagium genome and transcriptome and an annotated draft genome was assembled. T. melophagium was compared to 43 kinetoplastid genomes, including T. theileri. T. melophagium and T. theileri have an AT biased genome, the greatest bias of publicly available trypanosomatids. This trend may result from selection acting to decrease the genomic nucleotide cost. The T. melophagium genome is 6.3Mb smaller than T. theileri and large families of proteins, characteristic of the predicted surface of T. theileri, were found to be absent or greatly reduced in T. melophagium. Instead, T. melophagium has modestly expanded protein families associated with the avoidance of complement-mediated lysis. We propose that the contrasting genomic features of these species is linked to their mode of transmission from their insect vector to their mammalian host. This article has an associated First Person interview with the first author of the paper.
Collapse
Affiliation(s)
- Guy R. Oldrieve
- Institute of Immunology and Infection Research, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3FL, UK
| | | | | | | |
Collapse
|
5
|
Villain E, Fort P, Kajava AV. Aspartate-phobia of thermophiles as a reaction to deleterious chemical transformations. Bioessays 2021; 44:e2100213. [PMID: 34791689 DOI: 10.1002/bies.202100213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 11/01/2021] [Accepted: 11/02/2021] [Indexed: 11/08/2022]
Abstract
Prokaryotes growing at high temperatures have a high proportion of charged residues in their proteins to stabilize their 3D structure. By mining 175 disparate bacterial and archaeal proteomes we found that, against the general trend for charged residues, the frequency of aspartic acid residues decreases strongly as natural growth temperature increases. In search of the explanation, we hypothesized that the reason for such unusual correlation is the deleterious consequences of spontaneous chemical transformations of aspartate at high temperatures. Our subsequent statistical analysis supported this hypothesis. This finding reveals that organisms have likely adapted to high temperatures by minimizing the harmful consequences of spontaneous chemical transformations.
Collapse
Affiliation(s)
- Etienne Villain
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Université de Montpellier 1919 Route de Mende, Montpellier, France
| | - Philippe Fort
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Université de Montpellier 1919 Route de Mende, Montpellier, France
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Université de Montpellier 1919 Route de Mende, Montpellier, France
| |
Collapse
|
6
|
Neutralism versus selectionism: Chargaff's second parity rule, revisited. Genetica 2021; 149:81-88. [PMID: 33880685 PMCID: PMC8057000 DOI: 10.1007/s10709-021-00119-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 04/09/2021] [Indexed: 11/03/2022]
Abstract
Of Chargaff's four "rules" on DNA base frequencies, the functional interpretation of his second parity rule (PR2) is the most contentious. Thermophile base compositions (GC%) were taken by Galtier and Lobry (1997) as favoring Sueoka's neutral PR2 hypothesis over Forsdyke's selective PR2 hypothesis, namely that mutations improving local within-species recombination efficiency had generated a genome-wide potential for the strands of duplex DNA to separate and initiate recombination through the "kissing" of the tips of stem-loops. However, following Chargaff's GC rule, base composition mainly reflects a species-specific, genome-wide, evolutionary pressure. GC% could not have consistently followed the dictates of temperature, since it plays fundamental roles in both sustaining species integrity and, through primarily neutral genome-wide mutation, fostering speciation. Evidence for a local within-species recombination-initiating role of base order was obtained with a novel technology that masked the contribution of base composition to nucleic acid folding energy. Forsdyke's results were consistent with his PR2 hypothesis, appeared to resolve some root problems in biology and provided a theoretical underpinning for alignment-free taxonomic analyses using relative oligonucleotide frequencies (k-mer analysis). Moreover, consistent with Chargaff's cluster rule, discovery of the thermoadaptive role of the "purine-loading" of open reading frames made less tenable the Galtier-Lobry anti-selectionist arguments.
Collapse
|
7
|
Quan CL, Gao F. Quantitative analysis and assessment of base composition asymmetry and gene orientation bias in bacterial genomes. FEBS Lett 2019; 593:918-925. [PMID: 30941752 DOI: 10.1002/1873-3468.13374] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 03/28/2019] [Accepted: 03/31/2019] [Indexed: 11/10/2022]
Abstract
Base composition asymmetry and gene orientation bias are two common genomic structures in bacterial genomes. Here, correlation coefficients between nucleotide disparities and coding sequence (CDS) skew have been calculated, which provides insights into their relationship from an individual genome perspective. Consequently, we find GC and RY disparities correlate significantly with CDS skew, since around 60% of the bacterial genomes under study have correlation coefficients > 0.9. Then, we present a model for quantitative assessment of nucleotide disparity and CDS skew in which a numerical index R2 is used for evaluation. We find that skew curves with higher R2 perform better on the prediction of replication origins in bacteria.
Collapse
Affiliation(s)
- Chun-Lan Quan
- Department of Physics, School of Science, Tianjin University, China
| | - Feng Gao
- Department of Physics, School of Science, Tianjin University, China.,Key Laboratory of Systems Bioengineering, Ministry of Education, Tianjin University, China.,SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), China
| |
Collapse
|
8
|
Patel A, Matsakas L, Rova U, Christakopoulos P. A perspective on biotechnological applications of thermophilic microalgae and cyanobacteria. BIORESOURCE TECHNOLOGY 2019; 278:424-434. [PMID: 30685131 DOI: 10.1016/j.biortech.2019.01.063] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2018] [Revised: 01/12/2019] [Accepted: 01/15/2019] [Indexed: 05/18/2023]
Abstract
The importance of expanding our knowledge on microorganisms derived from extreme environments stems from the development of novel and sustainable technologies for our health, food, and environment. Microalgae and cyanobacteria represent a group of diverse microorganisms that inhabit a wide range of environments, are capable of oxygenic photosynthesis, and form a thick microbial mat even at extreme environments. Studies of thermophilic microorganisms have shown a considerable biotechnological potential due to their optimum growth and metabolisms at high temperatures (≥50 °C), which is supported by their thermostable enzymes. Microalgal and cyanobacterial communities present in high-temperature ecosystems account for a large part of the total ecosystem biomass and productivity, and can be exploited to generate several value-added products of agricultural, pharmaceutical, nutraceutical, and industrial relevance. This review provides an overview on the current status of biotechnological applications of thermophilic microalgae and cyanobacteria, with an outlook on the challenges and future prospects.
Collapse
Affiliation(s)
- Alok Patel
- Biochemical Process Engineering, Division of Chemical Engineering, Department of Civil, Environmental, and Natural Resources Engineering, Luleå University of Technology, SE-971 87 Luleå, Sweden
| | - Leonidas Matsakas
- Biochemical Process Engineering, Division of Chemical Engineering, Department of Civil, Environmental, and Natural Resources Engineering, Luleå University of Technology, SE-971 87 Luleå, Sweden.
| | - Ulrika Rova
- Biochemical Process Engineering, Division of Chemical Engineering, Department of Civil, Environmental, and Natural Resources Engineering, Luleå University of Technology, SE-971 87 Luleå, Sweden
| | - Paul Christakopoulos
- Biochemical Process Engineering, Division of Chemical Engineering, Department of Civil, Environmental, and Natural Resources Engineering, Luleå University of Technology, SE-971 87 Luleå, Sweden
| |
Collapse
|
9
|
Seward EA, Kelly S. Selection-driven cost-efficiency optimization of transcripts modulates gene evolutionary rate in bacteria. Genome Biol 2018; 19:102. [PMID: 30064467 PMCID: PMC6066932 DOI: 10.1186/s13059-018-1480-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 07/11/2018] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Most amino acids are encoded by multiple synonymous codons. However, synonymous codons are not used equally, and this biased codon use varies between different organisms. It has previously been shown that both selection acting to increase codon translational efficiency and selection acting to decrease codon biosynthetic cost contribute to differences in codon bias. However, it is unknown how these two factors interact or how they affect molecular sequence evolution. RESULTS Through analysis of 1320 bacterial genomes, we show that bacterial genes are subject to multi-objective selection-driven optimization of codon use. Here, selection acts to simultaneously decrease transcript biosynthetic cost and increase transcript translational efficiency, with highly expressed genes under the greatest selection. This optimization is not simply a consequence of the more translationally efficient codons being less expensive to synthesize. Instead, we show that transfer RNA gene copy number alters the cost-efficiency trade-off of synonymous codons such that, for many species, selection acting on transcript biosynthetic cost and translational efficiency act in opposition. Finally, we show that genes highly optimized to reduce cost and increase efficiency show reduced rates of synonymous and non-synonymous mutation. CONCLUSIONS This analysis provides a simple mechanistic explanation for variation in evolutionary rate between genes that depends on selection-driven cost-efficiency optimization of the transcript. These findings reveal how optimization of resource allocation to messenger RNA synthesis is a critical factor that determines both the evolution and composition of genes.
Collapse
Affiliation(s)
- Emily A Seward
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK
| | - Steven Kelly
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK.
| |
Collapse
|
10
|
Abstract
Genome and transcript sequences are composed of long strings of nucleotide monomers (A, C, G, and T/U) that require different quantities of nitrogen atoms for biosynthesis. Here, it is shown that the strength of selection acting on transcript nitrogen content is influenced by the amount of nitrogen plants require to conduct photosynthesis. Specifically, plants that require more nitrogen to conduct photosynthesis experience stronger selection on transcript sequences to use synonymous codons that cost less nitrogen to biosynthesize. It is further shown that the strength of selection acting on transcript nitrogen cost constrains molecular sequence evolution such that genes experiencing stronger selection evolve at a slower rate. Together these findings reveal that the plant molecular clock is set by photosynthetic efficiency, and provide a mechanistic explanation for changes in plant speciation rates that occur concomitant with improvements in photosynthetic efficiency and changes in the environment such as light, temperature, and atmospheric CO2 concentration.
Collapse
Affiliation(s)
- Steven Kelly
- Department of Plant Sciences, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
11
|
Dissimilar substitution rates between two strands of DNA influence codon usage pattern in some human genes. Gene 2018; 645:179-187. [PMID: 29229516 DOI: 10.1016/j.gene.2017.12.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 12/05/2017] [Accepted: 12/07/2017] [Indexed: 11/23/2022]
Abstract
We illustrated the descriptive aspects of codon usage of some important human genes and their expression potential in E. coli. By comparing the results of various codon usage parameters, effects that are due to selection and mutational pressures have been deciphered. The variation in GC3s explains a significant proportion of the variation in codon usage patterns. The codons CGC, CGG, CTG and GCG showed strong positive correlation with GC3, which suggested that codon usage had been influenced by GC bias. We also found that ACC (Thr, RSCU-1.77), GCC (Ala, RSCU-1.67), CCC (Pro, RSCU-1.54), TCC (Ser, RSCU-1.47) were frequently used which signified that C was common at 2nd and 3rd codon positions. Correspondence analysis revealed that F1 axis had significant correlation with various GC contents suggesting that compositional properties under mutation pressure might affect codon usage bias. Nc-GC3 plot analysis suggested that both mutation pressure and natural selection might affect the codon usage bias which is also supported by neutrality plot analysis. The dinucleotide CT, TG and AG were significantly over-represented and CG, TA, AT, TT, and GT were underrepresented due to high rate of spontaneous mutation resulting from cytosine deamination.
Collapse
|
12
|
|
13
|
Quantitative analysis of correlation between AT and GC biases among bacterial genomes. PLoS One 2017; 12:e0171408. [PMID: 28158313 PMCID: PMC5291525 DOI: 10.1371/journal.pone.0171408] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2016] [Accepted: 01/20/2017] [Indexed: 01/03/2023] Open
Abstract
Due to different replication mechanisms between the leading and lagging strands, nucleotide composition asymmetries widely exist in bacterial genomes. A general consideration reveals that the leading strand is enriched in Guanine (G) and Thymine (T), and the lagging strand shows richness in Adenine (A) and Cytosine (C). However, some bacteria like Bacillus subtilis have been discovered composing more A than T in the leading strand. To investigate the difference, we analyze the nucleotide asymmetry from the aspect of AT and GC bias correlations. In this study, we propose a windowless method, the Z-curve Correlation Coefficient (ZCC) index, based on the Z-curve method, and analyzed more than 2000 bacterial genomes. We find that the majority of bacteria reveal negative correlations between AT and GC biases, while most genomes in Firmicutes and Tenericutes have positive ZCC indexes. The presence of PolC, purine asymmetry and stronger genes preference in the leading strand are not confined to Firmicutes, but also likely to happen in other phyla dominated by positive ZCC indexes. This method also provides a new insight into other relevant features like aerobism, and can be applied to analyze the correlation between RY (Purine and Pyrimidine) and MK (Amino and Keto) bias and so on.
Collapse
|
14
|
Seward EA, Kelly S. Dietary nitrogen alters codon bias and genome composition in parasitic microorganisms. Genome Biol 2016; 17:226. [PMID: 27842572 PMCID: PMC5109750 DOI: 10.1186/s13059-016-1087-9] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 10/12/2016] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Genomes are composed of long strings of nucleotide monomers (A, C, G and T) that are either scavenged from the organism's environment or built from metabolic precursors. The biosynthesis of each nucleotide differs in atomic requirements with different nucleotides requiring different quantities of nitrogen atoms. However, the impact of the relative availability of dietary nitrogen on genome composition and codon bias is poorly understood. RESULTS Here we show that differential nitrogen availability, due to differences in environment and dietary inputs, is a major determinant of genome nucleotide composition and synonymous codon use in both bacterial and eukaryotic microorganisms. Specifically, low nitrogen availability species use nucleotides that require fewer nitrogen atoms to encode the same genes compared to high nitrogen availability species. Furthermore, we provide a novel selection-mutation framework for the evaluation of the impact of metabolism on gene sequence evolution and show that it is possible to predict the metabolic inputs of related organisms from an analysis of the raw nucleotide sequence of their genes. CONCLUSIONS Taken together, these results reveal a previously hidden relationship between cellular metabolism and genome evolution and provide new insight into how genome sequence evolution can be influenced by adaptation to different diets and environments.
Collapse
Affiliation(s)
- Emily A Seward
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK
| | - Steven Kelly
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK.
| |
Collapse
|
15
|
Tatarinova TV, Chekalin E, Nikolsky Y, Bruskin S, Chebotarov D, McNally KL, Alexandrov N. Nucleotide diversity analysis highlights functionally important genomic regions. Sci Rep 2016; 6:35730. [PMID: 27774999 PMCID: PMC5075931 DOI: 10.1038/srep35730] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 09/30/2016] [Indexed: 12/15/2022] Open
Abstract
We analyzed functionality and relative distribution of genetic variants across the complete Oryza sativa genome, using the 40 million single nucleotide polymorphisms (SNPs) dataset from the 3,000 Rice Genomes Project (http://snp-seek.irri.org), the largest and highest density SNP collection for any higher plant. We have shown that the DNA-binding transcription factors (TFs) are the most conserved group of genes, whereas kinases and membrane-localized transporters are the most variable ones. TFs may be conserved because they belong to some of the most connected regulatory hubs that modulate transcription of vast downstream gene networks, whereas signaling kinases and transporters need to adapt rapidly to changing environmental conditions. In general, the observed profound patterns of nucleotide variability reveal functionally important genomic regions. As expected, nucleotide diversity is much higher in intergenic regions than within gene bodies (regions spanning gene models), and protein-coding sequences are more conserved than untranslated gene regions. We have observed a sharp decline in nucleotide diversity that begins at about 250 nucleotides upstream of the transcription start and reaches minimal diversity exactly at the transcription start. We found the transcription termination sites to have remarkably symmetrical patterns of SNP density, implying presence of functional sites near transcription termination. Also, nucleotide diversity was significantly lower near 3′ UTRs, the area rich with regulatory regions.
Collapse
Affiliation(s)
- Tatiana V Tatarinova
- Center for Personalized Medicine and Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA.,Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russian Federation
| | | | - Yuri Nikolsky
- Vavilov Institute of General Genetics, Moscow, Russia.,F1 Genomics, San Diego, CA, USA.,School of Systems Biology, George Mason University, VA, USA
| | | | - Dmitry Chebotarov
- International Rice Research Institute, Los Baños, Laguna 4031, Philippines
| | - Kenneth L McNally
- International Rice Research Institute, Los Baños, Laguna 4031, Philippines
| | | |
Collapse
|
16
|
Umu SU, Poole AM, Dobson RC, Gardner PP. Avoidance of stochastic RNA interactions can be harnessed to control protein expression levels in bacteria and archaea. eLife 2016; 5. [PMID: 27642845 PMCID: PMC5028192 DOI: 10.7554/elife.13479] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 08/14/2016] [Indexed: 11/23/2022] Open
Abstract
A critical assumption of gene expression analysis is that mRNA abundances broadly correlate with protein abundance, but these two are often imperfectly correlated. Some of the discrepancy can be accounted for by two important mRNA features: codon usage and mRNA secondary structure. We present a new global factor, called mRNA:ncRNA avoidance, and provide evidence that avoidance increases translational efficiency. We also demonstrate a strong selection for the avoidance of stochastic mRNA:ncRNA interactions across prokaryotes, and that these have a greater impact on protein abundance than mRNA structure or codon usage. By generating synonymously variant green fluorescent protein (GFP) mRNAs with different potential for mRNA:ncRNA interactions, we demonstrate that GFP levels correlate well with interaction avoidance. Therefore, taking stochastic mRNA:ncRNA interactions into account enables precise modulation of protein abundance. DOI:http://dx.doi.org/10.7554/eLife.13479.001 Many genes carry information for making proteins. To make a protein, a working copy of the information stored in DNA is first copied into a molecule of messenger RNA. These RNA messages are then interpreted by the ribosome, the molecular machine that makes proteins. Many messages are produced from each gene, and each message can be read multiple times. Thus, it should follow that the number of messages produced dictates the number of proteins made. However, this is not the case and the number of proteins produced cannot be completely predicted from knowing the number of messenger RNAs. Cells control how much of a given protein they produce through interactions between the messenger RNAs and other regulatory RNAs. The regulatory RNAs bind directly to a message and impede protein production. Because there are millions of RNAs in a cell, these interactions have evolved to be highly specific. Nevertheless, it seems inevitable that messenger RNAs would encounter other RNAs too, which could short-circuit gene regulation and lead to less protein being produced. Umu et al. have now asked if such short-circuit events are selected against during evolution. Computational tools were used to predict the strength of binding between the RNAs found in the dominant forms of microbial life on Earth: the bacteria and the archaea. This approach revealed that the majority of messenger RNAs bind more weakly to the most common RNA molecules found in cells than would be expected by chance. Weakened binding should prevent the RNA molecules from becoming tangled with each other and ensure that protein levels are not perturbed by unintended interactions between highly expressed messages and other RNAs. To test this hypothesis further, Umu et al. generated versions of the gene for a green fluorescent protein that differed only in how well their messenger RNAs could avoid interacting with the most abundant RNAs in E. coli cells. Those messengers that were designed to avoid interacting with other RNAs yielded far more protein than those that were not. The findings show that taking this kind of avoidance into account can improve predictions about how much protein will be produced and should therefore make it easier to control protein production in experimental systems. Finally, the messenger RNAs of some bacteria do not show such clear avoidance. However, these bacteria have a more complex internal cell structure. This finding hints at an alternative means for avoiding short-circuiting events that could be used by more complicated cells, such of those of animals and plants, which also contain much larger numbers of RNAs. DOI:http://dx.doi.org/10.7554/eLife.13479.002
Collapse
Affiliation(s)
- Sinan Uğur Umu
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand.,Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand
| | - Anthony M Poole
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand.,Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand
| | - Renwick Cj Dobson
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand.,Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand.,Department of Biochemistry and Molecular Biology, University of Melbourne, Parkville, Australia
| | - Paul P Gardner
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand.,Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand.,BioProtection Research Centre, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
17
|
Abstract
The deconstruction of biomass is a pivotal process for the manufacture of target products using microbial cells and their enzymes. But the enzymes that possess a significant role in the breakdown of biomass remain relatively unexplored. Thermophilic microorganisms are of special interest as a source of novel thermostable enzymes. Many thermophilic microorganisms possess properties suitable for biotechnological and commercial use. There is, indeed, a considerable demand for a new generation of stable enzymes that are able to withstand severe conditions in industrial processes by replacing or supplementing traditional chemical processes. This manuscript reviews the pertinent role of thermophilic microorganisms as a source for production of thermostable enzymes, factors afftecting them, recent patents on thermophiles and moreso their wide spectrum applications for commercial and biotechnological use.
Collapse
|
18
|
Gao L, Liu Y, Sun H, Li C, Zhao Z, Liu G. Advances in mechanisms and modifications for rendering yeast thermotolerance. J Biosci Bioeng 2016; 121:599-606. [DOI: 10.1016/j.jbiosc.2015.11.002] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 11/05/2015] [Accepted: 11/08/2015] [Indexed: 10/22/2022]
|
19
|
Chargaff’s Cluster Rule. Evol Bioinform Online 2016. [DOI: 10.1007/978-3-319-28755-3_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
|
20
|
Jain K, Krause K, Grewe F, Nelson GF, Weber APM, Christensen AC, Mower JP. Extreme features of the Galdieria sulphuraria organellar genomes: a consequence of polyextremophily? Genome Biol Evol 2014; 7:367-80. [PMID: 25552531 PMCID: PMC4316638 DOI: 10.1093/gbe/evu290] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Nuclear genome sequencing from extremophilic eukaryotes has revealed clues about the mechanisms of adaptation to extreme environments, but the functional consequences of extremophily on organellar genomes are unknown. To address this issue, we assembled the mitochondrial and plastid genomes from a polyextremophilic red alga, Galdieria sulphuraria strain 074 W, and performed a comparative genomic analysis with other red algae and more broadly across eukaryotes. The mitogenome is highly reduced in size and genetic content and exhibits the highest guanine–cytosine skew of any known genome and the fastest substitution rate among all red algae. The plastid genome contains a large number of intergenic stem-loop structures but is otherwise rather typical in size, structure, and content in comparison with other red algae. We suggest that these unique genomic modifications result not only from the harsh conditions in which Galdieria lives but also from its unusual capability to grow heterotrophically, endolithically, and in the dark. These conditions place additional mutational pressures on the mitogenome due to the increased reliance on the mitochondrion for energy production, whereas the decreased reliance on photosynthesis and the presence of numerous stem-loop structures may shield the plastome from similar genomic stress.
Collapse
Affiliation(s)
- Kanika Jain
- Center for Plant Science Innovation, University of Nebraska - Lincoln School of Biological Sciences, University of Nebraska - Lincoln
| | - Kirsten Krause
- Department of Arctic and Marine Biology, UiT-The Arctic University of Norway, Tromsø, Norway
| | - Felix Grewe
- Center for Plant Science Innovation, University of Nebraska - Lincoln Department of Agronomy and Horticulture, University of Nebraska - Lincoln
| | - Gaven F Nelson
- Center for Plant Science Innovation, University of Nebraska - Lincoln School of Biological Sciences, University of Nebraska - Lincoln
| | - Andreas P M Weber
- Institute of Plant Biochemistry, Cluster of Excellence on Plant Science, Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany
| | | | - Jeffrey P Mower
- Center for Plant Science Innovation, University of Nebraska - Lincoln Department of Agronomy and Horticulture, University of Nebraska - Lincoln
| |
Collapse
|
21
|
Overlapping genes: a new strategy of thermophilic stress tolerance in prokaryotes. Extremophiles 2014; 19:345-53. [PMID: 25503326 DOI: 10.1007/s00792-014-0720-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 12/01/2014] [Indexed: 12/29/2022]
Abstract
Overlapping genes (OGs) draw the focus of recent day's research. However, the significance of OGs in prokaryotic genomes remained unexplored. As an adaptation to high temperature, thermophiles were shown to eliminate their intergenic regions. Therefore, it could be possible that prokaryotes would increase their OG content to adapt to high temperature. To test this hypothesis, we carried out a comparative study on OG frequency of 256 prokaryotic genomes comprising both thermophiles and non-thermophiles. It was found that thermophiles exhibit higher frequency of overlapping genes than non-thermophiles. Moreover, overlap frequency was found to correlate with optimal growth temperature (OGT) in prokaryotes. Long overlap frequency was found to hold a positive correlation with OGT resulting in an abundance of long overlaps in thermophiles compared to non-thermophiles. On the other hand, short overlap (1-4 nucleotides) frequency (SOF) did not yield any direct correlation with OGT. However, the correlation of SOF with CAIavg (extent of variation of codon usage bias measured as the mean of codon adaptation index of all genes in a given genome) and IG% (proportion of intergenic regions) indicate that they might upregulate the aforementioned factors (CAIavg and IG%) which are already known to be vital forces for thermophilic adaptation. From these evidences, we propose that the OG content bears a strong link to thermophily. Long overlaps are important for their genome compaction and short overlaps are important to uphold high CAIavg. Our findings will surely help in better understanding of the significance of overlapping gene content in prokaryotic genomes.
Collapse
|
22
|
Goncearenco A, Berezovsky IN. The fundamental tradeoff in genomes and proteomes of prokaryotes established by the genetic code, codon entropy, and physics of nucleic acids and proteins. Biol Direct 2014; 9:29. [PMID: 25496919 PMCID: PMC4273451 DOI: 10.1186/s13062-014-0029-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2014] [Accepted: 12/01/2014] [Indexed: 11/26/2022] Open
Abstract
Background Mutations in nucleotide sequences provide a foundation for genetic variability, and selection is the driving force of the evolution and molecular adaptation. Despite considerable progress in the understanding of selective forces and their compositional determinants, the very nature of underlying mutational biases remains unclear. Results We explore here a fundamental tradeoff, which analytically describes mutual adjustment of the nucleotide and amino acid compositions and its possible effect on the mutational biases. The tradeoff is determined by the interplay between the genetic code, optimization of the codon entropy, and demands on the structure and stability of nucleic acids and proteins. Conclusion The tradeoff is the unifying property of all prokaryotes regardless of the differences in their phylogenies, life styles, and extreme environments. It underlies mutational biases characteristic for genomes with different nucleotide and amino acid compositions, providing foundation for evolution and adaptation. Reviewers This article was reviewed by Eugene Koonin, Michael Gromiha, and Alexander Schleiffer. Electronic supplementary material The online version of this article (doi:10.1186/s13062-014-0029-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alexander Goncearenco
- Computational Biology Unit and Department of Informatics, University of Bergen, N-5008, Bergen, Norway. .,Current address: Computational Biology Branch of the National Center for Biotechnology Information in Bethesda, Maryland, USA.
| | - Igor N Berezovsky
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671, Singapore. .,Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117597, Singapore, Singapore.
| |
Collapse
|
23
|
Saha SK, Goswami A, Dutta C. Association of purine asymmetry, strand-biased gene distribution and PolC within Firmicutes and beyond: a new appraisal. BMC Genomics 2014; 15:430. [PMID: 24899249 PMCID: PMC4070872 DOI: 10.1186/1471-2164-15-430] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2013] [Accepted: 05/08/2014] [Indexed: 11/10/2022] Open
Abstract
Background The Firmicutes often possess three conspicuous genome features: marked Purine Asymmetry (PAS) across two strands of replication, Strand-biased Gene Distribution (SGD) and presence of two isoforms of DNA polymerase III alpha subunit, PolC and DnaE. Despite considerable research efforts, it is not clear whether the co-existence of PAS, PolC and/or SGD is an essential and exclusive characteristic of the Firmicutes. The nature of correlations, if any, between these three features within and beyond the lineages of Firmicutes has also remained elusive. The present study has been designed to address these issues. Results A large-scale analysis of diverse bacterial genomes indicates that PAS, PolC and SGD are neither essential nor exclusive features of the Firmicutes. PolC prevails in four bacterial phyla: Firmicutes, Fusobacteria, Tenericutes and Thermotogae, while PAS occurs only in subsets of Firmicutes, Fusobacteria and Tenericutes. There are five major compositional trends in Firmicutes: (I) an explicit PAS or G + A-dominance along the entire leading strand (II) only G-dominance in the leading strand, (III) alternate stretches of purine-rich and pyrimidine-rich sequences, (IV) G + T dominance along the leading strand, and (V) no identifiable patterns in base usage. Presence of strong SGD has been observed not only in genomes having PAS, but also in genomes with G-dominance along their leading strands – an observation that defies the notion of co-occurrence of PAS and SGD in Firmicutes. The PolC-containing non-Firmicutes organisms often have alternate stretches of R-dominant and Y-dominant sequences along their genomes and most of them show relatively weak, but significant SGD. Firmicutes having G + A-dominance or G-dominance along LeS usually show distinct base usage patterns in three codon sites of genes. Probable molecular mechanisms that might have incurred such usage patterns have been proposed. Conclusion Co-occurrence of PAS, strong SGD and PolC should not be regarded as a genome signature of the Firmicutes. Presence of PAS in a species may warrant PolC and strong SGD, but PolC and/or SGD not necessarily implies PAS. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-430) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | - Chitra Dutta
- Structural Biology & Bioinformatics Division, CSIR- Indian Institute of Chemical Biology, 4, Raja S, C, Mullick Road, Kolkata 700032, India.
| |
Collapse
|
24
|
Characterization of a heat-active archaeal β-glucosidase from a hydrothermal spring metagenome. Enzyme Microb Technol 2014; 57:48-54. [DOI: 10.1016/j.enzmictec.2014.01.010] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2013] [Revised: 01/14/2014] [Accepted: 01/27/2014] [Indexed: 11/20/2022]
|
25
|
Goncearenco A, Ma BG, Berezovsky IN. Molecular mechanisms of adaptation emerging from the physics and evolution of nucleic acids and proteins. Nucleic Acids Res 2013; 42:2879-92. [PMID: 24371267 PMCID: PMC3950714 DOI: 10.1093/nar/gkt1336] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
DNA, RNA and proteins are major biological macromolecules that coevolve and adapt to environments as components of one highly interconnected system. We explore here sequence/structure determinants of mechanisms of adaptation of these molecules, links between them, and results of their mutual evolution. We complemented statistical analysis of genomic and proteomic sequences with folding simulations of RNA molecules, unraveling causal relations between compositional and sequence biases reflecting molecular adaptation on DNA, RNA and protein levels. We found many compositional peculiarities related to environmental adaptation and the life style. Specifically, thermal adaptation of protein-coding sequences in Archaea is characterized by a stronger codon bias than in Bacteria. Guanine and cytosine load in the third codon position is important for supporting the aerobic life style, and it is highly pronounced in Bacteria. The third codon position also provides a tradeoff between arginine and lysine, which are favorable for thermal adaptation and aerobicity, respectively. Dinucleotide composition provides stability of nucleic acids via strong base-stacking in ApG dinucleotides. In relation to coevolution of nucleic acids and proteins, thermostability-related demands on the amino acid composition affect the nucleotide content in the second codon position in Archaea.
Collapse
Affiliation(s)
- Alexander Goncearenco
- CBU, University of Bergen, 5020 Bergen, Norway, Department of Informatics, University of Bergen, 5020 Bergen, Norway, Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671 Singapore and Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, 76100, Israel
| | | | | |
Collapse
|
26
|
Prabha R, Singh DP, Gupta SK, de Farias ST, Rai A. Comparative analysis to identify determinants of changing life style in Thermosynechococcus elongatus BP-1, a thermophilic cyanobacterium. Bioinformation 2013; 9:299-308. [PMID: 23559749 PMCID: PMC3607189 DOI: 10.6026/97320630009299] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2012] [Accepted: 12/22/2012] [Indexed: 11/23/2022] Open
Abstract
A comparative genomics analysis among all forty whole genome sequences available for cyanobacteria (3 thermophiles- Thermosynechococcus elongatus BP-1, Synechococcus sp. JA-2-3B'a (2-13), Synechococcus sp. JA-3-3Ab and 37 mesophiles) was performed to identify genomic and proteomic factors responsible for the behaviour of T. elongatus BP-1, a thermophilic unicellular cyanobacterium with optimum growth temperature [OGT] of 55°C. Majority of genomic and proteomic characteristics for this cyanobacterium indicated contrasting features indicating its mesophilic behaviour while the role of mutational biasness and selection pressure is thought to be responsible for high OGT. Contradictory results were obtained for T. elongatus for synonymous codon usage, CvP-bias and amino acid composition with respect to thermophilic behaviour. Calculated J2 index is lowest among all cyanobacterial genomes. Except for proline and termination codons, T. elongatus showed synonymous codon usage pattern which is expected for mesophiles. Results indicated that among cyanobacterial genomes, majority of genomic and proteomic determinants put T. elongatus very close to mesophiles and the whole genome of this organism represents continuous gain of mesophilic rather than thermophilic behavior.
Collapse
Affiliation(s)
- Ratna Prabha
- National Bureau of Agriculturally Important Microorganisms, Indian Council of Agricultural Research, Kushmaur, Maunath Bhanjan 275101, India
- Department of Biotechnology, Mewar University, Gangrar, Chittorgarh, Rajasthan, India
| | - Dhananjaya P Singh
- National Bureau of Agriculturally Important Microorganisms, Indian Council of Agricultural Research, Kushmaur, Maunath Bhanjan 275101, India
| | - Shailendra K Gupta
- CSIR-Indian Institute of Toxicology Research, Mahatma Gandhi Marg, Kaisarbagh, Lucknow 226001, India
| | | | - Anil Rai
- Indian Agricultural Statistical Research Institute, Indian Council of Agricultural Research, Pusa, New Delhi 110 012, India
| |
Collapse
|
27
|
Arakawa K, Tomita M. Measures of compositional strand bias related to replication machinery and its applications. Curr Genomics 2012; 13:4-15. [PMID: 22942671 PMCID: PMC3269016 DOI: 10.2174/138920212799034749] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2011] [Revised: 09/10/2011] [Accepted: 09/20/2011] [Indexed: 11/22/2022] Open
Abstract
The compositional asymmetry of complementary bases in nucleotide sequences implies the existence of a mutational or selectional bias in the two strands of the DNA duplex, which is commonly shaped by strand-specific mechanisms in transcription or replication. Such strand bias in genomes, frequently visualized by GC skew graphs, is used for the computational prediction of transcription start sites and replication origins, as well as for comparative evolutionary genomics studies. The use of measures of compositional strand bias in order to quantify the degree of strand asymmetry is crucial, as it is the basis for determining the applicability of compositional analysis and comparing the strength of the mutational bias in different biological machineries in various species. Here, we review the measures of strand bias that have been proposed to date, including the ∆GC skew, the B1 index, the predictability score of linear discriminant analysis for gene orientation, the signal-to-noise ratio of the oligonucleotide bias, and the GC skew index. These measures have been predominantly designed for and applied to the analysis of replication-related mutational processes in prokaryotes, but we also give research examples in eukaryotes.
Collapse
Affiliation(s)
- Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Fujisawa 252-8520, Japan
| | | |
Collapse
|
28
|
Khrustalev VV, Barkovsky EV. A blueprint for a mutationist theory of replicative strand asymmetries formation. Curr Genomics 2012; 13:55-64. [PMID: 22942675 PMCID: PMC3269017 DOI: 10.2174/138920212799034730] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Revised: 09/15/2011] [Accepted: 09/29/2011] [Indexed: 11/26/2022] Open
Abstract
In the present review, we summarized current knowledge on replicative strand asymmetries in prokaryotic genomes. A cornerstone for the creation of a theory of their formation has been overviewed. According to our recent works, the probability of nonsense mutation caused by replication-associated mutational pressure is higher for genes from lagging strands than for genes from leading strands of both bacterial and archaeal genomes. Lower density of open reading frames in lagging strands can be explained by faster rates of nonsense mutations in genes situated on them. According to the asymmetries in nucleotide usage in fourfold and twofold degenerate sites, the direction of replication-associated mutational pressure for genes from lagging strands is usually the same as the direction of transcription-associated mutational pressure. It means that lagging strands should accumulate more 8-oxo-G, uracil and 5-formyl-uracil, respectively. In our opinion, consequences of cytosine deamination (C to T transitions) do not lead to the decrease of cytosine usage in genes from lagging strands because of the consequences of thymine oxidation (T to C transitions), while guanine oxidation (causing G to T transversions) makes the main contribution into the decrease of guanine usage in fourfold degenerate sites of genes from lagging strands. Nucleotide usage asymmetries and bias in density of coding regions can be found in archaeal genomes, although, the percent of "inversed" asymmetries is much higher for them than for bacterial genomes. "Homogenized" and "inversed" replicative strand asymmetries in archaeal genomes can be used as retrospective indexes for detection of OriC translocations and large inversions.
Collapse
Affiliation(s)
- Vladislav V Khrustalev
- Department of General Chemistry, Belarussian State Medical University, Belarus, Minsk, Dzerzinskogo, 83, Russia
| | | |
Collapse
|
29
|
Gao J, Wang W. Analysis of structural requirements for thermo-adaptation from orthologs in microbial genomes. ANN MICROBIOL 2012. [DOI: 10.1007/s13213-012-0420-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022] Open
|
30
|
Mahale KN, Kempraj V, Dasgupta D. Does the growth temperature of a prokaryote influence the purine content of its mRNAs? Gene 2012; 497:83-9. [PMID: 22305982 DOI: 10.1016/j.gene.2012.01.040] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2011] [Accepted: 01/19/2012] [Indexed: 11/20/2022]
Abstract
The formation and breaking of hydrogen bonds between nucleic acid bases are dependent on temperature. The high G+C content of organisms was surmised to be an adaptation for high temperature survival because of the thermal stability of G:C pairs. However, a survey of genomic GC% and optimum growth temperature (OGT) of several prokaryotes revoked any direct relation between them. Significantly high purine (R=A or G) content in mRNAs is also seen as a selective response for survival among thermophiles. Nevertheless, the biological relevance of thermophiles loading their unstable mRNAs with excess purines (purine-loading or R-loading) is not persuasive. Here, we analysed the mRNA sequences from the genomes of 168 prokaryotes (as obtained from NCBI Genome database) with their OGTs ranging from -5 °C to 100 °C to verify the relation between R-loading and OGT. Our analysis fails to demonstrate any correlation between R-loading of the mRNA pool and OGT of a prokaryote. The percentage of purine-loaded mRNAs in prokaryotes is found to be in a rough negative correlation with the genomic GC% (r(2)=0.655, slope=-1.478, P<000.1). We conclude that genomic GC% and bias against certain combinations of nucleotides drive the mRNA-synonymous (sense) strands of DNA towards variations in R-loading.
Collapse
|
31
|
Satapathy SS, Dutta M, Ray SK. Higher tRNA diversity in thermophilic bacteria: A possible adaptation to growth at high temperature. Microbiol Res 2010; 165:609-16. [DOI: 10.1016/j.micres.2009.12.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2009] [Revised: 10/31/2009] [Accepted: 12/20/2009] [Indexed: 10/19/2022]
|
32
|
Qu H, Wu H, Zhang T, Zhang Z, Hu S, Yu J. Nucleotide compositional asymmetry between the leading and lagging strands of eubacterial genomes. Res Microbiol 2010; 161:838-46. [PMID: 20868744 DOI: 10.1016/j.resmic.2010.09.015] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2010] [Accepted: 08/03/2010] [Indexed: 11/15/2022]
Abstract
Nucleotide compositional asymmetry (NCA) between leading and lagging strands (LeS and LaS) is dynamic and diverse among eubacterial genomes due to different mutation and selection forces. A thorough investigation is needed in order to study the relationship between nucleotide composition dynamics and gene distribution biases. Based on a collection of 364 eubacterial genomes that were grouped according to a DnaE-based scheme (DnaE1-DnaE1, DnaE2-DnaE1, and DnaE3-PolC), we investigated NCA and nucleotide composition gradients at three codon positions and found that there was universal G-enrichment on LeS among all groups. This was due to a strong selection for G-heading (codon position1 or cp1) codons and mutation pressure that led to more G-ending (cp3) codons. Moreover, a slight T-enrichment of LeS due to the mutation of cytosine deamination at cp3 was universal among DnaE1-DnaE1 and DnaE2-DnaE1 genomes, but was not clearly seen among DnaE3-PolC genomes, in which A-enrichment of LeS was proposed to be the effect of selections unique to polC and a mutation bias toward A-richness at cp1 that may be a result of transcription-coupled DNA repair mechanisms. Furthermore, strand-biased gene distribution enhances the purine-richness of LeS for DnaE3-PolC genomes and T-richness of LeS for DnaE1-DnaE1 and DnaE2-dnaE1 genomes.
Collapse
Affiliation(s)
- Hongzhu Qu
- Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China.
| | | | | | | | | | | |
Collapse
|
33
|
Chattopadhyay S, Sahoo S, Kanner WA, Chakrabarti J. Pressures in archaeal protein coding genes: a comparative study. Comp Funct Genomics 2010; 4:56-65. [PMID: 18629113 PMCID: PMC2447400 DOI: 10.1002/cfg.246] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2002] [Accepted: 11/25/2002] [Indexed: 11/06/2022] Open
Abstract
Our studies on the bases of codons from 11 completely sequenced archaeal genomes show that, as we move from GC-rich to AT-rich protein-coding gene-containing species, the differences between G and C and between A and T, the purine load (AG content), and also the overall persistence (i.e. the tendency of a base to be followed by the same base) within codons, all increase almost simultaneously, although the extent of increase is different over the three positions within codons. These findings suggest that the deviations from the second parity rule (through the increasing differences between complementary base contents) and the increasing purine load hinder the chance of formation of the intra-strand Watson-Crick base-paired secondary structures in mRNAs (synonymous with the protein-coding genes we dealt with), thereby increasing the translational efficiency. We hypothesize that the ATrich protein-coding gene-containing archaeal species might have better translational efficiency than their GC-rich counterparts.
Collapse
Affiliation(s)
- Sujay Chattopadhyay
- Department of Theoretical Physics, Indian Association for the Cultivation of Science, Jadavpur, Calcutta 700 032, India.
| | | | | | | |
Collapse
|
34
|
Tatarinova TV, Alexandrov NN, Bouck JB, Feldmann KA. GC3 biology in corn, rice, sorghum and other grasses. BMC Genomics 2010; 11:308. [PMID: 20470436 PMCID: PMC2895627 DOI: 10.1186/1471-2164-11-308] [Citation(s) in RCA: 105] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2009] [Accepted: 05/16/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The third, or wobble, position in a codon provides a high degree of possible degeneracy and is an elegant fault-tolerance mechanism. Nucleotide biases between organisms at the wobble position have been documented and correlated with the abundances of the complementary tRNAs. We and others have noticed a bias for cytosine and guanine at the third position in a subset of transcripts within a single organism. The bias is present in some plant species and warm-blooded vertebrates but not in all plants, or in invertebrates or cold-blooded vertebrates. RESULTS Here we demonstrate that in certain organisms the amount of GC at the wobble position (GC3) can be used to distinguish two classes of genes. We highlight the following features of genes with high GC3 content: they (1) provide more targets for methylation, (2) exhibit more variable expression, (3) more frequently possess upstream TATA boxes, (4) are predominant in certain classes of genes (e.g., stress responsive genes) and (5) have a GC3 content that increases from 5'to 3'. These observations led us to formulate a hypothesis to explain GC3 bimodality in grasses. CONCLUSIONS Our findings suggest that high levels of GC3 typify a class of genes whose expression is regulated through DNA methylation or are a legacy of accelerated evolution through gene conversion. We discuss the three most probable explanations for GC3 bimodality: biased gene conversion, transcriptional and translational advantage and gene methylation.
Collapse
Affiliation(s)
- Tatiana V Tatarinova
- Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA.
| | | | | | | |
Collapse
|
35
|
Lee S, Weon S, Lee S, Kang C. Relative codon adaptation index, a sensitive measure of codon usage bias. Evol Bioinform Online 2010; 6:47-55. [PMID: 20535230 PMCID: PMC2880845 DOI: 10.4137/ebo.s4608] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
We propose a simple, sensitive measure of synonymous codon usage bias, the Relative Codon Adaptation Index (rCAI), as a way to discriminate better between highly biased and unbiased regions, compared with the widely used Codon Adaptation Index (CAI). CAI is a geometric mean of the relative usage of codons in a gene, and is calculated using the codon usage table trained with a set of highly expressed genes. In contrast, rCAI is computed by subtracting the background codon usage trained with two noncoding frames of highly expressed genes from the codon usage in the coding frame. rCAI has higher signal-to-noise ratio than CAI, considering that noncoding frames would not show codon bias. Translation efficiency and protein abundance correlates comparably or better with rCAI than CAI or other measures such as ‘effective number of codons’ and ‘SCUMBLE offsets’. Within overlapping coding regions, one of the two coding frames dominates in codon usage bias according to rCAI. Presumably, rCAI could substitute CAI in diverse applications.
Collapse
Affiliation(s)
- Soohyun Lee
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology, 335 Gwahangno, Yuseong-gu, Daejeon 305-701, Korea
| | | | | | | |
Collapse
|
36
|
Kennedy R, Lladser ME, Wu Z, Zhang C, Yarus M, De Sterck H, Knight R. Natural and artificial RNAs occupy the same restricted region of sequence space. RNA (NEW YORK, N.Y.) 2010; 16:280-9. [PMID: 20032164 PMCID: PMC2811657 DOI: 10.1261/rna.1923210] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
Different chemical and mutational processes within genomes give rise to sequences with different compositions and perhaps different capacities for evolution. The evolution of functional RNAs may occur on a "neutral network" in which sequences with any given function can easily mutate to sequences with any other. This neutral network hypothesis is more likely if there is a particular region of composition that contains sequences that are functional in general, and if many different functions are possible within this preferred region of composition. We show that sequence preferences in active sites recovered by in vitro selection combine with biophysical folding rules to support the neutral network hypothesis. These simple active-site specifications and folding preferences obtained by artificial selection experiments recapture the previously observed purine bias and specific spread along the GC axis of naturally occurring aptamers and ribozymes isolated from organisms, although other types of RNAs, such as miRNA precursors and spliceosomal RNAs, that act primarily through complementarity to other amino acids do not share these preferences. These universal evolved sequence features are therefore intrinsic in RNA molecules that bind small-molecule targets or catalyze reactions.
Collapse
MESH Headings
- Aptamers, Nucleotide/chemistry
- Aptamers, Nucleotide/genetics
- Aptamers, Nucleotide/metabolism
- Base Composition
- Base Sequence
- Binding Sites/genetics
- Biophysical Phenomena
- Computational Biology
- Models, Genetic
- Models, Molecular
- Models, Statistical
- Mutation
- Nucleic Acid Conformation
- Poisson Distribution
- RNA/chemistry
- RNA/genetics
- RNA/metabolism
- RNA, Catalytic/chemistry
- RNA, Catalytic/genetics
- RNA, Catalytic/metabolism
- SELEX Aptamer Technique
- Selection, Genetic
Collapse
Affiliation(s)
- Ryan Kennedy
- Department of Computer Science, University of Colorado, Boulder, Colorado 80309, USA
| | | | | | | | | | | | | |
Collapse
|
37
|
Poptsova MS, Larionov SA, Ryadchenko EV, Rybalko SD, Zakharov IA, Loskutov A. Hidden chromosome symmetry: in silico transformation reveals symmetry in 2D DNA walk trajectories of 671 chromosomes. PLoS One 2009; 4:e6396. [PMID: 19636424 PMCID: PMC2712679 DOI: 10.1371/journal.pone.0006396] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2009] [Accepted: 06/23/2009] [Indexed: 11/18/2022] Open
Abstract
Maps of 2D DNA walk of 671 examined chromosomes show composition complexity change from symmetrical half-turn in bacteria to pseudo-random trajectories in archaea, fungi and humans. In silico transformation of gene order and strand position returns most of the analyzed chromosomes to a symmetrical bacterial-like state with one transition point. The transformed chromosomal sequences also reveal remarkable segmental compositional symmetry between regions from different strands located equidistantly from the transition point. Despite extensive chromosome rearrangement the relation of gene numbers on opposite strands for chromosomes of different taxa varies in narrow limits around unity with Pearson coefficient r = 0.98. Similar relation is observed for total genes' length (r = 0.86) and cumulative GC (r = 0.95) and AT (r = 0.97) skews. This is also true for human coding sequences (CDS), which comprise only several percent of the entire chromosome length. We found that frequency distributions of the length of gene clusters, continuously located on the same strand, have close values for both strands. Eukaryotic gene distribution is believed to be non-random. Contribution of different subsystems to the noted symmetries and distributions, and evolutionary aspects of symmetry are discussed.
Collapse
Affiliation(s)
- Maria S Poptsova
- University of Connecticut, Storrs, Connecticut, United States of America.
| | | | | | | | | | | |
Collapse
|
38
|
Classification and regression tree (CART) analyses of genomic signatures reveal sets of tetramers that discriminate temperature optima of archaea and bacteria. ARCHAEA-AN INTERNATIONAL MICROBIOLOGICAL JOURNAL 2009; 2:159-67. [PMID: 19054742 DOI: 10.1155/2008/829730] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Classification and regression tree (CART) analysis was applied to genome-wide tetranucleotide frequencies (genomic signatures) of 195 archaea and bacteria. Although genomic signatures have typically been used to classify evolutionary divergence, in this study, convergent evolution was the focus. Temperature optima for most of the organisms examined could be distinguished by CART analyses of tetranucleotide frequencies. This suggests that pervasive (nonlinear) qualities of genomes may reflect certain environmental conditions (such as temperature) in which those genomes evolved. The predominant use of GAGA and AGGA as the discriminating tetramers in CART models suggests that purine-loading and codon biases of thermophiles may explain some of the results.
Collapse
|
39
|
Wang J, Ma BG, Zhang HY, Chen LL, Zhang SC. How does gene expression level contribute to thermophilic adaptation of prokaryotes? An exploration based on predictors. Gene 2008; 421:32-6. [PMID: 18621118 DOI: 10.1016/j.gene.2008.06.020] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2008] [Revised: 06/16/2008] [Accepted: 06/17/2008] [Indexed: 11/17/2022]
Abstract
By analyzing the predicted gene expression levels of 33 prokaryotes with living temperature span from <10 degrees C to >100 degrees C, a universal positive correlation was found between the percentage of predicted highly expressed genes and the organisms' optimal growth temperature. A physical interpretation of the correlation revealed that highly expressed genes are statistically more thermostable than lowly expressed genes. These findings show the possibility of the significant contribution of gene expression level to the prokaryotic thermal adaptation and provide evidence for the translational selection pressure on the thermostability of natural proteins during evolution.
Collapse
Affiliation(s)
- Ji Wang
- Department of Marine Biology, Ocean University of China, Qingdao 266003, P. R. China
| | | | | | | | | |
Collapse
|
40
|
Moura G, Pinheiro M, Arrais J, Gomes AC, Carreto L, Freitas A, Oliveira JL, Santos MAS. Large scale comparative codon-pair context analysis unveils general rules that fine-tune evolution of mRNA primary structure. PLoS One 2007; 2:e847. [PMID: 17786218 PMCID: PMC1952141 DOI: 10.1371/journal.pone.0000847] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2007] [Accepted: 07/31/2007] [Indexed: 11/18/2022] Open
Abstract
Background Codon usage and codon-pair context are important gene primary structure features that influence mRNA decoding fidelity. In order to identify general rules that shape codon-pair context and minimize mRNA decoding error, we have carried out a large scale comparative codon-pair context analysis of 119 fully sequenced genomes. Methodologies/Principal Findings We have developed mathematical and software tools for large scale comparative codon-pair context analysis. These methodologies unveiled general and species specific codon-pair context rules that govern evolution of mRNAs in the 3 domains of life. We show that evolution of bacterial and archeal mRNA primary structure is mainly dependent on constraints imposed by the translational machinery, while in eukaryotes DNA methylation and tri-nucleotide repeats impose strong biases on codon-pair context. Conclusions The data highlight fundamental differences between prokaryotic and eukaryotic mRNA decoding rules, which are partially independent of codon usage.
Collapse
Affiliation(s)
- Gabriela Moura
- Department of Biology, Center for Environmental and Marine Studies, University of Aveiro, Aveiro, Portugal
| | - Miguel Pinheiro
- Institute of Electronics and Telematics Engineering, University of Aveiro, Aveiro, Portugal
| | - Joel Arrais
- Institute of Electronics and Telematics Engineering, University of Aveiro, Aveiro, Portugal
| | - Ana Cristina Gomes
- Department of Biology, Center for Environmental and Marine Studies, University of Aveiro, Aveiro, Portugal
| | - Laura Carreto
- Department of Biology, Center for Environmental and Marine Studies, University of Aveiro, Aveiro, Portugal
| | - Adelaide Freitas
- Department of Mathematics, University of Aveiro, Aveiro, Portugal
| | - José L. Oliveira
- Institute of Electronics and Telematics Engineering, University of Aveiro, Aveiro, Portugal
| | - Manuel A. S. Santos
- Department of Biology, Center for Environmental and Marine Studies, University of Aveiro, Aveiro, Portugal
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
41
|
Affiliation(s)
- Claire Torchet
- Institut Jacques-Monod, Biochimie de l'Evolution et Adaptabilité Moléculaire, Université Paris VI, Tour 43, 2 place Jussieu, 75251 Paris Cedex 05, France
| | | |
Collapse
|
42
|
Hu J, Zhao X, Yu J. Replication-associated purine asymmetry may contribute to strand-biased gene distribution. Genomics 2007; 90:186-94. [PMID: 17532183 DOI: 10.1016/j.ygeno.2007.04.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2006] [Revised: 03/09/2007] [Accepted: 04/02/2007] [Indexed: 11/19/2022]
Abstract
Among prokaryotic genomes, the distribution of genes on the leading and lagging strands of the replication fork is known to be biased. Several hypotheses explaining this strand-biased gene distribution (SGD) have been proposed, but none have been tested or supported by sufficient data analyses. In this work we have analyzed 211 prokaryotic genomes in terms of compositional strand asymmetries and the presence or absence of polC and have found that SGD correlates not only with polC, but also with purine asymmetry (PAS). Furthermore, SGD, PAS, and polC are all features associated with a group of low-GC, gram-positive bacteria (Firmicutes). We conclude that PAS is a characteristic of organisms with a heterodimeric DNA polymerase III alpha-subunit constituted by polC and dnaE, which may play a direct role in the maintenance of SGD.
Collapse
Affiliation(s)
- Jianfei Hu
- College of Life Sciences, Peking University, Beijing 100871, China.
| | | | | |
Collapse
|
43
|
Mydland LT, Frøyland JRK, Skrede A. Composition of individual nucleobases in diets containing different products from bacterial biomass grown on natural gas, and digestibility in mink (Mustela vison). J Anim Physiol Anim Nutr (Berl) 2007; 92:1-8. [DOI: 10.1111/j.1439-0396.2007.00674.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
44
|
Paz A, Mester D, Nevo E, Korol A. Looking for organization patterns of highly expressed genes: purine-pyrimidine composition of precursor mRNAs. J Mol Evol 2007; 64:248-60. [PMID: 17211550 DOI: 10.1007/s00239-006-0135-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2006] [Accepted: 11/19/2006] [Indexed: 01/05/2023]
Abstract
We analyzed precursor messenger RNAs (pre-mRNAs) of 12 eukaryotic species. In each species, three groups of highly expressed genes, ribosomal proteins, heat shock proteins, and amino-acyl tRNA synthetases, were compared with a control group (randomly selected genes). The purine-pyrimidine (R-Y) composition of pre-mRNAs of the three targeted gene groups proved to differ significantly from the control. The exons of the three groups tested have higher purine contents and R-tract abundance and lower abundance of Y-tracts compared to the control (R-tract-tract of sequential purines with Rn>or=5; Y-tract-tract of sequential pyrimidines with Yn>or=5). In species widely employing "intron definition" in the splicing process, the Y content of introns of the three targeted groups appeared to be higher compared to the control group. Furthermore, in all examined species, the introns of the targeted genes have a lower abundance of R-tracts compared to the control. We hypothesized that the R-Y composition of the targeted gene groups contributes to high rate and efficiency of both splicing and translation, in addition to the mRNA coding role. This is presumably achieved by (1) reducing the possibility of the formation of secondary structures in the mRNA, (2) using the R-tracts and R-biased sequences as exonic splicing enhancers, (3) lowering the amount of targets for pyrimidine tract binding protein in the exons, and (4) reducing the amount of target sequences for binding of serine/arginine-rich (SR) proteins in the introns, thereby allowing SR proteins to bind to proper (exonic) targets.
Collapse
Affiliation(s)
- A Paz
- Institute of Evolution, Haifa University, Mount Carmel, Haifa, 31905, Israel
| | | | | | | |
Collapse
|
45
|
Lin FH, Forsdyke DR. Prokaryotes that grow optimally in acid have purine-poor codons in long open reading frames. Extremophiles 2006; 11:9-18. [PMID: 16957882 DOI: 10.1007/s00792-006-0005-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2006] [Accepted: 03/29/2006] [Indexed: 10/24/2022]
Abstract
In nucleic acids the N-glycosyl bonds between purines and their ribose sugar moities are broken under acid conditions. If one strand of a duplex DNA segment were more vulnerable to mutation than the other, then the archaeon Picrophilus torridus, with an optimum growth pH near zero, could have adapted by decreasing the purine content of that strand. Yet, P. torridus has an optimum growth temperature near 60 degrees C, and thermophiles prefer purine-rich codons. We found that, as in other thermophiles, high growth temperature correlates with the use of purine-rich codons. The extra purines are often in third, non-amino acid determining, codon positions. However, as in other acidophiles, as open reading frame lengths increase, there is increased use of purine-poor codons, particularly those without purines in second, amino acid-determining, codon positions. Thus, P. torridus can be seen as adapting (a) to temperature by increasing its purines in all open reading frames without greatly impacting protein amino acid compositions, and (b) to pH by decreasing purines in longer open reading frames, thereby potentially impacting protein amino acid compositions. It is proposed that longer open reading frames, being larger mutational targets, have become less vulnerable to depurination by virtue of pyrimidine for purine substitutions.
Collapse
Affiliation(s)
- Feng-Hsu Lin
- Department of Biochemistry, Queen's University, K7L3N6, Kingston, ON, Canada
| | | |
Collapse
|
46
|
Synonymous codon usage and its potential link with optimal growth temperature in prokaryotes. Gene 2006; 385:128-36. [PMID: 16989961 DOI: 10.1016/j.gene.2006.05.033] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2006] [Accepted: 05/29/2006] [Indexed: 12/01/2022]
Abstract
The relationship between codon usage in prokaryotes and their ability to grow at extreme temperatures has been given much attention over the past years. Previous studies have suggested that the difference in synonymous codon usage between (hyper)thermophiles and mesophiles is a consequence of a selective pressure linked to growth temperature. Here, we performed an updated analysis of the variation in synonymous codon usage with growth temperature; our study includes a large number of species from a wide taxonomic and growth temperature range. The presence of psychrophilic species in our study allowed us to test whether the same selective pressure acts on synonymous codon usage at very low growth temperature. Our results show that the synonymous codon usage for Arg (through the AGG, AGA and CGT codons) is the most discriminating factor between (hyper)thermophilic and non-thermophilic species, thus confirming previous studies. We report the unusual clustering of an Archaeal psychrophile with the thermophilic and hyperthermophilic species on the synonymous codon usage factorial map; the other psychrophiles in our study cluster with the mesophilic species. Our conclusion is that the difference in synonymous codon usage between (hyper)thermophilic and non-thermophilic species cannot be clearly attributed to a selective pressure linked to growth at high temperatures.
Collapse
|
47
|
Forsdyke DR. Conflict Resolution. Evol Bioinform Online 2006. [DOI: 10.1007/978-0-387-33419-6_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
|
48
|
Smit S, Yarus M, Knight R. Natural selection is not required to explain universal compositional patterns in rRNA secondary structure categories. RNA (NEW YORK, N.Y.) 2006; 12:1-14. [PMID: 16373489 PMCID: PMC1370880 DOI: 10.1261/rna.2183806] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
We have encountered an unexpected property of rRNA secondary structures that may generalize to all RNAs. Analysis of 8892 ribosomal RNA sequences and structures from a wide range of species revealed unexpected universal compositional trends. First, different categories of rRNA secondary structure (stems, loops, bulges, and junctions) have distinct, characteristic base compositions. Second, the observed patterns of variation are similar among sequences from large and small rRNA subunits and all domains of life, despite extensive evolutionary divergence. Surprisingly, these differences do not seem to be related to selection for different compositions in different structural categories, but rather relate to the overall composition of the molecule: Randomized RNAs with no evolutionary history show the same structure-dependent compositional biases as rRNAs. These compositional trends may improve the accuracy of RNA secondary structure prediction, because they allow us to compare predicted structures against known compositional preferences. They also suggest caution in interpreting differences in the rate of change of the GC content in different parts of the molecule as evidence of differential selection.
Collapse
Affiliation(s)
- Sandra Smit
- Department of Chemistry and Biochemistry, Campus Box 215, University of Colorado at Boulder, Boulder, CO 80309, USA
| | | | | |
Collapse
|
49
|
Forsdyke DR. Chargaff’s Cluster Rule. Evol Bioinform Online 2006. [DOI: 10.1007/978-0-387-33419-6_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
50
|
Rayment JH, Forsdyke DR. Amino acids as placeholders: base-composition pressures on protein length in malaria parasites and prokaryotes. ACTA ACUST UNITED AC 2005; 4:117-30. [PMID: 16128613 DOI: 10.2165/00822942-200504020-00005] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
BACKGROUND The composition and sequence of amino acids in a protein may serve the underlying needs of the nucleic acids that encode the protein (the genome phenotype). In extreme form, amino acids become mere placeholders inserted between functional segments or domains, and--apart from increasing protein length--playing no role in the specific function or structure of a protein (the conventional phenotype). METHODS We studied the genomes of two malarial parasites and 521 prokaryotes (144 complete) that differ widely in GC% and optimum growth temperature, comparing the base compositions of the protein coding regions and corresponding lengths (kilobases). RESULTS Malarial parasites show distinctive responses to base-compositional pressures that increase as protein lengths increase. A low-GC% species (Plasmodium falciparum) is likely to have more placeholder amino acids than an intermediate-GC% species (P. vivax), so that homologous proteins are longer. In prokaryotes, GC% is generally greater and AG% is generally less in open reading frames (ORFs) encoding long proteins. The increased GC% in long ORFs increases as species' GC% increases, and decreases as species' AG% increases. In low- and intermediate-GC% prokaryotic species, increases in ORF GC% as encoded proteins increase in length are largely accounted for by the base compositions of first and second (amino acid-determining) codon positions. In high-GC% prokaryotic species, first and third (non-amino acid-determining) codon positions play this role. CONCLUSION In low- and intermediate-GC% prokaryotes, placeholder amino acids are likely to be well defined, corresponding to codons enriched in G and/or C at first and second positions. In high-GC% prokaryotes, placeholder amino acids are likely to be less well defined. Increases in ORF GC% as encoded proteins increase in length are greater in mesophiles than in thermophiles, which are constrained from increasing protein lengths in response to base-composition pressures.
Collapse
Affiliation(s)
- Jonathan H Rayment
- Department of Biochemistry, Queen's University, Kingston, Ontario, Canada
| | | |
Collapse
|