1
|
Atre M, Joshi B, Babu J, Sawant S, Sharma S, Sankar TS. Origin, evolution, and maintenance of gene-strand bias in bacteria. Nucleic Acids Res 2024; 52:3493-3509. [PMID: 38442257 DOI: 10.1093/nar/gkae155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 02/06/2024] [Accepted: 02/19/2024] [Indexed: 03/07/2024] Open
Abstract
Gene-strand bias is a characteristic feature of bacterial genome organization wherein genes are preferentially encoded on the leading strand of replication, promoting co-orientation of replication and transcription. This co-orientation bias has evolved to protect gene essentiality, expression, and genomic stability from the harmful effects of head-on replication-transcription collisions. However, the origin, variation, and maintenance of gene-strand bias remain elusive. Here, we reveal that the frequency of inversions that alter gene orientation exhibits large variation across bacterial populations and negatively correlates with gene-strand bias. The density, distance, and distribution of inverted repeats show a similar negative relationship with gene-strand bias explaining the heterogeneity in inversions. Importantly, these observations are broadly evident across the entire bacterial kingdom uncovering inversions and inverted repeats as primary factors underlying the variation in gene-strand bias and its maintenance. The distinct catalytic subunits of replicative DNA polymerase have co-evolved with gene-strand bias, suggesting a close link between replication and the origin of gene-strand bias. Congruently, inversion frequencies and inverted repeats vary among bacteria with different DNA polymerases. In summary, we propose that the nature of replication determines the fitness cost of replication-transcription collisions, establishing a selection gradient on gene-strand bias by fine-tuning DNA sequence repeats and, thereby, gene inversions.
Collapse
Affiliation(s)
- Malhar Atre
- School of Biology, Indian Institute of Science Education and Research, Thiruvananthapuram, Kerala 695551, India
| | - Bharat Joshi
- School of Biology, Indian Institute of Science Education and Research, Thiruvananthapuram, Kerala 695551, India
| | - Jebin Babu
- School of Biology, Indian Institute of Science Education and Research, Thiruvananthapuram, Kerala 695551, India
| | - Shabduli Sawant
- School of Biology, Indian Institute of Science Education and Research, Thiruvananthapuram, Kerala 695551, India
| | - Shreya Sharma
- School of Biology, Indian Institute of Science Education and Research, Thiruvananthapuram, Kerala 695551, India
| | - T Sabari Sankar
- School of Biology, Indian Institute of Science Education and Research, Thiruvananthapuram, Kerala 695551, India
| |
Collapse
|
2
|
Ben-Elazar S, Chor B, Yakhini Z. The Functional 3D Organization of Unicellular Genomes. Sci Rep 2019; 9:12734. [PMID: 31484964 PMCID: PMC6726614 DOI: 10.1038/s41598-019-48798-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Accepted: 08/12/2019] [Indexed: 11/09/2022] Open
Abstract
Genome conformation capture techniques permit a systematic investigation into the functional spatial organization of genomes, including functional aspects like assessing the co-localization of sets of genomic elements. For example, the co-localization of genes targeted by a transcription factor (TF) within a transcription factory. We quantify spatial co-localization using a rigorous statistical model that measures the enrichment of a subset of elements in neighbourhoods inferred from Hi-C data. We also control for co-localization that can be attributed to genomic order. We systematically apply our open-sourced framework, spatial-mHG, to search for spatial co-localization phenomena in multiple unicellular Hi-C datasets with corresponding genomic annotations. Our biological findings shed new light on the functional spatial organization of genomes, including: In C. crescentus, DNA replication genes reside in two genomic clusters that are spatially co-localized. Furthermore, these clusters contain similar gene copies and lay in genomic vicinity to the ori and ter sequences. In S. cerevisae, Ty5 retrotransposon family element spatially co-localize at a spatially adjacent subset of telomeres. In N. crassa, both Proteasome lid subcomplex genes and protein refolding genes jointly spatially co-localize at a shared location. An implementation of our algorithms is available online.
Collapse
|
3
|
Quan CL, Gao F. Quantitative analysis and assessment of base composition asymmetry and gene orientation bias in bacterial genomes. FEBS Lett 2019; 593:918-925. [PMID: 30941752 DOI: 10.1002/1873-3468.13374] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 03/28/2019] [Accepted: 03/31/2019] [Indexed: 11/10/2022]
Abstract
Base composition asymmetry and gene orientation bias are two common genomic structures in bacterial genomes. Here, correlation coefficients between nucleotide disparities and coding sequence (CDS) skew have been calculated, which provides insights into their relationship from an individual genome perspective. Consequently, we find GC and RY disparities correlate significantly with CDS skew, since around 60% of the bacterial genomes under study have correlation coefficients > 0.9. Then, we present a model for quantitative assessment of nucleotide disparity and CDS skew in which a numerical index R2 is used for evaluation. We find that skew curves with higher R2 perform better on the prediction of replication origins in bacteria.
Collapse
Affiliation(s)
- Chun-Lan Quan
- Department of Physics, School of Science, Tianjin University, China
| | - Feng Gao
- Department of Physics, School of Science, Tianjin University, China.,Key Laboratory of Systems Bioengineering, Ministry of Education, Tianjin University, China.,SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), China
| |
Collapse
|
4
|
Seco EM, Ayora S. Bacillus subtilis DNA polymerases, PolC and DnaE, are required for both leading and lagging strand synthesis in SPP1 origin-dependent DNA replication. Nucleic Acids Res 2017; 45:8302-8313. [PMID: 28575448 PMCID: PMC5737612 DOI: 10.1093/nar/gkx493] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2016] [Accepted: 05/23/2017] [Indexed: 01/08/2023] Open
Abstract
Firmicutes have two distinct replicative DNA polymerases, the PolC leading strand polymerase, and PolC and DnaE synthesizing the lagging strand. We have reconstituted in vitro Bacillus subtilis bacteriophage SPP1 θ-type DNA replication, which initiates unidirectionally at oriL. With this system we show that DnaE is not only restricted to lagging strand synthesis as previously suggested. DnaG primase and DnaE polymerase are required for initiation of DNA replication on both strands. DnaE and DnaG synthesize in concert a hybrid RNA/DNA ‘initiation primer’ on both leading and lagging strands at the SPP1 oriL region, as it does the eukaryotic Pol α complex. DnaE, as a RNA-primed DNA polymerase, extends this initial primer in a reaction modulated by DnaG and one single-strand binding protein (SSB, SsbA or G36P), and hands off the initiation primer to PolC, a DNA-primed DNA polymerase. Then, PolC, stimulated by DnaG and the SSBs, performs the bulk of DNA chain elongation at both leading and lagging strands. Overall, these modulations by the SSBs and DnaG may contribute to the mechanism of polymerase switch at Firmicutes replisomes.
Collapse
Affiliation(s)
- Elena M Seco
- Centro Nacional de Biotecnología (CNB-CSIC), 28049 Madrid, Spain
| | - Silvia Ayora
- Centro Nacional de Biotecnología (CNB-CSIC), 28049 Madrid, Spain
| |
Collapse
|
5
|
Selection for energy efficiency drives strand-biased gene distribution in prokaryotes. Sci Rep 2017; 7:10572. [PMID: 28874819 PMCID: PMC5585166 DOI: 10.1038/s41598-017-11159-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Accepted: 08/18/2017] [Indexed: 01/08/2023] Open
Abstract
Lagging-strand genes accumulate more deleterious mutations. Genes are thus preferably located on the leading strand, an observation known as strand-biased gene distribution (SGD). Despite of this mechanistic understanding, a satisfactory quantitative model is still lacking. Replication-transcription-collisions induce stalling of the replication machinery, expose DNA to various attacks, and are followed by error-prone repairs. We found that mutational biases in non-transcribed regions can explain ~71% of the variations in SGDs in 1,552 genomes, supporting the mutagenesis origin of SGD. Mutational biases introduce energetically cheaper nucleotides on the lagging strand, and result in more expensive protein products; consistently, the cost difference between the two strands explains ~50% of the variance in SGDs. Protein costs decrease with increasing gene expression. At similar expression levels, protein products of leading-strand genes are generally cheaper than lagging-strand genes; however, highly-expressed lagging genes are still cheaper than lowly-expressed leading genes. Selection for energy efficiency thus drives some genes to the leading strand, especially those highly expressed and essential, but certainly not all genes. Stronger mutational biases are often associated with low-GC genomes; as low-GC genes encode expensive proteins, low-GC genomes thus tend to have stronger SGDs to alleviate the stronger pressure on efficient energy usage.
Collapse
|
6
|
Quantitative analysis of correlation between AT and GC biases among bacterial genomes. PLoS One 2017; 12:e0171408. [PMID: 28158313 PMCID: PMC5291525 DOI: 10.1371/journal.pone.0171408] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2016] [Accepted: 01/20/2017] [Indexed: 01/03/2023] Open
Abstract
Due to different replication mechanisms between the leading and lagging strands, nucleotide composition asymmetries widely exist in bacterial genomes. A general consideration reveals that the leading strand is enriched in Guanine (G) and Thymine (T), and the lagging strand shows richness in Adenine (A) and Cytosine (C). However, some bacteria like Bacillus subtilis have been discovered composing more A than T in the leading strand. To investigate the difference, we analyze the nucleotide asymmetry from the aspect of AT and GC bias correlations. In this study, we propose a windowless method, the Z-curve Correlation Coefficient (ZCC) index, based on the Z-curve method, and analyzed more than 2000 bacterial genomes. We find that the majority of bacteria reveal negative correlations between AT and GC biases, while most genomes in Firmicutes and Tenericutes have positive ZCC indexes. The presence of PolC, purine asymmetry and stronger genes preference in the leading strand are not confined to Firmicutes, but also likely to happen in other phyla dominated by positive ZCC indexes. This method also provides a new insight into other relevant features like aerobism, and can be applied to analyze the correlation between RY (Purine and Pyrimidine) and MK (Amino and Keto) bias and so on.
Collapse
|
7
|
Zheng WX, Luo CS, Deng YY, Guo FB. Essentiality drives the orientation bias of bacterial genes in a continuous manner. Sci Rep 2015; 5:16431. [PMID: 26560889 PMCID: PMC4642330 DOI: 10.1038/srep16431] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2015] [Accepted: 10/13/2015] [Indexed: 12/04/2022] Open
Abstract
Studies had found that bacterial genes are preferentially located on the leading strands. Subsequently, the preferences of essential genes and highly expressed genes were compared by classifying all genes into four groups, which showed that the former has an exclusive influence on orientation. However, only some functional classes of essential genes have this orientation bias. Nevertheless, previous studies only performed comparative analyzes by differentiating the orientation bias extent of two types of genes. Thus, it is unclear whether the influence of essentiality on strand bias works continuously. Herein, we found a significant correlation between essentiality and orientation bias extent in 19 of 21 analyzed bacterial genomes, based on quantitative measurement of gene essentiality (or fitness). The correlation coefficient was much higher than that derived from binary essentiality measures (essential or non-essential). This suggested that genes with relatively lower essentiality, i.e., conditionally essential genes, also have some orientation bias, although it is weaker than that of absolutely essential genes. The results demonstrated the continuous influence of essentiality on orientation bias and provided details on this visible structural feature of bacterial genomes. It also proved that Geptop and IFIM could serve as useful resources of bacterial gene essentiality, particularly for quantitative analysis.
Collapse
Affiliation(s)
- Wen-Xin Zheng
- School of Biomedical Engineering, Capital Medical University, Beijing 100069, China.,Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing 100069, China
| | - Cheng-Si Luo
- Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Key Laboratory for Neuro Information of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Yan-Yan Deng
- Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Key Laboratory for Neuro Information of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Feng-Biao Guo
- Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Key Laboratory for Neuro Information of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, 610054, China
| |
Collapse
|
8
|
Multiple Factors Drive Replicating Strand Composition Bias in Bacterial Genomes. Int J Mol Sci 2015; 16:23111-26. [PMID: 26404268 PMCID: PMC4613354 DOI: 10.3390/ijms160923111] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Revised: 08/18/2015] [Accepted: 09/18/2015] [Indexed: 11/18/2022] Open
Abstract
Composition bias from Chargaff’s second parity rule (PR2) has long been found in sequenced genomes, and is believed to relate strongly with the replication process in microbial genomes. However, some disagreement on the underlying reason for strand composition bias remains. We performed an integrative analysis of various genomic features that might influence composition bias using a large-scale dataset of 1111 genomes. Our results indicate (1) the bias was stronger in obligate intracellular bacteria than in other free-living species (p-value = 0.0305); (2) Fusobacteria and Firmicutes had the highest average bias among the 24 microbial phyla analyzed; (3) the strength of selected codon usage bias and generation times were not observably related to strand composition bias (p-value = 0.3247); (4) significant negative relationships were found between GC content, genome size, rearrangement frequency, Clusters of Orthologous Groups (COG) functional subcategories A, C, I, Q, and composition bias (p-values < 1.0 × 10−8); (5) gene density and COG functional subcategories D, F, J, L, and V were positively related with composition bias (p-value < 2.2 × 10−16); and (6) gene density made the most important contribution to composition bias, indicating transcriptional bias was associated strongly with strand composition bias. Therefore, strand composition bias was found to be influenced by multiple factors with varying weights.
Collapse
|
9
|
Goswami A, Roy Chowdhury A, Sarkar M, Saha SK, Paul S, Dutta C. Strand-biased gene distribution, purine assymetry and environmental factors influence protein evolution in Bacillus. FEBS Lett 2015; 589:629-38. [PMID: 25639611 DOI: 10.1016/j.febslet.2015.01.028] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Revised: 01/16/2015] [Accepted: 01/18/2015] [Indexed: 12/23/2022]
Abstract
A strong purine asymmetry, along with strand-biased gene distribution and the presence of PolC, prevails in Bacillus and some other members of Firmicutes, Fusobacteria and Tenericutes. The analysis of protein features in 21 Bacillus species of diverse metabolic, virulence and ecological traits revealed that purine asymmetry in conjunction with lineage/niche specific constraints significantly influences protein evolution in Bacillus. All Bacillus species, except for Se-respiring Bacillus selenitireducens, display distinct strand-specific biases in amino acid usage, which may affect the isoelectric point or surface charge distribution of proteins with prevalence of acidic and basic residues in the leading and lagging strand proteins, respectively.
Collapse
Affiliation(s)
- Aranyak Goswami
- Structural Biology & Bioinformatics Division, CSIR - Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Kolkata 700032, India.
| | - Anindya Roy Chowdhury
- Structural Biology & Bioinformatics Division, CSIR - Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Kolkata 700032, India.
| | - Munmun Sarkar
- Structural Biology & Bioinformatics Division, CSIR - Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Kolkata 700032, India.
| | - Sanjoy Kumar Saha
- Structural Biology & Bioinformatics Division, CSIR - Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Kolkata 700032, India.
| | - Sandip Paul
- Structural Biology & Bioinformatics Division, CSIR - Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Kolkata 700032, India.
| | - Chitra Dutta
- Structural Biology & Bioinformatics Division, CSIR - Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Kolkata 700032, India.
| |
Collapse
|
10
|
Jain K, Krause K, Grewe F, Nelson GF, Weber APM, Christensen AC, Mower JP. Extreme features of the Galdieria sulphuraria organellar genomes: a consequence of polyextremophily? Genome Biol Evol 2014; 7:367-80. [PMID: 25552531 PMCID: PMC4316638 DOI: 10.1093/gbe/evu290] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Nuclear genome sequencing from extremophilic eukaryotes has revealed clues about the mechanisms of adaptation to extreme environments, but the functional consequences of extremophily on organellar genomes are unknown. To address this issue, we assembled the mitochondrial and plastid genomes from a polyextremophilic red alga, Galdieria sulphuraria strain 074 W, and performed a comparative genomic analysis with other red algae and more broadly across eukaryotes. The mitogenome is highly reduced in size and genetic content and exhibits the highest guanine–cytosine skew of any known genome and the fastest substitution rate among all red algae. The plastid genome contains a large number of intergenic stem-loop structures but is otherwise rather typical in size, structure, and content in comparison with other red algae. We suggest that these unique genomic modifications result not only from the harsh conditions in which Galdieria lives but also from its unusual capability to grow heterotrophically, endolithically, and in the dark. These conditions place additional mutational pressures on the mitogenome due to the increased reliance on the mitochondrion for energy production, whereas the decreased reliance on photosynthesis and the presence of numerous stem-loop structures may shield the plastome from similar genomic stress.
Collapse
Affiliation(s)
- Kanika Jain
- Center for Plant Science Innovation, University of Nebraska - Lincoln School of Biological Sciences, University of Nebraska - Lincoln
| | - Kirsten Krause
- Department of Arctic and Marine Biology, UiT-The Arctic University of Norway, Tromsø, Norway
| | - Felix Grewe
- Center for Plant Science Innovation, University of Nebraska - Lincoln Department of Agronomy and Horticulture, University of Nebraska - Lincoln
| | - Gaven F Nelson
- Center for Plant Science Innovation, University of Nebraska - Lincoln School of Biological Sciences, University of Nebraska - Lincoln
| | - Andreas P M Weber
- Institute of Plant Biochemistry, Cluster of Excellence on Plant Science, Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany
| | | | - Jeffrey P Mower
- Center for Plant Science Innovation, University of Nebraska - Lincoln Department of Agronomy and Horticulture, University of Nebraska - Lincoln
| |
Collapse
|
11
|
Saha SK, Goswami A, Dutta C. Association of purine asymmetry, strand-biased gene distribution and PolC within Firmicutes and beyond: a new appraisal. BMC Genomics 2014; 15:430. [PMID: 24899249 PMCID: PMC4070872 DOI: 10.1186/1471-2164-15-430] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2013] [Accepted: 05/08/2014] [Indexed: 11/10/2022] Open
Abstract
Background The Firmicutes often possess three conspicuous genome features: marked Purine Asymmetry (PAS) across two strands of replication, Strand-biased Gene Distribution (SGD) and presence of two isoforms of DNA polymerase III alpha subunit, PolC and DnaE. Despite considerable research efforts, it is not clear whether the co-existence of PAS, PolC and/or SGD is an essential and exclusive characteristic of the Firmicutes. The nature of correlations, if any, between these three features within and beyond the lineages of Firmicutes has also remained elusive. The present study has been designed to address these issues. Results A large-scale analysis of diverse bacterial genomes indicates that PAS, PolC and SGD are neither essential nor exclusive features of the Firmicutes. PolC prevails in four bacterial phyla: Firmicutes, Fusobacteria, Tenericutes and Thermotogae, while PAS occurs only in subsets of Firmicutes, Fusobacteria and Tenericutes. There are five major compositional trends in Firmicutes: (I) an explicit PAS or G + A-dominance along the entire leading strand (II) only G-dominance in the leading strand, (III) alternate stretches of purine-rich and pyrimidine-rich sequences, (IV) G + T dominance along the leading strand, and (V) no identifiable patterns in base usage. Presence of strong SGD has been observed not only in genomes having PAS, but also in genomes with G-dominance along their leading strands – an observation that defies the notion of co-occurrence of PAS and SGD in Firmicutes. The PolC-containing non-Firmicutes organisms often have alternate stretches of R-dominant and Y-dominant sequences along their genomes and most of them show relatively weak, but significant SGD. Firmicutes having G + A-dominance or G-dominance along LeS usually show distinct base usage patterns in three codon sites of genes. Probable molecular mechanisms that might have incurred such usage patterns have been proposed. Conclusion Co-occurrence of PAS, strong SGD and PolC should not be regarded as a genome signature of the Firmicutes. Presence of PAS in a species may warrant PolC and strong SGD, but PolC and/or SGD not necessarily implies PAS. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-430) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | - Chitra Dutta
- Structural Biology & Bioinformatics Division, CSIR- Indian Institute of Chemical Biology, 4, Raja S, C, Mullick Road, Kolkata 700032, India.
| |
Collapse
|
12
|
Zhang Z, Yu J. Does the genetic code have a eukaryotic origin? GENOMICS PROTEOMICS & BIOINFORMATICS 2013; 11:41-55. [PMID: 23402863 PMCID: PMC4357656 DOI: 10.1016/j.gpb.2013.01.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Revised: 01/09/2013] [Accepted: 01/11/2013] [Indexed: 11/29/2022]
Abstract
In the RNA world, RNA is assumed to be the dominant macromolecule performing most, if not all, core “house-keeping” functions. The ribo-cell hypothesis suggests that the genetic code and the translation machinery may both be born of the RNA world, and the introduction of DNA to ribo-cells may take over the informational role of RNA gradually, such as a mature set of genetic code and mechanism enabling stable inheritance of sequence and its variation. In this context, we modeled the genetic code in two content variables—GC and purine contents—of protein-coding sequences and measured the purine content sensitivities for each codon when the sensitivity (% usage) is plotted as a function of GC content variation. The analysis leads to a new pattern—the symmetric pattern—where the sensitivity of purine content variation shows diagonally symmetry in the codon table more significantly in the two GC content invariable quarters in addition to the two existing patterns where the table is divided into either four GC content sensitivity quarters or two amino acid diversity halves. The most insensitive codon sets are GUN (valine) and CAN (CAR for asparagine and CAY for aspartic acid) and the most biased amino acid is valine (always over-estimated) followed by alanine (always under-estimated). The unique position of valine and its codons suggests its key roles in the final recruitment of the complete codon set of the canonical table. The distinct choice may only be attributable to sequence signatures or signals of splice sites for spliceosomal introns shared by all extant eukaryotes.
Collapse
Affiliation(s)
- Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | | |
Collapse
|
13
|
Lin Q, Cui P, Ding F, Hu S, Yu J. Replication-Associated Mutational Pressure (RMP) Governs Strand-Biased Compositional Asymmetry (SCA) and Gene Organization in Animal Mitochondrial Genomes. Curr Genomics 2012; 13:28-36. [PMID: 22942673 PMCID: PMC3269014 DOI: 10.2174/138920212799034811] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2011] [Revised: 10/01/2011] [Accepted: 10/04/2011] [Indexed: 11/30/2022] Open
Abstract
The nucleotide composition of the light (L-) and heavy (H-) strands of animal mitochondrial genomes is known to exhibit strand-biased compositional asymmetry (SCA). One of the possibilities is the existence of a replication-associated mutational pressure (RMP) that may introduce characteristic nucleotide changes among mitochondrial genomes of different animal lineages. Here, we discuss the influence of RMP on nucleotide and amino acid compositions as well as gene organization. Among animal mitochondrial genomes, RMP may represent the major force that compels the evolution of mitochondrial protein-coding genes, coupled with other process-based selective pressures, such as on components of translation machinery— tRNAs and their anticodons. Through comparative analyses of sequenced mitochondrial genomes among diverse animal lineages and literature reviews, we suggest a strong RMP effect, observed among invertebrate mitochondrial genes as compared to those of vertebrates, that is either a result of positive selection on the invertebrate or a relaxed selective pressure on the vertebrate mitochondrial genes.
Collapse
Affiliation(s)
- Qiang Lin
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, 100029 Beijing, China
| | | | | | | | | |
Collapse
|
14
|
Zhang Z, Yu J. The pendulum model for genome compositional dynamics: from the four nucleotides to the twenty amino acids. GENOMICS PROTEOMICS & BIOINFORMATICS 2012; 10:175-80. [PMID: 23084772 PMCID: PMC5054704 DOI: 10.1016/j.gpb.2012.08.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2012] [Accepted: 08/02/2012] [Indexed: 12/29/2022]
Abstract
The genetic code serves as one of the natural links for life’s two conceptual frameworks—the informational and operational tracks—bridging the nucleotide sequence of DNA and RNA to the amino acid sequence of protein and thus its structure and function. On the informational track, DNA and its four building blocks have four basic variables: order, length, GC and purine contents; the latter two exhibit unique characteristics in prokaryotic genomes where protein-coding sequences dominate. Bridging the two tracks, tRNAs and their aminoacyl tRNA synthases that interpret each codon—nucleotide triplet, together with ribosomes, form a complex machinery that translates genetic information encoded on the messenger RNAs into proteins. On the operational track, proteins are selected in a context of cellular and organismal functions constantly. The principle of such a functional selection is to minimize the damage caused by sequence alteration in a seemingly random fashion at the nucleotide level and its function-altering consequence at the protein level; the principle also suggests that there must be complex yet sophisticated mechanisms to protect molecular interactions and cellular processes for cells and organisms from the damage in addition to both immediate or short-term eliminations and long-term selections. The two-century study of selection at species and population levels has been leading a way to understand rules of inheritance and evolution at molecular levels along the informational track, while ribogenomics, epigenomics and other operationally-defined omics (such as the metabolite-centric metabolomics) have been ushering biologists into the new millennium along the operational track.
Collapse
Affiliation(s)
- Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | | |
Collapse
|
15
|
Wu H, Qu H, Wan N, Zhang Z, Hu S, Yu J. Strand-biased gene distribution in bacteria is related to both horizontal gene transfer and strand-biased nucleotide composition. GENOMICS PROTEOMICS & BIOINFORMATICS 2012; 10:186-96. [PMID: 23084774 PMCID: PMC5054707 DOI: 10.1016/j.gpb.2012.08.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2012] [Accepted: 07/29/2012] [Indexed: 11/18/2022]
Abstract
Although strand-biased gene distribution (SGD) was described some two decades ago, the underlying molecular mechanisms and their relationship remain elusive. Its facets include, but are not limited to, the degree of biases, the strand-preference of genes, and the influence of background nucleotide composition variations. Using a dataset composed of 364 non-redundant bacterial genomes, we sought to illustrate our current understanding of SGD. First, when we divided the collection of bacterial genomes into non-polC and polC groups according to their possession of DnaE isoforms that correlate closely with taxonomy, the SGD of the polC group stood out more significantly than that of the non-polC group. Second, when examining horizontal gene transfer, coupled with gene functional conservation (essentiality) and expressivity (level of expression), we realized that they all contributed to SGD. Third, we further demonstrated a weaker G-dominance on the leading strand of the non-polC group but strong purine dominance (both G and A) on the leading strand of the polC group. We propose that strand-biased nucleotide composition plays a decisive role for SGD since the polC-bearing genomes are not only AT-rich but also have pronounced purine-rich leading strands, and we believe that a special mutation spectrum that leads to a strong purine asymmetry and a strong strand-biased nucleotide composition coupled with functional selections for genes and their functions are both at work.
Collapse
|
16
|
Mao X, Zhang H, Yin Y, Xu Y. The percentage of bacterial genes on leading versus lagging strands is influenced by multiple balancing forces. Nucleic Acids Res 2012; 40:8210-8. [PMID: 22735706 PMCID: PMC3458553 DOI: 10.1093/nar/gks605] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
The majority of bacterial genes are located on the leading strand, and the percentage of such genes has a large variation across different bacteria. Although some explanations have been proposed, these are at most partial explanations as they cover only small percentages of the genes and do not even consider the ones biased toward the lagging strand. We have carried out a computational study on 725 bacterial genomes, aiming to elucidate other factors that may have influenced the strand location of genes in a bacterium. Our analyses suggest that (i) genes of some functional categories such as ribosome have higher preferences to be on the leading strands; (ii) genes of some functional categories such as transcription factor have higher preferences on the lagging strands; (iii) there is a balancing force that tends to keep genes from all moving to the leading and more efficient strand and (iv) the percentage of leading-strand genes in an bacterium can be accurately explained based on the numbers of genes in the functional categories outlined in (i) and (ii), genome size and gene density, indicating that these numbers implicitly contain the information about the percentage of genes on the leading versus lagging strand in a genome.
Collapse
Affiliation(s)
- Xizeng Mao
- Computational Systems Biology Lab, Department of Biochemistry and Molecular Biology and Institute of Bioinformatics, University of Georgia, Athens, GA 30605, USA
| | | | | | | |
Collapse
|
17
|
Distinct co-evolution patterns of genes associated to DNA polymerase III DnaE and PolC. BMC Genomics 2012; 13:69. [PMID: 22333191 PMCID: PMC3814617 DOI: 10.1186/1471-2164-13-69] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2011] [Accepted: 02/14/2012] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Bacterial genomes displaying a strong bias between the leading and the lagging strand of DNA replication encode two DNA polymerases III, DnaE and PolC, rather than a single one. Replication is a highly unsymmetrical process, and the presence of two polymerases is therefore not unexpected. Using comparative genomics, we explored whether other processes have evolved in parallel with each polymerase. RESULTS Extending previous in silico heuristics for the analysis of gene co-evolution, we analyzed the function of genes clustering with dnaE and polC. Clusters were highly informative. DnaE co-evolves with the ribosome, the transcription machinery, the core of intermediary metabolism enzymes. It is also connected to the energy-saving enzyme necessary for RNA degradation, polynucleotide phosphorylase. Most of the proteins of this co-evolving set belong to the persistent set in bacterial proteomes, that is fairly ubiquitously distributed. In contrast, PolC co-evolves with RNA degradation enzymes that are present only in the A+T-rich Firmicutes clade, suggesting at least two origins for the degradosome. CONCLUSION DNA replication involves two machineries, DnaE and PolC. DnaE co-evolves with the core functions of bacterial life. In contrast PolC co-evolves with a set of RNA degradation enzymes that does not derive from the degradosome identified in gamma-Proteobacteria. This suggests that at least two independent RNA degradation pathways existed in the progenote community at the end of the RNA genome world.
Collapse
|
18
|
Fang Y, Li Z, Liu J, Shu C, Wang X, Zhang X, Yu X, Zhao D, Liu G, Hu S, Zhang J, Al-Mssallem I, Yu J. A pangenomic study of Bacillus thuringiensis. J Genet Genomics 2011; 38:567-76. [PMID: 22196399 DOI: 10.1016/j.jgg.2011.11.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2011] [Revised: 10/25/2011] [Accepted: 11/09/2011] [Indexed: 11/28/2022]
Abstract
Bacillus thuringiensis (B. thuringiensis) is a soil-dwelling Gram-positive bacterium and its plasmid-encoded toxins (Cry) are commonly used as biological alternatives to pesticides. In a pangenomic study, we sequenced seven B. thuringiensis isolates in both high coverage and base-quality using the next-generation sequencing platform. The B. thuringiensis pangenome was extrapolated to have 4196 core genes and an asymptotic value of 558 unique genes when a new genome is added. Compared to the pangenomes of its closely related species of the same genus, B. thuringiensis pangenome shows an open characteristic, similar to B. cereus but not to B. anthracis; the latter has a closed pangenome. We also found extensive divergence among the seven B. thuringiensis genome assemblies, which harbor ample repeats and single nucleotide polymorphisms (SNPs). The identities among orthologous genes are greater than 84.5% and the hotspots for the genome variations were discovered in genomic regions of 2.3-2.8Mb and 5.0-5.6Mb. We concluded that high-coverage sequence assemblies from multiple strains, before all the gaps are closed, are very useful for pangenomic studies.
Collapse
Affiliation(s)
- Yongjun Fang
- James D. Watson Institute of Genome Sciences, College of Life Science, Zhejiang University, Hangzhou 310058, China
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Guo FB. [Strong strand specific composition bias-a genomic character of some obligate parasites or symbionts]. YI CHUAN = HEREDITAS 2011; 33:1039-1047. [PMID: 21993278 DOI: 10.3724/sp.j.1005.2011.01039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
DNA replication includes a set of asymmetric mechanisms, which is a division into lagging and leading strands. The former is synthesized continuously whereas the synthesis for the latter is discontinuous. Such a asymmetric mechanism leads to distinct nucleotide composition of these two strands. Strands specific nucleotide composition bias was originally found in genomes of echinoderm and vertebrate mitochondria and then in several bacterial genomes. With the rapid growth in the number of sequenced genomes, many bacteria and even eukaryotes are found to have the consistent strand composition bias. In some bacteria, the extent of strand specific composition bias was so strong that genes on the two replicating strands could be separated according to their codon usages. Till now, 11 obligate intracellular bacteria have been found to have separate codon usages according to whether genes located on the leading or lagging strands. However, there is still not a well-accepted theory that could interpret the reason for the occurrence of separate codon usages in some special bacterial genomes and not in others. This paper reviews the related works and points out its open problems.
Collapse
Affiliation(s)
- Feng-Biao Guo
- University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
20
|
Chen K, Wang L, Yang M, Liu J, Xin C, Hu S, Yu J. Sequence signatures of nucleosome positioning in Caenorhabditis elegans. GENOMICS PROTEOMICS & BIOINFORMATICS 2010; 8:92-102. [PMID: 20691394 PMCID: PMC5054450 DOI: 10.1016/s1672-0229(10)60010-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Our recent investigation in the protist Trichomonas vaginalis suggested a DNA sequence periodicity with a unit length of 120.9 nt, which represents a sequence signature for nucleosome positioning. We now extended our observation in higher eukaryotes and identified a similar periodicity of 175 nt in length in Caenorhabditis elegans. In the process of defining the sequence compositional characteristics, we found that the 10.5-nt periodicity, the sequence signature of DNA double helix, may not be sufficient for cross-nucleosome positioning but provides essential guiding rails to facilitate positioning. We further dissected nucleosome-protected sequences and identified a strong positive purine (AG) gradient from the 5′-end to the 3′-end, and also learnt that the nucleosome-enriched regions are GC-rich as compared to the nucleosome-free sequences as purine content is positively correlated with GC content. Sequence characterization allowed us to develop a hidden Markov model (HMM) algorithm for decoding nucleosome positioning computationally, and based on a set of training data from the fifth chromosome of C. elegans, our algorithm predicted 60%-70% of the well-positioned nucleosomes, which is 15%-20% higher than random positioning. We concluded that nucleosomes are not randomly positioned on DNA sequences and yet bind to different genome regions with variable stability, well-positioned nucleosomes leave sequence signatures on DNA, and statistical positioning of nucleosomes across genome can be decoded computationally based on these sequence signatures.
Collapse
Affiliation(s)
- Kaifu Chen
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | | | | | | | | | | | | |
Collapse
|
21
|
Qu H, Wu H, Zhang T, Zhang Z, Hu S, Yu J. Nucleotide compositional asymmetry between the leading and lagging strands of eubacterial genomes. Res Microbiol 2010; 161:838-46. [PMID: 20868744 DOI: 10.1016/j.resmic.2010.09.015] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2010] [Accepted: 08/03/2010] [Indexed: 11/15/2022]
Abstract
Nucleotide compositional asymmetry (NCA) between leading and lagging strands (LeS and LaS) is dynamic and diverse among eubacterial genomes due to different mutation and selection forces. A thorough investigation is needed in order to study the relationship between nucleotide composition dynamics and gene distribution biases. Based on a collection of 364 eubacterial genomes that were grouped according to a DnaE-based scheme (DnaE1-DnaE1, DnaE2-DnaE1, and DnaE3-PolC), we investigated NCA and nucleotide composition gradients at three codon positions and found that there was universal G-enrichment on LeS among all groups. This was due to a strong selection for G-heading (codon position1 or cp1) codons and mutation pressure that led to more G-ending (cp3) codons. Moreover, a slight T-enrichment of LeS due to the mutation of cytosine deamination at cp3 was universal among DnaE1-DnaE1 and DnaE2-DnaE1 genomes, but was not clearly seen among DnaE3-PolC genomes, in which A-enrichment of LeS was proposed to be the effect of selections unique to polC and a mutation bias toward A-richness at cp1 that may be a result of transcription-coupled DNA repair mechanisms. Furthermore, strand-biased gene distribution enhances the purine-richness of LeS for DnaE3-PolC genomes and T-richness of LeS for DnaE1-DnaE1 and DnaE2-dnaE1 genomes.
Collapse
Affiliation(s)
- Hongzhu Qu
- Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China.
| | | | | | | | | | | |
Collapse
|
22
|
Powdel BR, Borah M, Ray SK. Strand-specific mutational bias influences codon usage of weakly expressed genes in Escherichia coli. Genes Cells 2010; 15:773-82. [PMID: 20545764 DOI: 10.1111/j.1365-2443.2010.01417.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
According to the selection-mutation-drift theory of molecular evolution, mutation predominates in determining codon usage bias (CUB) in weakly expressed genes (WEG) whereas selection predominates in determining CUB in highly expressed genes (HEG). Strand-specific mutational bias causes compositional asymmetry of the nucleotides between leading and lagging strands (LaS) in bacterial chromosomes. Keeping in view the aforementioned points, CUB between the strands were compared in Escherichia coli chromosome. In comparison with HEG, codon usage of WEG was observed to be more biased toward strands: G ending codons were significantly more in leading strands than in LaS and the reverse was true for the C ending codons. In case of WEG, the GC(3) skews were found to be significantly different between the strands. This suggests that strand-specific mutational bias influences codon usage of WEG to a greater extent than that of HEG. The differential effect of strand-specific mutational bias in E. coli might be attributed to stronger purifying selection in the HEG than the WEG. The observation here in E. coli supports the SMD theory of molecular evolution.
Collapse
Affiliation(s)
- Bhesh Raj Powdel
- Department of Mathematical Sciences, Tezpur University, Assam, India
| | | | | |
Collapse
|
23
|
Bohlin J, Hardy SP, Ussery DW. Stretches of alternating pyrimidine/purines and purines are respectively linked with pathogenicity and growth temperature in prokaryotes. BMC Genomics 2009; 10:346. [PMID: 19646265 PMCID: PMC2728739 DOI: 10.1186/1471-2164-10-346] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2009] [Accepted: 07/31/2009] [Indexed: 02/02/2023] Open
Abstract
Background The genomic fractions of purine (RR) and alternating pyrimidine/purine (YR) stretches of 10 base pairs or more, have been linked to genomic AT content, the formation of different DNA helices, strand-biased gene distribution, DNA structure, and more. Although some of these factors are a consequence of the chemical properties of purines and pyrimidines, a thorough statistical examination of the distributions of YR/RR stretches in sequenced prokaryotic chromosomes has to the best of our knowledge, not been undertaken. The aim of this study is to expand upon previous research by using regression analysis to investigate how AT content, habitat, growth temperature, pathogenicity, phyla, oxygen requirement and halotolerance correlated with the distribution of RR and YR stretches in prokaryotes. Results Our results indicate that RR and YR-stretches are differently distributed in prokaryotic phyla. RR stretches are overrepresented in all phyla except for the Actinobacteria and β-Proteobacteria. In contrast, YR tracts are underrepresented in all phyla except for the β-Proteobacterial group. YR-stretches are associated with phylum, pathogenicity and habitat, whilst RR-tracts are associated with phylum, AT content, oxygen requirement, growth temperature and halotolerance. All associations described were statistically significant with p < 0.001. Conclusion Analysis of chromosomal distributions of RR/YR sequences in prokaryotes reveals a set of associations with environmental factors not observed with mono- and oligonucleotide frequencies. This implies that important information can be found in the distribution of RR/YR stretches that is more difficult to obtain from genomic mono- and oligonucleotide frequencies. The association between pathogenicity and fractions of YR stretches is assumed to be linked to recombination and horizontal transfer.
Collapse
Affiliation(s)
- Jon Bohlin
- Norwegian School of Veterinary Science, Oslo, Norway.
| | | | | |
Collapse
|
24
|
Chen K, Meng Q, Ma L, Liu Q, Tang P, Chiu C, Hu S, Yu J. A novel DNA sequence periodicity decodes nucleosome positioning. Nucleic Acids Res 2008; 36:6228-36. [PMID: 18829715 PMCID: PMC2577358 DOI: 10.1093/nar/gkn626] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
There have been two types of well-characterized DNA sequence periodicities; both are found to be associated with important molecular mechanisms. One is a 3-nt periodicity corresponding to codon triplets, the other is a 10.5-nt periodicity related to the structure of DNA helixes. In the process of analyzing the genome and transcriptome of Trichomonas vaginalis, we observed a 120.9-nt periodicity along DNA sequences. Different from the 3- and 10.5-nt periodicities, this novel periodicity originates near the 5′-end of transcripts, extends along the direction of transcription, and weakens gradually along transcripts. As a result, codon usage as well as amino acid composition is constrained by this periodicity. Similar periodicities were also identified in other organisms, but with variable length associated with the length of nucleosome units. We validated this association experimentally in T. vaginalis, and demonstrated that the periodicity manifests nucleotide variations between linker-DNA and wrapping-DNA along nucleosome array. We conclude that this novel DNA sequence periodicity is a signature of nucleosome organization suggesting that nucleosomes are well-positioned with regularity, especially near the 5′-end of transcripts.
Collapse
Affiliation(s)
- Kaifu Chen
- Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Graduate University of Chinese Academy of Sciences, Beijing, China
| | | | | | | | | | | | | | | |
Collapse
|
25
|
Xiao JF, Yu J. A scenario on the stepwise evolution of the genetic code. GENOMICS PROTEOMICS & BIOINFORMATICS 2008; 5:143-51. [PMID: 18267295 PMCID: PMC5054201 DOI: 10.1016/s1672-0229(08)60001-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
It is believed that in the RNA world the operational (ribozymes) and the informational (riboscripts) RNA molecules were created with only three (adenosine, uridine, and guanosine) and two (adenosine and uridine) nucleosides, respectively, so that the genetic code started uncomplicated. Ribozymes subsequently evolved to be able to cut and paste themselves and riboscripts were acceptive to rigorous editing (adenosine to inosine); the intensive diversification of RNA molecules shaped novel cellular machineries that are capable of polymerizing amino acids—a new type of cellular building materials for life. Initially, the genetic code, encoding seven amino acids, was created only to distinguish purine and pyrimidine; it was later expanded in a stepwise way to encode 12, 15, and 20 amino acids through the relief of guanine from its roles as operational signals and through the recruitment of cytosine. Therefore, the maturation of the genetic code also coincided with (1) the departure of aminoacyl-tRNA synthetases (AARSs) from the primordial translation machinery, (2) the replacement of informational RNA by DNA, and (3) the co-evolution of AARSs and their cognate tRNAs. This model predicts gradual replacements of RNA-made molecular mechanisms, cellular processes by proteins, and informational exploitation by DNA.
Collapse
Affiliation(s)
- Jing-Fa Xiao
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | | |
Collapse
|