1
|
Forsdyke DR. Genomic compliance with Chargaff's second parity rule may have originated non-adaptively, but stem-loops now function adaptively. J Theor Biol 2024; 595:111943. [PMID: 39277166 DOI: 10.1016/j.jtbi.2024.111943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 07/06/2024] [Accepted: 09/07/2024] [Indexed: 09/17/2024]
Abstract
Of Chargaff's four rules on DNA base quantity, his second parity rule (PR-2) is the most contentious. Various biometricians (e.g., Sueoka, Lobry) regarded PR-2 compliance as a non-adaptive feature of modern genomes that could be modeled through interrelations among mutation rates. However, PR-2 compliance with stem-loop potential was considered adaptively relevant by biochemists familiar with analyses of nucleic acid structure (e.g., of Crick) and of meiotic recombination (e.g., of Kleckner). Meanwhile, other biometricians had shown that PR-2 complementarity extended beyond individual bases (1-mers) to oligonucleotides (k-mers), possibly reflecting "advantageous DNA structure" (Nussinov). An "introns early" hypothesis (Reanney, Forsdyke) had suggested a primordial nucleic acid world with recombination-mediated error-correction requiring genome-wide stem-loop potential to have evolved prior to localized intrusions of protein-encoding potential (exons). Thus, a primordial genome was equivalent to one long intron. Indeed, when assessed as the base order-dependent component (correcting for local influences of GC%), modern genes, especially when evolving rapidly under positive Darwinian selection, display high intronic stem-loop potential. This suggests forced migration from neighboring exons by competing protein-encoding potential. PR-2 compliance may have first arisen non-adaptively. Primary prototypic structures were later strengthened by their adaptive contribution to recombination. Thus, contentious views may actually be in harmony.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, Ontario K7L3N6, Canada.
| |
Collapse
|
2
|
Pflughaupt P, Sahakyan AB. Generalised interrelations among mutation rates drive the genomic compliance of Chargaff's second parity rule. Nucleic Acids Res 2023; 51:7409-7423. [PMID: 37293966 PMCID: PMC10415130 DOI: 10.1093/nar/gkad477] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 05/05/2023] [Accepted: 05/17/2023] [Indexed: 06/10/2023] Open
Abstract
Chargaff's second parity rule (PR-2), where the complementary base and k-mer contents are matching within the same strand of a double stranded DNA (dsDNA), is a phenomenon that invited many explanations. The strict compliance of nearly all nuclear dsDNA to PR-2 implies that the explanation should also be similarly adamant. In this work, we revisited the possibility of mutation rates driving PR-2 compliance. Starting from the assumption-free approach, we constructed kinetic equations for unconstrained simulations. The results were analysed for their PR-2 compliance by employing symbolic regression and machine learning techniques. We arrived to a generalised set of mutation rate interrelations in place in most species that allow for their full PR-2 compliance. Importantly, our constraints explain PR-2 in genomes out of the scope of the prior explanations based on the equilibration under mutation rates with simpler no-strand-bias constraints. We thus reinstate the role of mutation rates in PR-2 through its molecular core, now shown, under our formulation, to be tolerant to previously noted strand biases and incomplete compositional equilibration. We further investigate the time for any genome to reach PR-2, showing that it is generally earlier than the compositional equilibrium, and well within the age of life on Earth.
Collapse
Affiliation(s)
- Patrick Pflughaupt
- MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DS, UK
| | - Aleksandr B Sahakyan
- MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DS, UK
| |
Collapse
|
3
|
Affinity and Correlation in DNA. J 2022. [DOI: 10.3390/j5020016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
A statistical analysis of important DNA sequences and related proteins has been performed to study the relationships between monomers, and some general considerations about these macromolecules can be provided from the results. First, the most important relationship between sites in all the DNA sequences examined is that between two consecutive base pairs. This is an indication of an energetic stabilization due to the stacking interaction of these couples of base pairs. Secondly, the difference between human chromosome sequences and their coding parts is relevant both in the relationships between sites and in some specific compositional rules, such as the second Chargaff rule. Third, the evidence of the relationship in two successive triplets of DNA coding sequences generates a relationship between two successive amino acids in the proteins. This is obviously impossible if all the relationships between the sites are statistical evidence and do not involve causes; therefore, in this article, due to stacking interactions and this relationship in coding sequences, we will divide the concept of the relationship between sites into two concepts: affinity and correlation, the first with physical causes and the second without. Finally, from the statistical analyses carried out, it will emerge that the human genome is uniform, with the only significant exception being the Y chromosome.
Collapse
|
4
|
Fimmel E, Gumbel M, Karpuzoglu A, Petoukhov S. On comparing composition principles of long DNA sequences with those of random ones. Biosystems 2019; 180:101-108. [DOI: 10.1016/j.biosystems.2019.04.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Revised: 04/05/2019] [Accepted: 04/06/2019] [Indexed: 11/25/2022]
|
5
|
Huang B, Huang LF, Zhang SH. Evaluation of the Persistence of Higher-Order Strand Symmetry in Genomic Sequences by Novel Word Symmetry Distance Analysis. Front Genet 2019; 10:148. [PMID: 30899274 PMCID: PMC6416199 DOI: 10.3389/fgene.2019.00148] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Accepted: 02/12/2019] [Indexed: 11/13/2022] Open
Abstract
For the ubiquitous phenomenon of strand symmetry, it has been shown that it may persist for higher-order oligonucleotides. However, there is no consensus about to what extent (order of oligonucleotides or length of words) strand symmetry still persists. To determine the extent of strand symmetry in genomic sequences is critically important for the further understanding of the phenomenon. Based on previous studies, we have developed an algorithm for the novel word symmetry distance analysis. We applied it to evaluate the higher-order strand symmetry for 206 archaeal genomes and 2,659 bacterial genomes. Our results show that the new approach could provide a clear-cut criterion to determine the extent of strand symmetry for a group of genomes or individual genomes. According to the new measure, strand symmetry would tend to persist for up to 8-mers in archaeal genomes, and up to 9-mers in bacterial genomes. And the persistence may vary from 6- to 9-mers in individual genomes. Moreover, higher-order strand symmetry would tend to positively correlate with GC content and mononucleotide symmetry levels of genomic sequences. The variations of higher-order strand symmetry among genomes would indicate that strand symmetry itself may not be strictly relevant to biological functions, which would provide some insights into the origin and evolution of the phenomenon.
Collapse
Affiliation(s)
- Bi Huang
- Key Laboratory of Gene Engineering of Ministry of Education, Biotechnology Research Center, Sun Yat-sen University, Guangzhou, China
| | - Li-Fang Huang
- Key Laboratory of Gene Engineering of Ministry of Education, Biotechnology Research Center, Sun Yat-sen University, Guangzhou, China
| | - Shang-Hong Zhang
- Key Laboratory of Gene Engineering of Ministry of Education, Biotechnology Research Center, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
6
|
Cristadoro G, Degli Esposti M, Altmann EG. The common origin of symmetry and structure in genetic sequences. Sci Rep 2018; 8:15817. [PMID: 30361485 PMCID: PMC6202410 DOI: 10.1038/s41598-018-34136-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 10/09/2018] [Indexed: 12/20/2022] Open
Abstract
Biologists have long sought a way to explain how statistical properties of genetic sequences emerged and are maintained through evolution. On the one hand, non-random structures at different scales indicate a complex genome organisation. On the other hand, single-strand symmetry has been scrutinised using neutral models in which correlations are not considered or irrelevant, contrary to empirical evidence. Different studies investigated these two statistical features separately, reaching minimal consensus despite sustained efforts. Here we unravel previously unknown symmetries in genetic sequences, which are organized hierarchically through scales in which non-random structures are known to be present. These observations are confirmed through the statistical analysis of the human genome and explained through a simple domain model. These results suggest that domain models which account for the cumulative action of mobile elements can explain simultaneously non-random structures and symmetries in genetic sequences.
Collapse
Affiliation(s)
- Giampaolo Cristadoro
- Dipartimento di Matematica e Applicazioni, Università di Milano-Bicocca, 20125, Milano, Italy.
| | | | - Eduardo G Altmann
- School of Mathematics and Statistics, University of Sydney, Sydney, 2006, NSW, Australia
| |
Collapse
|
7
|
A stationary distribution associated to a set of laws whose initial states are grouped into classes. An application in genomics. J Appl Probab 2016. [DOI: 10.1017/jpr.2016.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Abstract
Let I be a finite set and S be a nonempty strict subset of I which is partitioned into classes, and let C(s) be the class containing s ∈ S. Let (Ps: s ∈ S) be a family of distributions on IN, where each Ps applies to sequences starting with the symbol s. To this family, we associate a class of distributions P(π) on IN which depends on a probability vector π. Our main results assume that, for each s ∈ S, Ps regenerates with distribution Ps' when it encounters s' ∈ S ∖ C(s). From semiregenerative theory, we determine a simple condition on π for P(π) to be time stationary. We give a similar result for the following more complex model. Once a symbol s' ∈ S ∖ C(s) has been encountered, there is a decision to be made: either a new region of type C(s') governed by Ps' starts or the region continues to be a C(s) region. This decision is modeled as a random event and its probability depends on s and s'. The aim in studying these kinds of models is to attain a deeper statistical understanding of bacterial DNA sequences. Here I is the set of codons and the classes (C(s): s ∈ S) identify codons that initiate similar genomic regions. In particular, there are two classes corresponding to the start and stop codons which delimit coding and noncoding regions in bacterial DNA sequences. In addition, the random decision to continue the current region or begin a new region of a different class reflects the well-known fact that not every appearance of a start codon marks the beginning of a new coding region.
Collapse
|
8
|
Rosandić M, Vlahović I, Glunčić M, Paar V. Trinucleotide's quadruplet symmetries and natural symmetry law of DNA creation ensuing Chargaff's second parity rule. J Biomol Struct Dyn 2016; 34:1383-94. [PMID: 26524490 DOI: 10.1080/07391102.2015.1080628] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
For almost 50 years the conclusive explanation of Chargaff's second parity rule (CSPR), the equality of frequencies of nucleotides A=T and C=G or the equality of direct and reverse complement trinucleotides in the same DNA strand, has not been determined yet. Here, we relate CSPR to the interstrand mirror symmetry in 20 symbolic quadruplets of trinucleotides (direct, reverse complement, complement, and reverse) mapped to double-stranded genome. The symmetries of Q-box corresponding to quadruplets can be obtained as a consequence of Watson-Crick base pairing and CSPR together. Alternatively, assuming Natural symmetry law for DNA creation that each trinucleotide in one strand of DNA must simultaneously appear also in the opposite strand automatically leads to Q-box direct-reverse mirror symmetry which in conjunction with Watson-Crick base pairing generates CSPR. We demonstrate quadruplet's symmetries in chromosomes of wide range of organisms, from Escherichia coli to Neanderthal and human genomes, introducing novel quadruplet-frequency histograms and 3D-diagrams with combined interstrand frequencies. These "landscapes" are mutually similar in all mammals, including extinct Neanderthals, and somewhat different in most of older species. In human chromosomes 1-12, and X, Y the "landscapes" are almost identical and slightly different in the remaining smaller and telocentric chromosomes. Quadruplet frequencies could provide a new robust tool for characterization and classification of genomes and their evolutionary trajectories.
Collapse
Affiliation(s)
- Marija Rosandić
- a Croatian Academy of Sciences and Arts, HAZU, Bioinformatics and Biological Physics , Zrinski trg 11, 10000 Zagreb , Croatia
| | - Ines Vlahović
- b Faculty of Science , University of Zagreb , Bijenicka 32, 10000 Zagreb , Croatia
| | - Matko Glunčić
- b Faculty of Science , University of Zagreb , Bijenicka 32, 10000 Zagreb , Croatia
| | - Vladimir Paar
- a Croatian Academy of Sciences and Arts, HAZU, Bioinformatics and Biological Physics , Zrinski trg 11, 10000 Zagreb , Croatia.,b Faculty of Science , University of Zagreb , Bijenicka 32, 10000 Zagreb , Croatia
| |
Collapse
|
9
|
Afreixo V, Rodrigues JMOS, Bastos CAC, Silva RM. The exceptional genomic word symmetry along DNA sequences. BMC Bioinformatics 2016; 17:59. [PMID: 26842742 PMCID: PMC4738807 DOI: 10.1186/s12859-016-0905-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Accepted: 01/19/2016] [Indexed: 01/05/2023] Open
Abstract
Background The second Chargaff’s parity rule and its extensions are recognized as universal phenomena in DNA sequences. However, parity of the frequencies of reverse complementary oligonucleotides could be a mere consequence of the single nucleotide parity rule, if nucleotide independence is assumed. Exceptional symmetry (symmetry beyond that expected under an independent nucleotide assumption) was proposed previously as a meaningful measure of the extension of the second parity rule to oligonucleotides. The global exceptional symmetry was detected in long and short genomes. Results To explore the exceptional genomic word symmetry along the genome sequences, we propose a sliding window method to extract the values of exceptional symmetry (for all words or by word groups). We compare the exceptional symmetry effect size distribution in all human chromosomes against control scenarios (positive and negative controls), testing the differences and performing a residual analysis. We explore local exceptional symmetry in equivalent composition word groups, and find that the behaviour of the local exceptional symmetry depends on the word group. Conclusions We conclude that the exceptional symmetry is a local phenomenon in genome sequences, with distinct characteristics along the sequence of each chromosome. The local exceptional symmetry along the genomic sequences shows outlying segments, and those segments have high biological annotation density.
Collapse
Affiliation(s)
- Vera Afreixo
- Department of Mathematics, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal. .,Department of Medical Sciences and Institute of Biomedicine - iBiMED, University of Aveiro, 3810-193 Aveiro, Portugal, Campus Universitário de Santiago, Aveiro, Portugal. .,IEETA-Institute of Electronic Engineering and Informatics of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal.
| | - João M O S Rodrigues
- Department of Electronics, Telecommunications and Informatics, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal. .,IEETA-Institute of Electronic Engineering and Informatics of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal.
| | - Carlos A C Bastos
- Department of Electronics, Telecommunications and Informatics, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal. .,IEETA-Institute of Electronic Engineering and Informatics of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal.
| | - Raquel M Silva
- Department of Medical Sciences and Institute of Biomedicine - iBiMED, University of Aveiro, 3810-193 Aveiro, Portugal, Campus Universitário de Santiago, Aveiro, Portugal. .,IEETA-Institute of Electronic Engineering and Informatics of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal.
| |
Collapse
|
10
|
Stems and Loops. Evol Bioinform Online 2016. [DOI: 10.1007/978-3-319-28755-3_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
|
11
|
Zhang SH. Persistence and breakdown of strand symmetry in the human genome. J Theor Biol 2015; 370:202-4. [PMID: 25576243 DOI: 10.1016/j.jtbi.2014.12.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 12/26/2014] [Accepted: 12/29/2014] [Indexed: 10/24/2022]
Abstract
Afreixo, V., Bastos, C.A.C., Garcia, S.P., Rodrigues, J.M.O.S., Pinho, A.J., Ferreira, P.J.S.G., 2013. The breakdown of the word symmetry in the human genome. J. Theor. Biol. 335, 153-159 analyzed the word symmetry (strand symmetry or the second parity rule) in the human genome. They concluded that strand symmetry holds for oligonucleotides up to 6 nt and is no longer statistically significant for oligonucleotides of higher orders. However, although they provided some new results for the issue, their interpretation would not be fully justified. Also, their conclusion needs to be further evaluated. Further analysis of their results, especially those of equivalence tests and word symmetry distance, shows that strand symmetry would persist for higher-order oligonucleotides up to 9 nt in the human genome, at least for its overall frequency framework (oligonucleotide frequency pattern).
Collapse
Affiliation(s)
- Shang-Hong Zhang
- Key Laboratory of Gene Engineering of Ministry of Education, and Biotechnology Research Center, Sun Yat-sen University, Guangzhou 510275, China.
| |
Collapse
|
12
|
Wang S, Tu J, Jia Z, Lu Z. High order intra-strand partial symmetry increases with organismal complexity in animal evolution. Sci Rep 2014; 4:6400. [PMID: 25263801 PMCID: PMC4178289 DOI: 10.1038/srep06400] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2014] [Accepted: 08/28/2014] [Indexed: 12/02/2022] Open
Abstract
For sufficiently long genomic sequence, the frequency of any short nucleotide fragment on one strand is approximately equal to the frequency of its reverse complement on the same strand. Despite being studied over two decades, the precise mechanism involved has not yet been made clear. In this study, we calculated the high order intra-strand partial symmetry (IPS) for 14 animal species by using a fixed sliding window method to scan each genome sequence. The study showed that the IPS was positive associated with organismal complexity measured by the number of distinct cell types. The results indicated that the IPS might be resulted from the increasing of functional non-coding DNAs, and plays an important role in the evolution process of complex body plans.
Collapse
Affiliation(s)
- Shengqin Wang
- State Key Lab of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Jing Tu
- State Key Lab of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Zhongwei Jia
- National Institute of Drug Dependence, Peking University, Beijing 100191, China
| | - Zuhong Lu
- State Key Lab of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China
- Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100781, China
| |
Collapse
|
13
|
Afreixo V, Rodrigues JMOS, Bastos CAC. Analysis of single-strand exceptional word symmetry in the human genome: new measures. Biostatistics 2014; 16:209-21. [PMID: 25190514 DOI: 10.1093/biostatistics/kxu041] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Some previous studies suggest the extension of Chargaff's second rule (the phenomenon of symmetry in a single DNA strand) to long DNA words. However, in random sequences generated under an independent symbol model where complementary nucleotides have equal occurrence probabilities, we expect the phenomenon of symmetry to hold for any word length. In this work, we develop new statistical methods to measure the exceptional symmetry. Exceptional symmetry is a refinement of Chargaff's second parity rule that highlights the words whose frequency of occurrence is similar to that of its reversed complement but dissimilar to the frequencies of occurrence of other words which contain the same number of nucleotides A or T. We analyze words of lengths up to 12 in the complete human genome and in each chromosome separately. We assess exceptional symmetry globally, by word group, and by word. We conclude that the global symmetry present in the human genome is clearly exceptional and significant. The chromosomes present distinct exceptional symmetry profiles. There are several exceptional word groups and exceptional words with a strong exceptional symmetry.
Collapse
Affiliation(s)
- Vera Afreixo
- Department of Mathematics, University of Aveiro, 3810-193 Aveiro, PortugalCIDMA, University of Aveiro, 3810-193 Aveiro, PortugalIEETA, University of Aveiro, 3810-193 Aveiro, Portugal
| | - João M O S Rodrigues
- Department of Electronics, Telecommunications and Informatics, University of Aveiro, 3810-193 Aveiro, PortugalIEETA, University of Aveiro, 3810-193 Aveiro, Portugal
| | - Carlos A C Bastos
- Department of Electronics, Telecommunications and Informatics, University of Aveiro, 3810-193 Aveiro, PortugalIEETA, University of Aveiro, 3810-193 Aveiro, Portugal
| |
Collapse
|
14
|
Afreixo V, Bastos CA, Garcia SP, Rodrigues JM, Pinho AJ, Ferreira PJ. The breakdown of the word symmetry in the human genome. J Theor Biol 2013; 335:153-9. [DOI: 10.1016/j.jtbi.2013.06.032] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2012] [Revised: 05/30/2013] [Accepted: 06/25/2013] [Indexed: 01/13/2023]
|
15
|
Zhang H, Li P, Zhong HS, Zhang SH. Conservation vs. variation of dinucleotide frequencies across bacterial and archaeal genomes: evolutionary implications. Front Microbiol 2013; 4:269. [PMID: 24046767 PMCID: PMC3764401 DOI: 10.3389/fmicb.2013.00269] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2013] [Accepted: 08/19/2013] [Indexed: 11/13/2022] Open
Abstract
During the long history of biological evolution, genome structures have undergone enormous changes. Nevertheless, some traits or vestiges of the primordial genome (defined as the most primitive nucleic acid genome for life on earth in this paper) may remain in modern genetic systems. It is of great importance to find these traits or vestiges for the study of the origin and evolution of genomes. As the shorter is a sequence, the less probable it would be modified during genome evolution. And if mutated, it would be easier to reappear at the same site or another site. Consequently, the genomic frequencies of very short nucleotide sequences, such as dinucleotides, would have considerable chances to be conserved during billions of years of evolution. Prokaryotic genomes are very diverse and with a wide range of GC content. Therefore, in order to find traits or vestiges of the primordial genome remained in modern genetic systems, we have studied the characteristics of dinucleotide frequencies across bacterial and archaeal genomes. We analyzed the dinucleotide frequency patterns of the whole-genome sequences from more than 1300 prokaryotic species (bacterial and archaeal genomes available as of December 2012). The results show that the frequencies of the dinucleotides AC, AG, CA, CT, GA, GT, TC, and TG are well-conserved across various genomes, while the frequencies of other dinucleotides vary considerably among species. The dinucleotide frequency conservation/variation pattern seems to correlate with the distributions of dinucleotides throughout a genome and across genomes. Further analysis indicates that the phenomenon would be determined by strand symmetry of genomic sequences (the second parity rule) and GC content variations among genomes. We discussed some possible origins of strand symmetry. And we propose that the phenomenon of frequency conservation of some dinucleotides may provide insights into the genomic composition of the primordial genetic system.
Collapse
Affiliation(s)
| | | | | | - Shang-Hong Zhang
- Key Laboratory of Gene Engineering of Ministry of Education, and Biotechnology Research Center, Sun Yat-sen UniversityGuangzhou, China
| |
Collapse
|
16
|
Patterns of nucleotide asymmetries in plant and animal genomes. Biosystems 2013; 111:181-9. [PMID: 23438636 DOI: 10.1016/j.biosystems.2013.02.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2012] [Revised: 11/29/2012] [Accepted: 02/07/2013] [Indexed: 11/20/2022]
Abstract
Symmetry in biology provides many intriguing puzzles to the scientist's mind. Chargaff's second parity rule states a symmetric distribution of oligonucleotides within a single strand of double-stranded DNA. While this rule has been verified in a wide range of microbial genomes, it still awaits explanation. In our study, we inquired into patterns of mono- and trinucleotide intra-strand parity in complex plant genomic sequences that became available during the last few years, and compared these to equally complex animal genomes. The degree and patterns of deviation from Chargaff's second rule were different between plant and animal species. We observed a universal inter-chromosomal homogeneity of mononucleotide skews in coding sequences of plant chromosomes, while the base composition of animal coding sequences differed between chromosomes even within a single species. We also found differences in the base composition of dicot introns in comparison to those of monocots. These genome-wide patterns were limited to genic regions and were not encountered in inter-genic sequences. We discuss the implications of our findings in relation to hypotheses about functional correlations of intra-strand parity which have hitherto been put forward. Furthermore, we propose more recent polyploidization and subsequent homogenization of homoeologues as a possible reason for more homogeneous skew patterns in plants.
Collapse
|
17
|
Zhang SH, Wang L. Two common profiles exist for genomic oligonucleotide frequencies. BMC Res Notes 2012; 5:639. [PMID: 23158698 PMCID: PMC3532236 DOI: 10.1186/1756-0500-5-639] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2012] [Accepted: 11/14/2012] [Indexed: 11/19/2022] Open
Abstract
Background It was reported that there is a majority profile for trinucleotide frequencies among genomes. And further study has revealed that two common profiles, rather than one majority profile, exist for genomic trinucleotide frequencies. However, the origins of the common/majority profile remain elusive. Moreover, it is not clear whether the features of common profile may be extended to oligonucleotides other than trinucleotides. Findings We analyzed 571 prokaryotic genomes (chromosomes) and some selected eukaryotic nuclear genomes as well as other genetic systems to study their compositional features. We found that there are also two common profiles for genomic oligonucleotide frequencies: one is from low-GC content genomes, and the other is from high-GC content genomes. Furthermore, each common profile is highly correlated to the average profile of random sequences with corresponding GC content and generated according to first-order symmetry. Conclusions The causes for the existence of two common profiles would mainly be GC content variations and strand symmetry of genomic sequences. Therefore, both GC content and strand symmetry would play important roles in genome evolution.
Collapse
Affiliation(s)
- Shang-Hong Zhang
- Key Laboratory of Gene Engineering of Ministry of Education, and Biotechnology Research Center, Sun Yat-sen University, Guangzhou, 510275, China.
| | | |
Collapse
|
18
|
Sobottka M, Hart AG. A model capturing novel strand symmetries in bacterial DNA. Biochem Biophys Res Commun 2011; 410:823-8. [PMID: 21703245 DOI: 10.1016/j.bbrc.2011.06.072] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2011] [Accepted: 06/09/2011] [Indexed: 11/24/2022]
Abstract
Chargaff's second parity rule for short oligonucleotides states that the frequency of any short nucleotide sequence on a strand is approximately equal to the frequency of its reverse complement on the same strand. Recent studies have shown that, with the exception of organellar DNA, this parity rule generally holds for double-stranded DNA genomes and fails to hold for single-stranded genomes. While Chargaff's first parity rule is fully explained by the Watson-Crick pairing in the DNA double helix, a definitive explanation for the second parity rule has not yet been determined. In this work, we propose a model based on a hidden Markov process for approximating the distributional structure of primitive DNA sequences. Then, we use the model to provide another possible theoretical explanation for Chargaff's second parity rule, and to predict novel distributional aspects of bacterial DNA sequences.
Collapse
Affiliation(s)
- Marcelo Sobottka
- Departamento de Matemática, Universidade Federal de Santa Catarina, Brazil.
| | | |
Collapse
|
19
|
A novel common triplet profile for GC-rich prokaryotic genomes. Genomics 2011; 97:330-1. [DOI: 10.1016/j.ygeno.2011.02.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2010] [Revised: 02/08/2011] [Accepted: 02/09/2011] [Indexed: 11/22/2022]
|
20
|
|