1
|
Matkarimov BT, Saparbaev MK. Chargaff's second parity rule lies at the origin of additive genetic interactions in quantitative traits to make omnigenic selection possible. PeerJ 2023; 11:e16671. [PMID: 38107580 PMCID: PMC10725672 DOI: 10.7717/peerj.16671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 11/22/2023] [Indexed: 12/19/2023] Open
Abstract
Background Francis Crick's central dogma provides a residue-by-residue mechanistic explanation of the flow of genetic information in living systems. However, this principle may not be sufficient for explaining how random mutations cause continuous variation of quantitative highly polygenic complex traits. Chargaff's second parity rule (CSPR), also referred to as intrastrand DNA symmetry, defined as near-exact equalities G ≈ C and A ≈ T within a single DNA strand, is a statistical property of cellular genomes. The phenomenon of intrastrand DNA symmetry was discovered more than 50 years ago; at present, it remains unclear what its biological role is, what the mechanisms are that force cellular genomes to comply strictly with CSPR, and why genomes of certain noncellular organisms have broken intrastrand DNA symmetry. The present work is aimed at studying a possible link between intrastrand DNA symmetry and the origin of genetic interactions in quantitative traits. Methods Computational analysis of single-nucleotide polymorphisms in human and mouse populations and of nucleotide composition biases at different codon positions in bacterial and human proteomes. Results The analysis of mutation spectra inferred from single-nucleotide polymorphisms observed in murine and human populations revealed near-exact equalities of numbers of reverse complementary mutations, indicating that random genetic variations obey CSPR. Furthermore, nucleotide compositions of coding sequences proved to be statistically interwoven via CSPR because pyrimidine bias at the 3rd codon position compensates purine bias at the 1st and 2nd positions. Conclusions According to Fisher's infinitesimal model, we propose that accumulation of reverse complementary mutations results in a continuous phenotypic variation due to small additive effects of statistically interwoven genetic variations. Therefore, additive genetic interactions can be inferred as a statistical entanglement of nucleotide compositions of separate genetic loci. CSPR challenges the neutral theory of molecular evolution-because all random mutations participate in variation of a trait-and provides an alternative solution to Haldane's dilemma by making a gene function diffuse. We propose that CSPR is symmetry of Fisher's infinitesimal model and that genetic information can be transferred in an implicit contactless manner.
Collapse
Affiliation(s)
- Bakhyt T. Matkarimov
- National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan
- L.N.Gumilev Eurasian National University, Astana, Kazakhstan
| | - Murat K. Saparbaev
- Groupe «Mechanisms of DNA Repair and Carcinogenesis», CNRS UMR9019, Gustave Roussy Cancer Campus, Université Paris-Saclay, Villejuif, France
- Al-Farabi Kazakh National University, Almaty, Kazakhstan
| |
Collapse
|
2
|
Yıldırım B, Vogl C. Purifying selection against spurious splicing signals contributes to the base composition evolution of the polypyrimidine tract. J Evol Biol 2023; 36:1295-1312. [PMID: 37564008 PMCID: PMC10946897 DOI: 10.1111/jeb.14205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/31/2023] [Accepted: 06/15/2023] [Indexed: 08/12/2023]
Abstract
Among eukaryotes, the major spliceosomal pathway is highly conserved. While long introns may contain additional regulatory sequences, the ones in short introns seem to be nearly exclusively related to splicing. Although these regulatory sequences involved in splicing are well-characterized, little is known about their evolution. At the 3' end of introns, the splice signal nearly universally contains the dimer AG, which consists of purines, and the polypyrimidine tract upstream of this 3' splice signal is characterized by over-representation of pyrimidines. If the over-representation of pyrimidines in the polypyrimidine tract is also due to avoidance of a premature splicing signal, we hypothesize that AG should be the most under-represented dimer. Through the use of DNA-strand asymmetry patterns, we confirm this prediction in fruit flies of the genus Drosophila and by comparing the asymmetry patterns to a presumably neutrally evolving region, we quantify the selection strength acting on each motif. Moreover, our inference and simulation method revealed that the best explanation for the base composition evolution of the polypyrimidine tract is the joint action of purifying selection against a spurious 3' splice signal and the selection for pyrimidines. Patterns of asymmetry in other eukaryotes indicate that avoidance of premature splicing similarly affects the nucleotide composition in their polypyrimidine tracts.
Collapse
Affiliation(s)
- Burçin Yıldırım
- Department of Biomedical SciencesVetmeduni ViennaViennaAustria
- Vienna Graduate School of Population GeneticsViennaAustria
| | - Claus Vogl
- Department of Biomedical SciencesVetmeduni ViennaViennaAustria
- Vienna Graduate School of Population GeneticsViennaAustria
| |
Collapse
|
3
|
Pflughaupt P, Sahakyan AB. Generalised interrelations among mutation rates drive the genomic compliance of Chargaff's second parity rule. Nucleic Acids Res 2023; 51:7409-7423. [PMID: 37293966 PMCID: PMC10415130 DOI: 10.1093/nar/gkad477] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 05/05/2023] [Accepted: 05/17/2023] [Indexed: 06/10/2023] Open
Abstract
Chargaff's second parity rule (PR-2), where the complementary base and k-mer contents are matching within the same strand of a double stranded DNA (dsDNA), is a phenomenon that invited many explanations. The strict compliance of nearly all nuclear dsDNA to PR-2 implies that the explanation should also be similarly adamant. In this work, we revisited the possibility of mutation rates driving PR-2 compliance. Starting from the assumption-free approach, we constructed kinetic equations for unconstrained simulations. The results were analysed for their PR-2 compliance by employing symbolic regression and machine learning techniques. We arrived to a generalised set of mutation rate interrelations in place in most species that allow for their full PR-2 compliance. Importantly, our constraints explain PR-2 in genomes out of the scope of the prior explanations based on the equilibration under mutation rates with simpler no-strand-bias constraints. We thus reinstate the role of mutation rates in PR-2 through its molecular core, now shown, under our formulation, to be tolerant to previously noted strand biases and incomplete compositional equilibration. We further investigate the time for any genome to reach PR-2, showing that it is generally earlier than the compositional equilibrium, and well within the age of life on Earth.
Collapse
Affiliation(s)
- Patrick Pflughaupt
- MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DS, UK
| | - Aleksandr B Sahakyan
- MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DS, UK
| |
Collapse
|
4
|
Moeckel C, Zaravinos A, Georgakopoulos-Soares I. Strand Asymmetries Across Genomic Processes. Comput Struct Biotechnol J 2023; 21:2036-2047. [PMID: 36968020 PMCID: PMC10030826 DOI: 10.1016/j.csbj.2023.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 03/08/2023] [Accepted: 03/08/2023] [Indexed: 03/12/2023] Open
Abstract
Across biological systems, a number of genomic processes, including transcription, replication, DNA repair, and transcription factor binding, display intrinsic directionalities. These directionalities are reflected in the asymmetric distribution of nucleotides, motifs, genes, transposon integration sites, and other functional elements across the two complementary strands. Strand asymmetries, including GC skews and mutational biases, have shaped the nucleotide composition of diverse organisms. The investigation of strand asymmetries often serves as a method to understand underlying biological mechanisms, including protein binding preferences, transcription factor interactions, retrotransposition, DNA damage and repair preferences, transcription-replication collisions, and mutagenesis mechanisms. Research into this subject also enables the identification of functional genomic sites, such as replication origins and transcription start sites. Improvements in our ability to detect and quantify DNA strand asymmetries will provide insights into diverse functionalities of the genome, the contribution of different mutational mechanisms in germline and somatic mutagenesis, and our knowledge of genome instability and evolution, which all have significant clinical implications in human disease, including cancer. In this review, we describe key developments that have been made across the field of genomic strand asymmetries, as well as the discovery of associated mechanisms.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Apostolos Zaravinos
- Department of Life Sciences, European University Cyprus, Diogenis Str., 6, Nicosia 2404, Cyprus
- Cancer Genetics, Genomics and Systems Biology laboratory, Basic and Translational Cancer Research Center (BTCRC), Nicosia 1516, Cyprus
- Corresponding author at: Department of Life Sciences, European University Cyprus, Diogenis Str., 6, Nicosia 2404, Cyprus.
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Corresponding author.
| |
Collapse
|
5
|
Balaban M, Bristy NA, Faisal A, Bayzid MS, Mirarab S. Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model. BIOINFORMATICS ADVANCES 2022; 2:vbac055. [PMID: 35992043 PMCID: PMC9383262 DOI: 10.1093/bioadv/vbac055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 08/09/2022] [Indexed: 01/27/2023]
Abstract
While alignment has been the dominant approach for determining homology prior to phylogenetic inference, alignment-free methods can simplify the analysis, especially when analyzing genome-wide data. Furthermore, alignment-free methods present the only option for emerging forms of data, such as genome skims, which do not permit assembly. Despite the appeal, alignment-free methods have not been competitive with alignment-based methods in terms of accuracy. One limitation of alignment-free methods is their reliance on simplified models of sequence evolution such as Jukes-Cantor. If we can estimate frequencies of base substitutions in an alignment-free setting, we can compute pairwise distances under more complex models. However, since the strand of DNA sequences is unknown for many forms of genome-wide data, which arguably present the best use case for alignment-free methods, the most complex models that one can use are the so-called no strand-bias models. We show how to calculate distances under a four-parameter no strand-bias model called TK4 without relying on alignments or assemblies. The main idea is to replace letters in the input sequences and recompute Jaccard indices between k-mer sets. However, on larger genomes, we also need to compute the number of k-mer mismatches after replacement due to random chance as opposed to homology. We show in simulation that alignment-free distances can be highly accurate when genomes evolve under the assumed models and study the accuracy on assembled and unassembled biological data. Availability and implementation Our software is available open source at https://github.com/nishatbristy007/NSB. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | | | - Ahnaf Faisal
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| | - Md Shamsuzzoha Bayzid
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| | | |
Collapse
|
6
|
Almirantis Y, Provata A, Li W. Noether's Theorem as a Metaphor for Chargaff's 2nd Parity Rule in Genomics. J Mol Evol 2022; 90:231-238. [PMID: 35704064 DOI: 10.1007/s00239-022-10062-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 05/18/2022] [Indexed: 10/18/2022]
Abstract
In the present note, the genomic compositional rule largely known as 'Chargaff's 2nd parity rule' (asserting equimolarity between Adenine-Thymine and Guanine-Cytosine in any of the two DNA strands) is regarded in association with Noether's theorem linking symmetries with conservation laws in physics. In the case of the genome, the strict physical and mathematical prerequisites of Noether's theorem do not hold. However, we conclude that a metaphor can be established with Noether's theorem, as inter-strand symmetry concerning DNA functionality engenders specific features in genome composition. Inversely, when inter-strand symmetry does not hold, the corresponding quantitative relations fail to appear. This association is also considered from the point of view of the existence of emergent laws and properties in evolutionary genomics.
Collapse
Affiliation(s)
- Yannis Almirantis
- Theoretical Biology and Computational Genomics Laboratory, Institute of Bioscience and Applications, National Center for Scientific Research "Demokritos", 15341, Athens, Greece.
| | - Astero Provata
- Statistical Mechanics and Dynamical Systems Laboratory, Institute of Nanoscience and Nanotechnology, National Center for Scientific Research, "Demokritos", 15341, Athens, Greece
| | - Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
| |
Collapse
|
7
|
Affinity and Correlation in DNA. J 2022. [DOI: 10.3390/j5020016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
A statistical analysis of important DNA sequences and related proteins has been performed to study the relationships between monomers, and some general considerations about these macromolecules can be provided from the results. First, the most important relationship between sites in all the DNA sequences examined is that between two consecutive base pairs. This is an indication of an energetic stabilization due to the stacking interaction of these couples of base pairs. Secondly, the difference between human chromosome sequences and their coding parts is relevant both in the relationships between sites and in some specific compositional rules, such as the second Chargaff rule. Third, the evidence of the relationship in two successive triplets of DNA coding sequences generates a relationship between two successive amino acids in the proteins. This is obviously impossible if all the relationships between the sites are statistical evidence and do not involve causes; therefore, in this article, due to stacking interactions and this relationship in coding sequences, we will divide the concept of the relationship between sites into two concepts: affinity and correlation, the first with physical causes and the second without. Finally, from the statistical analyses carried out, it will emerge that the human genome is uniform, with the only significant exception being the Y chromosome.
Collapse
|
8
|
Rue CR, Selwyn JD, Cockett PM, Gillis B, Gurski L, Jose P, Kutil BL, Magnuson SF, Ángela López de Mesa L, Overath RD, Smee DL, Bird CE. Genetic diversity across the mitochondrial genome of eastern oysters ( Crassostrea virginica) in the northern Gulf of Mexico. PeerJ 2021; 9:e12205. [PMID: 34692250 PMCID: PMC8485835 DOI: 10.7717/peerj.12205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 09/03/2021] [Indexed: 11/20/2022] Open
Abstract
The eastern oyster, Crassostrea virginica, is divided into four populations along the western North Atlantic, however, the only published mitochondrial genome sequence was assembled using one individual in Delaware. This study aimed to (1) assemble C. virginica mitochondrial genomes from Texas with pooled restriction-site-associated DNA sequencing (ezRAD), (2) evaluate the validity of the mitochondrial genome assemblies including comparison with Sanger sequencing data, and (3) evaluate genetic differentiation both between the Delaware and Texas genomes, as well as among three bays in Texas. The pooled-genome-assembled-genomes (PAGs) from Texas exhibited several characteristics indicating that they were valid, including elevated nucleotide diversity in non-coding and the third position of codons, placement as the sister haplotype of the genome from Delaware in a phylogenetic reconstruction of Crassostrea mitochondrial genomes, and a lack of genetic structure in the ND4 gene among the three Texas bays as was found with Sanger amplicons in samples from the same bays several years prior. In the comparison between the Delaware and Texas genome, 27 of 38 coding regions exhibited variability between the two populations, which were differentiated by 273 mutations, versus 1-13 mutations among the Texas samples. Using the full PAGs, there was no additional evidence for population structure among the three Texas bays. While population genetics is rapidly moving towards larger high-density datasets, studies of mitochondrial DNA (and genomes) can be particularly useful for comparing historic data prior to the modern era of genomics. As such, being able to reliably compile mitochondrial genomes from genomic data can improve the ability to compare results across studies.
Collapse
Affiliation(s)
- Chani R Rue
- Department of Life Sciences, Texas A&M University-Corpus Christi, Corpus Christi, TX, United States of America
| | - Jason D Selwyn
- Department of Life Sciences, Texas A&M University-Corpus Christi, Corpus Christi, TX, United States of America
| | - Patricia M Cockett
- Harte Research Institute, Texas A&M University-Corpus Christi, Corpus Christi, TX, United States of America
| | - Bryan Gillis
- Conrad Blucher Institute, Texas A&M University-Corpus Christi, Corpus Christi, TX, United States of America
| | - Lauren Gurski
- Department of Life Sciences, Texas A&M University-Corpus Christi, Corpus Christi, TX, United States of America
| | - Philip Jose
- Department of Life Sciences, Texas A&M University-Corpus Christi, Corpus Christi, TX, United States of America
| | - Brandi L Kutil
- Department of Undergraduate Studies, Texas A&M University-Corpus Christi, Corpus Christi, TX, United States of America
| | - Sharon F Magnuson
- Department of Life Sciences, Texas A&M University-Corpus Christi, Corpus Christi, TX, United States of America
| | - Luz Ángela López de Mesa
- Department of Life Sciences, Texas A&M University-Corpus Christi, Corpus Christi, TX, United States of America
| | - R Deborah Overath
- Department of Mathematics and Sciences, Texas Southmost College, Brownsville, TX, United States of America
| | - Delbert Lee Smee
- Dauphin Island Sea Lab, Dauphin Island, AL, United States of America.,Marine Sciences, University of South Alabama, Mobile, AL, United States of America
| | - Christopher E Bird
- Department of Life Sciences, Texas A&M University-Corpus Christi, Corpus Christi, TX, United States of America.,Hawai'i Institute of Marine Biology, University of Hawaii at Mānoa, Kāne'ohe, Hawai'i, United States of America
| |
Collapse
|
9
|
Khoshbin Z, Abnous K, Taghdisi SM, Verdian A. A novel liquid crystal-based aptasensor for ultra-low detection of ochratoxin a using a π-shaped DNA structure: Promising for future on-site detection test strips. Biosens Bioelectron 2021; 191:113457. [PMID: 34175647 DOI: 10.1016/j.bios.2021.113457] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Revised: 05/22/2021] [Accepted: 06/17/2021] [Indexed: 12/19/2022]
Abstract
Ochratoxin A (OTA) as the most dangerous mycotoxin is produced by Aspergillus Ochraceus and Penicillium verrucosum. OTA can be found in beverages and foodstuffs that induces the teratogenic, nephrotoxic, carcinogenic, and immunosuppressive effects on humans. Hence, developing highly sensitive methods for its detection is of great importance. Herein, a novel aptasensor was designed for the label-free monitoring of the ultra-low OTA levels by a combination of the superiority of aptamers and long-range orientational order of liquid crystals (LCs). The aptasensing strategy was based on the conformational switch of the immobilized π-shaped DNA structure on the glass substrate in presence of the target. A shift in the orientation of LCs from random to homeotropic state led to the apparent alteration of the optical appearance of the aptasensor platform from bright to dark. The LC-based aptasensor especially detects OTA at the ultra-trace level as low as 0.63 aM with comparable selectivity. The aptasensor could detect OTA successfully in the grape juice, coffee, and human serum samples. The LC-based aptasensor paves a way for developing portable and real-time sensing probes with high performance for food safety control and clinical application.
Collapse
Affiliation(s)
- Zahra Khoshbin
- Pharmaceutical Research Center, Pharmaceutical Technology Institute, Mashhad University of Medical Sciences, Mashhad, Iran; Department of Medicinal Chemistry, School of Pharmacy, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Khalil Abnous
- Pharmaceutical Research Center, Pharmaceutical Technology Institute, Mashhad University of Medical Sciences, Mashhad, Iran; Department of Medicinal Chemistry, School of Pharmacy, Mashhad University of Medical Sciences, Mashhad, Iran.
| | - Seyed Mohammad Taghdisi
- Targeted Drug Delivery Research Center, Pharmaceutical Technology Institute, Mashhad University of Medical Sciences, Mashhad, Iran.
| | - Asma Verdian
- Department of Food Safety and Quality Control, Research Institute of Food Science and Technology (RIFST), Mashhad, Iran
| |
Collapse
|
10
|
Fariselli P, Taccioli C, Pagani L, Maritan A. DNA sequence symmetries from randomness: the origin of the Chargaff's second parity rule. Brief Bioinform 2021; 22:2172-2181. [PMID: 32266404 PMCID: PMC7986665 DOI: 10.1093/bib/bbaa041] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Revised: 02/27/2020] [Accepted: 03/05/2020] [Indexed: 01/13/2023] Open
Abstract
Most living organisms rely on double-stranded DNA (dsDNA) to store their genetic information and perpetuate themselves. This biological information has been considered as the main target of evolution. However, here we show that symmetries and patterns in the dsDNA sequence can emerge from the physical peculiarities of the dsDNA molecule itself and the maximum entropy principle alone, rather than from biological or environmental evolutionary pressure. The randomness justifies the human codon biases and context-dependent mutation patterns in human populations. Thus, the DNA 'exceptional symmetries,' emerged from the randomness, have to be taken into account when looking for the DNA encoded information. Our results suggest that the double helix energy constraints and, more generally, the physical properties of the dsDNA are the hard drivers of the overall DNA sequence architecture, whereas the selective biological processes act as soft drivers, which only under extraordinary circumstances overtake the overall entropy content of the genome.
Collapse
Affiliation(s)
- Piero Fariselli
- Department of Medical Sciences of the University of Turin, Italy
| | | | - Luca Pagani
- Department of Biology of the University of Padova, Italy
| | - Amos Maritan
- Department of Physics of the University of Padova, Italy
| |
Collapse
|
11
|
Zhao Y, Dong L, Jiang C, Wang X, Xie J, Rashid MAR, Liu Y, Li M, Bu Z, Wang H, Ma X, Sun S, Wang X, Bo C, Zhou T, Kong L. Distinct nucleotide patterns among three subgenomes of bread wheat and their potential origins during domestication after allopolyploidization. BMC Biol 2020; 18:188. [PMID: 33267868 PMCID: PMC7713161 DOI: 10.1186/s12915-020-00917-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 11/05/2020] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND The speciation and fast global domestication of bread wheat have made a great impact on three subgenomes of bread wheat. DNA base composition is an essential genome feature, which follows the individual-strand base equality rule and [AT]-increase pattern at the genome, chromosome, and polymorphic site levels among thousands of species. Systematic analyses on base compositions of bread wheat and its wild progenitors could facilitate further understanding of the evolutionary pattern of genome/subgenome-wide base composition of allopolyploid species and its potential causes. RESULTS Genome/subgenome-wide base-composition patterns were investigated by using the data of polymorphic site in 93 accessions from worldwide populations of bread wheat, its diploid and tetraploid progenitors, and their corresponding reference genome sequences. Individual-strand base equality rule and [AT]-increase pattern remain in recently formed hexaploid species bread wheat at the genome, subgenome, chromosome, and polymorphic site levels. However, D subgenome showed the fastest [AT]-increase across polymorphic site from Aegilops tauschii to bread wheat than that on A and B subgenomes from wild emmer to bread wheat. The fastest [AT]-increase could be detected almost all chromosome windows on D subgenome, suggesting different mechanisms between D and other two subgenomes. Interestingly, the [AT]-increase is mainly contributed by intergenic regions at non-selective sweeps, especially the fastest [AT]-increase of D subgenome. Further transition frequency and sequence context analysis indicated that three subgenomes shared same mutation type, but D subgenome owns the highest mutation rate on high-frequency mutation type. The highest mutation rate on D subgenome was further confirmed by using a bread-wheat-private SNP set. The exploration of loci/genes related to the [AT] value of D subgenome suggests the fastest [AT]-increase of D subgenome could be involved in DNA repair systems distributed on three subgenomes of bread wheat. CONCLUSIONS The highest mutation rate is detected on D subgenome of bread wheat during domestication after allopolyploidization, leading to the fastest [AT]-increase pattern of D subgenome. The phenomenon may come from the joint action of multiple repair systems inherited from its wild progenitors.
Collapse
Affiliation(s)
- Yan Zhao
- State Key Laboratory of Crop Biology, Shandong Key Laboratory of Crop Biology, College of Agronomy, Shandong Agricultural University, Tai'an, 271018, Shandong, People's Republic of China
| | - Luhao Dong
- State Key Laboratory of Crop Biology, Shandong Key Laboratory of Crop Biology, College of Agronomy, Shandong Agricultural University, Tai'an, 271018, Shandong, People's Republic of China
| | - Conghui Jiang
- Key Laboratory of Crop Heterosis and Utilization, Ministry of Education, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, People's Republic of China
| | - Xueqiang Wang
- Key Laboratory of Crop Heterosis and Utilization, Ministry of Education, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, People's Republic of China
| | - Jianyin Xie
- Key Laboratory of Crop Heterosis and Utilization, Ministry of Education, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, People's Republic of China
| | | | - Yanhe Liu
- State Key Laboratory of Crop Biology, Shandong Key Laboratory of Crop Biology, College of Agronomy, Shandong Agricultural University, Tai'an, 271018, Shandong, People's Republic of China
| | - Mengyao Li
- State Key Laboratory of Crop Biology, Shandong Key Laboratory of Crop Biology, College of Agronomy, Shandong Agricultural University, Tai'an, 271018, Shandong, People's Republic of China
| | - Zhimu Bu
- State Key Laboratory of Crop Biology, Shandong Key Laboratory of Crop Biology, College of Agronomy, Shandong Agricultural University, Tai'an, 271018, Shandong, People's Republic of China
| | - Hongwei Wang
- State Key Laboratory of Crop Biology, Shandong Key Laboratory of Crop Biology, College of Agronomy, Shandong Agricultural University, Tai'an, 271018, Shandong, People's Republic of China
| | - Xin Ma
- State Key Laboratory of Crop Biology, Shandong Key Laboratory of Crop Biology, College of Agronomy, Shandong Agricultural University, Tai'an, 271018, Shandong, People's Republic of China
| | - Silong Sun
- State Key Laboratory of Crop Biology, Shandong Key Laboratory of Crop Biology, College of Agronomy, Shandong Agricultural University, Tai'an, 271018, Shandong, People's Republic of China
| | - Xiaoqian Wang
- State Key Laboratory of Crop Biology, Shandong Key Laboratory of Crop Biology, College of Agronomy, Shandong Agricultural University, Tai'an, 271018, Shandong, People's Republic of China
| | - Cunyao Bo
- State Key Laboratory of Crop Biology, Shandong Key Laboratory of Crop Biology, College of Agronomy, Shandong Agricultural University, Tai'an, 271018, Shandong, People's Republic of China
| | - Tingting Zhou
- State Key Laboratory of Crop Biology, Shandong Key Laboratory of Crop Biology, College of Agronomy, Shandong Agricultural University, Tai'an, 271018, Shandong, People's Republic of China
| | - Lingrang Kong
- State Key Laboratory of Crop Biology, Shandong Key Laboratory of Crop Biology, College of Agronomy, Shandong Agricultural University, Tai'an, 271018, Shandong, People's Republic of China.
| |
Collapse
|
12
|
Revisiting the Relationships Between Genomic G + C Content, RNA Secondary Structures, and Optimal Growth Temperature. J Mol Evol 2020; 89:165-171. [PMID: 33216148 DOI: 10.1007/s00239-020-09974-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 11/09/2020] [Indexed: 10/23/2022]
Abstract
Over twenty years ago Galtier and Lobry published a manuscript entitled "Relationships between Genomic G + C Content, RNA Secondary Structure, and Optimal Growth Temperature" in the Journal of Molecular Evolution that showcased the lack of a relationship between genomic G + C content and optimal growth temperature (OGT) in a set of about 200 prokaryotes. Galtier and Lobry also assessed the relationship between RNA secondary structures (rRNA stems, tRNAs) and OGT, and in this case a clear relationship emerged. Increasing structured RNA G + C content (particularly in regions that are double-stranded) correlates with increased OGT. Both of these fundamental relationships have withstood test of many additional sequences and spawned a variety of different applications that include prediction of OGT from rRNA sequence and computational ncRNA identification approaches. In this work, I present the motivation behind Galtier and Lobry's original paper and the larger questions addressed by the work, how these questions have evolved over the last two decades, and the impact of Galtier and Lobry's manuscript in fields beyond these questions.
Collapse
|
13
|
Rosandić M, Vlahović I, Paar V. Novel look at DNA and life-Symmetry as evolutionary forcing. J Theor Biol 2019; 483:109985. [PMID: 31469987 DOI: 10.1016/j.jtbi.2019.08.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Revised: 06/21/2018] [Accepted: 08/22/2019] [Indexed: 11/20/2022]
Abstract
After explanation of the Chargaff´s first parity rule in terms of the Watson-Crick base-pairing between the two DNA strands, the Chargaff´s second parity rule for each strand of DNA (also named strand symmetry), which cannot be explained by Watson-Crick base-pairing only, is still a challenging issue already fifty years. We show that during evolution DNA preserves its identity in the form of quadruplet A+T and C+G rich matrices based on purine-pyrimidine mirror symmetries of trinucleotides. Identical symmetries are present in our classification of trinucleotides and the genetic code table. All eukaryotes and almost all prokaryotes (bacteria and archaea) have quadruplet mirror symmetries in structural form and frequencies following the principle of Chargaff's second parity rule and Natural symmetry law of DNA creation and conservation. Some rare symbionts have mirror symmetry only in their structural form within each DNA strand. Based on our matrix analysis of closely related species, humans and Neanderthals, we find that the circular cycle of inverse proportionality between trinucleotides preserves identical relative frequencies of trinucleotides in each quadruplet and in the whole genome. According to our calculations, a change in frequencies in quadruplet matrices could lead to the creation of new species. Violation of quadruplet symmetries is practically inconsistent with life. DNA symmetries provide a key for understanding the restriction of disorder (entropy) due to mutations in the evolution of DNA.
Collapse
Affiliation(s)
- Marija Rosandić
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia; University hospital centre Zagreb (ret.), Zagreb, Croatia.
| | - Ines Vlahović
- Department of Physics, Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia; Algebra University College, 10000 Zagreb, Croatia.
| | - Vladimir Paar
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia; Department of Physics, Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia.
| |
Collapse
|
14
|
Bohlin J, Pettersson JHO. Evolution of Genomic Base Composition: From Single Cell Microbes to Multicellular Animals. Comput Struct Biotechnol J 2019; 17:362-370. [PMID: 30949307 PMCID: PMC6429543 DOI: 10.1016/j.csbj.2019.03.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 02/28/2019] [Accepted: 03/01/2019] [Indexed: 01/07/2023] Open
Abstract
Whole genome sequencing (WGS) of thousands of microbial genomes has provided considerable insight into evolutionary mechanisms in the microbial world. While substantially fewer eukaryotic genomes are available for analyses the number is rapidly increasing. This mini-review summarizes broadly evolutionary dynamics of base composition in the different domains of life from the perspective of prokaryotes. Common and different evolutionary mechanisms influencing genomic base composition in eukaryotes and prokaryotes are discussed. The conclusion from the data currently available suggests that while there are similarities there are also striking differences in how genomic base composition has evolved within prokaryotes and eukaryotes. For instance, homologous recombination appears to increase GC content locally in eukaryotes due to a non-selective process termed GC-biased gene conversion (gBGC). For prokaryotes on the other hand, increase in genomic GC content seems to be driven by the environment and selection. We find that similar phenomena observed for some organisms in each respective domain may be caused by very different mechanisms: while gBGC and recombination rates appear to explain the negative correlation between GC3 (GC content based on the third codon nucleotides) and genome size in some eukaryotes uptake of AT rich DNA sequences is the main reason for a similar negative correlation observed in prokaryotes. We provide further examples that indicate that base composition in prokaryotes and eukaryotes have evolved under very different constraints.
Collapse
Affiliation(s)
- Jon Bohlin
- Norwegian Institute of Public Health, Division of Infection Control and Environmental Health, Department of Infectious Disease Epidemiology and Modelling, Lovisenberggata 8, 0456 Oslo, Norway.,Centre for Fertility and Health, Norwegian Institute of Public Health, PO-Box 222 Skøyen, N-0213 Oslo, Norway.,Norwegian University of Life Sciences, Faculty of Veterinary Sciences, Production Animal Clinical Sciences, Ullevålsveien 72, 0454 Oslo, Norway
| | - John H-O Pettersson
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School the University of Sydney, New South Wales 2006, Australia.,Zoonosis Science Center, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.,Public Health Agency of Sweden, Nobels vg 18, SE-171 82 Solna, Sweden
| |
Collapse
|
15
|
Cristadoro G, Degli Esposti M, Altmann EG. The common origin of symmetry and structure in genetic sequences. Sci Rep 2018; 8:15817. [PMID: 30361485 PMCID: PMC6202410 DOI: 10.1038/s41598-018-34136-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 10/09/2018] [Indexed: 12/20/2022] Open
Abstract
Biologists have long sought a way to explain how statistical properties of genetic sequences emerged and are maintained through evolution. On the one hand, non-random structures at different scales indicate a complex genome organisation. On the other hand, single-strand symmetry has been scrutinised using neutral models in which correlations are not considered or irrelevant, contrary to empirical evidence. Different studies investigated these two statistical features separately, reaching minimal consensus despite sustained efforts. Here we unravel previously unknown symmetries in genetic sequences, which are organized hierarchically through scales in which non-random structures are known to be present. These observations are confirmed through the statistical analysis of the human genome and explained through a simple domain model. These results suggest that domain models which account for the cumulative action of mobile elements can explain simultaneously non-random structures and symmetries in genetic sequences.
Collapse
Affiliation(s)
- Giampaolo Cristadoro
- Dipartimento di Matematica e Applicazioni, Università di Milano-Bicocca, 20125, Milano, Italy.
| | | | - Eduardo G Altmann
- School of Mathematics and Statistics, University of Sydney, Sydney, 2006, NSW, Australia
| |
Collapse
|
16
|
Bergman J, Betancourt AJ, Vogl C. Transcription-Associated Compositional Skews in Drosophila Genes. Genome Biol Evol 2018; 10:269-275. [PMID: 29036491 PMCID: PMC5786239 DOI: 10.1093/gbe/evx200] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/25/2017] [Indexed: 12/23/2022] Open
Abstract
In many organisms, local deviations from Chargaff's second parity rule are observed around replication and transcription start sites and within intron sequences. Here, we use expression data as well as a whole-genome data set of nearly 200 haplotypes to investigate such compositional skews in Drosophila melanogaster genes. We find a positive correlation between compositional skew and gene expression, comparable in strength to similar correlations between expression levels and genome-wide sequence features. This correlation is relatively stronger for germline, compared with somatic expression, consistent with the process of transcription-associated mutation bias. We also inferred mutation rates from alleles segregating at low frequencies in short introns, and show that, whereas the overall GC content of short introns does not conform to the equilibrium expectation, the level of the observed deviation from the second parity rule is generally consistent with the inferred rates.
Collapse
Affiliation(s)
- Juraj Bergman
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria
- Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Wien, Austria
| | - Andrea J Betancourt
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria
- Present address: Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Claus Vogl
- Institut für Tierzucht und Genetik, Vetmeduni Vienna, Wien, Austria
| |
Collapse
|
17
|
Exploring the Impact of Cleavage and Polyadenylation Factors on Pre-mRNA Splicing Across Eukaryotes. G3-GENES GENOMES GENETICS 2017; 7:2107-2114. [PMID: 28500052 PMCID: PMC5499120 DOI: 10.1534/g3.117.041483] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
In human, mouse, and Drosophila, the spliceosomal complex U1 snRNP (U1) protects transcripts from premature cleavage and polyadenylation at proximal intronic polyadenylation signals (PAS). These U1-mediated effects preserve transcription integrity, and are known as telescripting. The watchtower role of U1 throughout transcription is clear. What is less clear is whether cleavage and polyadenylation factors (CPFs) are simply patrolled or if they might actively antagonize U1 recruitment. In addressing this question, we found that, in the introns of human, mouse, and Drosophila, and of 14 other eukaryotes, including multi- and single-celled species, the conserved AATAAA PAS—a major target for CPFs—is selected against. This selective pressure, approximated using DNA strand asymmetry, is detected for peripheral and internal introns alike. Surprisingly, it is more pronounced within—rather than outside—the action range of telescripting, and particularly intense in the vicinity of weak 5′ splice sites. Our study uncovers a novel feature of eukaryotic genes: that the AATAAA PAS is universally counter-selected in spliceosomal introns. This pattern implies that CPFs may attempt to access introns at any time during transcription. However, natural selection operates to minimize this access. By corroborating and extending previous work, our study further indicates that CPF access to intronic PASs might perturb the recruitment of U1 to the adjacent 5′ splice sites. These results open the possibility that CPFs may impact the splicing process across eukaryotes.
Collapse
|
18
|
The Matrix Method of Representation, Analysis and Classification of Long Genetic Sequences. INFORMATION 2017. [DOI: 10.3390/info8010012] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
19
|
Shporer S, Chor B, Rosset S, Horn D. Inversion symmetry of DNA k-mer counts: validity and deviations. BMC Genomics 2016; 17:696. [PMID: 27580854 PMCID: PMC5006273 DOI: 10.1186/s12864-016-3012-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Accepted: 08/11/2016] [Indexed: 01/25/2023] Open
Abstract
Background The generalization of the second Chargaff rule states that counts of any string of nucleotides of length k on a single chromosomal strand equal the counts of its inverse (reverse-complement) k-mer. This Inversion Symmetry (IS) holds for many species, both eukaryotes and prokaryotes, for ranges of k which may vary from 7 to 10 as chromosomal lengths vary from 2Mbp to 200 Mbp. The existence of IS has been demonstrated in the literature, and other pair-wise candidate symmetries (e.g. reverse or complement) have been ruled out. Results Studying IS in the human genome, we find that IS holds up to k = 10. It holds for complete chromosomes, also after applying the low complexity mask. We introduce a numerical IS criterion, and define the k-limit, KL, as the highest k for which this criterion is valid. We demonstrate that chromosomes of different species, as well as different human chromosomal sections, follow a universal logarithmic dependence of KL ~ 0.7 ln(L), where L is the length of the chromosome. We introduce a statistical IS-Poisson model that allows us to apply confidence measures to our numerical findings. We find good agreement for large k, where the variance of the Poisson distribution determines the outcome of the analysis. This model predicts the observed logarithmic increase of KL with length. The model allows us to conclude that for low k, e.g. k = 1 where IS becomes the 2nd Chargaff rule, IS violation, although extremely small, is significant. Studying this violation we come up with an unexpected observation for human chromosomes, finding a meaningful correlation with the excess of genes on particular strands. Conclusions Our IS-Poisson model agrees well with genomic data, and accounts for the universal behavior of k-limits. For low k we point out minute, yet significant, deviations from the model, including excess of counts of nucleotides T vs A and G vs C on positive strands of human chromosomes. Interestingly, this correlates with a significant (but small) excess of genes on the same positive strands. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3012-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sagi Shporer
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Benny Chor
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Saharon Rosset
- Sackler School of Mathematical Sciences, Tel Aviv University, Tel Aviv, 69978, Israel
| | - David Horn
- Sackler School of Physics and Astronomy, Tel Aviv University, Tel Aviv, 69978, Israel.
| |
Collapse
|
20
|
Gouveia S, Scotto MG, Weiß CH, Ferreira PJSG. Binary auto-regressive geometric modelling in a DNA context. J R Stat Soc Ser C Appl Stat 2016. [DOI: 10.1111/rssc.12172] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
21
|
Apostolou-Karampelis K, Nikolaou C, Almirantis Y. A novel skew analysis reveals substitution asymmetries linked to genetic code GC-biases and PolIII a-subunit isoforms. DNA Res 2016; 23:353-63. [PMID: 27345720 PMCID: PMC4991834 DOI: 10.1093/dnares/dsw021] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2016] [Accepted: 05/09/2016] [Indexed: 11/30/2022] Open
Abstract
Strand biases reflect deviations from a null expectation of DNA evolution that assumes strand-symmetric substitution rates. Here, we present strong evidence that nearest-neighbour preferences are a strand-biased feature of bacterial genomes, indicating neighbour-dependent substitution asymmetries. To detect such asymmetries we introduce an alignment free index (relative abundance skews). The profiles of relative abundance skews along coding sequences can trace the phylogenetic relations of bacteria, suggesting that the patterns of neighbour-dependent substitution strand-biases are not common among different lineages, but are rather species-specific. Analysis of neighbour-dependent and codon-site skews sheds light on the origins of substitution asymmetries. Via a simple model we argue that the structure of the genetic code imposes position-dependent substitution strand-biases along coding sequences, as a response to GC mutation pressure. Thus, the organization of the genetic code per se can lead to an uneven distribution of nucleotides among different codon sites, even when requirements for specific codons and amino-acids are not accounted for. Moreover, our results suggest that strand-biases in replication fidelity of PolIII α-subunit induce substitution asymmetries, both neighbour-dependent and independent, on a genome scale. The role of DNA repair systems, such as transcription-coupled repair, is also considered.
Collapse
Affiliation(s)
| | - Christoforos Nikolaou
- Computational Genomics Group, Department of Biology, University of Crete, 71409 Heraklion, Greece
| | - Yannis Almirantis
- Institute of Biosciences and Applications, National Center for Scientific Research "Demokritos", 15310 Athens, Greece
| |
Collapse
|
22
|
mRNA-Associated Processes and Their Influence on Exon-Intron Structure in Drosophila melanogaster. G3-GENES GENOMES GENETICS 2016; 6:1617-26. [PMID: 27172210 PMCID: PMC4889658 DOI: 10.1534/g3.116.029231] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
mRNA-associated processes and gene structure in eukaryotes are typically treated as separate research subjects. Here, we bridge this separation and leverage the extensive multidisciplinary work on Drosophila melanogaster to examine the roles that capping, splicing, cleavage/polyadenylation, and telescripting (i.e., the protection of nascent transcripts from premature cleavage/polyadenylation by the splicing factor U1) might play in shaping exon-intron architecture in protein-coding genes. Our findings suggest that the distance between subsequent internal 5′ splice sites (5′ss) in Drosophila genes is constrained such that telescripting effects are maximized, in theory, and thus nascent transcripts are less vulnerable to premature termination. Exceptionally weak 5′ss and constraints on intron-exon size at the gene 5′ end also indicate that capping might enhance the recruitment of U1 and, in turn, promote telescripting at this location. Finally, a positive correlation between last exon length and last 5′ss strength suggests that optimal donor splice sites in the proximity of the pre-mRNA tail may inhibit the processing of downstream polyadenylation signals more than weak donor splice sites do. These findings corroborate and build upon previous experimental and computational studies on Drosophila genes. They support the possibility, hitherto scantly explored, that mRNA-associated processes impose significant constraints on the evolution of eukaryotic gene structure.
Collapse
|
23
|
A stationary distribution associated to a set of laws whose initial states are grouped into classes. An application in genomics. J Appl Probab 2016. [DOI: 10.1017/jpr.2016.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Abstract
Let I be a finite set and S be a nonempty strict subset of I which is partitioned into classes, and let C(s) be the class containing s ∈ S. Let (Ps: s ∈ S) be a family of distributions on IN, where each Ps applies to sequences starting with the symbol s. To this family, we associate a class of distributions P(π) on IN which depends on a probability vector π. Our main results assume that, for each s ∈ S, Ps regenerates with distribution Ps' when it encounters s' ∈ S ∖ C(s). From semiregenerative theory, we determine a simple condition on π for P(π) to be time stationary. We give a similar result for the following more complex model. Once a symbol s' ∈ S ∖ C(s) has been encountered, there is a decision to be made: either a new region of type C(s') governed by Ps' starts or the region continues to be a C(s) region. This decision is modeled as a random event and its probability depends on s and s'. The aim in studying these kinds of models is to attain a deeper statistical understanding of bacterial DNA sequences. Here I is the set of codons and the classes (C(s): s ∈ S) identify codons that initiate similar genomic regions. In particular, there are two classes corresponding to the start and stop codons which delimit coding and noncoding regions in bacterial DNA sequences. In addition, the random decision to continue the current region or begin a new region of a different class reflects the well-known fact that not every appearance of a start codon marks the beginning of a new coding region.
Collapse
|
24
|
Rosandić M, Vlahović I, Glunčić M, Paar V. Trinucleotide's quadruplet symmetries and natural symmetry law of DNA creation ensuing Chargaff's second parity rule. J Biomol Struct Dyn 2016; 34:1383-94. [PMID: 26524490 DOI: 10.1080/07391102.2015.1080628] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
For almost 50 years the conclusive explanation of Chargaff's second parity rule (CSPR), the equality of frequencies of nucleotides A=T and C=G or the equality of direct and reverse complement trinucleotides in the same DNA strand, has not been determined yet. Here, we relate CSPR to the interstrand mirror symmetry in 20 symbolic quadruplets of trinucleotides (direct, reverse complement, complement, and reverse) mapped to double-stranded genome. The symmetries of Q-box corresponding to quadruplets can be obtained as a consequence of Watson-Crick base pairing and CSPR together. Alternatively, assuming Natural symmetry law for DNA creation that each trinucleotide in one strand of DNA must simultaneously appear also in the opposite strand automatically leads to Q-box direct-reverse mirror symmetry which in conjunction with Watson-Crick base pairing generates CSPR. We demonstrate quadruplet's symmetries in chromosomes of wide range of organisms, from Escherichia coli to Neanderthal and human genomes, introducing novel quadruplet-frequency histograms and 3D-diagrams with combined interstrand frequencies. These "landscapes" are mutually similar in all mammals, including extinct Neanderthals, and somewhat different in most of older species. In human chromosomes 1-12, and X, Y the "landscapes" are almost identical and slightly different in the remaining smaller and telocentric chromosomes. Quadruplet frequencies could provide a new robust tool for characterization and classification of genomes and their evolutionary trajectories.
Collapse
Affiliation(s)
- Marija Rosandić
- a Croatian Academy of Sciences and Arts, HAZU, Bioinformatics and Biological Physics , Zrinski trg 11, 10000 Zagreb , Croatia
| | - Ines Vlahović
- b Faculty of Science , University of Zagreb , Bijenicka 32, 10000 Zagreb , Croatia
| | - Matko Glunčić
- b Faculty of Science , University of Zagreb , Bijenicka 32, 10000 Zagreb , Croatia
| | - Vladimir Paar
- a Croatian Academy of Sciences and Arts, HAZU, Bioinformatics and Biological Physics , Zrinski trg 11, 10000 Zagreb , Croatia.,b Faculty of Science , University of Zagreb , Bijenicka 32, 10000 Zagreb , Croatia
| |
Collapse
|
25
|
Sorimachi K, Okayasu T, Ohhira S. Normalization of Complete Genome Characteristics: Application to Evolution from Primitive Organisms to Homo sapiens. Curr Genomics 2015; 16:99-106. [PMID: 26085808 PMCID: PMC4467310 DOI: 10.2174/1389202916666150119215716] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Revised: 01/09/2015] [Accepted: 01/19/2015] [Indexed: 11/22/2022] Open
Abstract
Normalized nucleotide and amino acid contents of complete genome sequences can be visualized as radar charts. The shapes of these charts depict the characteristics of an organism's genome. The normalized values calculated from the genome sequence theoretically exclude experimental errors. Further, because normalization is independent of both target size and kind, this procedure is applicable not only to single genes but also to whole genomes, which consist of a huge number of different genes. In this review, we discuss the applications of the normalization of the nucleotide and predicted amino acid contents of complete genomes to the investigation of genome structure and to evolutionary research from primitive organisms to Homo sapiens. Some of the results could never have been obtained from the analysis of individual nucleotide or amino acid sequences but were revealed only after the normalization of nucleotide and amino acid contents was applied to genome research. The discovery that genome structure was homogeneous was obtained only after normalization methods were applied to the nucleotide or predicted amino acid contents of genome sequences. Normalization procedures are also applicable to evolutionary research. Thus, normalization of the contents of whole genomes is a useful procedure that can help to characterize organisms.
Collapse
Affiliation(s)
- Kenji Sorimachi
- Educational Support Center, Dokkyo Medical University, Mibu, Tochigi 321-0293, Japan
- Life Science
Research Center, Higashi-Kaizawa, Takasaki, Gunma 370-0041, Japan
| | - Teiji Okayasu
- Center for Medical Informatics,
Dokkyo Medical University, Mibu, Tochigi 321-0293, Japan
| | - Shuji Ohhira
- Laboratory for International Environmental
Health, Dokkyo Medical University, Tochigi 321-0293, Japan
| |
Collapse
|
26
|
Li X, Scanlon MJ, Yu J. Evolutionary patterns of DNA base composition and correlation to polymorphisms in DNA repair systems. Nucleic Acids Res 2015; 43:3614-25. [PMID: 25765652 PMCID: PMC4402523 DOI: 10.1093/nar/gkv197] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2014] [Accepted: 02/24/2015] [Indexed: 11/15/2022] Open
Abstract
DNA base composition is a fundamental genome feature. However, the evolutionary pattern of base composition and its potential causes have not been well understood. Here, we report findings from comparative analysis of base composition at the whole-genome level across 2210 species, the polymorphic-site level across eight population comparison sets, and the mutation-site level in 12 mutation-tracking experiments. We first demonstrate that base composition follows the individual-strand base equality rule at the genome, chromosome and polymorphic-site levels. More intriguingly, clear separation of base-composition values calculated across polymorphic sites was consistently observed between basal and derived groups, suggesting common underlying mechanisms. Individuals in the derived groups show an A&T-increase/G&C-decrease pattern compared with the basal groups. Spontaneous and induced mutation experiments indicated these patterns of base composition change can emerge across mutation sites. With base-composition across polymorphic sites as a genome phenotype, genome scans with human 1000 Genomes and HapMap3 data identified a set of significant genomic regions enriched with Gene Ontology terms for DNA repair. For three DNA repair genes (BRIP1, PMS2P3 and TTDN), ENCODE data provided evidence for interaction between genomic regions containing these genes and regions containing the significant SNPs. Our findings provide insights into the mechanisms of genome evolution.
Collapse
Affiliation(s)
- Xianran Li
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | - Michael J Scanlon
- Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Jianming Yu
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
27
|
Sorimachi K, Okayasu T. Evidence for Natural Selection in Nucleotide Content Relationships Based on Complete Mitochondrial Genomes: Strong Effect of Guanine Content on Separation between Terrestrial and Aquatic Vertebrates. Curr Chem Genom Transl Med 2015; 9:1-5. [PMID: 25853054 PMCID: PMC4382559 DOI: 10.2174/2213988501509010001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Revised: 12/31/2014] [Accepted: 01/10/2015] [Indexed: 12/02/2022] Open
Abstract
The complete vertebrate mitochondrial genome consists of 13 coding genes. We used this genome to investigate the existence of natural selection in vertebrate evolution. From the complete mitochondrial genomes, we predicted nucleotide contents and then separated these values into coding and non-coding regions. When nucleotide contents of a coding or non-coding region were plotted against the nucleotide content of the complete mitochondrial genomes, we obtained linear regression lines only between homonucleotides and their analogs. On every plot using G or A content purine, G content in aquatic vertebrates was higher than that in terrestrial vertebrates, while A content in aquatic vertebrates was lower than that in terrestrial vertebrates. Based on these relationships, vertebrates were separated into two groups, terrestrial and aquatic. However, using C or T content pyrimidine, clear separation between these two groups was not obtained. The hagfish (Eptatretus burgeri) was further separated from both terrestrial and aquatic vertebrates. Based on these results, nucleotide content relationships predicted from the complete vertebrate mitochondrial genomes reveal the existence of natural selection based on evolutionary separation between terrestrial and aquatic vertebrate groups. In addition, we propose that separation of the two groups might be linked to ammonia detoxification based on high G and low A contents, which encode Glu rich and Lys poor proteins.
Collapse
Affiliation(s)
- Kenji Sorimachi
- Educational Support Center, Dokkyo Medical University, Mibu, Tochigi 321-0293, Japan
- Life Science Research Center, Higashi-Kaizawa, Takasaki, Gunma 370-0041, Japan
| | - Teiji Okayasu
- Center for Medical Informatics, Dokkyo Medical University, Tochigi 321-0293, Japan
| |
Collapse
|
28
|
Zhang SH. Persistence and breakdown of strand symmetry in the human genome. J Theor Biol 2015; 370:202-4. [PMID: 25576243 DOI: 10.1016/j.jtbi.2014.12.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 12/26/2014] [Accepted: 12/29/2014] [Indexed: 10/24/2022]
Abstract
Afreixo, V., Bastos, C.A.C., Garcia, S.P., Rodrigues, J.M.O.S., Pinho, A.J., Ferreira, P.J.S.G., 2013. The breakdown of the word symmetry in the human genome. J. Theor. Biol. 335, 153-159 analyzed the word symmetry (strand symmetry or the second parity rule) in the human genome. They concluded that strand symmetry holds for oligonucleotides up to 6 nt and is no longer statistically significant for oligonucleotides of higher orders. However, although they provided some new results for the issue, their interpretation would not be fully justified. Also, their conclusion needs to be further evaluated. Further analysis of their results, especially those of equivalence tests and word symmetry distance, shows that strand symmetry would persist for higher-order oligonucleotides up to 9 nt in the human genome, at least for its overall frequency framework (oligonucleotide frequency pattern).
Collapse
Affiliation(s)
- Shang-Hong Zhang
- Key Laboratory of Gene Engineering of Ministry of Education, and Biotechnology Research Center, Sun Yat-sen University, Guangzhou 510275, China.
| |
Collapse
|
29
|
Rapoport AE, Trifonov EN. Compensatory nature of Chargaff’s second parity rule. J Biomol Struct Dyn 2013; 31:1324-36. [DOI: 10.1080/07391102.2012.736757] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
30
|
Afreixo V, Bastos CA, Garcia SP, Rodrigues JM, Pinho AJ, Ferreira PJ. The breakdown of the word symmetry in the human genome. J Theor Biol 2013; 335:153-9. [DOI: 10.1016/j.jtbi.2013.06.032] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2012] [Revised: 05/30/2013] [Accepted: 06/25/2013] [Indexed: 01/13/2023]
|
31
|
On the fractal geometry of DNA by the binary image analysis. Bull Math Biol 2013; 75:1544-70. [PMID: 23760660 DOI: 10.1007/s11538-013-9859-9] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2012] [Accepted: 05/21/2013] [Indexed: 12/15/2022]
Abstract
The multifractal analysis of binary images of DNA is studied in order to define a methodological approach to the classification of DNA sequences. This method is based on the computation of some multifractality parameters on a suitable binary image of DNA, which takes into account the nucleotide distribution. The binary image of DNA is obtained by a dot-plot (recurrence plot) of the indicator matrix. The fractal geometry of these images is characterized by fractal dimension (FD), lacunarity, and succolarity. These parameters are compared with some other coefficients such as complexity and Shannon information entropy. It will be shown that the complexity parameters are more or less equivalent to FD, while the parameters of multifractality have different values in the sense that sequences with higher FD might have lower lacunarity and/or succolarity. In particular, the genome of Drosophila melanogaster has been considered by focusing on the chromosome 3r, which shows the highest fractality with a corresponding higher level of complexity. We will single out some results on the nucleotide distribution in 3r with respect to complexity and fractality. In particular, we will show that sequences with higher FD also have a higher frequency distribution of guanine, while low FD is characterized by the higher presence of adenine.
Collapse
|
32
|
Patterns of nucleotide asymmetries in plant and animal genomes. Biosystems 2013; 111:181-9. [PMID: 23438636 DOI: 10.1016/j.biosystems.2013.02.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2012] [Revised: 11/29/2012] [Accepted: 02/07/2013] [Indexed: 11/20/2022]
Abstract
Symmetry in biology provides many intriguing puzzles to the scientist's mind. Chargaff's second parity rule states a symmetric distribution of oligonucleotides within a single strand of double-stranded DNA. While this rule has been verified in a wide range of microbial genomes, it still awaits explanation. In our study, we inquired into patterns of mono- and trinucleotide intra-strand parity in complex plant genomic sequences that became available during the last few years, and compared these to equally complex animal genomes. The degree and patterns of deviation from Chargaff's second rule were different between plant and animal species. We observed a universal inter-chromosomal homogeneity of mononucleotide skews in coding sequences of plant chromosomes, while the base composition of animal coding sequences differed between chromosomes even within a single species. We also found differences in the base composition of dicot introns in comparison to those of monocots. These genome-wide patterns were limited to genic regions and were not encountered in inter-genic sequences. We discuss the implications of our findings in relation to hypotheses about functional correlations of intra-strand parity which have hitherto been put forward. Furthermore, we propose more recent polyploidization and subsequent homogenization of homoeologues as a possible reason for more homogeneous skew patterns in plants.
Collapse
|
33
|
Zhang SH, Wang L. Two common profiles exist for genomic oligonucleotide frequencies. BMC Res Notes 2012; 5:639. [PMID: 23158698 PMCID: PMC3532236 DOI: 10.1186/1756-0500-5-639] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2012] [Accepted: 11/14/2012] [Indexed: 11/19/2022] Open
Abstract
Background It was reported that there is a majority profile for trinucleotide frequencies among genomes. And further study has revealed that two common profiles, rather than one majority profile, exist for genomic trinucleotide frequencies. However, the origins of the common/majority profile remain elusive. Moreover, it is not clear whether the features of common profile may be extended to oligonucleotides other than trinucleotides. Findings We analyzed 571 prokaryotic genomes (chromosomes) and some selected eukaryotic nuclear genomes as well as other genetic systems to study their compositional features. We found that there are also two common profiles for genomic oligonucleotide frequencies: one is from low-GC content genomes, and the other is from high-GC content genomes. Furthermore, each common profile is highly correlated to the average profile of random sequences with corresponding GC content and generated according to first-order symmetry. Conclusions The causes for the existence of two common profiles would mainly be GC content variations and strand symmetry of genomic sequences. Therefore, both GC content and strand symmetry would play important roles in genome evolution.
Collapse
Affiliation(s)
- Shang-Hong Zhang
- Key Laboratory of Gene Engineering of Ministry of Education, and Biotechnology Research Center, Sun Yat-sen University, Guangzhou, 510275, China.
| | | |
Collapse
|
34
|
Arakawa K, Tomita M. Measures of compositional strand bias related to replication machinery and its applications. Curr Genomics 2012; 13:4-15. [PMID: 22942671 PMCID: PMC3269016 DOI: 10.2174/138920212799034749] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2011] [Revised: 09/10/2011] [Accepted: 09/20/2011] [Indexed: 11/22/2022] Open
Abstract
The compositional asymmetry of complementary bases in nucleotide sequences implies the existence of a mutational or selectional bias in the two strands of the DNA duplex, which is commonly shaped by strand-specific mechanisms in transcription or replication. Such strand bias in genomes, frequently visualized by GC skew graphs, is used for the computational prediction of transcription start sites and replication origins, as well as for comparative evolutionary genomics studies. The use of measures of compositional strand bias in order to quantify the degree of strand asymmetry is crucial, as it is the basis for determining the applicability of compositional analysis and comparing the strength of the mutational bias in different biological machineries in various species. Here, we review the measures of strand bias that have been proposed to date, including the ∆GC skew, the B1 index, the predictability score of linear discriminant analysis for gene orientation, the signal-to-noise ratio of the oligonucleotide bias, and the GC skew index. These measures have been predominantly designed for and applied to the analysis of replication-related mutational processes in prokaryotes, but we also give research examples in eukaryotes.
Collapse
Affiliation(s)
- Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Fujisawa 252-8520, Japan
| | | |
Collapse
|
35
|
Mahale KN, Kempraj V, Dasgupta D. Does the growth temperature of a prokaryote influence the purine content of its mRNAs? Gene 2012; 497:83-9. [PMID: 22305982 DOI: 10.1016/j.gene.2012.01.040] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2011] [Accepted: 01/19/2012] [Indexed: 11/20/2022]
Abstract
The formation and breaking of hydrogen bonds between nucleic acid bases are dependent on temperature. The high G+C content of organisms was surmised to be an adaptation for high temperature survival because of the thermal stability of G:C pairs. However, a survey of genomic GC% and optimum growth temperature (OGT) of several prokaryotes revoked any direct relation between them. Significantly high purine (R=A or G) content in mRNAs is also seen as a selective response for survival among thermophiles. Nevertheless, the biological relevance of thermophiles loading their unstable mRNAs with excess purines (purine-loading or R-loading) is not persuasive. Here, we analysed the mRNA sequences from the genomes of 168 prokaryotes (as obtained from NCBI Genome database) with their OGTs ranging from -5 °C to 100 °C to verify the relation between R-loading and OGT. Our analysis fails to demonstrate any correlation between R-loading of the mRNA pool and OGT of a prokaryote. The percentage of purine-loaded mRNAs in prokaryotes is found to be in a rough negative correlation with the genomic GC% (r(2)=0.655, slope=-1.478, P<000.1). We conclude that genomic GC% and bias against certain combinations of nucleotides drive the mRNA-synonymous (sense) strands of DNA towards variations in R-loading.
Collapse
|
36
|
Nakashima H, Kuroda Y. Differences in dinucleotide frequencies of thermophilic genes encoding water soluble and membrane proteins. J Zhejiang Univ Sci B 2011; 12:419-27. [PMID: 21634034 DOI: 10.1631/jzus.b1000331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The occurrence frequencies of the dinucleotides of genes of three thermophilic and three mesophilic species from both archaea and eubacteria were investigated in this study. The genes encoding water soluble proteins were rich in the dinucleotides of purine dimers, whereas the genes encoding membrane proteins were rich in pyrimidine dimers. The dinucleotides of purine dimers are the counterparts of pyrimidine dimers in a double-stranded DNA. The purine/pyrimidine dimers were favored in the thermophiles but not in the mesophiles, based on comparisons of observed and expected frequencies. This finding is in agreement with our previous study which showed that purine/pyrimidine dimers are positive factors that increase the thermal stability of DNA. The dinucleotides AA, AG, and GA are components of the codons of charged residues of Glu, Asp, Lys, and Arg, and the dinucleotides TT, CT, and TC are components of the codons of hydrophobic residues of Leu, Ile, and Phe. This is consistent with the suitabilities of the different amino acid residues for water soluble and membrane proteins. Our analysis provides a picture of how thermophilic species produce water soluble and membrane proteins with distinctive characters: the genes encoding water soluble proteins use DNA sequences rich in purine dimers, and the genes encoding membrane proteins use DNA sequences rich in pyrimidine dimers on the opposite strand.
Collapse
Affiliation(s)
- Hiroshi Nakashima
- Department of Clinical Laboratory Science, Graduate Course of Medical Science and Technology, School of Health Sciences, Kanazawa University, 5-11-80 Kodatsuno, Kanazawa 920-0942, Japan.
| | | |
Collapse
|
37
|
Farlow A, Dolezal M, Hua L, Schlötterer C. The genomic signature of splicing-coupled selection differs between long and short introns. Mol Biol Evol 2011; 29:21-4. [PMID: 21878685 PMCID: PMC3245539 DOI: 10.1093/molbev/msr201] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Understanding the function of noncoding regions in the genome, such as introns, is of central importance to evolutionary biology. One approach is to assay for the targets of natural selection. On one hand, the sequence of introns, especially short introns, appears to evolve in an almost neutral manner. Whereas on the other hand, a large proportion of intronic sequence is under selective constraint. This discrepancy is largely dependent on intron length and differences in the methods used to infer selection. We have used a method based on DNA strand asymmetery that does not require comparison with any putatively neutrally evolving sequence, nor sequence conservation between species, to detect selection within introns. The strongest signal we identify is associated with short introns. This signal comes from a family of motifs that could act as cryptic 5′ splice sites during mRNA processing, suggesting a mechanistic justification underlying this signal of selection. Together with an analysis of intron length and splice site strength, we observe that the genomic signature of splicing-coupled selection differs between long and short introns.
Collapse
Affiliation(s)
- Ashley Farlow
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
- Present address: Gregor Mendel Institute of Molecular Plant Biology, Vienna, Austria
| | - Marlies Dolezal
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
| | - Liushuai Hua
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
- Present address: College of Animal Science and Technology, Shaanxi Key Laboratory of Molecular Biology for Agriculture, Northwest A&F University, Yangling, Shaanxi, China
| | - Christian Schlötterer
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
- Corresponding author: E-mail:
| |
Collapse
|
38
|
Sobottka M, Hart AG. A model capturing novel strand symmetries in bacterial DNA. Biochem Biophys Res Commun 2011; 410:823-8. [PMID: 21703245 DOI: 10.1016/j.bbrc.2011.06.072] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2011] [Accepted: 06/09/2011] [Indexed: 11/24/2022]
Abstract
Chargaff's second parity rule for short oligonucleotides states that the frequency of any short nucleotide sequence on a strand is approximately equal to the frequency of its reverse complement on the same strand. Recent studies have shown that, with the exception of organellar DNA, this parity rule generally holds for double-stranded DNA genomes and fails to hold for single-stranded genomes. While Chargaff's first parity rule is fully explained by the Watson-Crick pairing in the DNA double helix, a definitive explanation for the second parity rule has not yet been determined. In this work, we propose a model based on a hidden Markov process for approximating the distributional structure of primitive DNA sequences. Then, we use the model to provide another possible theoretical explanation for Chargaff's second parity rule, and to predict novel distributional aspects of bacterial DNA sequences.
Collapse
Affiliation(s)
- Marcelo Sobottka
- Departamento de Matemática, Universidade Federal de Santa Catarina, Brazil.
| | | |
Collapse
|
39
|
|
40
|
Zhang R. A rebuttal to the comments on the genome order index and the Z-curve. Biol Direct 2011; 6:10. [PMID: 21324187 PMCID: PMC3046898 DOI: 10.1186/1745-6150-6-10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2010] [Accepted: 02/16/2011] [Indexed: 11/15/2022] Open
Abstract
Background Elhaik, Graur and Josic recently commented on the genome order index (S) and the Z-curve (Elhaik et al. Biol Direct 2010, 5: 10). S is a quantity defined as S = a2 + c2 + g2 + t2, where a, c, g and t denote corresponding base frequencies. The Z-curve is a three dimensional curve that represents a DNA sequence in the manner that each can be uniquely reconstructed given the other. Elhaik et al. made 4 major claims. 1) In the previous mapping system with the regular tetrahedron, calculation of the radius of the inscribed sphere is "a mathematical error". 2) S follows an exponential distribution and is narrowly distributed with a range of (0.25 - 0.33). 3) Based on the Chargaff's second parity rule (PR2), "S is equivalent to H [Shannon entropy]" and they are derivable from each other. 4) Z-curve "suffers from over dimensionality", because based on the analysis of 235 bacterial genomes, x and y components contributed only less than 1% of the variance and therefore "would be of little use". Results 1) Elhaik et al. mistakenly neglected the parameter 4/3 when calculating the radius of the inscribed sphere. 2) The exponential distribution of S is a restatement of our previous conclusion, and the range of (0.25 - 0.33) only paraphrases the previously suggested S range (0.25 -1/3). 3) Elhaik et al. incorrectly disregard deviations from PR2 by treating the deviations as 0 altogether, reduce S and H, both having 4 variables, a, c, g and t, into functions of one single variable, a only, and apply this treatment to all DNA sequences as the basis of their "demonstration", which is therefore invalid. 4) Elhaik et al. confuse numeral smallness with biological insignificance, and disregard the distributions of purine/pyrimidine and amino/keto bases (x and y components), the variations of which, although can be less than that of GC content, contain rich information that is important and useful, such as in locating replication origins of bacterial and archaeal genomes, and in studies of gene recognition in various species. Conclusion Elhaik et al. confuse S (a single number) with Z-curve (a series of 3D coordinates), which are distinct. To use S as a case study of Z-curve, by itself, is invalid. S and H are neither equivalent nor derivable from each other. The criticisms of Elhaik, Graur and Josic are wrong. Reviewers This article was reviewed by Erik van Nimwegen.
Collapse
Affiliation(s)
- Ren Zhang
- Department of Epidemiology and Biostatistics, Tianjin Cancer Institute and Hospital, Tianjin 300060, PR China.
| |
Collapse
|
41
|
Medvedeva YA, Kulakovskii IV, Oparina NY, Favorov AV, Makeev VY. The GC skew near Pol II start sites and its association with SP1-binding site variants. Biophysics (Nagoya-shi) 2010. [DOI: 10.1134/s0006350910060023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
42
|
Elhaik E, Graur D, Josić K. 'Genome order index' should not be used for defining compositional constraints in nucleotide sequences--a case study of the Z-curve. Biol Direct 2010; 5:10. [PMID: 20158921 PMCID: PMC2841071 DOI: 10.1186/1745-6150-5-10] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2010] [Accepted: 02/17/2010] [Indexed: 12/03/2022] Open
Abstract
Background The Z-curve is a three dimensional representation of DNA sequences proposed over a decade ago and has been extensively applied to sequence segmentation, horizontal gene transfer detection, and sequence analysis. Based on the Z-curve, a "genome order index," was proposed, which is defined as S = a2+ c2+t2+g2, where a, c, t, and g are the nucleotide frequencies of A, C, T, and G, respectively. This index was found to be smaller than 1/3 for almost all tested genomes, which was taken as support for the existence of a constraint on genome composition. A geometric explanation for this constraint has been suggested. Each genome was represented by a point P whose distance from the four faces of a regular tetrahedron was given by the frequencies a, c, t, and g. They claimed that an inscribed sphere of radius r = 1/ contains almost all points corresponding to various genomes, implying that S <r2. The distribution of the points P obtained by S was studied using the Z-curve. Results In this work, we studied the basic properties of the Z-curve using the "genome order index" as a case study. We show that (1) the calculation of the radius of the inscribed sphere of a regular tetrahedron is incorrect, (2) the S index is narrowly distributed, (3) based on the second parity rule, the S index can be derived directly from the Shannon entropy and is, therefore, redundant, and (4) the Z-curve suffers from over dimensionality, and the dimension stands for GC content alone suffices to represent any given genome. Conclusion The "genome order index" S does not represent a constraint on nucleotide composition. Moreover, S can be easily computed from the Gini-Simpson index and be directly derived from entropy and is redundant. Overall, the Z-curve and S are over-complicated measures to GC content and Shannon H index, respectively. Reviewers This article was reviewed by Claus Wilke, Joel Bader, Marek Kimmel and Uladzislau Hryshkevich (nominated by Itai Yanai).
Collapse
Affiliation(s)
- Eran Elhaik
- McKusick - Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.
| | | | | |
Collapse
|
43
|
Zhang SH, Huang YZ. Limited contribution of stem-loop potential to symmetry of single-stranded genomic DNA. ACTA ACUST UNITED AC 2009; 26:478-85. [PMID: 20031973 DOI: 10.1093/bioinformatics/btp703] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
MOTIVATION The phenomenon of strand symmetry, which may provide clues to genome evolution, exists in all prokaryotic and eukaryotic genomes studied. Several possible mechanisms for its origins have been proposed, including: no strand biases for mutation and selection, strand inversion and selection of stem-loop structures. However, the relative contributions of these mechanisms to strand symmetry are not clear. In this article, we studied specifically the role of stem-loop potential of single-stranded DNA in strand symmetry. RESULTS We analyzed the complete genomes of 90 prokaryotes. We found that most oligonucleotides (pentanucleotides and higher) do not have a reverse complement in close proximity in the genomic sequences. Combined with further analysis, we conclude that the contribution of the widespread stem-loop potential of single-stranded genomic DNA to the formation and maintenance of strand symmetry would be very limited, at least for higher-order oligonucleotides. Therefore, other possible causes for strand symmetry must be taken into account to a deeper degree.
Collapse
Affiliation(s)
- Shang-Hong Zhang
- The Key Laboratory of Gene Engineering of Ministry of Education, and Biotechnology Research Center, Sun Yat-sen University, Guangzhou 510275, China.
| | | |
Collapse
|
44
|
Powdel BR, Satapathy SS, Kumar A, Jha PK, Buragohain AK, Borah M, Ray SK. A study in entire chromosomes of violations of the intra-strand parity of complementary nucleotides (Chargaff's second parity rule). DNA Res 2009; 16:325-43. [PMID: 19861381 PMCID: PMC2780954 DOI: 10.1093/dnares/dsp021] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Chargaff's rule of intra-strand parity (ISP) between complementary mono/oligonucleotides in chromosomes is well established in the scientific literature. Although a large numbers of papers have been published citing works and discussions on ISP in the genomic era, scientists are yet to find all the factors responsible for such a universal phenomenon in the chromosomes. In the present work, we have tried to address the issue from a new perspective, which is a parallel feature to ISP. The compositional abundance values of mono/oligonucleotides were determined in all non-overlapping sub-chromosomal regions of specific size. Also the frequency distributions of the mono/oligonucleotides among the regions were compared using the Kolmogorov–Smirnov test. Interestingly, the frequency distributions between the complementary mono/oligonucleotides revealed statistical similarity, which we named as intra-strand frequency distribution parity (ISFDP). ISFDP was observed as a general feature in chromosomes of bacteria, archaea and eukaryotes. Violation of ISFDP was also observed in several chromosomes. Chromosomes of different strains belonging a species in bacteria/archaea (Haemophilus influenza, Xylella fastidiosa etc.) and chromosomes of a eukaryote are found to be different among each other with respect to ISFDP violation. ISFDP correlates weakly with ISP in chromosomes suggesting that the latter one is not entirely responsible for the former. Asymmetry of replication topography and composition of forward-encoded sequences between the strands in chromosomes are found to be insufficient to explain the ISFDP feature in all chromosomes. This suggests that multiple factors in chromosomes are responsible for establishing ISFDP.
Collapse
Affiliation(s)
- B R Powdel
- 1Department of Mathematical Sciences, Tezpur University, Tezpur, Assam 784 028, India
| | | | | | | | | | | | | |
Collapse
|
45
|
Sorimachi K, Okayasu T. Codon evolution is governed by linear formulas. Amino Acids 2008; 34:661-8. [PMID: 18180868 DOI: 10.1007/s00726-007-0024-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2007] [Accepted: 12/17/2007] [Indexed: 10/22/2022]
Abstract
When nucleotide (G, C, T and A) contents were plotted against each nucleotide, their relationships were clearly expressed by a linear formula, y = alphax + beta in the coding and non-coding regions. This linear relationship was obtained from the complete single-stranded DNA. Similarly, nucleotide contents at all three codon positions were expressed by linear regression lines based on the content of each nucleotide. In addition, 64 codon usages were also expressed by linear formulas against nucleotide content. Thus, the nucleotide content not only in coding sequence but also in non-coding sequence can be expressed by a linear formula, y = alphax + beta, in 145 organisms (112 bacteria, 15 archaea and 18 eukaryotes). Based on these results, the ratio of C/T, G/T, C/A or G/A one can essentially estimate all four nucleotide contents in the complete single-stranded DNA, and the determination of any ratio of two kinds of nucleotides can essentially estimate four nucleotide contents, nucleotide contents at the three different codon positions and codon distributions at 64 codons in the coding region. The maximum and minimum values of G content were approximately 0.35 and approximately 0.15, respectively, among various organisms examined. Codon evolution occurs according to linear formulas between these two values.
Collapse
Affiliation(s)
- K Sorimachi
- Educational Support Center, Dokkyo Medical University, Mibu, Tochigi 321-0293, Japan.
| | | |
Collapse
|
46
|
Albrecht-Buehler G. Inversions and inverted transpositions as the basis for an almost universal "format" of genome sequences. Genomics 2007; 90:297-305. [PMID: 17582735 DOI: 10.1016/j.ygeno.2007.05.010] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2007] [Revised: 05/11/2007] [Accepted: 05/21/2007] [Indexed: 11/23/2022]
Abstract
In genome duplexes that exceed 100 kb the frequency distributions of their trinucleotides (triplet profiles) are the same in both strands. This remarkable symmetry, sometimes called Chargaff's second parity rule, is not the result of base pairing, but can be explained as the result of countless inversions and inverted transpositions that occurred throughout evolution (G. Albrecht-Buehler, 2006, Proc. Natl. Acad. Sci. USA 103, 17828-17833). Furthermore, comparing the triplet profiles of genomes from a large number of different taxa and species revealed that they were not only strand-symmetrical, but even surprisingly similar to one another (majority profile; G. Albrecht-Buehler, 2007, Genomics 89, 596-601). The present article proposes that the same inversion/transposition mechanism(s) that created the strand symmetry may also explain the existence of the majority profile. Thus they may be key factors in the creation of an almost universal "format" in which genome sequences are written. One may speculate that this universality of genome format may facilitate horizontal gene transfer and, thus, accelerate evolution.
Collapse
Affiliation(s)
- Guenter Albrecht-Buehler
- Department of Cell and Molecular Biology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA.
| |
Collapse
|
47
|
Evolutionary implications of inversions that have caused intra-strand parity in DNA. BMC Genomics 2007; 8:160. [PMID: 17562011 PMCID: PMC1913523 DOI: 10.1186/1471-2164-8-160] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2007] [Accepted: 06/11/2007] [Indexed: 11/22/2022] Open
Abstract
Background Chargaff's rule of DNA base composition, stating that DNA comprises equal amounts of adenine and thymine (%A = %T) and of guanine and cytosine (%C = %G), is well known because it was fundamental to the conception of the Watson-Crick model of DNA structure. His second parity rule stating that the base proportions of double-stranded DNA are also reflected in single-stranded DNA (%A = %T, %C = %G) is more obscure, likely because its biological basis and significance are still unresolved. Within each strand, the symmetry of single nucleotide composition extends even further, being demonstrated in the balance of di-, tri-, and multi-nucleotides with their respective complementary oligonucleotides. Results Here, we propose that inversions are sufficient to account for the symmetry within each single-stranded DNA. Human mitochondrial DNA does not demonstrate such intra-strand parity, and we consider how its different functional drivers may relate to our theory. This concept is supported by the recent observation that inversions occur frequently. Conclusion Along with chromosomal duplications, inversions must have been shaping the architecture of genomes since the origin of life.
Collapse
|
48
|
Mitchell D. GC content and genome length in Chargaff compliant genomes. Biochem Biophys Res Commun 2006; 353:207-10. [PMID: 17173863 DOI: 10.1016/j.bbrc.2006.12.008] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2006] [Accepted: 12/03/2006] [Indexed: 10/23/2022]
Abstract
Musto et al. [H. Musto, H. Naya, A. Zavala, H. Romero, F. Alvarez-Valin, G. Bernardi, Genomic GC level, optimal growth temperature, and genome size in prokaryotes, Biochem. Biophys. Res. Commun. 347 (2006) 1-3] recently reported a linear correlation between GC content and genome length. The regression model was heteroscedactic which suggested that the relationship might be more clearly defined. Alternative regression models (R(2)>0.95) were fitted to a set of over 900 sequences compliant with Chargaff's second parity rule. The new models suggest that the relationship between GC content and genome length is more complex than was originally suggested. While similar models can be derived for non-Chargaff compliant genomes, their interpretation is likely to be more difficult.
Collapse
Affiliation(s)
- David Mitchell
- Vice Deanery of Genetics and Microbiology, Trinity College, Dublin, Ireland.
| |
Collapse
|
49
|
Albrecht-Buehler G. Asymptotically increasing compliance of genomes with Chargaff's second parity rules through inversions and inverted transpositions. Proc Natl Acad Sci U S A 2006; 103:17828-33. [PMID: 17093051 PMCID: PMC1635160 DOI: 10.1073/pnas.0605553103] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Chargaff's second parity rules for mononucleotides and oligonucleotides (CIImono and CIIoligo rules) state that a sufficiently long (> 100 kb) strand of genomic DNA that contains N copies of a mono- or oligonucleotide, also contains N copies of its reverse complementary mono- or oligonucleotide on the same strand. There is very strong support in the literature for the validity of the rules in coding and noncoding regions, especially for the CIImono rule. Because the experimental support for the CIIoligo rule is much less complete, the present article, focusing on the special case of trinucleotides (triplets), examined several gigabases of genome sequences from a wide range of species and kingdoms including organelles such as mitochondria and chloroplasts. I found that all genomes, with the only exception of certain mitochondria, complied with the CIItriplet rule at a very high level of accuracy in coding and noncoding regions alike. Based on the growing evidence that genomes may contain up to millions of copies of interspersed repetitive elements, I propose in this article a quantitative formulation of the hypothesis that inversions and inverted transposition could be a major contributing if not dominant factor in the almost universal validity of the rules.
Collapse
Affiliation(s)
- Guenter Albrecht-Buehler
- Department of Cell and Molecular Biology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA.
| |
Collapse
|
50
|
Nikolaou C, Almirantis Y. Deviations from Chargaff's second parity rule in organellar DNA Insights into the evolution of organellar genomes. Gene 2006; 381:34-41. [PMID: 16893615 DOI: 10.1016/j.gene.2006.06.010] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2006] [Revised: 04/18/2006] [Accepted: 06/13/2006] [Indexed: 10/24/2022]
Abstract
Chargaff' s second parity rule (PR2) states that complementary nucleotides are met with almost equal frequencies in single stranded DNA. This is indeed the case for all bacterial and eukaryotic genomes studied, although the genomic patterns may differ among genomes in terms of local deviations. The behaviour of organellar genomes regarding the second parity rule has not been studied in detail up to now. We tested all available organellar genomes and found that a large number of mitochondrial genomes significantly deviate from the 2nd parity rule in contrast to the eubacterial ones, although mitochondria are believed to have evolved from proteobacteria. Moreover, mitochondria may be divided into three distinct sub-groups according to their overall deviation from the aforementioned parity rule. On the other hand, chloroplast genomes share the pattern of eubacterial genomes and, interestingly, so do mitochondrial genomes originating from plants and some fungi. The deviation from the second parity is found to be weakly correlated with the overall excess of purines against pyrimidines. The behaviour of the large majority of the mitochondrial genomes may be attributed to their distinct mode of replication, which is fundamentally different from the one of the eubacteria. Differences between chloroplast and mitochondrial genomes might also be explained on the basis of different replication mechanisms and correlated to differences in the genome size and compaction. The results presented herein may provide some insight into different modes of evolution of genome structure between chloroplasts and mitochondria.
Collapse
Affiliation(s)
- Christoforos Nikolaou
- Computational Genomics Group, Institute of Biology, NCSR Demokritos, 15310 Athens, Greece.
| | | |
Collapse
|