1
|
Yıldırım B, Vogl C. Purifying selection against spurious splicing signals contributes to the base composition evolution of the polypyrimidine tract. J Evol Biol 2023; 36:1295-1312. [PMID: 37564008 PMCID: PMC10946897 DOI: 10.1111/jeb.14205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/31/2023] [Accepted: 06/15/2023] [Indexed: 08/12/2023]
Abstract
Among eukaryotes, the major spliceosomal pathway is highly conserved. While long introns may contain additional regulatory sequences, the ones in short introns seem to be nearly exclusively related to splicing. Although these regulatory sequences involved in splicing are well-characterized, little is known about their evolution. At the 3' end of introns, the splice signal nearly universally contains the dimer AG, which consists of purines, and the polypyrimidine tract upstream of this 3' splice signal is characterized by over-representation of pyrimidines. If the over-representation of pyrimidines in the polypyrimidine tract is also due to avoidance of a premature splicing signal, we hypothesize that AG should be the most under-represented dimer. Through the use of DNA-strand asymmetry patterns, we confirm this prediction in fruit flies of the genus Drosophila and by comparing the asymmetry patterns to a presumably neutrally evolving region, we quantify the selection strength acting on each motif. Moreover, our inference and simulation method revealed that the best explanation for the base composition evolution of the polypyrimidine tract is the joint action of purifying selection against a spurious 3' splice signal and the selection for pyrimidines. Patterns of asymmetry in other eukaryotes indicate that avoidance of premature splicing similarly affects the nucleotide composition in their polypyrimidine tracts.
Collapse
Affiliation(s)
- Burçin Yıldırım
- Department of Biomedical SciencesVetmeduni ViennaViennaAustria
- Vienna Graduate School of Population GeneticsViennaAustria
| | - Claus Vogl
- Department of Biomedical SciencesVetmeduni ViennaViennaAustria
- Vienna Graduate School of Population GeneticsViennaAustria
| |
Collapse
|
2
|
Rosandić M, Vlahović I, Pilaš I, Glunčić M, Paar V. An Explanation of Exceptions from Chargaff's Second Parity Rule/Strand Symmetry of DNA Molecules. Genes (Basel) 2022; 13:1929. [PMID: 36360166 PMCID: PMC9689577 DOI: 10.3390/genes13111929] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Revised: 10/12/2022] [Accepted: 10/17/2022] [Indexed: 11/04/2022] Open
Abstract
In this article, we show that mono/oligonucleotide quadruplets, as basic structures of DNA, along with our classification of trinucleotides, disclose an organization of genomes based on purine-pyrimidine symmetry. Moreover, the structure and stability of DNA are influenced by the Watson-Crick pairing and the natural law of DNA creation and conservation, according to which the same mono- or oligonucleotide insertion must be inserted simultaneously into both strands of DNA. Taken together, they lead to quadruplets with central mirror symmetry and bidirectional DNA strand orientation and are incorporated into Chargaff's second parity rule (CSPR). Performing our quadruplet frequency analysis of all human chromosomes and of Neuroblastoma BreakPoint Family (NBPF) genes, which code Olduvai protein domains in the human genome, we show that the coding part of DNA violates CSPR. This may shed new light and give rise to a novel hypothesis on DNA creation and its evolution. In this framework, the logarithmic relationship between oligonucleotide order and minimal DNA sequence length, to establish the validity of CSPR, automatically follows from the quadruplet structure of the genomic sequence. The problem of the violation of CSPR in rare symbionts is discussed.
Collapse
Affiliation(s)
- Marija Rosandić
- University Hospital Centre Zagreb (Ret.), 10000 Zagreb, Croatia
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| | - Ines Vlahović
- Faculty of Science, Algebra University College, 10000 Zagreb, Croatia
| | - Ivan Pilaš
- Forest Research Institute, 10450 Jastrebarsko, Croatia
| | - Matko Glunčić
- Physics Department, Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia
| | - Vladimir Paar
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
- Physics Department, Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia
| |
Collapse
|
3
|
Almirantis Y, Provata A, Li W. Noether's Theorem as a Metaphor for Chargaff's 2nd Parity Rule in Genomics. J Mol Evol 2022; 90:231-238. [PMID: 35704064 DOI: 10.1007/s00239-022-10062-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 05/18/2022] [Indexed: 10/18/2022]
Abstract
In the present note, the genomic compositional rule largely known as 'Chargaff's 2nd parity rule' (asserting equimolarity between Adenine-Thymine and Guanine-Cytosine in any of the two DNA strands) is regarded in association with Noether's theorem linking symmetries with conservation laws in physics. In the case of the genome, the strict physical and mathematical prerequisites of Noether's theorem do not hold. However, we conclude that a metaphor can be established with Noether's theorem, as inter-strand symmetry concerning DNA functionality engenders specific features in genome composition. Inversely, when inter-strand symmetry does not hold, the corresponding quantitative relations fail to appear. This association is also considered from the point of view of the existence of emergent laws and properties in evolutionary genomics.
Collapse
Affiliation(s)
- Yannis Almirantis
- Theoretical Biology and Computational Genomics Laboratory, Institute of Bioscience and Applications, National Center for Scientific Research "Demokritos", 15341, Athens, Greece.
| | - Astero Provata
- Statistical Mechanics and Dynamical Systems Laboratory, Institute of Nanoscience and Nanotechnology, National Center for Scientific Research, "Demokritos", 15341, Athens, Greece
| | - Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
| |
Collapse
|
4
|
Rosandić M, Vlahović I, Paar V. Novel look at DNA and life-Symmetry as evolutionary forcing. J Theor Biol 2019; 483:109985. [PMID: 31469987 DOI: 10.1016/j.jtbi.2019.08.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Revised: 06/21/2018] [Accepted: 08/22/2019] [Indexed: 11/20/2022]
Abstract
After explanation of the Chargaff´s first parity rule in terms of the Watson-Crick base-pairing between the two DNA strands, the Chargaff´s second parity rule for each strand of DNA (also named strand symmetry), which cannot be explained by Watson-Crick base-pairing only, is still a challenging issue already fifty years. We show that during evolution DNA preserves its identity in the form of quadruplet A+T and C+G rich matrices based on purine-pyrimidine mirror symmetries of trinucleotides. Identical symmetries are present in our classification of trinucleotides and the genetic code table. All eukaryotes and almost all prokaryotes (bacteria and archaea) have quadruplet mirror symmetries in structural form and frequencies following the principle of Chargaff's second parity rule and Natural symmetry law of DNA creation and conservation. Some rare symbionts have mirror symmetry only in their structural form within each DNA strand. Based on our matrix analysis of closely related species, humans and Neanderthals, we find that the circular cycle of inverse proportionality between trinucleotides preserves identical relative frequencies of trinucleotides in each quadruplet and in the whole genome. According to our calculations, a change in frequencies in quadruplet matrices could lead to the creation of new species. Violation of quadruplet symmetries is practically inconsistent with life. DNA symmetries provide a key for understanding the restriction of disorder (entropy) due to mutations in the evolution of DNA.
Collapse
Affiliation(s)
- Marija Rosandić
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia; University hospital centre Zagreb (ret.), Zagreb, Croatia.
| | - Ines Vlahović
- Department of Physics, Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia; Algebra University College, 10000 Zagreb, Croatia.
| | - Vladimir Paar
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia; Department of Physics, Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia.
| |
Collapse
|
5
|
Huang B, Huang LF, Zhang SH. Evaluation of the Persistence of Higher-Order Strand Symmetry in Genomic Sequences by Novel Word Symmetry Distance Analysis. Front Genet 2019; 10:148. [PMID: 30899274 PMCID: PMC6416199 DOI: 10.3389/fgene.2019.00148] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Accepted: 02/12/2019] [Indexed: 11/13/2022] Open
Abstract
For the ubiquitous phenomenon of strand symmetry, it has been shown that it may persist for higher-order oligonucleotides. However, there is no consensus about to what extent (order of oligonucleotides or length of words) strand symmetry still persists. To determine the extent of strand symmetry in genomic sequences is critically important for the further understanding of the phenomenon. Based on previous studies, we have developed an algorithm for the novel word symmetry distance analysis. We applied it to evaluate the higher-order strand symmetry for 206 archaeal genomes and 2,659 bacterial genomes. Our results show that the new approach could provide a clear-cut criterion to determine the extent of strand symmetry for a group of genomes or individual genomes. According to the new measure, strand symmetry would tend to persist for up to 8-mers in archaeal genomes, and up to 9-mers in bacterial genomes. And the persistence may vary from 6- to 9-mers in individual genomes. Moreover, higher-order strand symmetry would tend to positively correlate with GC content and mononucleotide symmetry levels of genomic sequences. The variations of higher-order strand symmetry among genomes would indicate that strand symmetry itself may not be strictly relevant to biological functions, which would provide some insights into the origin and evolution of the phenomenon.
Collapse
Affiliation(s)
- Bi Huang
- Key Laboratory of Gene Engineering of Ministry of Education, Biotechnology Research Center, Sun Yat-sen University, Guangzhou, China
| | - Li-Fang Huang
- Key Laboratory of Gene Engineering of Ministry of Education, Biotechnology Research Center, Sun Yat-sen University, Guangzhou, China
| | - Shang-Hong Zhang
- Key Laboratory of Gene Engineering of Ministry of Education, Biotechnology Research Center, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
6
|
Cristadoro G, Degli Esposti M, Altmann EG. The common origin of symmetry and structure in genetic sequences. Sci Rep 2018; 8:15817. [PMID: 30361485 PMCID: PMC6202410 DOI: 10.1038/s41598-018-34136-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 10/09/2018] [Indexed: 12/20/2022] Open
Abstract
Biologists have long sought a way to explain how statistical properties of genetic sequences emerged and are maintained through evolution. On the one hand, non-random structures at different scales indicate a complex genome organisation. On the other hand, single-strand symmetry has been scrutinised using neutral models in which correlations are not considered or irrelevant, contrary to empirical evidence. Different studies investigated these two statistical features separately, reaching minimal consensus despite sustained efforts. Here we unravel previously unknown symmetries in genetic sequences, which are organized hierarchically through scales in which non-random structures are known to be present. These observations are confirmed through the statistical analysis of the human genome and explained through a simple domain model. These results suggest that domain models which account for the cumulative action of mobile elements can explain simultaneously non-random structures and symmetries in genetic sequences.
Collapse
Affiliation(s)
- Giampaolo Cristadoro
- Dipartimento di Matematica e Applicazioni, Università di Milano-Bicocca, 20125, Milano, Italy.
| | | | - Eduardo G Altmann
- School of Mathematics and Statistics, University of Sydney, Sydney, 2006, NSW, Australia
| |
Collapse
|
7
|
Tavares AH, Raymaekers J, Rousseeuw PJ, Silva RM, Bastos CAC, Pinho A, Brito P, Afreixo V. Comparing Reverse Complementary Genomic Words Based on Their Distance Distributions and Frequencies. Interdiscip Sci 2018; 10:1-11. [PMID: 29214497 DOI: 10.1007/s12539-017-0273-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2017] [Revised: 10/04/2017] [Accepted: 11/08/2017] [Indexed: 06/07/2023]
Abstract
In this work, we study reverse complementary genomic word pairs in the human DNA, by comparing both the distance distribution and the frequency of a word to those of its reverse complement. Several measures of dissimilarity between distance distributions are considered, and it is found that the peak dissimilarity works best in this setting. We report the existence of reverse complementary word pairs with very dissimilar distance distributions, as well as word pairs with very similar distance distributions even when both distributions are irregular and contain strong peaks. The association between distribution dissimilarity and frequency discrepancy is also explored, and it is speculated that symmetric pairs combining low and high values of each measure may uncover features of interest. Taken together, our results suggest that some asymmetries in the human genome go far beyond Chargaff's rules. This study uses both the complete human genome and its repeat-masked version.
Collapse
Affiliation(s)
- Ana Helena Tavares
- Department of Mathematics and CIDMA and iBiMED, University of Aveiro, Aveiro, Portugal.
| | | | | | - Raquel M Silva
- Department of Medical Sciences and iBiMED and IEETA, University of Aveiro, Aveiro, Portugal
| | - Carlos A C Bastos
- Department of Electronics Telecommunications and Informatics and IEETA, University of Aveiro, Aveiro, Portugal
| | - Armando Pinho
- Department of Electronics Telecommunications and Informatics and IEETA, University of Aveiro, Aveiro, Portugal
| | - Paula Brito
- Faculty of Economics and LIAAD-INESC TEC, University of Porto, Porto, Portugal
| | - Vera Afreixo
- Department of Mathematics and CIDMA and iBiMED and IEETA, University of Aveiro, Aveiro, Portugal
| |
Collapse
|
8
|
Shporer S, Chor B, Rosset S, Horn D. Inversion symmetry of DNA k-mer counts: validity and deviations. BMC Genomics 2016; 17:696. [PMID: 27580854 PMCID: PMC5006273 DOI: 10.1186/s12864-016-3012-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Accepted: 08/11/2016] [Indexed: 01/25/2023] Open
Abstract
Background The generalization of the second Chargaff rule states that counts of any string of nucleotides of length k on a single chromosomal strand equal the counts of its inverse (reverse-complement) k-mer. This Inversion Symmetry (IS) holds for many species, both eukaryotes and prokaryotes, for ranges of k which may vary from 7 to 10 as chromosomal lengths vary from 2Mbp to 200 Mbp. The existence of IS has been demonstrated in the literature, and other pair-wise candidate symmetries (e.g. reverse or complement) have been ruled out. Results Studying IS in the human genome, we find that IS holds up to k = 10. It holds for complete chromosomes, also after applying the low complexity mask. We introduce a numerical IS criterion, and define the k-limit, KL, as the highest k for which this criterion is valid. We demonstrate that chromosomes of different species, as well as different human chromosomal sections, follow a universal logarithmic dependence of KL ~ 0.7 ln(L), where L is the length of the chromosome. We introduce a statistical IS-Poisson model that allows us to apply confidence measures to our numerical findings. We find good agreement for large k, where the variance of the Poisson distribution determines the outcome of the analysis. This model predicts the observed logarithmic increase of KL with length. The model allows us to conclude that for low k, e.g. k = 1 where IS becomes the 2nd Chargaff rule, IS violation, although extremely small, is significant. Studying this violation we come up with an unexpected observation for human chromosomes, finding a meaningful correlation with the excess of genes on particular strands. Conclusions Our IS-Poisson model agrees well with genomic data, and accounts for the universal behavior of k-limits. For low k we point out minute, yet significant, deviations from the model, including excess of counts of nucleotides T vs A and G vs C on positive strands of human chromosomes. Interestingly, this correlates with a significant (but small) excess of genes on the same positive strands. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3012-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sagi Shporer
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Benny Chor
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Saharon Rosset
- Sackler School of Mathematical Sciences, Tel Aviv University, Tel Aviv, 69978, Israel
| | - David Horn
- Sackler School of Physics and Astronomy, Tel Aviv University, Tel Aviv, 69978, Israel.
| |
Collapse
|
9
|
Rosandić M, Vlahović I, Glunčić M, Paar V. Trinucleotide's quadruplet symmetries and natural symmetry law of DNA creation ensuing Chargaff's second parity rule. J Biomol Struct Dyn 2016; 34:1383-94. [PMID: 26524490 DOI: 10.1080/07391102.2015.1080628] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
For almost 50 years the conclusive explanation of Chargaff's second parity rule (CSPR), the equality of frequencies of nucleotides A=T and C=G or the equality of direct and reverse complement trinucleotides in the same DNA strand, has not been determined yet. Here, we relate CSPR to the interstrand mirror symmetry in 20 symbolic quadruplets of trinucleotides (direct, reverse complement, complement, and reverse) mapped to double-stranded genome. The symmetries of Q-box corresponding to quadruplets can be obtained as a consequence of Watson-Crick base pairing and CSPR together. Alternatively, assuming Natural symmetry law for DNA creation that each trinucleotide in one strand of DNA must simultaneously appear also in the opposite strand automatically leads to Q-box direct-reverse mirror symmetry which in conjunction with Watson-Crick base pairing generates CSPR. We demonstrate quadruplet's symmetries in chromosomes of wide range of organisms, from Escherichia coli to Neanderthal and human genomes, introducing novel quadruplet-frequency histograms and 3D-diagrams with combined interstrand frequencies. These "landscapes" are mutually similar in all mammals, including extinct Neanderthals, and somewhat different in most of older species. In human chromosomes 1-12, and X, Y the "landscapes" are almost identical and slightly different in the remaining smaller and telocentric chromosomes. Quadruplet frequencies could provide a new robust tool for characterization and classification of genomes and their evolutionary trajectories.
Collapse
Affiliation(s)
- Marija Rosandić
- a Croatian Academy of Sciences and Arts, HAZU, Bioinformatics and Biological Physics , Zrinski trg 11, 10000 Zagreb , Croatia
| | - Ines Vlahović
- b Faculty of Science , University of Zagreb , Bijenicka 32, 10000 Zagreb , Croatia
| | - Matko Glunčić
- b Faculty of Science , University of Zagreb , Bijenicka 32, 10000 Zagreb , Croatia
| | - Vladimir Paar
- a Croatian Academy of Sciences and Arts, HAZU, Bioinformatics and Biological Physics , Zrinski trg 11, 10000 Zagreb , Croatia.,b Faculty of Science , University of Zagreb , Bijenicka 32, 10000 Zagreb , Croatia
| |
Collapse
|
10
|
Zhang SH. Persistence and breakdown of strand symmetry in the human genome. J Theor Biol 2015; 370:202-4. [PMID: 25576243 DOI: 10.1016/j.jtbi.2014.12.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 12/26/2014] [Accepted: 12/29/2014] [Indexed: 10/24/2022]
Abstract
Afreixo, V., Bastos, C.A.C., Garcia, S.P., Rodrigues, J.M.O.S., Pinho, A.J., Ferreira, P.J.S.G., 2013. The breakdown of the word symmetry in the human genome. J. Theor. Biol. 335, 153-159 analyzed the word symmetry (strand symmetry or the second parity rule) in the human genome. They concluded that strand symmetry holds for oligonucleotides up to 6 nt and is no longer statistically significant for oligonucleotides of higher orders. However, although they provided some new results for the issue, their interpretation would not be fully justified. Also, their conclusion needs to be further evaluated. Further analysis of their results, especially those of equivalence tests and word symmetry distance, shows that strand symmetry would persist for higher-order oligonucleotides up to 9 nt in the human genome, at least for its overall frequency framework (oligonucleotide frequency pattern).
Collapse
Affiliation(s)
- Shang-Hong Zhang
- Key Laboratory of Gene Engineering of Ministry of Education, and Biotechnology Research Center, Sun Yat-sen University, Guangzhou 510275, China.
| |
Collapse
|
11
|
Afreixo V, Rodrigues JMOS, Bastos CAC. Analysis of single-strand exceptional word symmetry in the human genome: new measures. Biostatistics 2014; 16:209-21. [PMID: 25190514 DOI: 10.1093/biostatistics/kxu041] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Some previous studies suggest the extension of Chargaff's second rule (the phenomenon of symmetry in a single DNA strand) to long DNA words. However, in random sequences generated under an independent symbol model where complementary nucleotides have equal occurrence probabilities, we expect the phenomenon of symmetry to hold for any word length. In this work, we develop new statistical methods to measure the exceptional symmetry. Exceptional symmetry is a refinement of Chargaff's second parity rule that highlights the words whose frequency of occurrence is similar to that of its reversed complement but dissimilar to the frequencies of occurrence of other words which contain the same number of nucleotides A or T. We analyze words of lengths up to 12 in the complete human genome and in each chromosome separately. We assess exceptional symmetry globally, by word group, and by word. We conclude that the global symmetry present in the human genome is clearly exceptional and significant. The chromosomes present distinct exceptional symmetry profiles. There are several exceptional word groups and exceptional words with a strong exceptional symmetry.
Collapse
Affiliation(s)
- Vera Afreixo
- Department of Mathematics, University of Aveiro, 3810-193 Aveiro, PortugalCIDMA, University of Aveiro, 3810-193 Aveiro, PortugalIEETA, University of Aveiro, 3810-193 Aveiro, Portugal
| | - João M O S Rodrigues
- Department of Electronics, Telecommunications and Informatics, University of Aveiro, 3810-193 Aveiro, PortugalIEETA, University of Aveiro, 3810-193 Aveiro, Portugal
| | - Carlos A C Bastos
- Department of Electronics, Telecommunications and Informatics, University of Aveiro, 3810-193 Aveiro, PortugalIEETA, University of Aveiro, 3810-193 Aveiro, Portugal
| |
Collapse
|