1
|
Forsdyke DR. Genomic compliance with Chargaff's second parity rule may have originated non-adaptively, but stem-loops now function adaptively. J Theor Biol 2024; 595:111943. [PMID: 39277166 DOI: 10.1016/j.jtbi.2024.111943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 07/06/2024] [Accepted: 09/07/2024] [Indexed: 09/17/2024]
Abstract
Of Chargaff's four rules on DNA base quantity, his second parity rule (PR-2) is the most contentious. Various biometricians (e.g., Sueoka, Lobry) regarded PR-2 compliance as a non-adaptive feature of modern genomes that could be modeled through interrelations among mutation rates. However, PR-2 compliance with stem-loop potential was considered adaptively relevant by biochemists familiar with analyses of nucleic acid structure (e.g., of Crick) and of meiotic recombination (e.g., of Kleckner). Meanwhile, other biometricians had shown that PR-2 complementarity extended beyond individual bases (1-mers) to oligonucleotides (k-mers), possibly reflecting "advantageous DNA structure" (Nussinov). An "introns early" hypothesis (Reanney, Forsdyke) had suggested a primordial nucleic acid world with recombination-mediated error-correction requiring genome-wide stem-loop potential to have evolved prior to localized intrusions of protein-encoding potential (exons). Thus, a primordial genome was equivalent to one long intron. Indeed, when assessed as the base order-dependent component (correcting for local influences of GC%), modern genes, especially when evolving rapidly under positive Darwinian selection, display high intronic stem-loop potential. This suggests forced migration from neighboring exons by competing protein-encoding potential. PR-2 compliance may have first arisen non-adaptively. Primary prototypic structures were later strengthened by their adaptive contribution to recombination. Thus, contentious views may actually be in harmony.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, Ontario K7L3N6, Canada.
| |
Collapse
|
2
|
Jacquat AG, Theumer MG, Dambolena JS. Selective and non-selective evolutionary signatures found in the simplest replicative biological entities. J Evol Biol 2024; 37:862-876. [PMID: 38822575 DOI: 10.1093/jeb/voae070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 05/30/2024] [Indexed: 06/03/2024]
Abstract
Mitoviruses, which are considered evolutionary relics of extinct alpha-proteobacteria RNA phages, represent one of the simplest self-replicating biological systems. This study aims to quantitatively describe genomes and identify potential genomic signatures that support the protein phylogenetic-based classification criterion. Genomic variables, such as mononucleotide and dinucleotide composition, codon usage bias, and minimal free energy derived from optimized predicted RNA secondary structure, were analyzed. From the values obtained, the main evolutionary pressures were discussed, indicating that natural selection plays a significant role in shaping mitovirus genomes. However, neutral evolution also makes a significant contribution. This study reveals a significant discovery of structural divergence in Kvaramitovirus. The energy minimization approach employed to study 2D folding in this study reveals a distinct spatial organization of their genomes, providing evidence for the hypothesis of a single evolutionary event of circularization in the most recent common ancestor of the lineage. This hypothesis was discussed in light of recent discoveries by other researchers that partially support the existence of mitoviruses with circular genomes. Finally, this study represents a significant advancement in the understanding of mitoviruses, as it quantitatively describes the nucleotide sequence at the family and genus taxonomic levels. Additionally, we provide hypotheses that can be experimentally validated to inspire new research and address the gaps in knowledge of this fascinating, basally divergent RNA virus lineage.
Collapse
Affiliation(s)
- Andrés Gustavo Jacquat
- Facultad de Ciencias Exactas Físicas y Naturales (FCEFyN), Universidad Nacional de Córdoba (UNC), Córdoba, Argentina
- Instituto Multidisciplinario de Biología Vegetal (IMBIV), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Córdoba, Argentina
| | - Martín Gustavo Theumer
- Departamento de Bioquímica Clínica, Facultad de Ciencias Químicas (FCQ), Universidad Nacional de Córdoba (UNC), Córdoba, Argentina
- Centro de Investigaciones en Bioquímica Clínica e Inmunología (CIBICI), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Córdoba, Argentina
| | - José Sebastián Dambolena
- Facultad de Ciencias Exactas Físicas y Naturales (FCEFyN), Universidad Nacional de Córdoba (UNC), Córdoba, Argentina
- Instituto Multidisciplinario de Biología Vegetal (IMBIV), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Córdoba, Argentina
| |
Collapse
|
3
|
de la Fuente R, Díaz-Villanueva W, Arnau V, Moya A. Genomic Signature in Evolutionary Biology: A Review. BIOLOGY 2023; 12:biology12020322. [PMID: 36829597 PMCID: PMC9953303 DOI: 10.3390/biology12020322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 02/11/2023] [Accepted: 02/13/2023] [Indexed: 02/19/2023]
Abstract
Organisms are unique physical entities in which information is stored and continuously processed. The digital nature of DNA sequences enables the construction of a dynamic information reservoir. However, the distinction between the hardware and software components in the information flow is crucial to identify the mechanisms generating specific genomic signatures. In this work, we perform a bibliometric analysis to identify the different purposes of looking for particular patterns in DNA sequences associated with a given phenotype. This study has enabled us to make a conceptual breakdown of the genomic signature and differentiate the leading applications. On the one hand, it refers to gene expression profiling associated with a biological function, which may be shared across taxa. This signature is the focus of study in precision medicine. On the other hand, it also refers to characteristic patterns in species-specific DNA sequences. This interpretation plays a key role in comparative genomics, identifying evolutionary relationships. Looking at the relevant studies in our bibliographic database, we highlight the main factors causing heterogeneities in genome composition and how they can be quantified. All these findings lead us to reformulate some questions relevant to evolutionary biology.
Collapse
Affiliation(s)
- Rebeca de la Fuente
- Institute of Integrative Systems Biology (I2Sysbio), University of Valencia and Spanish Research Council (CSIC), 46980 Valencia, Spain
- Correspondence:
| | - Wladimiro Díaz-Villanueva
- Institute of Integrative Systems Biology (I2Sysbio), University of Valencia and Spanish Research Council (CSIC), 46980 Valencia, Spain
| | - Vicente Arnau
- Institute of Integrative Systems Biology (I2Sysbio), University of Valencia and Spanish Research Council (CSIC), 46980 Valencia, Spain
| | - Andrés Moya
- Institute of Integrative Systems Biology (I2Sysbio), University of Valencia and Spanish Research Council (CSIC), 46980 Valencia, Spain
- Foundation for the Promotion of Sanitary and Biomedical Research of the Valencian Community (FISABIO), 46020 Valencia, Spain
- CIBER in Epidemiology and Public Health (CIBEResp), 28029 Madrid, Spain
| |
Collapse
|
4
|
Neutralism versus selectionism: Chargaff's second parity rule, revisited. Genetica 2021; 149:81-88. [PMID: 33880685 PMCID: PMC8057000 DOI: 10.1007/s10709-021-00119-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 04/09/2021] [Indexed: 11/03/2022]
Abstract
Of Chargaff's four "rules" on DNA base frequencies, the functional interpretation of his second parity rule (PR2) is the most contentious. Thermophile base compositions (GC%) were taken by Galtier and Lobry (1997) as favoring Sueoka's neutral PR2 hypothesis over Forsdyke's selective PR2 hypothesis, namely that mutations improving local within-species recombination efficiency had generated a genome-wide potential for the strands of duplex DNA to separate and initiate recombination through the "kissing" of the tips of stem-loops. However, following Chargaff's GC rule, base composition mainly reflects a species-specific, genome-wide, evolutionary pressure. GC% could not have consistently followed the dictates of temperature, since it plays fundamental roles in both sustaining species integrity and, through primarily neutral genome-wide mutation, fostering speciation. Evidence for a local within-species recombination-initiating role of base order was obtained with a novel technology that masked the contribution of base composition to nucleic acid folding energy. Forsdyke's results were consistent with his PR2 hypothesis, appeared to resolve some root problems in biology and provided a theoretical underpinning for alignment-free taxonomic analyses using relative oligonucleotide frequencies (k-mer analysis). Moreover, consistent with Chargaff's cluster rule, discovery of the thermoadaptive role of the "purine-loading" of open reading frames made less tenable the Galtier-Lobry anti-selectionist arguments.
Collapse
|
5
|
Ou Z, Ouzounis C, Wang D, Sun W, Li J, Chen W, Marlière P, Danchin A. A Path toward SARS-CoV-2 Attenuation: Metabolic Pressure on CTP Synthesis Rules the Virus Evolution. Genome Biol Evol 2020; 12:2467-2485. [PMID: 33125064 PMCID: PMC7665462 DOI: 10.1093/gbe/evaa229] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/23/2020] [Indexed: 02/06/2023] Open
Abstract
In the context of the COVID-19 pandemic, we describe here the singular metabolic background that constrains enveloped RNA viruses to evolve toward likely attenuation in the long term, possibly after a step of increased pathogenicity. Cytidine triphosphate (CTP) is at the crossroad of the processes allowing SARS-CoV-2 to multiply, because CTP is in demand for four essential metabolic steps. It is a building block of the virus genome, it is required for synthesis of the cytosine-based liponucleotide precursors of the viral envelope, it is a critical building block of the host transfer RNAs synthesis and it is required for synthesis of dolichol-phosphate, a precursor of viral protein glycosylation. The CCA 3'-end of all the transfer RNAs required to translate the RNA genome and further transcripts into the proteins used to build active virus copies is not coded in the human genome. It must be synthesized de novo from CTP and ATP. Furthermore, intermediary metabolism is built on compulsory steps of synthesis and salvage of cytosine-based metabolites via uridine triphosphate that keep limiting CTP availability. As a consequence, accidental replication errors tend to replace cytosine by uracil in the genome, unless recombination events allow the sequence to return to its ancestral sequences. We document some of the consequences of this situation in the function of viral proteins. This unique metabolic setup allowed us to highlight and provide a raison d'être to viperin, an enzyme of innate antiviral immunity, which synthesizes 3'-deoxy-3',4'-didehydro-CTP as an extremely efficient antiviral nucleotide.
Collapse
Affiliation(s)
- Zhihua Ou
- BGI-Shenzhen, Shenzhen, China.,Shenzhen Key Laboratory of Unknown Pathogen Identification, BGI-Shenzhen, Shenzhen, China
| | - Christos Ouzounis
- Biological Computation and Process Laboratory, Centre for Research and Technology Hellas, Chemical Process and Energy Resources Institute, Thessalonica, Greece
| | - Daxi Wang
- BGI-Shenzhen, Shenzhen, China.,Shenzhen Key Laboratory of Unknown Pathogen Identification, BGI-Shenzhen, Shenzhen, China
| | - Wanying Sun
- BGI-Shenzhen, Shenzhen, China.,Shenzhen Key Laboratory of Unknown Pathogen Identification, BGI-Shenzhen, Shenzhen, China.,BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, China
| | - Junhua Li
- BGI-Shenzhen, Shenzhen, China.,Shenzhen Key Laboratory of Unknown Pathogen Identification, BGI-Shenzhen, Shenzhen, China
| | - Weijun Chen
- Shenzhen Key Laboratory of Unknown Pathogen Identification, BGI-Shenzhen, Shenzhen, China.,BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen, China
| | - Philippe Marlière
- TESSSI, The European Syndicate of Synthetic Scientists and Industrialists, Paris, France
| | - Antoine Danchin
- Kodikos Labs, Institut Cochin, Paris, France.,School of Biomedical Sciences, Li KaShing Faculty of Medicine, Hong Kong University, Pokfulam, Hong Kong
| |
Collapse
|
6
|
Comparative Genomics Unveils Regionalized Evolution of the Faustovirus Genomes. Viruses 2020; 12:v12050577. [PMID: 32456325 PMCID: PMC7290515 DOI: 10.3390/v12050577] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 05/19/2020] [Accepted: 05/22/2020] [Indexed: 11/17/2022] Open
Abstract
Faustovirus is a recently discovered genus of large DNA virus infecting the amoeba Vermamoeba vermiformis, which is phylogenetically related to Asfarviridae. To better understand the diversity and evolution of this viral group, we sequenced six novel Faustovirus strains, mined published metagenomic datasets and performed a comparative genomic analysis. Genomic sequences revealed three consistent phylogenetic groups, within which genetic diversity was moderate. The comparison of the major capsid protein (MCP) genes unveiled between 13 and 18 type-I introns that likely evolved through a still-active birth and death process mediated by intron-encoded homing endonucleases that began before the Faustovirus radiation. Genome-wide alignments indicated that despite genomes retaining high levels of gene collinearity, the central region containing the MCP gene together with the extremities of the chromosomes evolved at a faster rate due to increased indel accumulation and local rearrangements. The fluctuation of the nucleotide composition along the Faustovirus (FV) genomes is mostly imprinted by the consistent nucleotide bias of coding sequences and provided no evidence for a single DNA replication origin like in circular bacterial genomes.
Collapse
|
7
|
Danchin A, Marlière P. Cytosine drives evolution of SARS-CoV-2. Environ Microbiol 2020; 22:1977-1985. [PMID: 32291894 PMCID: PMC7262064 DOI: 10.1111/1462-2920.15025] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2020] [Accepted: 04/13/2020] [Indexed: 12/11/2022]
Affiliation(s)
- Antoine Danchin
- Kodikos Labs, 24 rue Jean Baldassini, 69007 Lyon/Institut Cochin, 75013 Paris, France
| | - Philippe Marlière
- TESSSI, The European Syndicate of Synthetic Scientists and Industrialists, 81 rue Réaumur, 75002, Paris, France
| |
Collapse
|
8
|
Forsdyke DR. Success of alignment-free oligonucleotide (k-mer) analysis confirms relative importance of genomes not genes in speciation and phylogeny. Biol J Linn Soc Lond 2019. [DOI: 10.1093/biolinnean/blz096] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
AbstractThe utility of DNA sequence substrings (k-mers) in alignment-free phylogenetic classification, including that of bacteria and viruses, is increasingly recognized. However, its biological basis eludes many 21st century practitioners. A path from the 19th century recognition of the informational basis of heredity to the modern era can be discerned. Crick’s DNA ‘unpairing postulate’ predicted that recombinational pairing of homologous DNAs during meiosis would be mediated by short k-mers in the loops of stem-loop structures extruded from classical duplex helices. The complementary ‘kissing’ duplex loops – like tRNA anticodon–codon k-mer duplexes – would seed a more extensive pairing that would then extend until limited by lack of homology or other factors. Indeed, this became the principle behind alignment-based methods that assessed similarity by degree of DNA–DNA reassociation in vitro. These are now seen as less sensitive than alignment-free methods that are closely consistent, both theoretically and mechanistically, with chromosomal anti-recombination models for the initiation of divergence into new species. The analytical power of k-mer differences supports the theses that evolutionary advance sometimes serves the needs of nucleic acids (genomes) rather than proteins (genes), and that such differences can play a role in early speciation events.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Department of Biomedical and Molecular Sciences, Queen’s University, Kingston, Ontario, Canada
| |
Collapse
|
9
|
Li W, Thanos D, Provata A. Quantifying local randomness in human DNA and RNA sequences using Erdös motifs. J Theor Biol 2018; 461:41-50. [PMID: 30336158 DOI: 10.1016/j.jtbi.2018.09.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Revised: 08/14/2018] [Accepted: 09/25/2018] [Indexed: 10/28/2022]
Abstract
In 1932, Paul Erdös asked whether a random walk constructed from a binary sequence can achieve the lowest possible deviation (lowest discrepancy), for the sequence itself and for all its subsequences formed by homogeneous arithmetic progressions. Although avoiding low discrepancy is impossible for infinite sequences, as recently proven by Terence Tao, attempts were made to construct such sequences with finite lengths. We recognize that such constructed sequences (we call these "Erdös sequences") exhibit certain hallmarks of randomness at the local level: they show roughly equal frequencies of short subsequences, and at the same time exclude trivial periodic patterns. For the human DNA we examine the frequency of a set of Erdös motifs of length-10 using three nucleotides-to-binary mappings. The particular length-10 Erdös sequence is derived from the length-11 Mathias sequence and is identical with the first 10 digits of the Thue-Morse sequence, underscoring the fact that both are deficient in periodicities. Our calculations indicate that: (1) the purine(A and G)/pyridimine(C and T) based Erdös motifs are greatly underrepresented in the human genome, (2) the strong(G and C)/weak(A and T) based Erdös motifs are slightly overrepresented, (3) the densities of the two are negatively correlated, (4) the Erdös motifs based on all three mappings being combined are slightly underrepresented, and (5) the strong/weak based Erdös motifs are greatly overrepresented in the human messenger RNA sequences.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, USA.
| | - Dimitrios Thanos
- Department of Mathematics, National and Kapodistrian University of Athens, Athens GR-15784, Greece; Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", Athens GR-15341, Greece
| | - Astero Provata
- Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", Athens GR-15341, Greece
| |
Collapse
|
10
|
Tavares AH, Raymaekers J, Rousseeuw PJ, Silva RM, Bastos CAC, Pinho A, Brito P, Afreixo V. Comparing Reverse Complementary Genomic Words Based on Their Distance Distributions and Frequencies. Interdiscip Sci 2018; 10:1-11. [PMID: 29214497 DOI: 10.1007/s12539-017-0273-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2017] [Revised: 10/04/2017] [Accepted: 11/08/2017] [Indexed: 06/07/2023]
Abstract
In this work, we study reverse complementary genomic word pairs in the human DNA, by comparing both the distance distribution and the frequency of a word to those of its reverse complement. Several measures of dissimilarity between distance distributions are considered, and it is found that the peak dissimilarity works best in this setting. We report the existence of reverse complementary word pairs with very dissimilar distance distributions, as well as word pairs with very similar distance distributions even when both distributions are irregular and contain strong peaks. The association between distribution dissimilarity and frequency discrepancy is also explored, and it is speculated that symmetric pairs combining low and high values of each measure may uncover features of interest. Taken together, our results suggest that some asymmetries in the human genome go far beyond Chargaff's rules. This study uses both the complete human genome and its repeat-masked version.
Collapse
Affiliation(s)
- Ana Helena Tavares
- Department of Mathematics and CIDMA and iBiMED, University of Aveiro, Aveiro, Portugal.
| | | | | | - Raquel M Silva
- Department of Medical Sciences and iBiMED and IEETA, University of Aveiro, Aveiro, Portugal
| | - Carlos A C Bastos
- Department of Electronics Telecommunications and Informatics and IEETA, University of Aveiro, Aveiro, Portugal
| | - Armando Pinho
- Department of Electronics Telecommunications and Informatics and IEETA, University of Aveiro, Aveiro, Portugal
| | - Paula Brito
- Faculty of Economics and LIAAD-INESC TEC, University of Porto, Porto, Portugal
| | - Vera Afreixo
- Department of Mathematics and CIDMA and iBiMED and IEETA, University of Aveiro, Aveiro, Portugal
| |
Collapse
|
11
|
Tavares AHMP, Pinho AJ, Silva RM, Rodrigues JMOS, Bastos CAC, Ferreira PJSG, Afreixo V. DNA word analysis based on the distribution of the distances between symmetric words. Sci Rep 2017; 7:728. [PMID: 28389642 PMCID: PMC5428789 DOI: 10.1038/s41598-017-00646-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Accepted: 03/02/2017] [Indexed: 02/01/2023] Open
Abstract
We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expected.
Collapse
Affiliation(s)
- Ana H M P Tavares
- Department of Mathematics & CIDMA, University of Aveiro, Aveiro, Portugal.,Department of Medical Sciences & iBiMED, University of Aveiro, Aveiro, Portugal
| | - Armando J Pinho
- Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal.,IEETA, University of Aveiro, Aveiro, Portugal
| | - Raquel M Silva
- Department of Medical Sciences & iBiMED, University of Aveiro, Aveiro, Portugal.,IEETA, University of Aveiro, Aveiro, Portugal
| | - João M O S Rodrigues
- Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal.,IEETA, University of Aveiro, Aveiro, Portugal
| | - Carlos A C Bastos
- Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal.,IEETA, University of Aveiro, Aveiro, Portugal
| | - Paulo J S G Ferreira
- Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal.,IEETA, University of Aveiro, Aveiro, Portugal
| | - Vera Afreixo
- Department of Mathematics & CIDMA, University of Aveiro, Aveiro, Portugal. .,Department of Medical Sciences & iBiMED, University of Aveiro, Aveiro, Portugal. .,IEETA, University of Aveiro, Aveiro, Portugal.
| |
Collapse
|
12
|
Gouveia S, Scotto MG, Weiß CH, Ferreira PJSG. Binary auto-regressive geometric modelling in a DNA context. J R Stat Soc Ser C Appl Stat 2016. [DOI: 10.1111/rssc.12172] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
13
|
Crucial steps to life: From chemical reactions to code using agents. Biosystems 2016; 140:49-57. [DOI: 10.1016/j.biosystems.2015.12.007] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 12/05/2015] [Accepted: 12/07/2015] [Indexed: 01/21/2023]
|
14
|
Forsdyke DR. Homostability. Evol Bioinform Online 2016. [DOI: 10.1007/978-3-319-28755-3_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
|
15
|
Lymphocyte repertoire selection and intracellular self/non-self-discrimination: historical overview. Immunol Cell Biol 2014; 93:297-304. [PMID: 25385066 DOI: 10.1038/icb.2014.96] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2014] [Revised: 09/19/2014] [Accepted: 10/15/2014] [Indexed: 02/07/2023]
Abstract
Immunological self/non-self-discrimination is conventionally seen as an extracellular event, involving interactions been receptors on T cells pre-educated to discriminate and peptides bound to major histocompatibility complex proteins (pMHCs). Mechanisms by which non-self peptides might first be sorted intracellularly to distinguish them from the vast excess of self-peptides have long been called for. Recent demonstrations of endogenous peptide-specific clustering of pMHCs on membrane rafts are indicative of intracellular enrichment before surface display. The clustering could follow the specific aggregation of a foreign protein that exceeded its solubility limit in the crowded intracellular environment. Predominantly entropy-driven, this homoaggregation would colocalize identical peptides, thus facilitating their collective presentation. Concentrations of self-proteins are fine-tuned over evolutionary time to avoid this. Disparate observations, such as pyrexia and female susceptibility to autoimmune disease, can be explained in terms of the need to cosegregate cognate pMHC complexes internally before extracellular display.
Collapse
|
16
|
Borzov EA, Marakhonov AV, Ivanov MV, Drozdova PB, Baranova AV, Skoblov MY. RANDTRAN: Random transcriptome sequence generator that accounts for partition specific features in eukaryotic mRNA datasets. Mol Biol 2014. [DOI: 10.1134/s0026893314050021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
17
|
Wang S, Tu J, Jia Z, Lu Z. High order intra-strand partial symmetry increases with organismal complexity in animal evolution. Sci Rep 2014; 4:6400. [PMID: 25263801 PMCID: PMC4178289 DOI: 10.1038/srep06400] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2014] [Accepted: 08/28/2014] [Indexed: 12/02/2022] Open
Abstract
For sufficiently long genomic sequence, the frequency of any short nucleotide fragment on one strand is approximately equal to the frequency of its reverse complement on the same strand. Despite being studied over two decades, the precise mechanism involved has not yet been made clear. In this study, we calculated the high order intra-strand partial symmetry (IPS) for 14 animal species by using a fixed sliding window method to scan each genome sequence. The study showed that the IPS was positive associated with organismal complexity measured by the number of distinct cell types. The results indicated that the IPS might be resulted from the increasing of functional non-coding DNAs, and plays an important role in the evolution process of complex body plans.
Collapse
Affiliation(s)
- Shengqin Wang
- State Key Lab of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Jing Tu
- State Key Lab of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Zhongwei Jia
- National Institute of Drug Dependence, Peking University, Beijing 100191, China
| | - Zuhong Lu
- State Key Lab of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China
- Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100781, China
| |
Collapse
|
18
|
Satapathy SS, Powdel BR, Dutta M, Buragohain AK, Ray SK. Constraint on di-nucleotides by codon usage bias in bacterial genomes. Gene 2013; 536:18-28. [PMID: 24333347 DOI: 10.1016/j.gene.2013.11.098] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2013] [Revised: 11/18/2013] [Accepted: 11/25/2013] [Indexed: 10/25/2022]
Abstract
It has been reported earlier that the relative di-nucleotide frequency (RDF) in different parts of a genome is similar while the frequency is variable among different genomes. So RDF is termed as genome signature in bacteria. It is not known if the constancy in RDF is governed by genome wide mutational bias or by selection. Here we did comparative analysis of RDF between the inter-genic and the coding sequences in seventeen bacterial genomes, whose gene expression data was available. The constraint on di-nucleotides was found to be higher in the coding sequences than that in the inter-genic regions and the constraint at the 2nd codon position was more than that in the 3rd position within a genome. Further analysis revealed that the constraint on di-nucleotides at the 2nd codon position is greater in the high expression genes (HEG) than that in the whole genomes as well as in the low expression genes (LEG). We analyzed RDF at the 2nd and the 3rd codon positions in simulated coding sequences that were computationally generated by keeping the codon usage bias (CUB) according to genome G+C composition and the sequence of amino acids unaltered. In the simulated coding sequences, the constraint observed was significantly low and no significant difference was observed between the HEG and the LEG in terms of di-nucleotide constraint. This indicated that the greater constraint on di-nucleotides in the HEG was due to the stronger selection on CUB in these genes in comparison to the LEG within a genome. Further, we did comparative analyses of the RDF in the HEG rpoB and rpoC of 199 bacteria, which revealed a common pattern of constraints on di-nucleotides at the 2nd codon position across these bacteria. To validate the role of CUB on di-nucleotide constraint, we analyzed RDF at the 2nd and the 3rd codon positions in simulated rpoB/rpoC sequences. The analysis revealed that selection on CUB is an important attribute for the constraint on di-nucleotides at these positions in bacterial genomes. We believe that this study has come with major findings of the role of CUB on di-nucleotide constraint in bacterial genomes.
Collapse
Affiliation(s)
| | - Bhes Raj Powdel
- Department of Statistics, Darrang College, Tezpur, Assam 784001, India
| | - Malay Dutta
- Department of Computer Science and Engineering, Tezpur University, Tezpur, Assam 784 028, India
| | - Alak Kumar Buragohain
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, Assam 784 028, India; Dibrugarh University, Dibrugarh, Assam 786004, India
| | - Suvendra Kumar Ray
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, Assam 784 028, India.
| |
Collapse
|
19
|
Forsdyke DR. Implications of HIV RNA structure for recombination, speciation, and the neutralism-selectionism controversy. Microbes Infect 2013; 16:96-103. [PMID: 24211872 DOI: 10.1016/j.micinf.2013.10.017] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2013] [Revised: 10/24/2013] [Accepted: 10/24/2013] [Indexed: 11/29/2022]
Abstract
The conflict between the needs to encode both a protein (impaired by non-synonymous mutation), and nucleic acid structure (impaired by synonymous or non-synonymous mutation), can sometimes be resolved in favour of the nucleic acid because its structure is critical for a selectively advantageous genome-wide activity--recombination. However, above a sequence difference threshold, recombination is impaired. It may then be advantageous for new species to arise. Building on the work of Grantham and others critical of the neutralist viewpoint, heuristic support for this hypothesis emerged from studies of the base composition and structure of retroviral genomes. The extreme enrichment in the purine A of the RNA of human immunodeficiency virus (HIV-1), parallels the mild purine-loading of the RNAs of most organisms, for which there is an adaptive explanation--immune evasion. However, human T cell leukaemia virus (HTLV-1), with the potential to invade the same host cell, shows extreme enrichment in the pyrimidine C. Assuming the low GC% HIV and the high GC% HTLV-1 to share a common ancestor, it was postulated that differences in GC% had arisen to prevent homologous recombination between these emerging lentiviral species. Sympatrically isolated by this intracellular reproductive barrier, prototypic HIV-1 seized the AU-rich (low GC%) high ground (thus committing to purine A rather than purine G). Prototypic HTLV-1 forwent this advantage and evolved an independent evolutionary strategy--similar to that of the GC%-rich Epstein-Barr virus--profound latency maintained by transcription of one purine-rich mRNA. The evidence supporting these interpretations is reviewed.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON K7L3N6, Canada.
| |
Collapse
|
20
|
Zhang H, Li P, Zhong HS, Zhang SH. Conservation vs. variation of dinucleotide frequencies across bacterial and archaeal genomes: evolutionary implications. Front Microbiol 2013; 4:269. [PMID: 24046767 PMCID: PMC3764401 DOI: 10.3389/fmicb.2013.00269] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2013] [Accepted: 08/19/2013] [Indexed: 11/13/2022] Open
Abstract
During the long history of biological evolution, genome structures have undergone enormous changes. Nevertheless, some traits or vestiges of the primordial genome (defined as the most primitive nucleic acid genome for life on earth in this paper) may remain in modern genetic systems. It is of great importance to find these traits or vestiges for the study of the origin and evolution of genomes. As the shorter is a sequence, the less probable it would be modified during genome evolution. And if mutated, it would be easier to reappear at the same site or another site. Consequently, the genomic frequencies of very short nucleotide sequences, such as dinucleotides, would have considerable chances to be conserved during billions of years of evolution. Prokaryotic genomes are very diverse and with a wide range of GC content. Therefore, in order to find traits or vestiges of the primordial genome remained in modern genetic systems, we have studied the characteristics of dinucleotide frequencies across bacterial and archaeal genomes. We analyzed the dinucleotide frequency patterns of the whole-genome sequences from more than 1300 prokaryotic species (bacterial and archaeal genomes available as of December 2012). The results show that the frequencies of the dinucleotides AC, AG, CA, CT, GA, GT, TC, and TG are well-conserved across various genomes, while the frequencies of other dinucleotides vary considerably among species. The dinucleotide frequency conservation/variation pattern seems to correlate with the distributions of dinucleotides throughout a genome and across genomes. Further analysis indicates that the phenomenon would be determined by strand symmetry of genomic sequences (the second parity rule) and GC content variations among genomes. We discussed some possible origins of strand symmetry. And we propose that the phenomenon of frequency conservation of some dinucleotides may provide insights into the genomic composition of the primordial genetic system.
Collapse
Affiliation(s)
| | | | | | - Shang-Hong Zhang
- Key Laboratory of Gene Engineering of Ministry of Education, and Biotechnology Research Center, Sun Yat-sen UniversityGuangzhou, China
| |
Collapse
|
21
|
Zhang SH, Wang L. Two common profiles exist for genomic oligonucleotide frequencies. BMC Res Notes 2012; 5:639. [PMID: 23158698 PMCID: PMC3532236 DOI: 10.1186/1756-0500-5-639] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2012] [Accepted: 11/14/2012] [Indexed: 11/19/2022] Open
Abstract
Background It was reported that there is a majority profile for trinucleotide frequencies among genomes. And further study has revealed that two common profiles, rather than one majority profile, exist for genomic trinucleotide frequencies. However, the origins of the common/majority profile remain elusive. Moreover, it is not clear whether the features of common profile may be extended to oligonucleotides other than trinucleotides. Findings We analyzed 571 prokaryotic genomes (chromosomes) and some selected eukaryotic nuclear genomes as well as other genetic systems to study their compositional features. We found that there are also two common profiles for genomic oligonucleotide frequencies: one is from low-GC content genomes, and the other is from high-GC content genomes. Furthermore, each common profile is highly correlated to the average profile of random sequences with corresponding GC content and generated according to first-order symmetry. Conclusions The causes for the existence of two common profiles would mainly be GC content variations and strand symmetry of genomic sequences. Therefore, both GC content and strand symmetry would play important roles in genome evolution.
Collapse
Affiliation(s)
- Shang-Hong Zhang
- Key Laboratory of Gene Engineering of Ministry of Education, and Biotechnology Research Center, Sun Yat-sen University, Guangzhou, 510275, China.
| | | |
Collapse
|
22
|
Arakawa K, Tomita M. Measures of compositional strand bias related to replication machinery and its applications. Curr Genomics 2012; 13:4-15. [PMID: 22942671 PMCID: PMC3269016 DOI: 10.2174/138920212799034749] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2011] [Revised: 09/10/2011] [Accepted: 09/20/2011] [Indexed: 11/22/2022] Open
Abstract
The compositional asymmetry of complementary bases in nucleotide sequences implies the existence of a mutational or selectional bias in the two strands of the DNA duplex, which is commonly shaped by strand-specific mechanisms in transcription or replication. Such strand bias in genomes, frequently visualized by GC skew graphs, is used for the computational prediction of transcription start sites and replication origins, as well as for comparative evolutionary genomics studies. The use of measures of compositional strand bias in order to quantify the degree of strand asymmetry is crucial, as it is the basis for determining the applicability of compositional analysis and comparing the strength of the mutational bias in different biological machineries in various species. Here, we review the measures of strand bias that have been proposed to date, including the ∆GC skew, the B1 index, the predictability score of linear discriminant analysis for gene orientation, the signal-to-noise ratio of the oligonucleotide bias, and the GC skew index. These measures have been predominantly designed for and applied to the analysis of replication-related mutational processes in prokaryotes, but we also give research examples in eukaryotes.
Collapse
Affiliation(s)
- Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Fujisawa 252-8520, Japan
| | | |
Collapse
|
23
|
Mahale KN, Kempraj V, Dasgupta D. Does the growth temperature of a prokaryote influence the purine content of its mRNAs? Gene 2012; 497:83-9. [PMID: 22305982 DOI: 10.1016/j.gene.2012.01.040] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2011] [Accepted: 01/19/2012] [Indexed: 11/20/2022]
Abstract
The formation and breaking of hydrogen bonds between nucleic acid bases are dependent on temperature. The high G+C content of organisms was surmised to be an adaptation for high temperature survival because of the thermal stability of G:C pairs. However, a survey of genomic GC% and optimum growth temperature (OGT) of several prokaryotes revoked any direct relation between them. Significantly high purine (R=A or G) content in mRNAs is also seen as a selective response for survival among thermophiles. Nevertheless, the biological relevance of thermophiles loading their unstable mRNAs with excess purines (purine-loading or R-loading) is not persuasive. Here, we analysed the mRNA sequences from the genomes of 168 prokaryotes (as obtained from NCBI Genome database) with their OGTs ranging from -5 °C to 100 °C to verify the relation between R-loading and OGT. Our analysis fails to demonstrate any correlation between R-loading of the mRNA pool and OGT of a prokaryote. The percentage of purine-loaded mRNAs in prokaryotes is found to be in a rough negative correlation with the genomic GC% (r(2)=0.655, slope=-1.478, P<000.1). We conclude that genomic GC% and bias against certain combinations of nucleotides drive the mRNA-synonymous (sense) strands of DNA towards variations in R-loading.
Collapse
|
24
|
Valdivia-Granda WA. Biodefense Oriented Genomic-Based Pathogen Classification Systems: Challenges and Opportunities. ACTA ACUST UNITED AC 2012; 3:1000113. [PMID: 25587492 PMCID: PMC4289626 DOI: 10.4172/2157-2526.1000113] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Countermeasures that will effectively prevent or diminish the impact of a biological attack will depend on the rapid and accurate generation and analysis of genomic information. Because of their increasing level of sensitivity, rapidly decreasing cost, and their ability to effectively interrogate the genomes of previously unknown organisms, Next Generation Sequencing (NGS) technologies are revolutionizing the biological sciences. However, the exponential accumulation microbial data is equally outpacing the computational performance of existing analytical tools in their ability to translate DNA information into reliable detection, prophylactic and therapeutic countermeasures. It is now evident that the bottleneck for next-generation sequence data analysis will not be solved simply by scaling up our computational resources, but rather accomplished by implementing novel biodefense-oriented algorithms that overcome exiting vulnerabilities of speed, sensitivity and accuracy. Considering these circumstances, this document highlights the challenges and opportunities that biodefense stakeholders must consider in order to exploit more efficiently genomic information and translate this data into integrated countermeasures. The document overviews different genome analysis methods and explains concepts of DNA fingerprints, motif fingerprints, genomic barcodes and genomic signatures. A series of recommendations to promote genomics and bioinformatics as an effective form of deterrence and a valuable scientific platform for rapid technological insertion of detection, prophylactic, therapeutic countermeasures are discussed.
Collapse
|
25
|
Abstract
Among species within a phylogenetic group, genomic GC% values can cover a wide range that is particularly evident at third codon positions. However, among genes within a genome, genic GC% values can also cover a wide range that is, again, particularly evident at third codon positions. Individual genes and genomes each have a "homostabilizing propensity" to adopt a relatively uniform GC%. Each gene (a "microisochore") occupies a discrete GC% niche of relatively uniform base composition amongst its fellow genes, which can collectively span a wide GC% range. Homostabilization serves to recombinationally isolate both genome sectors (facilitating gene duplication and differentiation) and genomes (facilitating genome duplication and differentiation; e.g., speciation). Although they may sometimes be in conflict, the individualities of genomes, and of genes within those genomes, are separately sustained by a common mechanism, uniformity of GC%. The protection against inadvertent recombination afforded by GC% differentiation is, in the general case, a prerequisite for phenotypic differentiation.
Collapse
Affiliation(s)
- D. R. FORSDYKE
- Department of Biochemistry, Queen's University, Kingston, Ontario K7L3N6, Canada
| |
Collapse
|
26
|
Abstract
To detect positive Darwinian selection it is thought essential to compare two sequences. Despite its defects, "the comparative method rules." However, genes evolving rapidly under positive selection conflict more with internal forces (the genome phenotype) than genes evolving slowly under negative selection. In particular, there is conflict with stem-loop potential. The conflict between protein-encoding potential (primary information) and stem-loop potential (secondary information) permits detection of positive selection in a single sequence. The degree to which secondary information is compromised provides a measure of the speed of transmission of primary information. Thus, the sovereignty of the comparative method is challenged not only by its own defects, but also by the availability of a single-sequence method. However, while of limited utility for positive selection, the comparative method casts new light on Darwin's great question — the origin of species. Comparison of rates of synonymous and non-synonymous mutation suggests that branching into new species begins with synonymous mutations.
Collapse
Affiliation(s)
- DONALD R. FORSDYKE
- Department of Biochemistry, Queen's University, Kingston, Ontario, Canada K7L3N6, Canada
| |
Collapse
|
27
|
Abstract
Sometimes a cross between two individuals that appear to belong to the same species produces a sterile offspring (i.e., their hybrid is sterile). Thus, the two individuals appear reproductively isolated from each other. If each could find a compatible mate, then new species might emerge. At issue is whether the form of hybrid sterility that precedes sympatric differentiation into species is, in the general case, of genic or non-genic origin. Several recent papers lend the authority of William Bateson to the genic hypothesis, referring to the "Bateson–Dobzhansky–Muller hypothesis". All these papers cite a 1996 paper that, in turn, cites a 1909 paper of Bateson. However, from 1902 until 1926 the latter espoused a non-genic hypothesis that today would be classified as "chromosomal". Analysis of Bateson's 1909 text reveals no recantation. Bateson's non-genic view was similar to that advanced by Richard Goldschmidt in the 1940s. However, Bateson proposed a contribution from parents of abstract factors that, together in their hybrids, complement to bring about a negative effect (hybrid sterility). In contrast, Goldschmidt proposed that normally parents contribute complementary factors making parental chromosomes compatible at meiosis in their hybrids, which hence are fertile (i.e., the parental factors work together to produce a positive effect). When the factors are not sufficiently complementary the parental chromosomes are incompatible in their hybrids, which hence are sterile. The non-genic Batesonian–Goldschmidtian abstractions are now being fleshed-out chemically in terms of DNA base-composition differences.
Collapse
Affiliation(s)
- D. R. Forsdyke
- Department of Biochemistry, Queen's University, Kingston, Ontario K7L 3N6, Canada
| |
Collapse
|
28
|
Nakashima H, Kuroda Y. Differences in dinucleotide frequencies of thermophilic genes encoding water soluble and membrane proteins. J Zhejiang Univ Sci B 2011; 12:419-27. [PMID: 21634034 DOI: 10.1631/jzus.b1000331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The occurrence frequencies of the dinucleotides of genes of three thermophilic and three mesophilic species from both archaea and eubacteria were investigated in this study. The genes encoding water soluble proteins were rich in the dinucleotides of purine dimers, whereas the genes encoding membrane proteins were rich in pyrimidine dimers. The dinucleotides of purine dimers are the counterparts of pyrimidine dimers in a double-stranded DNA. The purine/pyrimidine dimers were favored in the thermophiles but not in the mesophiles, based on comparisons of observed and expected frequencies. This finding is in agreement with our previous study which showed that purine/pyrimidine dimers are positive factors that increase the thermal stability of DNA. The dinucleotides AA, AG, and GA are components of the codons of charged residues of Glu, Asp, Lys, and Arg, and the dinucleotides TT, CT, and TC are components of the codons of hydrophobic residues of Leu, Ile, and Phe. This is consistent with the suitabilities of the different amino acid residues for water soluble and membrane proteins. Our analysis provides a picture of how thermophilic species produce water soluble and membrane proteins with distinctive characters: the genes encoding water soluble proteins use DNA sequences rich in purine dimers, and the genes encoding membrane proteins use DNA sequences rich in pyrimidine dimers on the opposite strand.
Collapse
Affiliation(s)
- Hiroshi Nakashima
- Department of Clinical Laboratory Science, Graduate Course of Medical Science and Technology, School of Health Sciences, Kanazawa University, 5-11-80 Kodatsuno, Kanazawa 920-0942, Japan.
| | | |
Collapse
|
29
|
Yu JF, Xiao K, Jiang DK, Guo J, Wang JH, Sun X. An integrative method for identifying the over-annotated protein-coding genes in microbial genomes. DNA Res 2011; 18:435-49. [PMID: 21903723 PMCID: PMC3223076 DOI: 10.1093/dnares/dsr030] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
The falsely annotated protein-coding genes have been deemed one of the major causes accounting for the annotating errors in public databases. Although many filtering approaches have been designed for the over-annotated protein-coding genes, some are questionable due to the resultant increase in false negative. Furthermore, there is no webserver or software specifically devised for the problem of over-annotation. In this study, we propose an integrative algorithm for detecting the over-annotated protein-coding genes in microorganisms. Overall, an average accuracy of 99.94% is achieved over 61 microbial genomes. The extremely high accuracy indicates that the presented algorithm is efficient to differentiate the protein-coding genes from the non-coding open reading frames. Abundant analyses show that the predicting results are reliable and the integrative algorithm is robust and convenient. Our analysis also indicates that the over-annotated protein-coding genes can cause the false positive of horizontal gene transfers detection. The webserver of the proposed algorithm can be freely accessible from www.cbi.seu.edu.cn/RPGM.
Collapse
Affiliation(s)
- Jia-Feng Yu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.
| | | | | | | | | | | |
Collapse
|
30
|
Qu H, Wu H, Zhang T, Zhang Z, Hu S, Yu J. Nucleotide compositional asymmetry between the leading and lagging strands of eubacterial genomes. Res Microbiol 2010; 161:838-46. [PMID: 20868744 DOI: 10.1016/j.resmic.2010.09.015] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2010] [Accepted: 08/03/2010] [Indexed: 11/15/2022]
Abstract
Nucleotide compositional asymmetry (NCA) between leading and lagging strands (LeS and LaS) is dynamic and diverse among eubacterial genomes due to different mutation and selection forces. A thorough investigation is needed in order to study the relationship between nucleotide composition dynamics and gene distribution biases. Based on a collection of 364 eubacterial genomes that were grouped according to a DnaE-based scheme (DnaE1-DnaE1, DnaE2-DnaE1, and DnaE3-PolC), we investigated NCA and nucleotide composition gradients at three codon positions and found that there was universal G-enrichment on LeS among all groups. This was due to a strong selection for G-heading (codon position1 or cp1) codons and mutation pressure that led to more G-ending (cp3) codons. Moreover, a slight T-enrichment of LeS due to the mutation of cytosine deamination at cp3 was universal among DnaE1-DnaE1 and DnaE2-DnaE1 genomes, but was not clearly seen among DnaE3-PolC genomes, in which A-enrichment of LeS was proposed to be the effect of selections unique to polC and a mutation bias toward A-richness at cp1 that may be a result of transcription-coupled DNA repair mechanisms. Furthermore, strand-biased gene distribution enhances the purine-richness of LeS for DnaE3-PolC genomes and T-richness of LeS for DnaE1-DnaE1 and DnaE2-dnaE1 genomes.
Collapse
Affiliation(s)
- Hongzhu Qu
- Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China.
| | | | | | | | | | | |
Collapse
|
31
|
Zhang SH, Huang YZ. Limited contribution of stem-loop potential to symmetry of single-stranded genomic DNA. ACTA ACUST UNITED AC 2009; 26:478-85. [PMID: 20031973 DOI: 10.1093/bioinformatics/btp703] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
MOTIVATION The phenomenon of strand symmetry, which may provide clues to genome evolution, exists in all prokaryotic and eukaryotic genomes studied. Several possible mechanisms for its origins have been proposed, including: no strand biases for mutation and selection, strand inversion and selection of stem-loop structures. However, the relative contributions of these mechanisms to strand symmetry are not clear. In this article, we studied specifically the role of stem-loop potential of single-stranded DNA in strand symmetry. RESULTS We analyzed the complete genomes of 90 prokaryotes. We found that most oligonucleotides (pentanucleotides and higher) do not have a reverse complement in close proximity in the genomic sequences. Combined with further analysis, we conclude that the contribution of the widespread stem-loop potential of single-stranded genomic DNA to the formation and maintenance of strand symmetry would be very limited, at least for higher-order oligonucleotides. Therefore, other possible causes for strand symmetry must be taken into account to a deeper degree.
Collapse
Affiliation(s)
- Shang-Hong Zhang
- The Key Laboratory of Gene Engineering of Ministry of Education, and Biotechnology Research Center, Sun Yat-sen University, Guangzhou 510275, China.
| | | |
Collapse
|
32
|
Powdel BR, Satapathy SS, Kumar A, Jha PK, Buragohain AK, Borah M, Ray SK. A study in entire chromosomes of violations of the intra-strand parity of complementary nucleotides (Chargaff's second parity rule). DNA Res 2009; 16:325-43. [PMID: 19861381 PMCID: PMC2780954 DOI: 10.1093/dnares/dsp021] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Chargaff's rule of intra-strand parity (ISP) between complementary mono/oligonucleotides in chromosomes is well established in the scientific literature. Although a large numbers of papers have been published citing works and discussions on ISP in the genomic era, scientists are yet to find all the factors responsible for such a universal phenomenon in the chromosomes. In the present work, we have tried to address the issue from a new perspective, which is a parallel feature to ISP. The compositional abundance values of mono/oligonucleotides were determined in all non-overlapping sub-chromosomal regions of specific size. Also the frequency distributions of the mono/oligonucleotides among the regions were compared using the Kolmogorov–Smirnov test. Interestingly, the frequency distributions between the complementary mono/oligonucleotides revealed statistical similarity, which we named as intra-strand frequency distribution parity (ISFDP). ISFDP was observed as a general feature in chromosomes of bacteria, archaea and eukaryotes. Violation of ISFDP was also observed in several chromosomes. Chromosomes of different strains belonging a species in bacteria/archaea (Haemophilus influenza, Xylella fastidiosa etc.) and chromosomes of a eukaryote are found to be different among each other with respect to ISFDP violation. ISFDP correlates weakly with ISP in chromosomes suggesting that the latter one is not entirely responsible for the former. Asymmetry of replication topography and composition of forward-encoded sequences between the strands in chromosomes are found to be insufficient to explain the ISFDP feature in all chromosomes. This suggests that multiple factors in chromosomes are responsible for establishing ISFDP.
Collapse
Affiliation(s)
- B R Powdel
- 1Department of Mathematical Sciences, Tezpur University, Tezpur, Assam 784 028, India
| | | | | | | | | | | | | |
Collapse
|
33
|
Scherrer and Jost’s symposium: the gene concept in 2008. Theory Biosci 2009; 128:157-61. [DOI: 10.1007/s12064-009-0071-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2008] [Accepted: 02/03/2009] [Indexed: 10/20/2022]
|
34
|
Wang Y, Leung FCC. Comparative genomic study reveals a transition from TA richness in invertebrates to GC richness in vertebrates at CpG flanking sites: an indication for context-dependent mutagenicity of methylated CpG sites. GENOMICS PROTEOMICS & BIOINFORMATICS 2009; 6:144-54. [PMID: 19329065 PMCID: PMC5054122 DOI: 10.1016/s1672-0229(09)60002-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Vertebrate genomes are characterized with CpG deficiency, particularly for GC-poor regions. The GC content-related CpG deficiency is probably caused by context-dependent deamination of methylated CpG sites. This hypothesis was examined in this study by comparing nucleotide frequencies at CpG flanking positions among invertebrate and vertebrate genomes. The finding is a transition of nucleotide preference of 5′ T to 5′ A at the invertebrate-vertebrate boundary, indicating that a large number of CpG sites with 5′ Ts were depleted because of global DNA methylation developed in vertebrates. At genome level, we investigated CpG observed/expected (obs/exp) values in 500 bp fragments, and found that higher CpG obs/exp value is shown in GC-poor regions of invertebrate genomes (except sea urchin) but in GC-rich sequences of vertebrate genomes. We next compared GC content at CpG flanking positions with genomic average, showing that the GC content is lower than the average in invertebrate genomes, but higher than that in vertebrate genomes. These results indicate that although 5′ T and 5′ A are different in inducing deamination of methylated CpG sites, GC content is even more important in affecting the deamination rate. In all the tests, the results of sea urchin are similar to vertebrates perhaps due to its fractional DNA methylation. CpG deficiency is therefore suggested to be mainly a result of high mutation rates of methylated CpG sites in GC-poor regions.
Collapse
Affiliation(s)
- Yong Wang
- School of Biological Sciences and Genome Research Centre, The University of Hong Kong, Pokfulam, Hong Kong, China
| | | |
Collapse
|
35
|
Zhang Y. Relations between Shannon entropy and genome order index in segmenting DNA sequences. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2009; 79:041918. [PMID: 19518267 DOI: 10.1103/physreve.79.041918] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2008] [Revised: 03/14/2009] [Indexed: 05/27/2023]
Abstract
Shannon entropy H and genome order index S are used in segmenting DNA sequences. Zhang [Phys. Rev. E 72, 041917 (2005)] found that the two schemes are equivalent when a DNA sequence is converted to a binary sequence of S (strong H bond) and W (weak H bond). They left the mathematical proof to mathematicians who are interested in this issue. In this paper, a possible mathematical explanation is given. Moreover, we find that Chargaff parity rule 2 is the necessary condition of the equivalence, and the equivalence disappears when a DNA sequence is regarded as a four-symbol sequence. At last, we propose that S-2(-H) may be related to species evolution.
Collapse
Affiliation(s)
- Yi Zhang
- Department of Mathematics, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, People's Republic of China.
| |
Collapse
|
36
|
Microsatellites that violate Chargaff's second parity rule have base order-dependent asymmetries in the folding energies of complementary DNA strands and may not drive speciation. J Theor Biol 2008; 254:168-77. [DOI: 10.1016/j.jtbi.2008.05.013] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2008] [Revised: 05/16/2008] [Accepted: 05/16/2008] [Indexed: 11/21/2022]
|
37
|
Baudouin-Cornu P. [Stoichiometric, my dear Watson!]. Med Sci (Paris) 2008; 24:483-9. [PMID: 18466725 DOI: 10.1051/medsci/2008245483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Living organisms can be seen as complex chemicals interacting with their environment through chemical reactions. As such, they are subjected to the laws of stoichiometry: their constitutive elements (atoms) cannot be created (they must be found in their environment) nor destroyed. Acknowledging these rules led ecologists to the concept of "biological stoichiometry". In this review, I want to show that combining (1) the study of the elemental composition of biopolymers and (2) the ecologist's point of view, particularly the concept of biological stoichiometry, benefits molecular biology. In particular, this coupled approach unveils parts of the history of organisms, helps interpreting transcriptional profiles and sheds a different light on the growth of carcinogenic tumors.
Collapse
Affiliation(s)
- Peggy Baudouin-Cornu
- CEA, iBiTecS, SBIGeM, LBI, Bâtiment 142, CEA Saclay, 91191 Gif-sur-Yvette, France.
| |
Collapse
|
38
|
Hu J, Zhao X, Yu J. Replication-associated purine asymmetry may contribute to strand-biased gene distribution. Genomics 2007; 90:186-94. [PMID: 17532183 DOI: 10.1016/j.ygeno.2007.04.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2006] [Revised: 03/09/2007] [Accepted: 04/02/2007] [Indexed: 11/19/2022]
Abstract
Among prokaryotic genomes, the distribution of genes on the leading and lagging strands of the replication fork is known to be biased. Several hypotheses explaining this strand-biased gene distribution (SGD) have been proposed, but none have been tested or supported by sufficient data analyses. In this work we have analyzed 211 prokaryotic genomes in terms of compositional strand asymmetries and the presence or absence of polC and have found that SGD correlates not only with polC, but also with purine asymmetry (PAS). Furthermore, SGD, PAS, and polC are all features associated with a group of low-GC, gram-positive bacteria (Firmicutes). We conclude that PAS is a characteristic of organisms with a heterodimeric DNA polymerase III alpha-subunit constituted by polC and dnaE, which may play a direct role in the maintenance of SGD.
Collapse
Affiliation(s)
- Jianfei Hu
- College of Life Sciences, Peking University, Beijing 100871, China.
| | | | | |
Collapse
|
39
|
Forsdyke DR. Calculation of folding energies of single-stranded nucleic acid sequences: conceptual issues. J Theor Biol 2007; 248:745-53. [PMID: 17698086 DOI: 10.1016/j.jtbi.2007.07.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2007] [Revised: 07/05/2007] [Accepted: 07/09/2007] [Indexed: 12/16/2022]
Abstract
The stability of a folded single-stranded nucleic acid depends on the composition and order of its constituent bases and may be assessed by taking into account the pairing energies of its constituent dinucleotides. To assess the possible biological significance of a computed structure, Maizel and coworkers in the 1980s compared the energy of folding of a natural single-stranded RNA sequence with the energies of several versions of the same sequence produced by shuffling base order. However, in the 2000s many took as self-evident the view that shuffling at the mononucleotide level (single bases) was conceptual wrong and should be replaced by shuffling at the level of dinucleotides (retaining pairs of adjacent bases). Folding energies then became indistinguishable from those of corresponding shuffled sequences and doubt was cast on the importance of secondary structures. Nevertheless, some continued productively to employ the single base shuffling approach, the justification for which is the topic of this paper. Because dinucleotide pairing energies are needed to calculate structure, it does not follow that shuffling should not disrupt dinucleotides. Base shuffling allows determination of the relative contributions of base composition and base order to total folding energy. The potential for secondary structure arises from pressures acting at both DNA and RNA levels, and is abundant throughout genomes-with a probable primary role in recombination. Within a gene the potential can often be accommodated, and base order and composition work together (values have the same negative sign) in contributing to total folding energy. But sometimes protein-coding pressure on base order conflicts with the pressure for secondary structure and the values have opposite signs. Total folding energy can be deemed of potential biological significance when the average of several readings is significantly less than zero.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Department of Biochemistry, Queen's University, Kingston, Ontario, Canada K7L3N6.
| |
Collapse
|
40
|
Evolutionary implications of inversions that have caused intra-strand parity in DNA. BMC Genomics 2007; 8:160. [PMID: 17562011 PMCID: PMC1913523 DOI: 10.1186/1471-2164-8-160] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2007] [Accepted: 06/11/2007] [Indexed: 11/22/2022] Open
Abstract
Background Chargaff's rule of DNA base composition, stating that DNA comprises equal amounts of adenine and thymine (%A = %T) and of guanine and cytosine (%C = %G), is well known because it was fundamental to the conception of the Watson-Crick model of DNA structure. His second parity rule stating that the base proportions of double-stranded DNA are also reflected in single-stranded DNA (%A = %T, %C = %G) is more obscure, likely because its biological basis and significance are still unresolved. Within each strand, the symmetry of single nucleotide composition extends even further, being demonstrated in the balance of di-, tri-, and multi-nucleotides with their respective complementary oligonucleotides. Results Here, we propose that inversions are sufficient to account for the symmetry within each single-stranded DNA. Human mitochondrial DNA does not demonstrate such intra-strand parity, and we consider how its different functional drivers may relate to our theory. This concept is supported by the recent observation that inversions occur frequently. Conclusion Along with chromosomal duplications, inversions must have been shaping the architecture of genomes since the origin of life.
Collapse
|
41
|
Abstract
The unprecedented availability of genome sequences, coupled with user-friendly, web-enabled search and analysis tools allows practitioners to locate interesting genome features or sequence tracts with relative ease. Although many public model organism- and genome-mapping resources offer pre-mapped genome browsing, biologists also still need to perform de novo mapping analyses. Correct interpretation of the results in genome annotation databases or the results of one's individual analyses requires at least a conceptual understanding of the statistics and mechanics of genome searches, the expected results from statistical considerations, as well as the algorithms used by different search tools. This chapter introduces the basic statistical results that underlie mapping of nucleotide sequences to genomes and briefly surveys the common programs and algorithms that are used to perform genome mapping, all available via public hosted web sites. Selection of the appropriate sequence search and mapping tool will often demand tradeoffs in sensitivity and specificity relating to the statistics of the search.
Collapse
Affiliation(s)
- Josyf C Mychaleckyj
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| |
Collapse
|
42
|
Fournier PE, Suhre K, Fournous G, Raoult D. Estimation of prokaryote genomic DNA G+C content by sequencing universally conserved genes. Int J Syst Evol Microbiol 2006; 56:1025-1029. [PMID: 16627649 DOI: 10.1099/ijs.0.63903-0] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Determination of the DNA G+C content of prokaryotic genomes using traditional methods is time-consuming and results may vary from laboratory to laboratory, depending on the technique used. We explored the possibility of extrapolating the genomic DNA G+C content of prokaryotes from gene sequences. For this, 127 universally conserved genes were studied from 50 prokaryotic genomes in the Clusters of Orthologous Groups database. Of these, 57 genes were present as a single copy in the genomes of 157 different prokaryote species available in GenBank. There was a strong correlation [coefficient of determination (r2) >95 %] between the DNA G+C contents of 20 genes and their corresponding genomes. For each of the 157 prokaryotic genomes studied, the DNA G+C content of the 20 genes was used to determine a 'calculated' genome DNA G+C content (CGC) and this value was compared with the 'real' genome DNA G+C content (RGC). In order to select the most suitable gene for the determination of CGC values, we compared the r2 and median mol% difference between CGC and RGC as well as the sensitivity of each gene to provide CGC values for prokaryotic genomes that differ by less than 5 mol% from their RGC. The highly conserved ftsY gene (median size 1144 nucleotides), a vertically inherited member of the GTPase superfamily, showed the highest r2 value of 0.98, the smallest median mol% difference between CGC and RGC of 1.06 and a sensitivity of 100 %. Using ftsY DNA G+C content values, the CGC values of 100 genomes not included in the calculation of r2 differed by less than 5 mol% from their RGC values. These data suggest that the genomic DNA G+C content of prokaryotes may be estimated easily and reliably from the ftsY gene sequence.
Collapse
Affiliation(s)
- Pierre-Edouard Fournier
- Information Génomique et Structurale, CNRS UPR2589, Case 934, 163 Avenue de Luminy, 13288 Marseille cedex 09, France
| | - Karsten Suhre
- Information Génomique et Structurale, CNRS UPR2589, Case 934, 163 Avenue de Luminy, 13288 Marseille cedex 09, France
| | - Ghislain Fournous
- Unité des rickettsies, IFR 48, CNRS UMR 6020, Faculté de Médecine, Université de la Méditerranée, 27 Boulevard Jean Moulin, 13385 Marseille cedex 05, France
| | - Didier Raoult
- Unité des rickettsies, IFR 48, CNRS UMR 6020, Faculté de Médecine, Université de la Méditerranée, 27 Boulevard Jean Moulin, 13385 Marseille cedex 05, France
| |
Collapse
|
43
|
Lin FH, Forsdyke DR. Prokaryotes that grow optimally in acid have purine-poor codons in long open reading frames. Extremophiles 2006; 11:9-18. [PMID: 16957882 DOI: 10.1007/s00792-006-0005-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2006] [Accepted: 03/29/2006] [Indexed: 10/24/2022]
Abstract
In nucleic acids the N-glycosyl bonds between purines and their ribose sugar moities are broken under acid conditions. If one strand of a duplex DNA segment were more vulnerable to mutation than the other, then the archaeon Picrophilus torridus, with an optimum growth pH near zero, could have adapted by decreasing the purine content of that strand. Yet, P. torridus has an optimum growth temperature near 60 degrees C, and thermophiles prefer purine-rich codons. We found that, as in other thermophiles, high growth temperature correlates with the use of purine-rich codons. The extra purines are often in third, non-amino acid determining, codon positions. However, as in other acidophiles, as open reading frame lengths increase, there is increased use of purine-poor codons, particularly those without purines in second, amino acid-determining, codon positions. Thus, P. torridus can be seen as adapting (a) to temperature by increasing its purines in all open reading frames without greatly impacting protein amino acid compositions, and (b) to pH by decreasing purines in longer open reading frames, thereby potentially impacting protein amino acid compositions. It is proposed that longer open reading frames, being larger mutational targets, have become less vulnerable to depurination by virtue of pyrimidine for purine substitutions.
Collapse
Affiliation(s)
- Feng-Hsu Lin
- Department of Biochemistry, Queen's University, K7L3N6, Kingston, ON, Canada
| | | |
Collapse
|
44
|
Nikolaou C, Almirantis Y. Deviations from Chargaff's second parity rule in organellar DNA Insights into the evolution of organellar genomes. Gene 2006; 381:34-41. [PMID: 16893615 DOI: 10.1016/j.gene.2006.06.010] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2006] [Revised: 04/18/2006] [Accepted: 06/13/2006] [Indexed: 10/24/2022]
Abstract
Chargaff' s second parity rule (PR2) states that complementary nucleotides are met with almost equal frequencies in single stranded DNA. This is indeed the case for all bacterial and eukaryotic genomes studied, although the genomic patterns may differ among genomes in terms of local deviations. The behaviour of organellar genomes regarding the second parity rule has not been studied in detail up to now. We tested all available organellar genomes and found that a large number of mitochondrial genomes significantly deviate from the 2nd parity rule in contrast to the eubacterial ones, although mitochondria are believed to have evolved from proteobacteria. Moreover, mitochondria may be divided into three distinct sub-groups according to their overall deviation from the aforementioned parity rule. On the other hand, chloroplast genomes share the pattern of eubacterial genomes and, interestingly, so do mitochondrial genomes originating from plants and some fungi. The deviation from the second parity is found to be weakly correlated with the overall excess of purines against pyrimidines. The behaviour of the large majority of the mitochondrial genomes may be attributed to their distinct mode of replication, which is fundamentally different from the one of the eubacteria. Differences between chloroplast and mitochondrial genomes might also be explained on the basis of different replication mechanisms and correlated to differences in the genome size and compaction. The results presented herein may provide some insight into different modes of evolution of genome structure between chloroplasts and mitochondria.
Collapse
Affiliation(s)
- Christoforos Nikolaou
- Computational Genomics Group, Institute of Biology, NCSR Demokritos, 15310 Athens, Greece.
| | | |
Collapse
|
45
|
Dalevi D, Dubhashi D, Hermansson M. Bayesian classifiers for detecting HGT using fixed and variable order markov models of genomic signatures. Bioinformatics 2006; 22:517-22. [PMID: 16403797 DOI: 10.1093/bioinformatics/btk029] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Analyses of genomic signatures are gaining attention as they allow studies of species-specific relationships without involving alignments of homologous sequences. A naïve Bayesian classifier was built to discriminate between different bacterial compositions of short oligomers, also known as DNA words. The classifier has proven successful in identifying foreign genes in Neisseria meningitis. In this study we extend the classifier approach using either a fixed higher order Markov model (Mk) or a variable length Markov model (VLMk). RESULTS We propose a simple algorithm to lock a variable length Markov model to a certain number of parameters and show that the use of Markov models greatly increases the flexibility and accuracy in prediction to that of a naïve model. We also test the integrity of classifiers in terms of false-negatives and give estimates of the minimal sizes of training data. We end the report by proposing a method to reject a false hypothesis of horizontal gene transfer. AVAILABILITY Software and Supplementary information available at www.cs.chalmers.se/~dalevi/genetic_sign_classifiers/.
Collapse
Affiliation(s)
- Daniel Dalevi
- Department of Computing Science, Chalmers University, SE 412 96 Göteborg, Sweden.
| | | | | |
Collapse
|
46
|
Lee SJ, Mortimer JR, Forsdyke DR. Genomic conflict settled in favour of the species rather than the gene at extreme GC percentage values. ACTA ACUST UNITED AC 2005; 3:219-28. [PMID: 15702952 DOI: 10.2165/00822942-200403040-00003] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Wada and colleagues have shown that, whether prokaryotic or eukaryotic, each gene has a "homostabilising propensity" to adopt a relatively uniform GC percentage (GC%). Accordingly, each gene can be viewed as a "microisochore" occupying a discrete GC% niche of relatively uniform base composition amongst its fellow genes. Although first, second and third codon positions usually differ in GC%, each position tends to maintain a uniform, gene-specific GC% value. Thus, within a genome, genic GC% values can cover a wide range. This is most evident at third codon positions, which are least constrained by amino acid encoding needs. In 1991, Wada and colleagues further noted that, within a phylogenetic group, genomic GC% values can also cover a wide range. This is again most evident at third codon positions. Thus, the dispersion of GC% values among genes within a genome matches the dispersion of GC% values among genomes within a phylogenetic group. Wada described the context-independence of plots of different codon position GC% values against total GC% as a "universal" characteristic. Several studies relate this to recombination. We have confirmed that third codon positions usually relate more to the genes that contain them than to the species. However, in genomes with extreme GC% values (low or high), third codon positions tend to maintain a constant GC%, thus relating more to the species than to the genes that contain them. Genes in an extreme-GC% genome collectively span a smaller GC% range, and mainly rely on first and second codon positions for differentiation as "microisochores". Our results are consistent with the view that differences in GC% serve to recombinationally isolate both genome sectors (facilitating gene duplication) and genomes (facilitating genome duplication, e.g. speciation). In intermediate-GC% genomes, conflict between the needs of the species and the needs of individual genes within that species is minimal. However, in extreme-GC% genomes there is a conflict, which is settled in favour of the species (i.e. group selection) rather than in favour of the gene (genic selection).
Collapse
Affiliation(s)
- Shang-Jung Lee
- Genetics Graduate Program, University of British Columbia, Vancouver, British Columbia, Canada
| | | | | |
Collapse
|
47
|
Mitchell D, Bridge R. A test of Chargaff's second rule. Biochem Biophys Res Commun 2005; 340:90-4. [PMID: 16364245 DOI: 10.1016/j.bbrc.2005.11.160] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2005] [Accepted: 11/22/2005] [Indexed: 10/25/2022]
Abstract
In 1968, Chargaff and his colleagues discovered a rule in Bacillus subtilis: in single stranded DNA, A=T and C=G. This rule has since been confirmed many times in other bacterial and eukaryotic genomes. To the best of our knowledge, this rule has not been tested before in either single stranded DNA or RNA genomes. Over 3400 genomic sequences were examined here and included for the first time both double and single stranded DNA and RNA genomes. We found that: (1) with the exception of the organellar DNA, this parity rule holds for all types of double stranded DNA genomes and (2) that this rule fails to hold for other types of genomes. The parity rule appears to be a selective force on genome evolution and codon use.
Collapse
Affiliation(s)
- David Mitchell
- Vice Deanery of Genetics and Microbiology, Trinity College, Dublin, Ireland.
| | | |
Collapse
|
48
|
Rayment JH, Forsdyke DR. Amino acids as placeholders: base-composition pressures on protein length in malaria parasites and prokaryotes. ACTA ACUST UNITED AC 2005; 4:117-30. [PMID: 16128613 DOI: 10.2165/00822942-200504020-00005] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
BACKGROUND The composition and sequence of amino acids in a protein may serve the underlying needs of the nucleic acids that encode the protein (the genome phenotype). In extreme form, amino acids become mere placeholders inserted between functional segments or domains, and--apart from increasing protein length--playing no role in the specific function or structure of a protein (the conventional phenotype). METHODS We studied the genomes of two malarial parasites and 521 prokaryotes (144 complete) that differ widely in GC% and optimum growth temperature, comparing the base compositions of the protein coding regions and corresponding lengths (kilobases). RESULTS Malarial parasites show distinctive responses to base-compositional pressures that increase as protein lengths increase. A low-GC% species (Plasmodium falciparum) is likely to have more placeholder amino acids than an intermediate-GC% species (P. vivax), so that homologous proteins are longer. In prokaryotes, GC% is generally greater and AG% is generally less in open reading frames (ORFs) encoding long proteins. The increased GC% in long ORFs increases as species' GC% increases, and decreases as species' AG% increases. In low- and intermediate-GC% prokaryotic species, increases in ORF GC% as encoded proteins increase in length are largely accounted for by the base compositions of first and second (amino acid-determining) codon positions. In high-GC% prokaryotic species, first and third (non-amino acid-determining) codon positions play this role. CONCLUSION In low- and intermediate-GC% prokaryotes, placeholder amino acids are likely to be well defined, corresponding to codons enriched in G and/or C at first and second positions. In high-GC% prokaryotes, placeholder amino acids are likely to be less well defined. Increases in ORF GC% as encoded proteins increase in length are greater in mesophiles than in thermophiles, which are constrained from increasing protein lengths in response to base-composition pressures.
Collapse
Affiliation(s)
- Jonathan H Rayment
- Department of Biochemistry, Queen's University, Kingston, Ontario, Canada
| | | |
Collapse
|
49
|
Paz A, Kirzhner V, Nevo E, Korol A. Coevolution of DNA-interacting proteins and genome "dialect". Mol Biol Evol 2005; 23:56-64. [PMID: 16151189 DOI: 10.1093/molbev/msj007] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Several species-specific characteristics of genome organization that are superimposed on its coding aspects were proposed earlier, including genome signature (GS), genome accent, and compositional spectrum (CS). These notions could be considered as representatives of genome dialect (GD). We measured within the Proteobacteria some GD representatives, the relative abundance of dinucleotides or GS, the profiles of occurrence of 10 nucleotide words (CS), and the profiles of occurrence of 20 nucleotide words, using a degenerate two-letter alphabet (purine-pyrimidine compositional spectra [PPCS]). Here, we show that the evolutionary distances between DNA repair and recombination orthologous enzymes (especially those of the nucleotide excision repair system) are highly correlated with PPCS and GS distances. Orthologous proteins involved in structural or metabolic processes (control group) have significantly lower correlations of their evolutionary distances with the PPCS and GS distances. We hypothesize that the high correlation of the evolutionary distances of the DNA repair orthologous enzymes with their GD is a result of the coevolution of the DNA repair enzymes' structures and GDs. Species GDs could be substantially influenced by the function of DNA polymerase I (the bacterial major DNA repair polymerase). This might cause the correlation of species GDs differentiation with evolutionary changes of species DNA polymerase I. Simultaneously, the structures of DNA repair-recombination enzymes might be evolutionarily sensitive and responsive to changes in the structure of their substrate-the DNA (including those that are represented by GD differentiation). We further discuss the rationale and mechanisms of the hypothesized coevolution. We suggest that stress might be an important cause of changes in the repair-recombination genes and the GD and the trigger of the aforementioned coevolution process. Other triggers might be massive horizontal gene transfer and ecological selection.
Collapse
Affiliation(s)
- A Paz
- Institute of Evolution, University of Haifa, Mount Carmel, Haifa, Israel
| | | | | | | |
Collapse
|
50
|
Guy L, Roten CAH. Genometric analyses of the organization of circular chromosomes: a universal pressure determines the direction of ribosomal RNA genes transcription relative to chromosome replication. Gene 2004; 340:45-52. [PMID: 15556293 DOI: 10.1016/j.gene.2004.06.056] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2004] [Revised: 06/08/2004] [Accepted: 06/29/2004] [Indexed: 10/26/2022]
Abstract
Selective pressures related to gene function and chromosomal architecture are acting on genome sequences and can be revealed, for instance, by appropriate genometric methods. Cumulative nucleotide skew analyses, i.e., GC, TA, and ORF orientation skews, predict the location of the origin of DNA replication for 88 out of 100 completely sequenced bacterial chromosomes. These methods appear fully reliable for proteobacteria, Gram-positives, and spirochetes as well as for euryarchaeotes. Based on this genome architecture information, coorientation analyses reveal that in prokaryotes, ribosomal RNA (rRNA) genes encoding the small and large ribosomal subunits are all transcribed in the same direction as DNA replication; that is, they are located along the leading strand. This result offers a simple and reliable method for circumscribing the region containing the origin of the DNA replication and reveals a strong selective pressure acting on the orientation of rRNA genes similar to the weaker one acting on the orientation of ORFs. Rate of coorientation of transfer RNA (tRNA) genes with DNA replication appears to be taxon-specific. Analyzing nucleotide biases such as GC and TA skews of genes and plotting one against the other reveals a taxonomic clusterization of species. All ribosomal RNA genes are enriched in Gs and depleted in Cs, the only so far known exception being the rRNA genes of deuterostomian mitochondria. However, this exception can be explained by the fact that in the chromosome of the human mitochondrion, the model of the deuterostomian organelle genome, DNA replication, and rRNA transcription proceed in opposite directions. A general rule is deduced from prokaryotic and mitochondrial genomes: ribosomal RNA genes that are transcribed in the same direction as the DNA replication are enriched in Gs, and those transcribed in the opposite direction are depleted in Gs.
Collapse
MESH Headings
- Base Composition/genetics
- Chromosomes, Archaeal/genetics
- Chromosomes, Bacterial/genetics
- DNA Replication/genetics
- DNA, Circular/genetics
- DNA, Mitochondrial/genetics
- Databases, Nucleic Acid
- Genome, Archaeal
- Genome, Bacterial
- Humans
- Models, Genetic
- Phylogeny
- RNA, Ribosomal/genetics
- Replication Origin/genetics
- Transcription, Genetic/genetics
Collapse
Affiliation(s)
- Lionel Guy
- Département de Microbiologie Fondamentale, Faculté de Biologie et de Médecine, Université de Lausanne, CH-1015 Lausanne, Switzerland
| | | |
Collapse
|