1
|
Bohlin J, Pettersson JHO. Evolution of Genomic Base Composition: From Single Cell Microbes to Multicellular Animals. Comput Struct Biotechnol J 2019; 17:362-370. [PMID: 30949307 PMCID: PMC6429543 DOI: 10.1016/j.csbj.2019.03.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 02/28/2019] [Accepted: 03/01/2019] [Indexed: 01/07/2023] Open
Abstract
Whole genome sequencing (WGS) of thousands of microbial genomes has provided considerable insight into evolutionary mechanisms in the microbial world. While substantially fewer eukaryotic genomes are available for analyses the number is rapidly increasing. This mini-review summarizes broadly evolutionary dynamics of base composition in the different domains of life from the perspective of prokaryotes. Common and different evolutionary mechanisms influencing genomic base composition in eukaryotes and prokaryotes are discussed. The conclusion from the data currently available suggests that while there are similarities there are also striking differences in how genomic base composition has evolved within prokaryotes and eukaryotes. For instance, homologous recombination appears to increase GC content locally in eukaryotes due to a non-selective process termed GC-biased gene conversion (gBGC). For prokaryotes on the other hand, increase in genomic GC content seems to be driven by the environment and selection. We find that similar phenomena observed for some organisms in each respective domain may be caused by very different mechanisms: while gBGC and recombination rates appear to explain the negative correlation between GC3 (GC content based on the third codon nucleotides) and genome size in some eukaryotes uptake of AT rich DNA sequences is the main reason for a similar negative correlation observed in prokaryotes. We provide further examples that indicate that base composition in prokaryotes and eukaryotes have evolved under very different constraints.
Collapse
Affiliation(s)
- Jon Bohlin
- Norwegian Institute of Public Health, Division of Infection Control and Environmental Health, Department of Infectious Disease Epidemiology and Modelling, Lovisenberggata 8, 0456 Oslo, Norway.,Centre for Fertility and Health, Norwegian Institute of Public Health, PO-Box 222 Skøyen, N-0213 Oslo, Norway.,Norwegian University of Life Sciences, Faculty of Veterinary Sciences, Production Animal Clinical Sciences, Ullevålsveien 72, 0454 Oslo, Norway
| | - John H-O Pettersson
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School the University of Sydney, New South Wales 2006, Australia.,Zoonosis Science Center, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.,Public Health Agency of Sweden, Nobels vg 18, SE-171 82 Solna, Sweden
| |
Collapse
|
2
|
Błażej P, Mackiewicz D, Grabińska M, Wnętrzak M, Mackiewicz P. Optimization of amino acid replacement costs by mutational pressure in bacterial genomes. Sci Rep 2017; 7:1061. [PMID: 28432324 PMCID: PMC5430830 DOI: 10.1038/s41598-017-01130-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Accepted: 03/27/2017] [Indexed: 12/17/2022] Open
Abstract
Mutations are considered a spontaneous and random process, which is important component of evolution because it generates genetic variation. On the other hand, mutations are deleterious leading to non-functional genes and energetically costly repairs. Therefore, one can expect that the mutational pressure is optimized to simultaneously generate genetic diversity and preserve genetic information. To check if empirical mutational pressures are optimized in these ways, we compared matrices of nucleotide mutation rates derived from bacterial genomes with their best possible alternatives that minimized or maximized costs of amino acid replacements associated with differences in their physicochemical properties (e.g. hydropathy and polarity). It should be noted that the studied empirical nucleotide substitution matrices and the costs of amino acid replacements are independent because these matrices were derived from sites free of selection on amino acid properties and the amino acid costs assumed only amino acid physicochemical properties without any information about mutation at the nucleotide level. Obtained results indicate that the empirical mutational matrices show a tendency to minimize costs of amino acid replacements. It implies that bacterial mutational pressures can evolve to decrease consequences of amino acid substitutions. However, the optimization is not full, which enables generation of some genetic variability.
Collapse
Affiliation(s)
- Paweł Błażej
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland
| | - Dorota Mackiewicz
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland
| | - Małgorzata Grabińska
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland
| | - Małgorzata Wnętrzak
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland
| | - Paweł Mackiewicz
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland.
| |
Collapse
|
3
|
Dia N, Lavie L, Faye N, Méténier G, Yeramian E, Duroure C, Toguebaye BS, Frutos R, Niang MN, Vivarès CP, Ben Mamoun C, Cornillot E. Subtelomere organization in the genome of the microsporidian Encephalitozoon cuniculi: patterns of repeated sequences and physicochemical signatures. BMC Genomics 2016; 17:34. [PMID: 26744270 PMCID: PMC4704409 DOI: 10.1186/s12864-015-1920-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Accepted: 09/11/2015] [Indexed: 12/23/2022] Open
Abstract
Background The microsporidian Encephalitozoon cuniculi is an obligate intracellular eukaryotic pathogen with a small nuclear genome (2.9 Mbp) consisting of 11 chromosomes. Although each chromosome end is known to contain a single rDNA unit, the incomplete assembly of subtelomeric regions following sequencing of the genome identified only 3 of the 22 expected rDNA units. While chromosome end assembly remains a difficult process in most eukaryotic genomes, it is of significant importance for pathogens because these regions encode factors important for virulence and host evasion. Results Here we report the first complete assembly of E. cuniculi chromosome ends, and describe a novel mosaic structure of segmental duplications (EXT repeats) in these regions. EXT repeats range in size between 3.5 and 23.8 kbp and contain four multigene families encoding membrane associated proteins. Twenty-one recombination sites were identified in the sub-terminal region of E. cuniculi chromosomes. Our analysis suggests that these sites contribute to the diversity of chromosome ends organization through Double Strand Break repair mechanisms. The region containing EXT repeats at chromosome extremities can be differentiated based on gene composition, GC content, recombination sites density and chromosome landscape. Conclusion Together this study provides the complete structure of the chromosome ends of E. cuniculi GB-M1, and identifies important factors, which could play a major role in parasite diversity and host-parasite interactions. Comparison with other eukaryotic genomes suggests that terminal regions could be distinguished precisely based on gene content, genetic instability and base composition biais. The diversity of processes assciated with chromosome extremities and their biological consequences, as they are presented in the present study, emphasize the fact that great effort will be necessary in the future to characterize more carefully these regions during whole genome sequencing efforts. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1920-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ndongo Dia
- Unité de Virologie Médicale, Institut Pasteur de Dakar, 36 Avenue Pasteur, B.P. 220, Dakar, Sénégal.
| | - Laurence Lavie
- Clermont Université, Université Blaise Pascal, Laboratoire Microorganismes, Génome et Environnement, UMR 6023, CNRS, 63177, Aubière, France.
| | - Ngor Faye
- Laboratoire de Parasitologie Générale, Département de Biologie Animale, Faculté des Sciences et Technologies, Université Cheikh Anta Diop, Dakar, Sénégal.
| | - Guy Méténier
- Clermont Université, Université Blaise Pascal, Laboratoire Microorganismes, Génome et Environnement, UMR 6023, CNRS, 63177, Aubière, France.
| | - Edouard Yeramian
- Unité de Bioinformatique Structurale, UMR 3528 CNRS, Institut Pasteur, 25-28, rue du Dr Roux, 75015, Paris, France.
| | - Christophe Duroure
- Laboratoire de Météorologie Physique, OPGC UMR 6016 CNRS-Université Blaise Pascal, 24 Avenue des Landais, 63177, Aubière Cedex, France.
| | - Bhen S Toguebaye
- Laboratoire de Parasitologie Générale, Département de Biologie Animale, Faculté des Sciences et Technologies, Université Cheikh Anta Diop, Dakar, Sénégal.
| | - Roger Frutos
- CIRAD, UMR 17, Cirad-Ird, TA-A17/G, Campus International de Baillarguet, 34398, Montpellier, France.
| | - Mbayame N Niang
- Unité de Virologie Médicale, Institut Pasteur de Dakar, 36 Avenue Pasteur, B.P. 220, Dakar, Sénégal.
| | - Christian P Vivarès
- Clermont Université, Université Blaise Pascal, Laboratoire Microorganismes, Génome et Environnement, UMR 6023, CNRS, 63177, Aubière, France.
| | - Choukri Ben Mamoun
- Section of Infectious Disease and Department of Microbial Pathogenesis, Winchester Building WWW403D, Yale School of Medicine, 15 York St., New Haven, CT, 06520, USA.
| | - Emmanuel Cornillot
- Institut de Recherche en Cancérologie de Montpellier, IRCM - INSERM U1194 & Université de Montpellier & ICM, Institut régional du Cancer Montpellier, Campus Val d'Aurelle, 34298, Montpellier cedex 5, France. .,Institut de Biologie Computationnelle, IBC, Campus Saint Priest, 34090, Montpellier, France.
| |
Collapse
|
4
|
Abstract
Mutational heterogeneity must be taken into account when reconstructing evolutionary histories, calibrating molecular clocks, and predicting links between genes and disease. Selective pressures and various DNA transactions have been invoked to explain the heterogeneous distribution of genetic variation between species, within populations, and in tissue-specific tumors. To examine relationships between such heterogeneity and variations in leading- and lagging-strand replication fidelity and mismatch repair, we accumulated 40,000 spontaneous mutations in eight diploid yeast strains in the absence of selective pressure. We found that replicase error rates vary by fork direction, coding state, nucleosome proximity, and sequence context. Further, error rates and DNA mismatch repair efficiency both vary by mismatch type, responsible polymerase, replication time, and replication origin proximity. Mutation patterns implicate replication infidelity as one driver of variation in somatic and germline evolution, suggest mechanisms of mutual modulation of genome stability and composition, and predict future observations in specific cancers.
Collapse
|
5
|
Evertts AG, Coller HA. Back to the origin: reconsidering replication, transcription, epigenetics, and cell cycle control. Genes Cancer 2013; 3:678-96. [PMID: 23634256 DOI: 10.1177/1947601912474891] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
In bacteria, replication is a carefully orchestrated event that unfolds the same way for each bacterium and each cell division. The process of DNA replication in bacteria optimizes cell growth and coordinates high levels of simultaneous replication and transcription. In metazoans, the organization of replication is more enigmatic. The lack of a specific sequence that defines origins of replication has, until recently, severely limited our ability to define the organizing principles of DNA replication. This question is of particular importance as emerging data suggest that replication stress is an important contributor to inherited genetic damage and the genomic instability in tumors. We consider here the replication program in several different organisms including recent genome-wide analyses of replication origins in humans. We review recent studies on the role of cytosine methylation in replication origins, the role of transcriptional looping and gene gating in DNA replication, and the role of chromatin's 3-dimensional structure in DNA replication. We use these new findings to consider several questions surrounding DNA replication in metazoans: How are origins selected? What is the relationship between replication and transcription? How do checkpoints inhibit origin firing? Why are there early and late firing origins? We then discuss whether oncogenes promote cancer through a role in DNA replication and whether errors in DNA replication are important contributors to the genomic alterations and gene fusion events observed in cancer. We conclude with some important areas for future experimentation.
Collapse
|
6
|
Audit B, Zaghloul L, Baker A, Arneodo A, Chen CL, d'Aubenton-Carafa Y, Thermes C. Megabase replication domains along the human genome: relation to chromatin structure and genome organisation. Subcell Biochem 2013; 61:57-80. [PMID: 23150246 DOI: 10.1007/978-94-007-4525-4_3] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
In higher eukaryotes, the absence of specific sequence motifs, marking the origins of replication has been a serious hindrance to the understanding of (i) the mechanisms that regulate the spatio-temporal replication program, and (ii) the links between origins activation, chromatin structure and transcription. In this chapter, we review the partitioning of the human genome into megabased-size replication domains delineated as N-shaped motifs in the strand compositional asymmetry profiles. They collectively span 28.3% of the genome and are bordered by more than 1,000 putative replication origins. We recapitulate the comparison of this partition of the human genome with high-resolution experimental data that confirms that replication domain borders are likely to be preferential replication initiation zones in the germline. In addition, we highlight the specific distribution of experimental and numerical chromatin marks along replication domains. Domain borders correspond to particular open chromatin regions, possibly encoded in the DNA sequence, and around which replication and transcription are highly coordinated. These regions also present a high evolutionary breakpoint density, suggesting that susceptibility to breakage might be linked to local open chromatin fiber state. Altogether, this chapter presents a compartmentalization of the human genome into replication domains that are landmarks of the human genome organization and are likely to play a key role in genome dynamics during evolution and in pathological situations.
Collapse
|
7
|
Marsolier-Kergoat MC. Asymmetry indices for analysis and prediction of replication origins in eukaryotic genomes. PLoS One 2012; 7:e45050. [PMID: 23028755 PMCID: PMC3459929 DOI: 10.1371/journal.pone.0045050] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2012] [Accepted: 08/15/2012] [Indexed: 01/15/2023] Open
Abstract
DNA replication was recently shown to induce the formation of compositional skews in the genomes of the yeasts Saccharomyces cerevisiae and Kluyveromyces lactis. In this work, I have characterized further GC and TA skew variations in the vicinity of S. cerevisiae replication origins and termination sites, and defined asymmetry indices for origin analysis and prediction. The presence of skew jumps at some termination sites in the S. cerevisiae genome was established. The majority of S. cerevisiae replication origins are marked by an oriented consensus sequence called ACS, but no evidence could be found for asymmetric origin firing that would be linked to ACS orientation. Asymmetry indices related to GC and TA skews were defined, and a global asymmetry index IGC,TA was described. IGC,TA was found to strongly correlate with origin efficiency in S. cerevisiae and to allow the determination of sets of intergenes significantly enriched in origin loci. The generalized use of asymmetry indices for origin prediction in naive genomes implies the determination of the direction of the skews, i.e. the identification of which strand, leading or lagging, is enriched in G and which one is enriched in T. Recent work indicates that in Candida albicans and in several related species, centromeres contain early and efficient replication origins. It has been proposed that the skew jumps observed at these positions would reflect the activity of these origins, thus allowing to determine the direction of the skews in these genomes. However, I show here that the skew jumps at C. albicans centromeres are not related to replication and that replication-associated GC and TA skews in C. albicans have in fact the opposite directions of what was proposed.
Collapse
|
8
|
Arakawa K, Tomita M. Measures of compositional strand bias related to replication machinery and its applications. Curr Genomics 2012; 13:4-15. [PMID: 22942671 PMCID: PMC3269016 DOI: 10.2174/138920212799034749] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2011] [Revised: 09/10/2011] [Accepted: 09/20/2011] [Indexed: 11/22/2022] Open
Abstract
The compositional asymmetry of complementary bases in nucleotide sequences implies the existence of a mutational or selectional bias in the two strands of the DNA duplex, which is commonly shaped by strand-specific mechanisms in transcription or replication. Such strand bias in genomes, frequently visualized by GC skew graphs, is used for the computational prediction of transcription start sites and replication origins, as well as for comparative evolutionary genomics studies. The use of measures of compositional strand bias in order to quantify the degree of strand asymmetry is crucial, as it is the basis for determining the applicability of compositional analysis and comparing the strength of the mutational bias in different biological machineries in various species. Here, we review the measures of strand bias that have been proposed to date, including the ∆GC skew, the B1 index, the predictability score of linear discriminant analysis for gene orientation, the signal-to-noise ratio of the oligonucleotide bias, and the GC skew index. These measures have been predominantly designed for and applied to the analysis of replication-related mutational processes in prokaryotes, but we also give research examples in eukaryotes.
Collapse
Affiliation(s)
- Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Fujisawa 252-8520, Japan
| | | |
Collapse
|
9
|
Baker A, Julienne H, Chen CL, Audit B, d'Aubenton-Carafa Y, Thermes C, Arneodo A. Linking the DNA strand asymmetry to the spatio-temporal replication program. I. About the role of the replication fork polarity in genome evolution. THE EUROPEAN PHYSICAL JOURNAL. E, SOFT MATTER 2012; 35:92. [PMID: 23001787 DOI: 10.1140/epje/i2012-12092-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2012] [Revised: 08/08/2012] [Accepted: 08/21/2012] [Indexed: 06/01/2023]
Abstract
Two key cellular processes, namely transcription and replication, require the opening of the DNA double helix and act differently on the two DNA strands, generating different mutational patterns (mutational asymmetry) that may result, after long evolutionary time, in different nucleotide compositions on the two DNA strands (compositional asymmetry). We elaborate on the simplest model of neutral substitution rates that takes into account the strand asymmetries generated by the transcription and replication processes. Using perturbation theory, we then solve the time evolution of the DNA composition under strand-asymmetric substitution rates. In our minimal model, the compositional and substitutional asymmetries are predicted to decompose into a transcription- and a replication-associated components. The transcription-associated asymmetry increases in magnitude with transcription rate and changes sign with gene orientation while the replication-associated asymmetry is proportional to the replication fork polarity. These results are confirmed experimentally in the human genome, using substitution rates obtained by aligning the human and chimpanzee genomes using macaca and orangutan as outgroups, and replication fork polarity determined in the HeLa cell line as estimated from the derivative of the mean replication timing. When further investigating the dynamics of compositional skew evolution, we show that it is not at equilibrium yet and that its evolution is an extremely slow process with characteristic time scales of several hundred Myrs.
Collapse
Affiliation(s)
- A Baker
- Université de Lyon, Lyon, France
| | | | | | | | | | | | | |
Collapse
|
10
|
Agier N, Fischer G. The Mutational Profile of the Yeast Genome Is Shaped by Replication. Mol Biol Evol 2011; 29:905-13. [DOI: 10.1093/molbev/msr280] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
|
11
|
Marsolier-Kergoat MC, Goldar A. DNA replication induces compositional biases in yeast. Mol Biol Evol 2011; 29:893-904. [PMID: 21948086 DOI: 10.1093/molbev/msr240] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Asymmetries intrinsic to the process of DNA replication are expected to cause differences in the substitution patterns of the leading and the lagging strands and to induce compositional biases. These biases have been detected in the majority of eubacterial genomes but rarely in eukaryotes. Only in the human genome, the activity of a minority of replication origins seems to generate compositional biases. In this work, we provide evidence for replication-associated GC and TA skews in the genomes of two yeast species, Saccharomyces cerevisiae and Kluyveromyces lactis, whereas the data for the Schizosaccharomyces pombe genome are less conclusive. In contrast with the genomes of Homo sapiens and of the majority of eubacteria, the leading strand is enriched in cytosine and adenine in both S. cerevisiae and K. lactis. We observed significant variations across the interorigin intervals of several substitution rates in the S. cerevisiae lineage since its divergence from S. paradoxus. We also found that the S. cerevisiae genome is far from compositional equilibrium and that its present compositional biases are due to substitution rates operating before its divergence from S. paradoxus. Finally, we observed that replication and transcription tend to be cooriented in the S. cerevisiae genome, especially for genes encoding subunits of protein complexes. Taken together, our results suggest that replication-related compositional biases may be a feature of many eukaryotic genomes despite the stochastic nature of the firing of replication origins in these genomes.
Collapse
|
12
|
de Moura APS, Retkute R, Hawkins M, Nieduszynski CA. Mathematical modelling of whole chromosome replication. Nucleic Acids Res 2010; 38:5623-33. [PMID: 20457753 PMCID: PMC2943597 DOI: 10.1093/nar/gkq343] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
All chromosomes must be completely replicated prior to cell division, a requirement that demands the activation of a sufficient number of appropriately distributed DNA replication origins. Here we investigate how the activity of multiple origins on each chromosome is coordinated to ensure successful replication. We present a stochastic model for whole chromosome replication where the dynamics are based upon the parameters of individual origins. Using this model we demonstrate that mean replication time at any given chromosome position is determined collectively by the parameters of all origins. Combining parameter estimation with extensive simulations we show that there is a range of model parameters consistent with mean replication data, emphasising the need for caution in interpreting such data. In contrast, the replicated-fraction at time points through S phase contains more information than mean replication time data and allowed us to use our model to uniquely estimate many origin parameters. These estimated parameters enable us to make a number of predictions that showed agreement with independent experimental data, confirming that our model has predictive power. In summary, we demonstrate that a stochastic model can recapitulate experimental observations, including those that might be interpreted as deterministic such as ordered origin activation times.
Collapse
Affiliation(s)
- Alessandro P S de Moura
- Department of Physics, University of Aberdeen, Aberdeen AB24 3UE and School of Biology, University of Nottingham, Nottingham NG7 2UH, UK
| | | | | | | |
Collapse
|
13
|
Necsulea A, Guillet C, Cadoret JC, Prioleau MN, Duret L. The relationship between DNA replication and human genome organization. Mol Biol Evol 2009; 26:729-41. [PMID: 19126867 DOI: 10.1093/molbev/msn303] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Assessment of the impact of DNA replication on genome architecture in Eukaryotes has long been hampered by the scarcity of experimental data. Recent work, relying on computational predictions of origins of replication, suggested that replication might be a major determinant of gene organization in human (Huvet et al. 2007. Human gene organization driven by the coordination of replication and transcription. Genome Res. 17:1278-1285). Here, we address this question by analyzing the first large-scale data set of experimentally determined origins of replication in human: 283 origins identified in HeLa cells, in 1% of the genome covered by ENCODE regions (Cadoret et al. 2008. Genome-wide studies highlight indirect links between human replication origins and gene regulation. Proc Natl Acad Sci USA. 105:15837-15842). We show that origins of replication are not randomly distributed as they display significant overlap with promoter regions and CpG islands. The hypothesis of a selective pressure to avoid frontal collisions between replication and transcription polymerases is not supported by experimental data as we find no evidence for gene orientation bias in the proximity of origins of replication. The lack of a significant orientation bias remains manifest even when considering only genes expressed at a high rate, or in a wide number of tissues, and is not affected by the regional replication timing. Gene expression breadth does not appear to be correlated with the distance from the origins of replication. We conclude that the impact of DNA replication on human genome organization is considerably weaker than previously proposed.
Collapse
|
14
|
Mackiewicz P, Biecek P, Mackiewicz D, Kiraga J, Baczkowski K, Sobczynski M, Cebrat S. Optimisation of Asymmetric Mutational Pressure and Selection Pressure Around the Universal Genetic Code. COMPUTATIONAL SCIENCE – ICCS 2008 2008. [DOI: 10.1007/978-3-540-69389-5_13] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
15
|
Touchon M, Rocha EPC. From GC skews to wavelets: a gentle guide to the analysis of compositional asymmetries in genomic data. Biochimie 2007; 90:648-59. [PMID: 17988781 DOI: 10.1016/j.biochi.2007.09.015] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2007] [Accepted: 09/21/2007] [Indexed: 12/29/2022]
Abstract
Compositional asymmetries are pervasive in DNA sequences. They are the result of the asymmetric interactions between DNA and cellular mechanisms such as replication and transcription. Here, we review many of the methods that have been proposed over the years to analyse compositional asymmetries in DNA sequences. Among these we list GC skews, oligonucleotide skews and wavelets, which among other uses have been extensively employed to delimitate origins and termini of replication in genomes. We also review the use of multivariate methods, such as factorial correspondence analysis, discriminant analysis and analysis of variance, which allow assigning compositional strand asymmetries to the different biological processes shaping sequence composition. Finally, we review methods that have been used to infer substitution matrices and allow understanding the mutational processes underlying strand asymmetry. We focus on replication asymmetries because they have been more thoroughly studied, but the methods may be adapted, and often are, to other problems. Although strand asymmetry has been studied more frequently through compositional skews of nucleotides or oligonucleotides, we recall that, depending on the goal of the analysis, other methods may be more appropriate to answer certain biological questions. We also refer to programs freely available to analyse strand asymmetry.
Collapse
Affiliation(s)
- Marie Touchon
- Atelier de Bioinformatique, Université Pierre et Marie Curie-Paris 6, Paris, France
| | | |
Collapse
|
16
|
Wang HF, Hou WR, Niu DK. Strand compositional asymmetries in vertebrate large genes. Mol Biol Rep 2007; 35:163-9. [PMID: 17420956 DOI: 10.1007/s11033-007-9066-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2006] [Accepted: 02/26/2007] [Indexed: 10/23/2022]
Abstract
Both transcription-associated and replication-associated strand compositional asymmetries have recently been shown in vertebrate genomes. In this paper, we illustrate that transcription-associated strand compositional asymmetries and replication-associated ones coexist in most vertebrate large genes, although in most case the former conceals the latter. Furthermore, we found that the transcription-associated strand compositional asymmetries of housekeeping genes are stronger than those of somatic cell expressed genes. Together with other evidence, we suggest that germline transcription-associated strand asymmetric mutations may be the main cause of the transcription-associated strand compositional asymmetries.
Collapse
Affiliation(s)
- Hai-Fang Wang
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | | | | |
Collapse
|
17
|
Hou WR, Wang HF, Niu DK. Replication-associated strand asymmetries in vertebrate genomes and implications for replicon size, DNA replication origin, and termination. Biochem Biophys Res Commun 2006; 344:1258-62. [PMID: 16650814 DOI: 10.1016/j.bbrc.2006.04.039] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2006] [Accepted: 04/17/2006] [Indexed: 11/16/2022]
Abstract
Strand compositional asymmetry has been observed in prokaryotes and used in predicting prokaryotic DNA replication origins and termini. However, it was not found in eukaryotic genomes by the same methods. We propose that transcription-associated strand asymmetries mask the replication-associated ones. By analyzing the nucleotide composition of intergenic sequences larger than 50 kb by cumulative skew diagrams (CSD), we found replication-associated strand asymmetry in vertebrate genomes. Furthermore, we found that the most common replicon sizes in vertebrates are 50-100 kb, and show evidence that the replication origin and termination regions of vertebrate genomes range from a discrete site to a broad zone.
Collapse
Affiliation(s)
- Wen-Ru Hou
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | | | | |
Collapse
|
18
|
Touchon M, Nicolay S, Audit B, Brodie of Brodie EB, d'Aubenton-Carafa Y, Arneodo A, Thermes C. Replication-associated strand asymmetries in mammalian genomes: toward detection of replication origins. Proc Natl Acad Sci U S A 2005; 102:9836-41. [PMID: 15985556 PMCID: PMC1174978 DOI: 10.1073/pnas.0500577102] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2005] [Indexed: 12/25/2022] Open
Abstract
In the course of evolution, mutations do not affect both strands of genomic DNA equally. This imbalance mainly results from asymmetric DNA mutation and repair processes associated with replication and transcription. In prokaryotes, prevalence of G over C and T over A is frequently observed in the leading strand. The sign of the resulting TA and GC skews changes abruptly when crossing replication-origin and termination sites, producing characteristic step-like transitions. In mammals, transcription-coupled skews have been detected, but so far, no bias has been associated with replication. Here, analysis of intergenic and transcribed regions flanking experimentally identified human replication origins and the corresponding mouse and dog homologous regions demonstrates the existence of compositional strand asymmetries associated with replication. Multiscale analysis of human genome skew profiles reveals numerous transitions that allow us to identify a set of 1,000 putative replication initiation zones. Around these putative origins, the skew profile displays a characteristic jagged pattern also observed in mouse and dog genomes. We therefore propose that in mammalian cells, replication termination sites are randomly distributed between adjacent origins. Taken together, these analyses constitute a step toward genome-wide studies of replication mechanisms.
Collapse
Affiliation(s)
- Marie Touchon
- Centre de Génétique Moléculaire, Centre National de la Recherche Scientifique, Allée de la Terrasse, 91198 Gif-sur-Yvette, France
| | | | | | | | | | | | | |
Collapse
|
19
|
Brodie Of Brodie EB, Nicolay S, Touchon M, Audit B, d'Aubenton-Carafa Y, Thermes C, Arneodo A. From DNA sequence analysis to modeling replication in the human genome. PHYSICAL REVIEW LETTERS 2005; 94:248103. [PMID: 16090582 DOI: 10.1103/physrevlett.94.248103] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2004] [Indexed: 05/03/2023]
Abstract
We explore the large-scale behavior of nucleotide compositional strand asymmetries along human chromosomes. As we observe for 7 of 9 origins of replication experimentally identified so far, the (TA+GC) skew displays rather sharp upward jumps, with a linear decreasing profile in between two successive jumps. We present a model of replication with well positioned replication origins and random terminations that accounts for the observed characteristic serrated skew profiles. We succeed in identifying 287 pairs of putative adjacent replication origins with an origin spacing approximately 1-2 Mbp that are likely to correspond to replication foci observed in interphase nuclei and recognized as stable structures that persist throughout subsequent cell generations.
Collapse
Affiliation(s)
- E B Brodie Of Brodie
- Laboratoire Joliot-Curie (CNRS), Ecole Normale Supérieure de Lyon, 46 Allée d'Italie, 69364 Lyon Cedex 07, France
| | | | | | | | | | | | | |
Collapse
|
20
|
Dudkiewicz M, Mackiewicz P, Mackiewicz D, Kowalczuk M, Nowicka A, Polak N, Smolarczyk K, Banaszak J, Dudek MR, Cebrat S. Higher mutation rate helps to rescue genes from the elimination by selection. Biosystems 2004; 80:193-9. [PMID: 15823418 DOI: 10.1016/j.biosystems.2004.11.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2004] [Revised: 06/17/2004] [Accepted: 11/23/2004] [Indexed: 11/26/2022]
Abstract
Directional mutation pressure associated with replication processes is the main cause of the asymmetry between the leading and lagging DNA strands in bacterial genomes. On the other hand, the asymmetry between sense and antisense strands of protein coding sequences is a result of both mutation and selection pressures. Thus, there are two different ways of superposition of the sense strand, on the leading or lagging strand. Besides many other implications of these two possible situations, one seems to be very important - because of the asymmetric replication-associated mutation pressure, the mutation rate of genes depends on their location. Using Monte Carlo methods, we have simulated, under experimentally determined directional mutation pressure, the divergence rate and the elimination rate of genes depending on their location in respect to the leading/lagging DNA strands in the asymmetric prokaryotic genome. We have found that the best survival strategy for the majority of genes is to sometimes switch between DNA strands. Paradoxically, this strategy results in higher substitution rates but remains in agreement with observations in bacterial genomes that such inversions are very frequent and divergence rate between homologs lying on different DNA strands is very high.
Collapse
Affiliation(s)
- Malgorzata Dudkiewicz
- Institute of Genetics and Microbiology, University of Wrocław, ul. Przybyszewskiego, Wrocław, Poland
| | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Touchon M, Arneodo A, d'Aubenton-Carafa Y, Thermes C. Transcription-coupled and splicing-coupled strand asymmetries in eukaryotic genomes. Nucleic Acids Res 2004; 32:4969-78. [PMID: 15388799 PMCID: PMC521644 DOI: 10.1093/nar/gkh823] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Under no-strand bias conditions, each genomic DNA strand should present equimolarities of A and T and of G and C. Deviations from these rules are attributed to asymmetric properties intrinsic to DNA mutation-repair processes. In bacteria, strand biases are associated with replication or transcription. In eukaryotes, recent studies demonstrate that human genes present transcription-coupled biases that might reflect transcription-coupled repair processes. Here, we study strand asymmetries in intron sequences of evolutionarily distant eukaryotes, and show that two superimposed intron biases can be distinguished. (i) Biases that are maximum at intron extremities and decrease over large distances to zero values in internal regions, possibly reflecting interactions between pre-mRNA and splicing machinery; these extend over approximately 0.5 kb in mammals and Arabidopsis thaliana, and over 1 kb in Caenorhabditis elegans and Drosophila melanogaster. (ii) Biases that are constant along introns, possibly associated with transcription. Strikingly, in C.elegans, these latter biases extend over intergenic regions that separate co-oriented genes. When appropriately examined, all genomes present transcription-coupled excess of T over A in the coding strand. On the opposite, GC skews are either positive (mammals, plants) or negative (invertebrates). These results suggest that transcription-coupled asymmetries result from mutation-repair mechanisms that differ between vertebrates and invertebrates.
Collapse
Affiliation(s)
- Marie Touchon
- Centre de Génétique Moléculaire (CNRS), Allée de la Terrasse, 91198 Gif-sur-Yvette, France
| | | | | | | |
Collapse
|
22
|
Abstract
The replication of the chromosome is among the most essential functions of the bacterial cell and influences many other cellular mechanisms, from gene expression to cell division. Yet the way it impacts on the bacterial chromosome was not fully acknowledged until the availability of complete genomes allowed one to look upon genomes as more than bags of genes. Chromosomal replication includes a set of asymmetric mechanisms, among which are a division in a lagging and a leading strand and a gradient between early and late replicating regions. These differences are the causes of many of the organizational features observed in bacterial genomes, in terms of both gene distribution and sequence composition along the chromosome. When asymmetries or gradients increase in some genomes, e.g. due to a different composition of the DNA polymerase or to a higher growth rate, so do the corresponding biases. As some of the features of the chromosome structure seem to be under strong selection, understanding such biases is important for the understanding of chromosome organization and adaptation. Inversely, understanding chromosome organization may shed further light on questions relating to replication and cell division. Ultimately, the understanding of the interplay between these different elements will allow a better understanding of bacterial genetics and evolution.
Collapse
Affiliation(s)
- Eduardo P C Rocha
- Atelier de Bioinformatique, Université Pierre et Marie Curie, 12, Rue Cuvier, 75005 Paris, and Unité Génétique des Génomes Bactériens, Institut Pasteur, 28 rue du Dr Roux, 75724 Paris Cedex 15, France
| |
Collapse
|
23
|
Touchon M, Nicolay S, Arneodo A, d'Aubenton-Carafa Y, Thermes C. Transcription-coupled TA and GC strand asymmetries in the human genome. FEBS Lett 2004; 555:579-82. [PMID: 14675777 DOI: 10.1016/s0014-5793(03)01306-1] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Analysis of the whole set of human genes reveals that most of them present TA and GC skews, that these biases are correlated to each other and are specific to gene sequences, exhibiting sharp transitions between transcribed and non-transcribed regions. The GC asymmetries cannot be explained solely by a model previously proposed for (G+T) skew based on transitions measured in a small set of human genes. We propose that the GC skew results from additional transcription-coupled mutation process that would include transversions. During evolution, both processes acting on a large majority of genes in germline cells would have produced these transcription-coupled strand asymmetries.
Collapse
Affiliation(s)
- M Touchon
- Centre de Génétique Moléculaire, CNRS, Allée de la Terrasse, 91198, Gif-sur-Yvette, France
| | | | | | | | | |
Collapse
|
24
|
Abstract
Changes in technology in the past decade have had such an impact on the way that molecular evolution research is done that it is difficult now to imagine working in a world without genomics or the Internet. In 1992, GenBank was less than a hundredth of its current size and was updated every three months on a huge spool of tape. Homology searches took 30 minutes and rarely found a hit. Now it is difficult to find sequences with only a few homologs to use as examples for teaching bioinformatics. For molecular evolution researchers, the genomics revolution has showered us with raw data and the information revolution has given us the wherewithal to analyze it. In broad terms, the most significant outcome from these changes has been our newfound ability to examine the evolution of genomes as a whole, enabling us to infer genome-wide evolutionary patterns and to identify subsets of genes whose evolution has been in some way atypical.
Collapse
Affiliation(s)
- Kenneth H Wolfe
- Department of Genetics, Smurfit Institute, University of Dublin, Trinity College, Dublin 2, Ireland.
| | | |
Collapse
|
25
|
Li W, Bernaola-Galván P, Haghighi F, Grosse I. Applications of recursive segmentation to the analysis of DNA sequences. COMPUTERS & CHEMISTRY 2002; 26:491-510. [PMID: 12144178 DOI: 10.1016/s0097-8485(02)00010-4] [Citation(s) in RCA: 64] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Recursive segmentation is a procedure that partitions a DNA sequence into domains with a homogeneous composition of the four nucleotides A, C, G and T. This procedure can also be applied to any sequence converted from a DNA sequence, such as to a binary strong(G + C)/weak(A + T) sequence, to a binary sequence indicating the presence or absence of the dinucleotide CpG, or to a sequence indicating both the base and the codon position information. We apply various conversion schemes in order to address the following five DNA sequence analysis problems: isochore mapping, CpG island detection, locating the origin and terminus of replication in bacterial genomes, finding complex repeats in telomere sequences, and delineating coding and noncoding regions. We find that the recursive segmentation procedure can successfully detect isochore borders, CpG islands, and the origin and terminus of replication, but it needs improvement for detecting complex repeats as well as borders between coding and noncoding regions.
Collapse
Affiliation(s)
- Wentian Li
- Center for Genomics and Human Genetics, North Shore-LIJ Research Institute, Manhasset, NY 11030, USA.
| | | | | | | |
Collapse
|
26
|
Wang J, Guo FB. Base frequencies at the second codon position of Vibrio cholerae genes connect with protein function. Biochem Biophys Res Commun 2002; 290:81-4. [PMID: 11779136 DOI: 10.1006/bbrc.2001.6174] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
In this paper, the base frequency at the second codon position of the 3839 open reading frames (ORFs) in the Vibrio cholerae genome is analyzed. It is shown that according to the base content at this codon site, the ORFs can be divided into two clusters, each containing 673 and 3166 ORFs, respectively. ORFs in the smaller cluster usually have significantly higher T frequency than that of A at the second codon position. For the two clusters of ORFs, there are significant differences in the frequencies for 18 of the 20 amino acids in the encoding proteins. The two clusters of ORFs are also significantly different in their functions. More than half of the known genes involved in transport and binding are included in the smaller cluster, while few genes involved in amino acid biosynthesis, protein synthesis, and so on are included in this cluster.
Collapse
Affiliation(s)
- Ju Wang
- Department of Physics, Tianjin University, Tianjin 300072, China.
| | | |
Collapse
|
27
|
Lobry JR, Sueoka N. Asymmetric directional mutation pressures in bacteria. Genome Biol 2002; 3:RESEARCH0058. [PMID: 12372146 PMCID: PMC134625 DOI: 10.1186/gb-2002-3-10-research0058] [Citation(s) in RCA: 127] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2001] [Revised: 06/18/2002] [Accepted: 08/15/2002] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND When there are no strand-specific biases in mutation and selection rates (that is, in the substitution rates) between the two strands of DNA, the average nucleotide composition is theoretically expected to be A = T and G = C within each strand. Deviations from these equalities are therefore evidence for an asymmetry in selection and/or mutation between the two strands. By focusing on weakly selected regions that could be oriented with respect to replication in 43 out of 51 completely sequenced bacterial chromosomes, we have been able to detect asymmetric directional mutation pressures. RESULTS Most of the 43 chromosomes were found to be relatively enriched in G over C and T over A, and slightly depleted in G+C, in their weakly selected positions (intergenic regions and third codon positions) in the leading strand compared with the lagging strand. Deviations from A = T and G = C were highly correlated between third codon positions and intergenic regions, with a lower degree of deviation in intergenic regions, and were not correlated with overall genomic G+C content. CONCLUSIONS During the course of bacterial chromosome evolution, the effects of asymmetric directional mutation pressures are commonly observed in weakly selected positions. The degree of deviation from equality is highly variable among species, and within species is higher in third codon positions than in intergenic regions. The orientation of these effects is almost universal and is compatible in most cases with the hypothesis of an excess of cytosine deamination in the single-stranded state during DNA replication. However, the variation in G+C content between species is influenced by factors other than asymmetric mutation pressure.
Collapse
Affiliation(s)
- Jean R Lobry
- Laboratoire BBE CNRS UMR 5558, Université Claude Bernard, 43 Bd du 11 Novembre 1918, F-69622 Villeurbanne cedex, France.
| | | |
Collapse
|
28
|
Abstract
The codon usage in the Vibrio cholerae genome is analyzed in this paper. Although there are much more genes on the chromosome 1 than on chromosome 2, the codon usage patterns of genes on the two chromosomes are quite similar, indicating that the two chromosomes may have coexisted in the same cell for a very long history. Unlike the base frequency pattern observed in other genomes, the G+C content at the third codon position of the V. cholerae genome varies in a rather small interval. The most notable feature of codon usage of V. cholerae genome is that there is a fraction of genes show significant bias in base choice at the second codon position. The 2,006 known genes can be classified into two clusters according to the base frequencies at this position. The smaller cluster contains 227 genes, most of which code for proteins involved in transport and binding functions. The encoding products of these genes have significant bias in amino acids composition as compared with other genes. The codon usage patterns for the 1,836 function unknown ORFs are also analyzed, which is useful to study their functions.
Collapse
Affiliation(s)
- J Wang
- Department of Physics, Tianjin University, China
| | | |
Collapse
|
29
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2001. [PMCID: PMC2447185 DOI: 10.1002/cfg.55] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
30
|
|