1
|
Li W, Almirantis Y, Provata A. Revisiting the neutral dynamics derived limiting guanine-cytosine content using human de novo point mutation data. Meta Gene 2022. [DOI: 10.1016/j.mgene.2021.100994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
2
|
Mugal CF, Weber CC, Ellegren H. GC-biased gene conversion links the recombination landscape and demography to genomic base composition. Bioessays 2015; 37:1317-26. [DOI: 10.1002/bies.201500058] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Carina F. Mugal
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
| | - Claudia C. Weber
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
- Department of Biology; Center for Computational Genetics and Genomics; Temple University; Philadelphia PA USA
| | - Hans Ellegren
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
| |
Collapse
|
3
|
Costantini M. An overview on genome organization of marine organisms. Mar Genomics 2015; 24 Pt 1:3-9. [PMID: 25899406 DOI: 10.1016/j.margen.2015.03.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2015] [Revised: 03/17/2015] [Accepted: 03/17/2015] [Indexed: 11/16/2022]
Abstract
In this review we will concentrate on some general genome features of marine organisms and their evolution, ranging from vertebrate to invertebrates until unicellular organisms. Before genome sequencing, the ultracentrifugation in CsCl led to high resolution of mammalian DNA (without seeing at the sequence). The analytical profile of human DNA showed that the vertebrate genome is a mosaic of isochores, typically megabase-size DNA segments that belong in a small number of families characterized by different GC levels. The recent availability of a number of fully sequenced genomes allowed mapping very precisely the isochores, based on DNA sequences. Since isochores are tightly linked to biological properties such as gene density, replication timing and recombination, the new level of detail provided by the isochore map helped the understanding of genome structure, function and evolution. This led the current level of knowledge and to further insights.
Collapse
Affiliation(s)
- Maria Costantini
- Department of Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Naples, Italy.
| |
Collapse
|
4
|
Evolutionary consequences of DNA methylation on the GC content in vertebrate genomes. G3-GENES GENOMES GENETICS 2015; 5:441-7. [PMID: 25591920 PMCID: PMC4349097 DOI: 10.1534/g3.114.015545] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The genomes of many vertebrates show a characteristic variation in GC content. To explain its origin and evolution, mainly three mechanisms have been proposed: selection for GC content, mutation bias, and GC-biased gene conversion. At present, the mechanism of GC-biased gene conversion, i.e., short-scale, unidirectional exchanges between homologous chromosomes in the neighborhood of recombination-initiating double-strand breaks in favor for GC nucleotides, is the most widely accepted hypothesis. We here suggest that DNA methylation also plays an important role in the evolution of GC content in vertebrate genomes. To test this hypothesis, we investigated one mammalian (human) and one avian (chicken) genome. We used bisulfite sequencing to generate a whole-genome methylation map of chicken sperm and made use of a publicly available whole-genome methylation map of human sperm. Inclusion of these methylation maps into a model of GC content evolution provided significant support for the impact of DNA methylation on the local equilibrium GC content. Moreover, two different estimates of equilibrium GC content, one that neglects and one that incorporates the impact of DNA methylation and the concomitant CpG hypermutability, give estimates that differ by approximately 15% in both genomes, arguing for a strong impact of DNA methylation on the evolution of GC content. Thus, our results put forward that previous estimates of equilibrium GC content, which neglect the hypermutability of CpG dinucleotides, need to be reevaluated.
Collapse
|
5
|
Zaghloul L, Drillon G, Boulos RE, Argoul F, Thermes C, Arneodo A, Audit B. Large replication skew domains delimit GC-poor gene deserts in human. Comput Biol Chem 2014; 53 Pt A:153-65. [PMID: 25224847 DOI: 10.1016/j.compbiolchem.2014.08.020] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2014] [Indexed: 01/25/2023]
Abstract
Besides their large-scale organization in isochores, mammalian genomes display megabase-sized regions, spanning both genes and intergenes, where the strand nucleotide composition asymmetry decreases linearly, possibly due to replication activity. These so-called skew-N domains cover about a third of the human genome and are bordered by two skew upward jumps that were hypothesized to compose a subset of "master" replication origins active in the germline. Skew-N domains were shown to exhibit a particular gene organization. Genes with CpG-rich promoters likely expressed in the germline are over represented near the master replication origins, with large genes being co-oriented with replication fork progression, which suggests some coordination of replication and transcription. In this study, we describe another skew structure that covers ∼13% of the human genome and that is bordered by putative master replication origins similar to the ones flanking skew-N domains. These skew-split-N domains have a shape reminiscent of a N, but split in half, leaving in the center a region of null skew whose length increases with domain size. These central regions (median size ∼860 kb) have a homogeneous composition, i.e. both a null and constant skew and a constant and low GC content. They correspond to heterochromatin gene deserts found in low-GC isochores with an average gene density of 0.81 promoters/Mb as compared to 7.73 promoters/Mb genome wide. The analysis of epigenetic marks and replication timing data confirms that, in these late replicating heterochomatic regions, the initiation of replication is likely to be random. This contrasts with the transcriptionally active euchromatin state found around the bordering well positioned master replication origins. Altogether skew-N domains and skew-split-N domains cover about 50% of the human genome.
Collapse
Affiliation(s)
- Lamia Zaghloul
- Université de Lyon, F-69000 Lyon, France; Laboratoire de Physique, CNRS UMR 5672, Ecole Normale Supérieure de Lyon, F-69007 Lyon, France
| | - Guénola Drillon
- Université de Lyon, F-69000 Lyon, France; Laboratoire de Physique, CNRS UMR 5672, Ecole Normale Supérieure de Lyon, F-69007 Lyon, France
| | - Rasha E Boulos
- Université de Lyon, F-69000 Lyon, France; Laboratoire de Physique, CNRS UMR 5672, Ecole Normale Supérieure de Lyon, F-69007 Lyon, France
| | - Françoise Argoul
- Université de Lyon, F-69000 Lyon, France; Laboratoire de Physique, CNRS UMR 5672, Ecole Normale Supérieure de Lyon, F-69007 Lyon, France
| | - Claude Thermes
- Centre de Génétique Moléculaire, CNRS UPR 3404, Gif-sur-Yvette, France
| | - Alain Arneodo
- Université de Lyon, F-69000 Lyon, France; Laboratoire de Physique, CNRS UMR 5672, Ecole Normale Supérieure de Lyon, F-69007 Lyon, France
| | - Benjamin Audit
- Université de Lyon, F-69000 Lyon, France; Laboratoire de Physique, CNRS UMR 5672, Ecole Normale Supérieure de Lyon, F-69007 Lyon, France.
| |
Collapse
|
6
|
Li W, Sosa D, Jose MV. Human repetitive sequence densities are mostly negatively correlated with R/Y-based nucleosome-positioning motifs and positively correlated with W/S-based motifs. Genomics 2013; 101:125-33. [DOI: 10.1016/j.ygeno.2012.10.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2012] [Revised: 10/28/2012] [Accepted: 10/29/2012] [Indexed: 01/25/2023]
|
7
|
Frenkel S, Kirzhner V, Korol A. Organizational heterogeneity of vertebrate genomes. PLoS One 2012; 7:e32076. [PMID: 22384143 PMCID: PMC3288070 DOI: 10.1371/journal.pone.0032076] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2011] [Accepted: 01/23/2012] [Indexed: 01/06/2023] Open
Abstract
Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as "texts" using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS) analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers) in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter--GDM) allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences.
Collapse
Affiliation(s)
| | | | - Abraham Korol
- Department of Evolutionary and Environmental Biology and Institute of Evolution, University of Haifa, Mount Carmel, Haifa, Israel
| |
Collapse
|
8
|
On parameters of the human genome. J Theor Biol 2011; 288:92-104. [DOI: 10.1016/j.jtbi.2011.07.021] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2011] [Revised: 06/28/2011] [Accepted: 07/21/2011] [Indexed: 02/06/2023]
|
9
|
Provata A, Beck C. Multifractal analysis of nonhyperbolic coupled map lattices: application to genomic sequences. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2011; 83:066210. [PMID: 21797464 DOI: 10.1103/physreve.83.066210] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2011] [Indexed: 05/31/2023]
Abstract
Symbolic sequences generated by coupled map lattices (CMLs) can be used to model the chaotic-like structure of genomic sequences. In this study it is shown that diffusively coupled Chebyshev maps of order 4 (corresponding to a shift of four symbols) very closely reproduce the multifractal spectrum D(q) of human genomic sequences for coupling constant α = 0.35 ± 0.01 if q > 0. The presence of rare configurations causes deviations for q < 0, which disappear if the rare event statistics of the CML is modified. Such rare configurations are known to play specific functional roles in genomic sequences serving as promoters or regulatory elements.
Collapse
Affiliation(s)
- A Provata
- Institute of Physical Chemistry, National Center for Scientific Research Demokritos, GR-15310 Athens, Greece
| | | |
Collapse
|
10
|
Provata A, Katsaloulis P. Hierarchical multifractal representation of symbolic sequences and application to human chromosomes. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2010; 81:026102. [PMID: 20365626 DOI: 10.1103/physreve.81.026102] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2009] [Indexed: 05/29/2023]
Abstract
The two-dimensional density correlation matrix is constructed for symbolic sequences using contiguous segments of arbitrary size. The multifractal spectrum obtained from this matrix motif is shown to characterize the correlations in the symbolic sequences. This method is applied to entire human chromosomes, shuffled human chromosomes, reconstructed human genomic sequences and to artificial random sequences. It is shown that all human chromosomes have common characteristics in their multifractal spectrum and deviate substantially from random and uncorrelated sequences of the same size. Small deviations are observed between the longer and the shorter chromosomes, especially for the higher (in absolute values) statistical moments. The correlations are crucial for the form of the multifractal spectrum; surrogate shuffled chromosomes present randomlike spectrum, distinctly different from the actual chromosomes. Analytical approaches based on hierarchical superposition of tensor products show that retaining pair correlations in the sequences leads to a closer representation of the genomic multifractal spectra, especially in the region of negative exponents, due to the underrepresentation of various functional units (such as the cytosine-guanine CG combination and its complementary GC complex). Retaining higher-order correlations in the construction of the tensor products is a way to approach closer the structure of the multifractal spectra of the actual genomic sequences. This hierarchical approach is generic and is applicable to other correlated symbolic sequences.
Collapse
Affiliation(s)
- A Provata
- Institute of Physical Chemistry, National Center for Scientific Research Demokritos, 15310 Athens, Greece
| | | |
Collapse
|
11
|
Elhaik E, Graur D, Josic K. Comparative testing of DNA segmentation algorithms using benchmark simulations. Mol Biol Evol 2009; 27:1015-24. [PMID: 20018981 DOI: 10.1093/molbev/msp307] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Numerous segmentation methods for the detection of compositionally homogeneous domains within genomic sequences have been proposed. Unfortunately, these methods yield inconsistent results. Here, we present a benchmark consisting of two sets of simulated genomic sequences for testing the performances of segmentation algorithms. Sequences in the first set are composed of fixed-sized homogeneous domains, distinct in their between-domain guanine and cytosine (GC) content variability. The sequences in the second set are composed of a mosaic of many short domains and a few long ones, distinguished by sharp GC content boundaries between neighboring domains. We use these sets to test the performance of seven segmentation algorithms in the literature. Our results show that recursive segmentation algorithms based on the Jensen-Shannon divergence outperform all other algorithms. However, even these algorithms perform poorly in certain instances because of the arbitrary choice of a segmentation-stopping criterion.
Collapse
Affiliation(s)
- Eran Elhaik
- Department of Biology & Biochemistry, University of Houston, TX, USA.
| | | | | |
Collapse
|
12
|
Knoch TA, Göker M, Lohner R, Abuseiris A, Grosveld FG. Fine-structured multi-scaling long-range correlations in completely sequenced genomes--features, origin, and classification. EUROPEAN BIOPHYSICS JOURNAL: EBJ 2009; 38:757-79. [PMID: 19533117 PMCID: PMC2701493 DOI: 10.1007/s00249-009-0489-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/20/2009] [Revised: 05/05/2009] [Accepted: 05/13/2009] [Indexed: 11/26/2022]
Abstract
The sequential organization of genomes, i.e. the relations between distant base pairs and regions within sequences, and its connection to the three-dimensional organization of genomes is still a largely unresolved problem. Long-range power-law correlations were found using correlation analysis on almost the entire observable scale of 132 completely sequenced chromosomes of 0.5 × 106 to 3.0 × 107 bp from Archaea, Bacteria, Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster, and Homo sapiens. The local correlation coefficients show a species-specific multi-scaling behaviour: close to random correlations on the scale of a few base pairs, a first maximum from 40 to 3,400 bp (for Arabidopsis thaliana and Drosophila melanogaster divided in two submaxima), and often a region of one or more second maxima from 105 to 3 × 105 bp. Within this multi-scaling behaviour, an additional fine-structure is present and attributable to codon usage in all except the human sequences, where it is related to nucleosomal binding. Computer-generated random sequences assuming a block organization of genomes, the codon usage, and nucleosomal binding explain these results. Mutation by sequence reshuffling destroyed all correlations. Thus, the stability of correlations seems to be evolutionarily tightly controlled and connected to the spatial genome organization, especially on large scales. In summary, genomes show a complex sequential organization related closely to their three-dimensional organization.
Collapse
MESH Headings
- Algorithms
- Animals
- Arabidopsis/genetics
- Chromosomes/chemistry
- Chromosomes/genetics
- Chromosomes/ultrastructure
- Chromosomes, Fungal/chemistry
- Chromosomes, Fungal/genetics
- Chromosomes, Fungal/ultrastructure
- Chromosomes, Human/chemistry
- Chromosomes, Human/genetics
- Chromosomes, Human/ultrastructure
- Chromosomes, Plant/chemistry
- Chromosomes, Plant/genetics
- Chromosomes, Plant/ultrastructure
- Codon/chemistry
- Computer Simulation
- DNA/chemistry
- Drosophila melanogaster/genetics
- Genome
- Humans
- Models, Genetic
- Mutation
- Nucleosomes/chemistry
- Saccharomyces cerevisiae/genetics
- Schizosaccharomyces/genetics
- Sequence Analysis, DNA
Collapse
Affiliation(s)
- Tobias A Knoch
- Biophysical Genomics, Cell Biology and Genetics, Erasmus Medical Center, Rotterdam, The Netherlands.
| | | | | | | | | |
Collapse
|
13
|
Li W, Lee A, Gregersen PK. Copy-number-variation and copy-number-alteration region detection by cumulative plots. BMC Bioinformatics 2009; 10 Suppl 1:S67. [PMID: 19208171 PMCID: PMC2648736 DOI: 10.1186/1471-2105-10-s1-s67] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Background Regions with copy number variations (in germline cells) or copy number alteration (in somatic cells) are of great interest for human disease gene mapping and cancer studies. They represent a new type of mutation and are larger-scaled than the single nucleotide polymorphisms. Using genotyping microarray for copy number variation detection has become standard, and there is a need for improving analysis methods. Results We apply the cumulative plot to the detection of regions with copy number variation/alteration, on samples taken from a chronic lymphocytic leukemia patient. Two sets of whole-genome genotyping of 317 k single nucleotide polymorphisms, one from the normal cell and another from the cancer cell, are analyzed. We demonstrate the utility of cumulative plot in detecting a 9 Mb (9 ×106 bases) hemizygous deletion and 1 Mb homozygous deletion on chromosome 13. We also show the possibility to detect smaller copy number variation/alteration regions below the 100 kb range. Conclusion As a graphic tool, the cumulative plot is an intuitive and a scale-free (window-less) way for detecting copy number variation/alteration regions, especially when such regions are small.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S Boas Center for Genomics and Human Genetics, Feinstein Institute for Medical Research, North Shore LIJ Health System, Manhasset, NY 11030, USA.
| | | | | |
Collapse
|
14
|
Freudenberg J, Wang M, Yang Y, Li W. Partial correlation analysis indicates causal relationships between GC-content, exon density and recombination rate in the human genome. BMC Bioinformatics 2009; 10 Suppl 1:S66. [PMID: 19208170 PMCID: PMC2648766 DOI: 10.1186/1471-2105-10-s1-s66] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Several features are known to correlate with the GC-content in the human genome, including recombination rate, gene density and distance to telomere. However, by testing for pairwise correlation only, it is impossible to distinguish direct associations from indirect ones and to distinguish between causes and effects. RESULTS We use partial correlations to construct partially directed graphs for the following four variables: GC-content, recombination rate, exon density and distance-to-telomere. Recombination rate and exon density are unconditionally uncorrelated, but become inversely correlated by conditioning on GC-content. This pattern indicates a model where recombination rate and exon density are two independent causes of GC-content variation. CONCLUSION Causal inference and graphical models are useful methods to understand genome evolution and the mechanisms of isochore evolution in the human genome.
Collapse
Affiliation(s)
- Jan Freudenberg
- The Robert S Boas Center for Genomics and Human GeneticsFeinstein Institute for Medical Research, North Shore LIJ Health System, Manhasset, NY 11030, USA.
| | | | | | | |
Collapse
|
15
|
Haiminen N, Mannila H. Discovering isochores by least-squares optimal segmentation. Gene 2007; 394:53-60. [PMID: 17389148 DOI: 10.1016/j.gene.2007.01.028] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2006] [Revised: 01/16/2007] [Accepted: 01/22/2007] [Indexed: 10/23/2022]
Abstract
The isochore structure of a genome is observable by variation in the G+C (guanine and cytosine) content within and between the chromosomes. Describing the isochore structure of vertebrate genomes is a challenging task, and many computational methods have been developed and applied to it. Here we apply a well-known least-squares optimal segmentation algorithm to isochore discovery. The algorithm finds the best division of the sequence into k pieces, such that the segments are internally as homogeneous as possible. We show how this simple segmentation method can be applied to isochore discovery using as input the G+C content of sliding windows on the sequence. To evaluate the performance of this segmentation technique on isochore detection, we present results from segmenting previously studied isochore regions of the human genome. Detailed results on the MHC locus, on parts of chromosomes 21 and 22, and on a 100 Mb region from chromosome 1 are similar to previously suggested isochore structures. We also give results on segmenting all 22 autosomal human chromosomes. An advantage of this technique is that oversegmentation of G+C rich regions can generally be avoided. This is because the technique concentrates on greater global, instead of smaller local, differences in the sequence composition. The effect is further emphasized by a log-transformation of the data that lowers the high variance that is observed in G+C rich regions. We conclude that the least-squares optimal segmentation method is computationally efficient and yields results close to previous biologically motivated isochore structures.
Collapse
MESH Headings
- Algorithms
- Chromosomes, Human/genetics
- Chromosomes, Human, Pair 1/genetics
- Chromosomes, Human, Pair 21/genetics
- Chromosomes, Human, Pair 22/genetics
- Chromosomes, Human, Pair 6/genetics
- GC Rich Sequence
- Genome, Human
- Genomics/statistics & numerical data
- Humans
- Isochores/chemistry
- Isochores/genetics
- Least-Squares Analysis
- Major Histocompatibility Complex
Collapse
Affiliation(s)
- Niina Haiminen
- HIIT Basic Research Unit, Department of Computer Science, University of Helsinki, Finland.
| | | |
Collapse
|
16
|
Schmegner C, Hameister H, Vogel W, Assum G. Isochores and replication time zones: a perfect match. Cytogenet Genome Res 2007; 116:167-72. [PMID: 17317955 DOI: 10.1159/000098182] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2006] [Accepted: 11/10/2006] [Indexed: 11/19/2022] Open
Abstract
The mammalian genome is not a random sequence but shows a specific, evolutionarily conserved structure that becomes manifest in its isochore pattern. Isochores, i.e. stretches of DNA with a distinct sequence composition and thus a specific GC content, cause the chromosomal banding pattern. This fundamental level of genome organization is related to several functional features like the replication timing of a DNA sequence. GC richness of genomic regions generally corresponds to an early replication time during S phase. Recently, we demonstrated this interdependency on a molecular level for an abrupt transition from a GC-poor isochore to a GC-rich one in the NF1 gene region; this isochore boundary also separates late from early replicating chromatin. Now, we analyzed another genomic region containing four isochores separated by three sharp isochore transitions. Again, the GC-rich isochores were found to be replicating early, the GC-poor isochores late in S phase; one of the replication time zones was discovered to consist of one single replicon. At the boundaries between isochores, that all show no special sequence elements, the replication machinery stopped for several hours. Thus, our results emphasize the importance of isochores as functional genomic units, and of isochore transitions as genomic landmarks with a key function for chromosome organization and basic biological properties.
Collapse
Affiliation(s)
- C Schmegner
- Institut fur Humangenetik, Universitat Ulm, Ulm, Germany.
| | | | | | | |
Collapse
|
17
|
Schmegner C, Hoegel J, Vogel W, Assum G. The rate, not the spectrum, of base pair substitutions changes at a GC-content transition in the human NF1 gene region: implications for the evolution of the mammalian genome structure. Genetics 2006; 175:421-8. [PMID: 17057231 PMCID: PMC1775011 DOI: 10.1534/genetics.106.064386] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The human genome is composed of long stretches of DNA with distinct GC contents, called isochores or GC-content domains. A boundary between two GC-content domains in the human NF1 gene region is also a boundary between domains of early- and late-replicating sequences and of regions with high and low recombination frequencies. The perfect conservation of the GC-content distribution in this region between human and mouse demonstrates that GC-content stabilizing forces must act regionally on a fine scale at this locus. To further elucidate the nature of these forces, we report here on the spectrum of human SNPs and base pair substitutions between human and chimpanzee. The results show that the mutation rate changes exactly at the GC-content transition zone from low values in the GC-poor sequences to high values in GC-rich ones. The GC content of the GC-poor sequences can be explained by a bias in favor of GC > AT mutations, whereas the GC content of the GC-rich segment may result from a fixation bias in favor of AT > GC substitutions. This fixation bias may be explained by direct selection by the GC content or by biased gene conversion.
Collapse
|
18
|
Abstract
Isochores are large DNA segments (>>300 kb on average) that are characterized by an internal variation in GC well below the full variation seen in the mammalian genome. Precisely defining in terms of size and composition as well as mapping the isochores on human chromosomes have, however, remained largely unsolved problems. Here we used a very simple approach to segment the human chromosomes de novo, based on assessments of GC and its variation within and between adjacent regions. We obtain a complete coverage of the human genome (neglecting the remaining gaps) by approximately 3200 isochores, which may be visualized as the ultimate chromosomal bands. Isochores visibly belong to five families characterized by different GC levels, as expected from previous investigations. Since we previously showed that isochores are tightly linked to basic biological properties such as gene density, replication timing, and recombination, the new level of detail provided by the isochore map will help the understanding of genome structure, function, and evolution.
Collapse
Affiliation(s)
- Maria Costantini
- Laboratory of Molecular Evolution, Stazione Zoologica Anton Dohrn, 80121 Naples, Italy
| | - Oliver Clay
- Laboratory of Molecular Evolution, Stazione Zoologica Anton Dohrn, 80121 Naples, Italy
| | - Fabio Auletta
- Laboratory of Molecular Evolution, Stazione Zoologica Anton Dohrn, 80121 Naples, Italy
| | - Giorgio Bernardi
- Laboratory of Molecular Evolution, Stazione Zoologica Anton Dohrn, 80121 Naples, Italy
- Corresponding author.E-mail ; fax 39 081 2455807
| |
Collapse
|
19
|
Li W, Holste D. Universal 1/f noise, crossovers of scaling exponents, and chromosome-specific patterns of guanine-cytosine content in DNA sequences of the human genome. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2005; 71:041910. [PMID: 15903704 DOI: 10.1103/physreve.71.041910] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2004] [Revised: 10/28/2004] [Indexed: 05/02/2023]
Abstract
Spatial fluctuations of guanine and cytosine base content (GC%) are studied by spectral analysis for the complete set of human genomic DNA sequences. We find that (i) 1/ f(alpha) decay is universally observed in the power spectra of all 24 chromosomes, and (ii) the exponent alpha approximately 1 extends to about 10(7) bases, one order of magnitude longer than has previously been observed. We further find that (iii) almost all human chromosomes exhibit a crossover from alpha(1) approximately 1 (1/ f (alpha(1))) at lower frequency to alpha(2) <1 (1/ f (alpha(2))) at higher frequency, typically occurring at around 30,000-100,000 bases, while (iv) the crossover in this frequency range is virtually absent in human chromosome 22. In addition to the universal 1/ f(alpha) noise in power spectra, we find (v) several lines of evidence for chromosome-specific correlation structures, including a 500,000 base long oscillation in human chromosome 21. The universal 1/ f(alpha) spectrum in the human genome is further substantiated by a resistance to reduction in variance of guanine and cytosine content when the window size is increased.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, North Shore LIJ Institute for Medical Research, 350 Community Drive, Manhasset, New York 10030, USA.
| | | |
Collapse
|
20
|
Cohen N, Dagan T, Stone L, Graur D. GC composition of the human genome: in search of isochores. Mol Biol Evol 2005; 22:1260-72. [PMID: 15728737 DOI: 10.1093/molbev/msi115] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The isochore theory, proposed nearly three decades ago, depicts the mammalian genome as a mosaic of long, fairly homogeneous genomic regions that are characterized by their guanine and cytosine (GC) content. The human genome, for instance, was claimed to consist of five distinct isochore families: L1, L2, H1, H2, and H3, with GC contents of <37%, 37%-42%, 42%-47%, 47%-52%, and >52%, respectively. In this paper, we address the question of the validity of the isochore theory through a rigorous sequence-based analysis of the human genome. Toward this end, we adopt a set of six attributes that are generally claimed to characterize isochores and statistically test their veracity against the available draft sequence of the complete human genome. By the selection criteria used in this study: distinctiveness, homogeneity, and minimal length of 300 kb, we identify 1,857 genomic segments that warrant the label "isochore." These putative isochores are nonuniformly scattered throughout the genome and cover about 41% of the human genome. We found that a four-family model of putative isochores is the most parsimonious multi-Gaussian model that can be fitted to the empirical data. These families, however, are GC poor, with mean GC contents of 35%, 38%, 41%, and 48% and do not resemble the five isochore families in the literature. Moreover, due to large overlaps among the families, it is impossible to classify genomic segments into isochore families reliably, according to compositional properties alone. These findings undermine the utility of the isochore theory and seem to indicate that the theory may have reached the limits of its usefulness as a description of genomic compositional structures.
Collapse
Affiliation(s)
- Netta Cohen
- School of Computing, University of Leeds, Leeds, United Kingdom
| | | | | | | |
Collapse
|
21
|
Zhang CT, Zhang R. Isochore structures in the mouse genome. Genomics 2004; 83:384-94. [PMID: 14962664 DOI: 10.1016/j.ygeno.2003.09.011] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2003] [Accepted: 09/04/2003] [Indexed: 10/26/2022]
Abstract
The distribution of the G+C content in the mouse genome has been studied using a windowless technique. We have found that: (i). Abrupt variations of the G+C content from a GC-rich region to a GC-poor region, and vice versa, occur frequently at some sites along the sequence of the mouse genome. (ii). Long domains with relatively homogeneous G+C content (isochores) exist, which usually have sharp boundaries. Consequently, 28 isochores longer than 1 Mb have been identified in the mouse genome. A homogeneity index was used to quantify the variations of the G+C content within isochores. The precise boundaries, sizes, and G+C contents of these isochores have been determined. The windowless technique for the G+C content computation was also used to analyze the DNA sequence containing the mouse MHC region, which has a GC-poor isochore. This isochore is located at the central part of the sequence with boundaries at 468459 and 812716 bp, where the sequence is extended from the centromeric end to the telomeric end. In addition, the analysis of a segment of the rat genome shows that the rat genome also has clear isochore structures.
Collapse
Affiliation(s)
- Chun-Ting Zhang
- Department of Physics, Tianjin University, Tianjin 300072, China.
| | | |
Collapse
|
22
|
Mallon AM, Wilming L, Weekes J, Gilbert JGR, Ashurst J, Peyrefitte S, Matthews L, Cadman M, McKeone R, Sellick CA, Arkell R, Botcherby MRM, Strivens MA, Campbell RD, Gregory S, Denny P, Hancock JM, Rogers J, Brown SDM. Organization and evolution of a gene-rich region of the mouse genome: a 12.7-Mb region deleted in the Del(13)Svea36H mouse. Genome Res 2004; 14:1888-901. [PMID: 15364904 PMCID: PMC524412 DOI: 10.1101/gr.2478604] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Del(13)Svea36H (Del36H) is a deletion of approximately 20% of mouse chromosome 13 showing conserved synteny with human chromosome 6p22.1-6p22.3/6p25. The human region is lost in some deletion syndromes and is the site of several disease loci. Heterozygous Del36H mice show numerous phenotypes and may model aspects of human genetic disease. We describe 12.7 Mb of finished, annotated sequence from Del36H. Del36H has a higher gene density than the draft mouse genome, reflecting high local densities of three gene families (vomeronasal receptors, serpins, and prolactins) which are greatly expanded relative to human. Transposable elements are concentrated near these gene families. We therefore suggest that their neighborhoods are gene factories, regions of frequent recombination in which gene duplication is more frequent. The gene families show different proportions of pseudogenes, likely reflecting different strengths of purifying selection and/or gene conversion. They are also associated with relatively low simple sequence concentrations, which vary across the region with a periodicity of approximately 5 Mb. Del36H contains numerous evolutionarily conserved regions (ECRs). Many lie in noncoding regions, are detectable in species as distant as Ciona intestinalis, and therefore are candidate regulatory sequences. This analysis will facilitate functional genomic analysis of Del36H and provides insights into mouse genome evolution.
Collapse
Affiliation(s)
- Ann-Marie Mallon
- Medical Research Council Mammalian Genetics Unit, Harwell, Oxfordshire, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Holste D, Grosse I, Beirer S, Schieg P, Herzel H. Repeats and correlations in human DNA sequences. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2003; 67:061913. [PMID: 16241267 DOI: 10.1103/physreve.67.061913] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2003] [Indexed: 05/04/2023]
Abstract
We study the nucleotide-nucleotide mutual information function I(k) of the DNA sequences of the three completely sequenced human chromosomes 20, 21, and 22. We find in each human chromosome (i) the absence of the k=3 base pair (bp) sequence periodicity characteristic for protein coding regions, (ii) the absence of the k=10-11 bp sequence periodicity characteristic for both protein secondary structure and DNA bendability, and (iii) the presence of significant statistical dependencies at about k=135 bp and at about k=165 bp. We investigate to which degree the density and composition of interspersed repeats might explain these observed statistical patterns in all three human chromosomes. We use simple stochastic models to substitute known interspersed repeats and find by numerical studies that (iv) the presence of interspersed repeats dominates short-range correlations as measured by I(k) on the scale of several hundred base pairs in human chromosomes 20, 21, and 22. On the other hand, we find that (v) interspersed repeats contribute only weakly to long-range correlations due to the clustering of highly abundant Alu repeats.
Collapse
Affiliation(s)
- Dirk Holste
- Department of Biology, Massachusetts Institute of Technology, Cambridge 02139, USA.
| | | | | | | | | |
Collapse
|