1
|
Vogl C, Karapetiants M, Yıldırım B, Kjartansdóttir H, Kosiol C, Bergman J, Majka M, Mikula LC. Inference of genomic landscapes using ordered Hidden Markov Models with emission densities (oHMMed). BMC Bioinformatics 2024; 25:151. [PMID: 38627634 PMCID: PMC11021005 DOI: 10.1186/s12859-024-05751-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 03/18/2024] [Indexed: 04/19/2024] Open
Abstract
BACKGROUND Genomes are inherently inhomogeneous, with features such as base composition, recombination, gene density, and gene expression varying along chromosomes. Evolutionary, biological, and biomedical analyses aim to quantify this variation, account for it during inference procedures, and ultimately determine the causal processes behind it. Since sequential observations along chromosomes are not independent, it is unsurprising that autocorrelation patterns have been observed e.g., in human base composition. In this article, we develop a class of Hidden Markov Models (HMMs) called oHMMed (ordered HMM with emission densities, the corresponding R package of the same name is available on CRAN): They identify the number of comparably homogeneous regions within autocorrelated observed sequences. These are modelled as discrete hidden states; the observed data points are realisations of continuous probability distributions with state-specific means that enable ordering of these distributions. The observed sequence is labelled according to the hidden states, permitting only neighbouring states that are also neighbours within the ordering of their associated distributions. The parameters that characterise these state-specific distributions are inferred. RESULTS We apply our oHMMed algorithms to the proportion of G and C bases (modelled as a mixture of normal distributions) and the number of genes (modelled as a mixture of poisson-gamma distributions) in windows along the human, mouse, and fruit fly genomes. This results in a partitioning of the genomes into regions by statistically distinguishable averages of these features, and in a characterisation of their continuous patterns of variation. In regard to the genomic G and C proportion, this latter result distinguishes oHMMed from segmentation algorithms based in isochore or compositional domain theory. We further use oHMMed to conduct a detailed analysis of variation of chromatin accessibility (ATAC-seq) and epigenetic markers H3K27ac and H3K27me3 (modelled as a mixture of poisson-gamma distributions) along the human chromosome 1 and their correlations. CONCLUSIONS Our algorithms provide a biologically assumption free approach to characterising genomic landscapes shaped by continuous, autocorrelated patterns of variation. Despite this, the resulting genome segmentation enables extraction of compositionally distinct regions for further downstream analyses.
Collapse
Affiliation(s)
- Claus Vogl
- Department of Biomedical Sciences and Pathobiology, Vetmeduni Vienna, Veterinärplatz 1, Vienna, Austria.
- Vienna Graduate School of Population Genetics, Vienna, Austria.
| | - Mariia Karapetiants
- Department of Biomedical Sciences and Pathobiology, Vetmeduni Vienna, Veterinärplatz 1, Vienna, Austria
| | - Burçin Yıldırım
- Department of Biomedical Sciences and Pathobiology, Vetmeduni Vienna, Veterinärplatz 1, Vienna, Austria
- Vienna Graduate School of Population Genetics, Vienna, Austria
- Department of Ecology and Genetics, Plant Ecology and Evolution, Uppsala University, Uppsala, Sweden
| | - Hrönn Kjartansdóttir
- Department of Biomedical Sciences and Pathobiology, Vetmeduni Vienna, Veterinärplatz 1, Vienna, Austria
| | - Carolin Kosiol
- Centre for Biological Diversity, School of Biology, University of St Andrews, St Andrews, Scotland, UK
| | - Juraj Bergman
- Department of Biology, Centre for Biodiversity Dynamics in a Changing World (BIOCHANGE) & Section for Ecoinformatics and Biodiversity, Aarhus University, Aarhus, Denmark
| | | | - Lynette Caitlin Mikula
- Centre for Biological Diversity, School of Biology, University of St Andrews, St Andrews, Scotland, UK.
| |
Collapse
|
2
|
Slaying (Yet Again) the Brain-Eating Zombie Called the "Isochore Theory": A Segmentation Algorithm Used to "Confirm" the Existence of Isochores Creates "Isochores" Where None Exist. Int J Mol Sci 2022; 23:ijms23126558. [PMID: 35743002 PMCID: PMC9224211 DOI: 10.3390/ijms23126558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 06/07/2022] [Accepted: 06/09/2022] [Indexed: 01/27/2023] Open
Abstract
The isochore theory, which was proposed more than 40 years ago, depicts the mammalian genome as a mosaic of long, homogeneous regions that are characterized by their guanine and cytosine (GC) content. The human genome, for instance, was claimed to consist of five compositionally distinct isochore families. The isochore theory, in all its reincarnations, has been repeatedly falsified in the literature, yet isochore proponents have persistently resurrected it by either redefining isochores or by proposing alternative means of testing the theory. Here, I deal with the latest attempt to salvage this seemingly immortal zombie—a sequence segmentation method called isoSegmenter, which was claimed to “identify” isochores while at the same time disregarding the main characteristic attribute of isochores—compositional homogeneity. I used a series of controlled, randomly generated simulated sequences as a benchmark to study the performance of isoSegmenter. The main advantage of using simulated sequences is that, unlike real data, the exact start and stop point of any isochore or homogeneous compositional domain is known. Based on three key performance metrics—sensitivity, precision, and Jaccard similarity index—isoSegmenter was found to be vastly inferior to isoPlotter, a segmentation algorithm with no user input. Moreover, isoSegmenter identified isochores where none exist and failed to identify compositionally homogeneous sequences that were shorter than 100−200 kb. Will this zillionth refutation of “isochores” ensure a final and permanent entombment of the isochore theory? This author is not holding his breath.
Collapse
|
3
|
Arhondakis S, Milanesi M, Castrignanò T, Gioiosa S, Valentini A, Chillemi G. Evidence of distinct gene functional patterns in GC-poor and GC-rich isochores in Bos taurus. Anim Genet 2020; 51:358-368. [PMID: 32069522 DOI: 10.1111/age.12917] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/20/2020] [Indexed: 01/10/2023]
Abstract
Vertebrate genomes are mosaics of megabase-size DNA segments with a fairly homogeneous base composition, called isochores. They are divided into five families characterized by different guanine-cytosine (GC) levels and linked to several functional and structural properties. The increased availability of fully sequenced genomes allows the investigation of isochores in several species, assessing their level of conservation across vertebrate genomes. In this work, we characterized the isochores in Bos taurus using the ARS-UCD1.2 genome version. The comparison of our results with the well-studied human isochores and those of other mammals revealed a large conservation in isochore families, in number, average GC levels and gene density. Exceptions to the established increase in gene density with the increase in isochores (GC%) were observed for the following gene biotypes: tRNA, small nuclear RNA, small nucleolar RNA and pseudogenes that have their maximum number in H2 and H1 isochores. Subsequently, we assessed the ontology of all gene biotypes looking for functional classes that are statistically over- or under-represented in each isochore. Receptor activity and sensory perception pathways were significantly over-represented in L1 and L2 (GC-poor) isochores. This was also validated for the horse genome. Our analysis of housekeeping genes confirmed a preferential localization in GC-rich isochores, as reported in other species. Finally, we assessed the SNP distribution of a bovine high-density SNP chip across the isochores, finding a higher density in the GC-rich families, reflecting a potential bias in the chip, widely used for genetic selection and biodiversity studies.
Collapse
Affiliation(s)
- S Arhondakis
- Bioinformatics and Computational Science (BioCoS), Boniali 11-19, Chania, 73134, Crete, Greece
| | - M Milanesi
- Department of Support, Production and Animal Health, School of Veterinary Medicine, São Paulo State University, 16050-680 R. Clóvis Pestana 793 - Dona Amelia, Araçatuba, SP, Brazil.,International Atomic Energy Agency Collaborating Centre on Animal Genomics and Bioinformatics, 16050-680 R. Clóvis Pestana 793 - Dona Amelia, Araçatuba, SP, Brazil
| | - T Castrignanò
- SCAI - Super Computing Applications and Innovation Department, CINECA, Rome, Italy
| | - S Gioiosa
- SCAI - Super Computing Applications and Innovation Department, CINECA, Rome, Italy
| | - A Valentini
- Department for Innovation in Biological, Agro-food and Forest Systems, DIBAF, University of Tuscia, via S. Camillo de Lellis s.n.c, 01100, Viterbo, Italy
| | - G Chillemi
- Department for Innovation in Biological, Agro-food and Forest Systems, DIBAF, University of Tuscia, via S. Camillo de Lellis s.n.c, 01100, Viterbo, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, IBIOM, CNR, Bari, Italy
| |
Collapse
|
4
|
Cozzi P, Milanesi L, Bernardi G. Segmenting the Human Genome into Isochores. Evol Bioinform Online 2015; 11:253-61. [PMID: 26640363 PMCID: PMC4662427 DOI: 10.4137/ebo.s27693] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2015] [Revised: 08/25/2015] [Accepted: 08/31/2015] [Indexed: 02/06/2023] Open
Abstract
The human genome is a mosaic of isochores, which are long (>200 kb) DNA sequences that are fairly homogeneous in base composition and can be assigned to five families comprising 33%–59% of GC composition. Although the compartmentalized organization of the mammalian genome has been investigated for more than 40 years, no satisfactory automatic procedure for segmenting the genome into isochores is available so far. We present a critical discussion of the currently available methods and a new approach called isoSegmenter which allows segmenting the genome into isochores in a fast and completely automatic manner. This approach relies on two types of experimentally defined parameters, the compositional boundaries of isochore families and an optimal window size of 100 kb. The approach represents an improvement over the existing methods, is ideally suited for investigating long-range features of sequenced and assembled genomes, and is publicly available at https://github.com/bunop/isoSegmenter.
Collapse
Affiliation(s)
- Paolo Cozzi
- National Research Council, Institute for Biomedical Technologies, Segrate, Milan, Italy. ; Parco Tecnologico Padano, Lodi, Italy
| | - Luciano Milanesi
- National Research Council, Institute for Biomedical Technologies, Segrate, Milan, Italy
| | - Giorgio Bernardi
- National Research Council, Institute for Biomedical Technologies, Segrate, Milan, Italy. ; Science Department, Rome 3 University, Rome, Italy
| |
Collapse
|
5
|
Voelker RB, Cresko WA, Berglund JA. Computational approaches to mine publicly available databases. Methods Mol Biol 2014; 1126:325-340. [PMID: 24549675 DOI: 10.1007/978-1-62703-980-2_24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Publicly available sequence annotation data is a vital resource for researchers. Many types of information are available, including structural annotations (i.e., the locations and identities of genomic features) and functional annotations (e.g., gene expression and protein interactions). Annotation data is especially useful for interrogating Next-Gen sequencing data (e.g., identifying genomic features that are associated with mapped reads). Additionally, the vast amount of data that is available offers researchers the opportunity to mine existing data sets and make new discoveries. The ability to efficiently obtain, manipulate, and interrogate this data is a valuable and empowering skill. In this chapter, we introduce several primary data repositories and describe the most commonly encountered file formats. In order to highlight some of the key concepts, operations, and utilities that are involved in working with annotation data we provide a fully worked example of using annotations to answer some basic questions about a particular CHIP-seq data set.
Collapse
Affiliation(s)
- Rodger B Voelker
- Institutes of Molecular Biology and Ecology and Evolution, University of Oregon, Eugene, OR, USA
| | | | | |
Collapse
|
6
|
Voelker RB, Erkelenz S, Reynoso V, Schaal H, Berglund JA. Frequent gain and loss of intronic splicing regulatory elements during the evolution of vertebrates. Genome Biol Evol 2012; 4:659-74. [PMID: 22619362 PMCID: PMC3606033 DOI: 10.1093/gbe/evs051] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Splicing regulatory elements (SREs) are sequences bound by proteins that influence splicing of nearby splice sites. Constitutively spliced introns have evolved to utilize many different splicing factors. The evolutionary processes that influenced which splicing factors are used for splicing of individual introns are generally unclear. We demonstrate that in the lineage that gave rise to mammals, many introns lost U-rich sequences and gained G-rich sequences, both of which resemble known SREs. The apparent conversion of U-rich to G-rich SREs suggests that the associated splicing factors are functionally equivalent. In support of this we demonstrated that U-rich and G-rich SREs are both capable of promoting splicing of an SRE-dependent splicing reporter. Furthermore, we demonstrate, using the heterologous MS2 tethering system (bacterial MS2 coat fusion-protein and its RNA stem-loop binding site), that both the U-rich SRE-binding protein (TIA1) and the G-rich SRE-binding protein (HNRNPF) can promote splicing of the same intron. We also observed that gain of G-rich SREs is significantly associated with G/C-rich genomic isochores, suggesting that gain or loss of SREs was driven by the same processes that ultimately resulted in the formation of mammalian genomic isochores. We propose the following model for the gain and loss of mammalian SREs. Ancestral U-rich SREs located in genomic regions that were experiencing high rates of A/T to G/C conversion would have suffered frequent deleterious mutations. However, this same process resulted in increased formation of functionally equivalent G-rich SREs, and acquisition of new G-rich SREs decreased purifying selection on the U-rich SREs, which were then free to decay.
Collapse
Affiliation(s)
- Rodger B Voelker
- Institute of Molecular Biology, Department of Chemistry, University of Oregon, OR, USA
| | | | | | | | | |
Collapse
|
7
|
Carpena P, Oliver JL, Hackenberg M, Coronado AV, Barturen G, Bernaola-Galván P. High-level organization of isochores into gigantic superstructures in the human genome. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2011; 83:031908. [PMID: 21517526 DOI: 10.1103/physreve.83.031908] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2010] [Revised: 01/10/2011] [Indexed: 05/30/2023]
Abstract
Human DNA shows a complex structure with compositional features at many scales; the isochores--long DNA segments (~10⁵ bp) of relatively homogeneous guanine-cytosine (G + C) content--are the largest well-documented and well-analyzed compositional structures. However, we report here on the existence of a high-level compositional organization of isochores in the human genome. By using a segmentation algorithm incorporating the long-range correlations existing in human DNA, we find that every chromosome is composed of a few huge segments (~ 10⁷ bp) of relatively homogeneous G + C content, which become the largest compositional organization of the genome. Finally, we show evidence of the biological relevance of these superstructures, pointing to a large-scale functional organization of the human genome.
Collapse
Affiliation(s)
- P Carpena
- Departamento de Física Aplicada II, Universidad de Málaga, ES-29071, Málaga, Spain.
| | | | | | | | | | | |
Collapse
|
8
|
Zhang W, Wu W, Lin W, Zhou P, Dai L, Zhang Y, Huang J, Zhang D. Deciphering heterogeneity in pig genome assembly Sscrofa9 by isochore and isochore-like region analyses. PLoS One 2010; 5:e13303. [PMID: 20948965 PMCID: PMC2952626 DOI: 10.1371/journal.pone.0013303] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2010] [Accepted: 09/15/2010] [Indexed: 11/18/2022] Open
Abstract
Background The isochore, a large DNA sequence with relatively small GC variance, is one of the most important structures in eukaryotic genomes. Although the isochore has been widely studied in humans and other species, little is known about its distribution in pigs. Principal Findings In this paper, we construct a map of long homogeneous genome regions (LHGRs), i.e., isochores and isochore-like regions, in pigs to provide an intuitive version of GC heterogeneity in each chromosome. The LHGR pattern study not only quantifies heterogeneities, but also reveals some primary characteristics of the chromatin organization, including the followings: (1) the majority of LHGRs belong to GC-poor families and are in long length; (2) a high gene density tends to occur with the appearance of GC-rich LHGRs; and (3) the density of LINE repeats decreases with an increase in the GC content of LHGRs. Furthermore, a portion of LHGRs with particular GC ranges (50%–51% and 54%–55%) tend to have abnormally high gene densities, suggesting that biased gene conversion (BGC), as well as time- and energy-saving principles, could be of importance to the formation of genome organization. Conclusion This study significantly improves our knowledge of chromatin organization in the pig genome. Correlations between the different biological features (e.g., gene density and repeat density) and GC content of LHGRs provide a unique glimpse of in silico gene and repeats prediction.
Collapse
Affiliation(s)
- Wenqian Zhang
- Bioinformatics Center, College of Life Science, Northwest A&F University, Xianyang, Shaanxi, China
| | - Wenwu Wu
- Bioinformatics Center, College of Life Science, Northwest A&F University, Xianyang, Shaanxi, China
| | - Wenchao Lin
- Bioinformatics Center, College of Life Science, Northwest A&F University, Xianyang, Shaanxi, China
| | - Pengfang Zhou
- Bioinformatics Center, College of Life Science, Northwest A&F University, Xianyang, Shaanxi, China
| | - Li Dai
- Bioinformatics Center, College of Life Science, Northwest A&F University, Xianyang, Shaanxi, China
| | - Yang Zhang
- Investigation Group of Molecular Virology, Immunology, Oncology and Systems Biology, and Bioinformatics Center, College of Veterinary Medicine, Northwest A&F University, Xianyang, Shaanxi, China
| | - Jingfei Huang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- * E-mail: (DZ); (JH)
| | - Deli Zhang
- Investigation Group of Molecular Virology, Immunology, Oncology and Systems Biology, and Bioinformatics Center, College of Veterinary Medicine, Northwest A&F University, Xianyang, Shaanxi, China
- * E-mail: (DZ); (JH)
| |
Collapse
|
9
|
Elhaik E, Graur D, Josić K, Landan G. Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm. Nucleic Acids Res 2010; 38:e158. [PMID: 20571085 PMCID: PMC2926622 DOI: 10.1093/nar/gkq532] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
It has been suggested that the mammalian genome is composed mainly of long compositionally homogeneous domains. Such domains are frequently identified using recursive segmentation algorithms based on the Jensen–Shannon divergence. However, a common difficulty with such methods is deciding when to halt the recursive partitioning and what criteria to use in deciding whether a detected boundary between two segments is real or not. We demonstrate that commonly used halting criteria are intrinsically biased, and propose IsoPlotter, a parameter-free segmentation algorithm that overcomes such biases by using a simple dynamic halting criterion and tests the homogeneity of the inferred domains. IsoPlotter was compared with an alternative segmentation algorithm, DJS, using two sets of simulated genomic sequences. Our results show that IsoPlotter was able to infer both long and short compositionally homogeneous domains with low GC content dispersion, whereas DJS failed to identify short compositionally homogeneous domains and sequences with low compositional dispersion. By segmenting the human genome with IsoPlotter, we found that one-third of the genome is composed of compositionally nonhomogeneous domains and the remaining is a mixture of many short compositionally homogeneous domains and relatively few long ones.
Collapse
Affiliation(s)
- Eran Elhaik
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.
| | | | | | | |
Collapse
|
10
|
Elhaik E, Graur D, Josic K. Comparative testing of DNA segmentation algorithms using benchmark simulations. Mol Biol Evol 2009; 27:1015-24. [PMID: 20018981 DOI: 10.1093/molbev/msp307] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Numerous segmentation methods for the detection of compositionally homogeneous domains within genomic sequences have been proposed. Unfortunately, these methods yield inconsistent results. Here, we present a benchmark consisting of two sets of simulated genomic sequences for testing the performances of segmentation algorithms. Sequences in the first set are composed of fixed-sized homogeneous domains, distinct in their between-domain guanine and cytosine (GC) content variability. The sequences in the second set are composed of a mosaic of many short domains and a few long ones, distinguished by sharp GC content boundaries between neighboring domains. We use these sets to test the performance of seven segmentation algorithms in the literature. Our results show that recursive segmentation algorithms based on the Jensen-Shannon divergence outperform all other algorithms. However, even these algorithms perform poorly in certain instances because of the arbitrary choice of a segmentation-stopping criterion.
Collapse
Affiliation(s)
- Eran Elhaik
- Department of Biology & Biochemistry, University of Houston, TX, USA.
| | | | | |
Collapse
|
11
|
Abstract
Background Previous investigations from our laboratory were largely focused on the genome organization of vertebrates. We showed that these genomes are mosaics of isochores, megabase-size DNA sequences that are fairly homogeneous in base composition yet belong to a small number of families that cover a wide compositional spectrum. A question raised by these results concerned how far back in evolution an isochore organization of the eukaryotic genome arose. Results The present investigation deals with the compositional patterns of the invertebrates for which full genome sequences, or at least scaffolds, are available. We found that (i) a mosaic of isochores is the long-range organization of all the genomes that we investigated; (ii) the isochore families from the invertebrate genomes matched the corresponding families of vertebrates in GC levels; (iii) the relative amounts of isochore families were remarkably different for different genomes, except for those from phylogenetically close species, such as the Drosophilids. Conclusion This work demonstrates not only that an isochore organization is present in all metazoan genomes analyzed that included Nematodes, Arthropods among Protostomia, Echinoderms and Chordates among Deuterostomia, but also that the isochore families of invertebrates share GC levels with the corresponding families of vertebrates.
Collapse
|
12
|
Costantini M, Cammarano R, Bernardi G. The evolution of isochore patterns in vertebrate genomes. BMC Genomics 2009; 10:146. [PMID: 19344507 PMCID: PMC2678159 DOI: 10.1186/1471-2164-10-146] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2008] [Accepted: 04/03/2009] [Indexed: 01/23/2023] Open
Abstract
Background Previous work from our laboratory showed that (i) vertebrate genomes are mosaics of isochores, typically megabase-size DNA segments that are fairly homogeneous in base composition; (ii) isochores belong to a small number of families (five in the human genome) characterized by different GC levels; (iii) isochore family patterns are different in fishes/amphibians and mammals/birds, the latter showing GC-rich isochore families that are absent or very scarce in the former; (iv) there are two modes of genome evolution, a conservative one in which isochore patterns basically do not change (e.g., among mammalian orders), and a transitional one, in which they do change (e.g., between amphibians and mammals); and (v) isochores are tightly linked to a number of basic biological properties, such as gene density, gene expression, replication timing and recombination. Results The present availability of a number of fully sequenced genomes ranging from fishes to mammals allowed us to carry out investigations that (i) more precisely quantified our previous conclusions; (ii) showed that the different isochore families of vertebrate genomes are largely conserved in GC levels and dinucleotide frequencies, as well as in isochore size; and (iii) isochore family patterns can be either conserved or change within both warm- and cold-blooded vertebrates. Conclusion On the basis of the results presented, we propose that (i) the large conservation of GC levels and dinucleotide frequencies may reflect the conservation of chromatin structures; (ii) the conservation of isochore size may be linked to the role played by isochores in chromosome structure and replication; (iii) the formation, the maintainance and the changes of isochore patterns are due to natural selection.
Collapse
|