1
|
Vogl C, Karapetiants M, Yıldırım B, Kjartansdóttir H, Kosiol C, Bergman J, Majka M, Mikula LC. Inference of genomic landscapes using ordered Hidden Markov Models with emission densities (oHMMed). BMC Bioinformatics 2024; 25:151. [PMID: 38627634 PMCID: PMC11021005 DOI: 10.1186/s12859-024-05751-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 03/18/2024] [Indexed: 04/19/2024] Open
Abstract
BACKGROUND Genomes are inherently inhomogeneous, with features such as base composition, recombination, gene density, and gene expression varying along chromosomes. Evolutionary, biological, and biomedical analyses aim to quantify this variation, account for it during inference procedures, and ultimately determine the causal processes behind it. Since sequential observations along chromosomes are not independent, it is unsurprising that autocorrelation patterns have been observed e.g., in human base composition. In this article, we develop a class of Hidden Markov Models (HMMs) called oHMMed (ordered HMM with emission densities, the corresponding R package of the same name is available on CRAN): They identify the number of comparably homogeneous regions within autocorrelated observed sequences. These are modelled as discrete hidden states; the observed data points are realisations of continuous probability distributions with state-specific means that enable ordering of these distributions. The observed sequence is labelled according to the hidden states, permitting only neighbouring states that are also neighbours within the ordering of their associated distributions. The parameters that characterise these state-specific distributions are inferred. RESULTS We apply our oHMMed algorithms to the proportion of G and C bases (modelled as a mixture of normal distributions) and the number of genes (modelled as a mixture of poisson-gamma distributions) in windows along the human, mouse, and fruit fly genomes. This results in a partitioning of the genomes into regions by statistically distinguishable averages of these features, and in a characterisation of their continuous patterns of variation. In regard to the genomic G and C proportion, this latter result distinguishes oHMMed from segmentation algorithms based in isochore or compositional domain theory. We further use oHMMed to conduct a detailed analysis of variation of chromatin accessibility (ATAC-seq) and epigenetic markers H3K27ac and H3K27me3 (modelled as a mixture of poisson-gamma distributions) along the human chromosome 1 and their correlations. CONCLUSIONS Our algorithms provide a biologically assumption free approach to characterising genomic landscapes shaped by continuous, autocorrelated patterns of variation. Despite this, the resulting genome segmentation enables extraction of compositionally distinct regions for further downstream analyses.
Collapse
Affiliation(s)
- Claus Vogl
- Department of Biomedical Sciences and Pathobiology, Vetmeduni Vienna, Veterinärplatz 1, Vienna, Austria.
- Vienna Graduate School of Population Genetics, Vienna, Austria.
| | - Mariia Karapetiants
- Department of Biomedical Sciences and Pathobiology, Vetmeduni Vienna, Veterinärplatz 1, Vienna, Austria
| | - Burçin Yıldırım
- Department of Biomedical Sciences and Pathobiology, Vetmeduni Vienna, Veterinärplatz 1, Vienna, Austria
- Vienna Graduate School of Population Genetics, Vienna, Austria
- Department of Ecology and Genetics, Plant Ecology and Evolution, Uppsala University, Uppsala, Sweden
| | - Hrönn Kjartansdóttir
- Department of Biomedical Sciences and Pathobiology, Vetmeduni Vienna, Veterinärplatz 1, Vienna, Austria
| | - Carolin Kosiol
- Centre for Biological Diversity, School of Biology, University of St Andrews, St Andrews, Scotland, UK
| | - Juraj Bergman
- Department of Biology, Centre for Biodiversity Dynamics in a Changing World (BIOCHANGE) & Section for Ecoinformatics and Biodiversity, Aarhus University, Aarhus, Denmark
| | | | - Lynette Caitlin Mikula
- Centre for Biological Diversity, School of Biology, University of St Andrews, St Andrews, Scotland, UK.
| |
Collapse
|
2
|
Brovkina MV, Chapman MA, Holding ML, Clowney EJ. Emergence and influence of sequence bias in evolutionarily malleable, mammalian tandem arrays. BMC Biol 2023; 21:179. [PMID: 37612705 PMCID: PMC10463633 DOI: 10.1186/s12915-023-01673-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 08/01/2023] [Indexed: 08/25/2023] Open
Abstract
BACKGROUND The radiation of mammals at the extinction of the dinosaurs produced a plethora of new forms-as diverse as bats, dolphins, and elephants-in only 10-20 million years. Behind the scenes, adaptation to new niches is accompanied by extensive innovation in large families of genes that allow animals to contact the environment, including chemosensors, xenobiotic enzymes, and immune and barrier proteins. Genes in these "outward-looking" families are allelically diverse among humans and exhibit tissue-specific and sometimes stochastic expression. RESULTS Here, we show that these tandem arrays of outward-looking genes occupy AT-biased isochores and comprise the "tissue-specific" gene class that lack CpG islands in their promoters. Models of mammalian genome evolution have not incorporated the sharply different functions and transcriptional patterns of genes in AT- versus GC-biased regions. To examine the relationship between gene family expansion, sequence content, and allelic diversity, we use population genetic data and comparative analysis. First, we find that AT bias can emerge during evolutionary expansion of gene families in cis. Second, human genes in AT-biased isochores or with GC-poor promoters experience relatively low rates of de novo point mutation today but are enriched for non-synonymous variants. Finally, we find that isochores containing gene clusters exhibit low rates of recombination. CONCLUSIONS Our analyses suggest that tolerance of non-synonymous variation and low recombination are two forces that have produced the depletion of GC bases in outward-facing gene arrays. In turn, high AT content exerts a profound effect on their chromatin organization and transcriptional regulation.
Collapse
Affiliation(s)
- Margarita V Brovkina
- Graduate Program in Cellular and Molecular Biology, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Margaret A Chapman
- Neurosciences Graduate Program, University of Michigan Medical School, Ann Arbor, MI, USA
| | | | - E Josephine Clowney
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI, USA.
- Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
3
|
Lamolle G, Iriarte A, Musto H. Codon usage in the flatworm Schistosoma mansoni is shaped by the mutational bias towards A+T and translational selection, which increases GC-ending codons in highly expressed genes. Mol Biochem Parasitol 2021; 247:111445. [PMID: 34942292 DOI: 10.1016/j.molbiopara.2021.111445] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 12/14/2021] [Accepted: 12/17/2021] [Indexed: 11/30/2022]
Abstract
Schistosoma mansoni is a trematode flatworm that parasitizes humans and produces a disease called bilharzia. At the genomic level, it is characterized by a low genomic GC content and an "isochore-like" structure, where GC-richest regions, mainly placed at the extremes of the chromosomes, are interspersed with low GC-regions. Furthermore, the GC-richest regions are at the same time the gene-richest, and where the most heavily expressed genes are placed. Taking these features into account, we decided to reanalyze the codon usage of this flatworm. Our results show that a) when all genes are considered together, the strong mutational bias towards A + T leads to a predominance of A/T-ending codons, b) a multivariate analysis discriminates between highly and lowly expressed genes, c) the sequences expressed at highest levels display a significant increase in G/C-ending codons, d) when comparing the molecular distances with a closely related species the synonymous distance in highly expressed genes is significantly lower than in lowly expressed sequences. Therefore, we conclude that despite previous results, which were performed with a small sample of genes, codon usage in S. mansoni is the result of two forces that operate in opposite directions: while mutational bias leads to a predominance of A/T codons, translational selection, working at the level of speed, increment G/C ending triplets.
Collapse
Affiliation(s)
- Guillermo Lamolle
- Unidad de Genómica Evolutiva, Facultad de Ciencias, Universidad de la República, Iguá 4225, 11400 Montevideo, Uruguay
| | - Andrés Iriarte
- Laboratorio de Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Avenida A. Navarro 3051, 11600 Montevideo, Uruguay.
| | - Héctor Musto
- Unidad de Genómica Evolutiva, Facultad de Ciencias, Universidad de la República, Iguá 4225, 11400 Montevideo, Uruguay.
| |
Collapse
|
4
|
Federico C, Leotta CG, Bruno F, Longo AM, Owoka T, Tosi S, Saccone S. Nuclear Repositioning of the Non-Translocated HLXB9 Allele in the Leukaemia Cell Line GDM-1 Harbouring a t(6;7)(q23;q36). Cytogenet Genome Res 2017; 153:10-17. [PMID: 28965118 DOI: 10.1159/000480745] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/17/2017] [Indexed: 11/19/2022] Open
Abstract
Transcriptionally active and inactive topologically associated domains (TADs) occupy different areas in the cell nucleus, and chromosomal rearrangements relocating TADs could determine ectopic expression of the repositioned genes. In this study, we investigated the HLXB9 gene in a myeloid leukaemia cell line, GDM-1, known to harbour a rearrangement involving chromosome 7 with a breakpoint distal to HLXB9, highly expressed in these cells. We used FISH to target the regions involved in the translocation and to distinguish the translocated chromosome from the non-translocated one in interphase nuclei. Two-dimensional analysis of the interphase FISH data indicated that the 2 HLXB9 alleles had a different localisation in the cell nuclei, with the translocated allele consistently positioned in the nuclear periphery and the normal one in the more internal portion of the nucleus, known as the transcriptionally active compartment. Our data may indicate that HLXB9 transcripts in the GDM-1 cell line do not arise from the allele located in rearranged chromosome 7, suggesting that regulation of gene expression in cancer cells harbouring chromosomal translocations might be more complex than previously thought, paving the path to further investigations on mechanisms of gene expression.
Collapse
Affiliation(s)
- Concetta Federico
- Dipartimento di Scienze Biologiche, Geologiche e Ambientali, University of Catania, Catania, Italy
| | | | | | | | | | | | | |
Collapse
|
5
|
Costantini M, Musto H. The Isochores as a Fundamental Level of Genome Structure and Organization: A General Overview. J Mol Evol 2017; 84:93-103. [PMID: 28243687 DOI: 10.1007/s00239-017-9785-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Accepted: 02/15/2017] [Indexed: 11/30/2022]
Abstract
The recent availability of a number of fully sequenced genomes (including marine organisms) allowed to map very precisely the isochores, based on DNA sequences, confirming the results obtained before genome sequencing by the ultracentrifugation in CsCl. In fact, the analytical profile of human DNA showed that the vertebrate genome is a mosaic of isochores, typically megabase-size DNA segments that belong to a small number of families characterized by different GC levels. In this review, we will concentrate on some general genome features regarding the compositional organization from different organisms and their evolution, ranging from vertebrates to invertebrates until unicellular organisms. Since isochores are tightly linked to biological properties such as gene density, replication timing, and recombination, the new level of detail provided by the isochore map helped the understanding of genome structure, function, and evolution. All the findings reported here confirm the idea that the isochores can be considered as a "fundamental level of genome structure and organization." We stress that we do not discuss in this review the origin of isochores, which is still a matter of controversy, but we focus on well established structural and physiological aspects.
Collapse
Affiliation(s)
- Maria Costantini
- Department of Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Napoli, Italy.
| | - Héctor Musto
- Laboratorio de Organización y Evolución del Genoma, Unidad de Genómica Evolutiva, Facultad de Ciencias, 11400, Montevideo, Uruguay
| |
Collapse
|
6
|
Abstract
Epilepsy is a common complex disorder most frequently associated with psychiatric and neurological diseases. Massive parallel sequencing of individual or cohort genomes and exomes led the identification of several disease associated genes. We review here the candidate genes in epilepsy genetics with focus on exome and gene panel data. Together with the examination of brain expressed genes and post synaptic proteome the results show that: (1) Non-metabolic epilepsies and autism candidate genes tend to be AT-rich and (2) large transcript size and local AT-richness are characteristic features of genes involved in developmental brain disorders and synaptic functions. These results point to the preferential location of core epilepsy and autism candidate genes in late replicating, GC-poor chromosomal regions (isochores). These results indicate that the genomic alterations leading to some brain disorders are confined to responsive chromatin areas harboring brain critical genes.
Collapse
Affiliation(s)
- Kamel Jabbari
- Cologne Center for Genomics, University of Cologne, Cologne, Germany.
| | - Peter Nürnberg
- Cologne Center for Genomics, University of Cologne, Cologne, Germany
| |
Collapse
|
7
|
Costantini M. An overview on genome organization of marine organisms. Mar Genomics 2015; 24 Pt 1:3-9. [PMID: 25899406 DOI: 10.1016/j.margen.2015.03.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2015] [Revised: 03/17/2015] [Accepted: 03/17/2015] [Indexed: 11/16/2022]
Abstract
In this review we will concentrate on some general genome features of marine organisms and their evolution, ranging from vertebrate to invertebrates until unicellular organisms. Before genome sequencing, the ultracentrifugation in CsCl led to high resolution of mammalian DNA (without seeing at the sequence). The analytical profile of human DNA showed that the vertebrate genome is a mosaic of isochores, typically megabase-size DNA segments that belong in a small number of families characterized by different GC levels. The recent availability of a number of fully sequenced genomes allowed mapping very precisely the isochores, based on DNA sequences. Since isochores are tightly linked to biological properties such as gene density, replication timing and recombination, the new level of detail provided by the isochore map helped the understanding of genome structure, function and evolution. This led the current level of knowledge and to further insights.
Collapse
Affiliation(s)
- Maria Costantini
- Department of Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Naples, Italy.
| |
Collapse
|
8
|
Panda A, Podder S, Chakraborty S, Ghosh TC. GC-made protein disorder sheds new light on vertebrate evolution. Genomics 2014; 104:530-7. [PMID: 25240915 DOI: 10.1016/j.ygeno.2014.09.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Revised: 08/05/2014] [Accepted: 09/10/2014] [Indexed: 10/24/2022]
Abstract
At the emergence of endothermic vertebrates, GC rich regions of the ectothermic ancestral genomes underwent a significant GC increase. Such an increase was previously postulated to increase thermodynamic and structural stability of proteins through selective increase of protein hydrophobicity. Here, we found that, increase in GC content promotes a higher content of disorder promoting amino acid in endothermic vertebrates proteins and that the increase in hydrophobicity is mainly due to a higher content of the small disorder promoting amino acid alanine. In endothermic vertebrates, prevalence of disordered residues was found to promote functional diversity of proteins encoded by GC rich genes. Higher fraction of disordered residues in this group of proteins was also found to minimize their aggregation tendency. Thus, we propose that the GC transition has favored disordered residues to promote functional diversity in GC rich genes, and to protect them against functional loss by protein misfolding.
Collapse
Affiliation(s)
- Arup Panda
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | - Soumita Podder
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | - Sandip Chakraborty
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | - Tapash Chandra Ghosh
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India.
| |
Collapse
|
9
|
Prydz R, Straty GC. PVT Measurements, Virial Coefficients, and Joule-Thomson inversion Curve of Fluorine. J Res Natl Bur Stand A Phys Chem 1970; 74A:747-760. [PMID: 32523225 PMCID: PMC6730985 DOI: 10.6028/jres.074a.062] [Citation(s) in RCA: 23] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Experimental PVT measurements on gaseous and liquid fluorine from the triple point (53.5 K) to 300 K at pressures to about 21 MN/m2 are presented. The data are represented by a truncated virial equation in the low-density region. Comparisons of the second virial coefficient from this equation are made with published data. The PVT relationship along the Joule-Thomson inversion curve was obtained from the isotherm-isochore representation of the high density region.
Collapse
Affiliation(s)
- Rolf Prydz
- Institute for Basic Standards, National Bureau of Standards, Boulder, Colorado 80302
| | - G C Straty
- Institute for Basic Standards, National Bureau of Standards, Boulder, Colorado 80302
| |
Collapse
|