1
|
Li ZL, Buck M. A proteome-scale analysis of vertebrate protein amino acid occurrence: Thermoadaptation shows a correlation with protein solvation but less so with dynamics. Proteins 2023; 91:3-15. [PMID: 36053994 PMCID: PMC10087973 DOI: 10.1002/prot.26404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Revised: 07/06/2022] [Accepted: 07/25/2022] [Indexed: 12/15/2022]
Abstract
Despite differences in behaviors and living conditions, vertebrate organisms share the great majority of proteins, often with subtle differences in amino acid sequence. Here, we present a simple way to analyze the difference in amino acid occurrence by comparing highly homologous proteins on a subproteome level between several vertebrate model organisms. Specifically, we use this method to identify a pattern of amino acid conservation as well as a shift in amino acid occurrence between homeotherms (warm-blooded species) and poikilotherms (cold-blooded species). Importantly, this general analysis and a specific example further establish a broad correlation, if not likely connection between the thermal adaptation of protein sequences and two of their physical features: on average a change in their protein dynamics and, even more strongly, in their solvation. For poikilotherms, such as frog and fish, the lower body temperature is expected to increase the protein-protein interaction due to a decrease in protein internal dynamics. In order to counteract the tendency for enhanced binding caused by low temperatures, poikilotherms enhance the solvation of their proteins by favoring polar amino acids. This feature appears to dominate over possible changes in dynamics for some proteins. The results suggest that a general trend for amino acid choice is part of the mechanism for thermoadaptation of vertebrate organisms at the molecular level.
Collapse
Affiliation(s)
- Zhen-Lu Li
- School of Life Science, Tianjin University, Tianjin, China.,Department of Physiology and Biophysics, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | - Matthias Buck
- Department of Physiology and Biophysics, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA.,Departments of Pharmacology and of Neurosciences, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| |
Collapse
|
2
|
Jiménez‐Mena B, Flávio H, Henriques R, Manuzzi A, Ramos M, Meldrup D, Edson J, Pálsson S, Ásta Ólafsdóttir G, Ovenden JR, Nielsen EE. Fishing for DNA? Designing baits for population genetics in target enrichment experiments: Guidelines, considerations and the new tool supeRbaits. Mol Ecol Resour 2022; 22:2105-2119. [PMID: 35178874 PMCID: PMC9313901 DOI: 10.1111/1755-0998.13598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 01/24/2022] [Accepted: 02/07/2022] [Indexed: 11/27/2022]
Abstract
Targeted sequencing is an increasingly popular next-generation sequencing (NGS) approach for studying populations that involves focusing sequencing efforts on specific parts of the genome of a species of interest. Methodologies and tools for designing targeted baits are scarce but in high demand. Here, we present specific guidelines and considerations for designing capture sequencing experiments for population genetics for both neutral genomic regions and regions subject to selection. We describe the bait design process for three diverse fish species: Atlantic salmon, Atlantic cod and tiger shark, which was carried out in our research group, and provide an evaluation of the performance of our approach across both historical and modern samples. The workflow used for designing these three bait sets has been implemented in the R-package supeRbaits, which encompasses our considerations and guidelines for bait design for the benefit of researchers and practitioners. The supeRbaits R-package is user-friendly and versatile. It is written in C++ and implemented in R. supeRbaits and its manual are available from Github: https://github.com/BelenJM/supeRbaits.
Collapse
Affiliation(s)
- Belén Jiménez‐Mena
- Section for Marine Living Resources, National Institute of Aquatic ResourcesTechnical University of DenmarkSilkeborgDenmark
| | - Hugo Flávio
- Section for Marine Living Resources, National Institute of Aquatic ResourcesTechnical University of DenmarkSilkeborgDenmark
| | - Romina Henriques
- Section for Marine Living Resources, National Institute of Aquatic ResourcesTechnical University of DenmarkSilkeborgDenmark
| | - Alice Manuzzi
- Section for Marine Living Resources, National Institute of Aquatic ResourcesTechnical University of DenmarkSilkeborgDenmark
| | | | - Dorte Meldrup
- Section for Marine Living Resources, National Institute of Aquatic ResourcesTechnical University of DenmarkSilkeborgDenmark
| | - Janette Edson
- Queensland Brain InstituteThe University of QueenslandBrisbaneQueenslandAustralia
| | - Snæbjörn Pálsson
- Faculty of Life and Environmental SciencesUniversity of IcelandReykjavíkIceland
| | | | - Jennifer R. Ovenden
- Molecular Fisheries Laboratory, School of Biomedical SciencesThe University of QueenslandBrisbaneQueenslandAustralia
| | - Einar Eg Nielsen
- Section for Marine Living Resources, National Institute of Aquatic ResourcesTechnical University of DenmarkSilkeborgDenmark
| |
Collapse
|
3
|
Ho AT, Hurst LD. Unusual mammalian usage of TGA stop codons reveals that sequence conservation need not imply purifying selection. PLoS Biol 2022; 20:e3001588. [PMID: 35550630 PMCID: PMC9129041 DOI: 10.1371/journal.pbio.3001588] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 05/24/2022] [Accepted: 04/20/2022] [Indexed: 11/18/2022] Open
Abstract
The assumption that conservation of sequence implies the action of purifying selection is central to diverse methodologies to infer functional importance. GC-biased gene conversion (gBGC), a meiotic mismatch repair bias strongly favouring GC over AT, can in principle mimic the action of selection, this being thought to be especially important in mammals. As mutation is GC→AT biased, to demonstrate that gBGC does indeed cause false signals requires evidence that an AT-rich residue is selectively optimal compared to its more GC-rich allele, while showing also that the GC-rich alternative is conserved. We propose that mammalian stop codon evolution provides a robust test case. Although in most taxa TAA is the optimal stop codon, TGA is both abundant and conserved in mammalian genomes. We show that this mammalian exceptionalism is well explained by gBGC mimicking purifying selection and that TAA is the selectively optimal codon. Supportive of gBGC, we observe (i) TGA usage trends are consistent at the focal stop codon and elsewhere (in UTR sequences); (ii) that higher TGA usage and higher TAA→TGA substitution rates are predicted by a high recombination rate; and (iii) across species the difference in TAA <-> TGA substitution rates between GC-rich and GC-poor genes is largest in genomes that possess higher between-gene GC variation. TAA optimality is supported both by enrichment in highly expressed genes and trends associated with effective population size. High TGA usage and high TAA→TGA rates in mammals are thus consistent with gBGC’s predicted ability to “drive” deleterious mutations and supports the hypothesis that sequence conservation need not be indicative of purifying selection. A general trend for GC-rich trinucleotides to reside at frequencies far above their mutational equilibrium in high recombining domains supports the generality of these results.
Collapse
Affiliation(s)
- Alexander Thomas Ho
- Milner Centre for Evolution, University of Bath, Bath, United Kingdom
- * E-mail:
| | | |
Collapse
|
4
|
Raza SHA, Hassanin AA, Dhshan AI, Abdelnour SA, Khan R, Mei C, Zan L. In silico genomic and proteomic analyses of three heat shock proteins (HSP70, HSP90-α, and HSP90-β) in even-toed ungulates. ELECTRON J BIOTECHN 2021. [DOI: 10.1016/j.ejbt.2021.07.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
|
5
|
Vinogradov AE, Anatskaya OV. Systemic evolutionary changes in mammalian gene expression. Biosystems 2020; 198:104256. [PMID: 32976926 DOI: 10.1016/j.biosystems.2020.104256] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 09/18/2020] [Accepted: 09/18/2020] [Indexed: 12/16/2022]
Abstract
Changes in gene expression play an important role in evolution and can be relevant to evolutionary medicine. In this work, a strong relationship was found between the statistical significance of evolutionary changes in the expression of orthologous genes in the five or six homologous mammalian tissues and the across-tissues unidirectionality of changes (i.e., they occur in the same direction in different tissues -- all upward or all downward). In the area of highly significant changes, the fraction of unidirectionally changed genes (UCG) was above 0.9 (random expectation is 0.03). This observation indicates that the most pronounced evolutionary changes in mammalian gene expression are systemic (i.e., they operate at the whole-organism level). The UCG are strongly enriched in the housekeeping genes. More specifically, in the human-chimpanzee comparison, the UCG are enriched in the pathways belonging to gene expression (translation is prominent), cell cycle control, ubiquitin-dependent protein degradation (mostly related to cell cycle control), apoptosis, and Parkinson's disease. In the human-macaque comparison, the two other neurodegenerative diseases (Alzheimer's and Huntington's) are added to the enriched pathways. The consolidation of gene expression changes at the level of pathways indicates that they are not neutral but functional. The systemic expression changes probably maintain the across-tissues balance of basic physiological processes in the course of evolution (e.g., during the movement along the fast-slow life axis). These results can be useful for understanding the variation in longevity and susceptibility to cancer and widespread neurodegenerative diseases. This approach can also guide the choice of prospective genes for studies aiming to decipher cis-regulatory code (the gene list is provided).
Collapse
Affiliation(s)
| | - Olga V Anatskaya
- Institute of Cytology, Russian Academy of Sciences, St. Petersburg, 194064, Russia
| |
Collapse
|
6
|
Vinogradov AE, Anatskaya OV. Cell-cycle dependence of transcriptome gene modules: comparison of regression lines. FEBS J 2020; 287:4427-4439. [PMID: 32083797 DOI: 10.1111/febs.15257] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Revised: 01/24/2020] [Accepted: 02/20/2020] [Indexed: 12/17/2022]
Abstract
The transcriptome consists of various gene modules that can be mutually dependent, and ignoring these dependencies may lead to misinterpretation. The most important problem is module dependence on cell-cycle activity. Using meta-analysis of over 30 000 single-cell transcriptomes, we show gene module dependencies on cell-cycle signature, which can be consistently observed in various normal and cancer cells. Transcript levels of receptors, plasma membrane, and differentiation-related genes are negatively regressed on cell-cycle signature. Pluripotency, stress response, DNA repair, chromatin remodeling, proteasomal protein degradation, protein network connectivity, and unicellular evolutionary origin are regressed positively. These effects cannot be explained by partial overlap of corresponding gene sets because they remain if the overlapped genes were removed. We propose a visual analysis of gene module-specific regression lines as complement to an uncurated enrichment analysis. The different lines for a same gene module indicate different cell conditions. The approach is tested on several problems (polyploidy, pluripotency, cancer, phylostratigraphy). Intriguingly, we found variation in cell-cycle activity, which is independent of cell progression through the cycle. The upregulation of G2/M checkpoint genes with downregulation of G2/M transition and cytokinesis is revealed in polyploid cells. A temporal increase in cell-cycle activity at transition from pluripotent to more differentiated state is found in human embryonic stem cells. The upregulation of unicellular interactome cluster in human cancers is shown in single cells with control for cell-cycle activity. The greater scatter around regression line in cancer cells suggests greater heterogeneity caused by deviation from a line of normal cells.
Collapse
Affiliation(s)
| | - Olga V Anatskaya
- Institute of Cytology, Russian Academy of Sciences, St. Petersburg, Russia
| |
Collapse
|
7
|
Wang X, Zhang Y, Qiao L, Chen B. Comparative analyses of simple sequence repeats (SSRs) in 23 mosquito species genomes: Identification, characterization and distribution (Diptera: Culicidae). INSECT SCIENCE 2019; 26:607-619. [PMID: 29484820 PMCID: PMC7379697 DOI: 10.1111/1744-7917.12577] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Revised: 01/20/2018] [Accepted: 01/24/2018] [Indexed: 05/28/2023]
Abstract
Simple sequence repeats (SSRs) exist in both eukaryotic and prokaryotic genomes and are the most popular genetic markers, but the SSRs of mosquito genomes are still not well understood. In this study, we identified and analyzed the SSRs in 23 mosquito species using Drosophila melanogaster as reference at the whole-genome level. The results show that SSR numbers (33 076-560 175/genome) and genome sizes (574.57-1342.21 Mb) are significantly positively correlated (R2 = 0.8992, P < 0.01), but the correlation in individual species varies in these mosquito species. In six types of SSR, mono- to trinucleotide SSRs are dominant with cumulative percentages of 95.14%-99.00% and densities of 195.65/Mb-787.51/Mb, whereas tetra- to hexanucleotide SSRs are rare with 1.12%-4.22% and 3.76/Mb-40.23/Mb. The (A/T)n, (AC/GT)n and (AGC/GCT)n are the most frequent motifs in mononucleotide, dinucleotide and trinucleotide SSRs, respectively, and the motif frequencies of tetra- to hexanucleotide SSRs appear to be species-specific. The 10-20 bp length of SSRs are dominant with the number of 110 561 ± 93 482 and the frequency of 87.25% ± 5.73% on average, and the number and frequency decline with the increase of length. Most SSRs (83.34% ± 7.72%) are located in intergenic regions, followed by intron regions (11.59% ± 5.59%), exon regions (3.74% ± 1.95%), and untranslated regions (1.32% ± 1.39%). The mono-, di- and trinucleotide SSRs are the main SSRs in both gene regions (98.55% ± 0.85%) and exon regions (99.27% ± 0.52%). An average of 42.52% of total genes contains SSRs, and the preference for SSR occurrence in different gene subcategories are species-specific. The study provides useful insights into the SSR diversity, characteristics and distribution in 23 mosquito species of genomes.
Collapse
Affiliation(s)
- Xiao‐Ting Wang
- Chongqing Key Laboratory of Vector Insects; Chongqing Key Laboratory of Animal Biology; Institute of Entomology and Molecular BiologyChongqing Normal UniversityChongqingChina
| | - Yu‐Juan Zhang
- Chongqing Key Laboratory of Vector Insects; Chongqing Key Laboratory of Animal Biology; Institute of Entomology and Molecular BiologyChongqing Normal UniversityChongqingChina
| | - Liang Qiao
- Chongqing Key Laboratory of Vector Insects; Chongqing Key Laboratory of Animal Biology; Institute of Entomology and Molecular BiologyChongqing Normal UniversityChongqingChina
| | - Bin Chen
- Chongqing Key Laboratory of Vector Insects; Chongqing Key Laboratory of Animal Biology; Institute of Entomology and Molecular BiologyChongqing Normal UniversityChongqingChina
| |
Collapse
|
8
|
Shilina MA, Grinchuk TM, Anatskaya OV, Vinogradov AE, Alekseenko LL, Elmuratov AU, Nikolsky NN. Cytogenetic and Transcriptomic Analysis of Human Endometrial MSC Retaining Proliferative Activity after Sublethal Heat Shock. Cells 2018; 7:cells7110184. [PMID: 30366433 PMCID: PMC6262560 DOI: 10.3390/cells7110184] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 10/19/2018] [Accepted: 10/23/2018] [Indexed: 12/14/2022] Open
Abstract
Temperature is an important exogenous factor capable of leading to irreversible processes in the vital activity of cells. However, the long-term effects of heat shock (HS) on mesenchymal stromal cells (MSC) remain unstudied. We investigated the karyotype and DNA repair drivers and pathways in the human endometrium MSC (eMSC) survived progeny at passage 6 after sublethal heat stress (sublethal heat stress survived progeny (SHS-SP)). G-banding revealed an outbreak of random karyotype instability caused by chromosome breakages and aneuploidy. Molecular karyotyping confirmed the random nature of this instability. Transcriptome analysis found homologous recombination (HR) deficiency that most likely originated from the low thermostability of the AT-rich HR driving genes. SHS-SP protection from transformation is provided presumably by low oncogene expression maintained by tight co-regulation between thermosensitive HR drivers BRCA, ATM, ATR, and RAD51 (decreasing expression after SHS), and oncogenes mTOR, MDM2, KRAS, and EGFR. The cancer-related transcriptomic features previously identified in hTERT transformed MSC in culture were not found in SHS-SP, suggesting no traits of malignancy in them. The entrance of SHS-SP into replicative senescence after 25 passages confirms their mortality and absence of transformation features. Overall, our data indicate that SHS may trigger non-tumorigenic karyotypic instability due to HR deficiency and decrease of oncogene expression in progeny of SHS-survived MSC. These data can be helpful for the development of new therapeutic approaches in personalized medicine.
Collapse
Affiliation(s)
- Mariia A Shilina
- Institute of Cytology, Russian Academy of Sciences, Tikhoretskay Ave 4, St. 194064 Petersburg, Russia.
| | - Tatiana M Grinchuk
- Institute of Cytology, Russian Academy of Sciences, Tikhoretskay Ave 4, St. 194064 Petersburg, Russia.
| | - Olga V Anatskaya
- Institute of Cytology, Russian Academy of Sciences, Tikhoretskay Ave 4, St. 194064 Petersburg, Russia.
| | - Alexander E Vinogradov
- Institute of Cytology, Russian Academy of Sciences, Tikhoretskay Ave 4, St. 194064 Petersburg, Russia.
| | - Larisa L Alekseenko
- Institute of Cytology, Russian Academy of Sciences, Tikhoretskay Ave 4, St. 194064 Petersburg, Russia.
| | - Artem U Elmuratov
- Institute of Biomedical Chemistry (IBMC) of Russian Academy of Sciences, 10 Building 8, Pogodinskaya Street, 119121 Moscow, Russia.
- Medical Genetics Centre Genotek, Nastavnichesky Alley 17-1-15, 10510 Moscow, Russia.
| | - Nikolai N Nikolsky
- Institute of Cytology, Russian Academy of Sciences, Tikhoretskay Ave 4, St. 194064 Petersburg, Russia.
| |
Collapse
|
9
|
DNA helix: the importance of being AT-rich. Mamm Genome 2017; 28:455-464. [PMID: 28836096 DOI: 10.1007/s00335-017-9713-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Accepted: 08/12/2017] [Indexed: 01/02/2023]
Abstract
The AT-rich DNA is mostly associated with condensed chromatin, whereas the GC-rich sequence is preferably located in the dispersed chromatin. The AT-rich genes are prone to be tissue-specific (silenced in most tissues), while the GC-rich genes tend to be housekeeping (expressed in many tissues). This paper reports another important property of DNA base composition, which can affect repertoire of genes with high AT content. The GC-rich sequence is more liable to mutation. We found that Spearman correlation between human gene GC content and mutation probability is above 0.9. The change of base composition even in synonymous sites affects mutation probability of nonsynonymous sites and thus of encoded proteins. There is a unique type of housekeeping genes, which are especially unsafe when prone to mutation. Natural selection which usually removes deleterious mutations, in the case of these genes only increases the hazard because it can descend to suborganismal (cellular) level. These are cell cycle-related genes. In accordance with the proposed concept, they have low GC content of synonymous sites (despite them being housekeeping). The gene-centred protein interaction enrichment analysis (PIEA) showed the core clusters of genes whose interactants are modularly enriched in genes with AT-rich synonymous codons. This interconnected network is involved in double-strand break repair, DNA integrity checkpoints and chromosome pairing at mitosis. The damage of these genes results in genome and chromosome instability leading to cancer and other 'error catastrophes'. Reducing the nonsynonymous mutations, the usage of AT-rich synonymous codons can decrease probability of cancer by above 20-fold.
Collapse
|
10
|
Tarallo A, Gambi MC, D'Onofrio G. Lifestyle and DNA base composition in polychaetes. Physiol Genomics 2016; 48:883-888. [PMID: 27764763 DOI: 10.1152/physiolgenomics.00018.2016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2016] [Accepted: 09/27/2016] [Indexed: 11/22/2022] Open
Abstract
A comparative analysis of polychaete species, classified as motile and low-motile forms, highlighted that the former were characterized not only by a higher metabolic rate (MR), but also by a higher genomic GC content. The fluctuation of both variables was not affected by the phylogenetic relationship of the species. Thus, present results further support that a very active lifestyle affects MR and GC at the same time, showing an unexpected similarity between invertebrates and vertebrates. In teleosts, indeed, a similar pattern has been also observed in comparisons of migratory and nonmigratory species. A cause-effect link between MR and GC has not yet been proved, but the fact that the two variables are significantly linked in all the organisms so far analyzed is, most probably, of relevant biological and evolutionary meaning. The present results fit very well within the frame of the metabolic rate hypothesis proposed to explain the DNA base composition variability among organisms. On the contrary, the thermostability hypothesis was not supported. At present, no data about the recombination rate in polychaetes were available to test the biased gene conversion (BGC hypothesis).
Collapse
Affiliation(s)
- Andrea Tarallo
- Stazione Zoologica Anton Dohrn, Department of Biology and Evolution of Marine Organisms, Naples, Italy; and
| | - Maria Cristina Gambi
- Stazione Zoologica Anton Dohrn, Department of Integrative Marine Ecology (Villa Dohrn-Benthic Ecology Center), Ischia, Naples, Italy
| | - Giuseppe D'Onofrio
- Stazione Zoologica Anton Dohrn, Department of Biology and Evolution of Marine Organisms, Naples, Italy; and
| |
Collapse
|
11
|
Torres CM, Biran A, Burney MJ, Patel H, Henser-Brownhill T, Cohen AHS, Li Y, Ben-Hamo R, Nye E, Spencer-Dene B, Chakravarty P, Efroni S, Matthews N, Misteli T, Meshorer E, Scaffidi P. The linker histone H1.0 generates epigenetic and functional intratumor heterogeneity. Science 2016; 353:aaf1644. [PMID: 27708074 PMCID: PMC5131846 DOI: 10.1126/science.aaf1644] [Citation(s) in RCA: 112] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2015] [Accepted: 08/30/2016] [Indexed: 12/22/2022]
Abstract
Tumors comprise functionally diverse subpopulations of cells with distinct proliferative potential. Here, we show that dynamic epigenetic states defined by the linker histone H1.0 determine which cells within a tumor can sustain the long-term cancer growth. Numerous cancer types exhibit high inter- and intratumor heterogeneity of H1.0, with H1.0 levels correlating with tumor differentiation status, patient survival, and, at the single-cell level, cancer stem cell markers. Silencing of H1.0 promotes maintenance of self-renewing cells by inducing derepression of megabase-sized gene domains harboring downstream effectors of oncogenic pathways. Self-renewing epigenetic states are not stable, and reexpression of H1.0 in subsets of tumor cells establishes transcriptional programs that restrict cancer cells' long-term proliferative potential and drive their differentiation. Our results uncover epigenetic determinants of tumor-maintaining cells.
Collapse
Affiliation(s)
- Cristina Morales Torres
- Cancer Epigenetics Laboratory, The Francis Crick Institute, Lincoln's Inn Fields Laboratory, London WC2A 3LY, UK
| | - Alva Biran
- Department of Genetics, The Institute of Life Sciences, and The Edmond and Lily Safra Center for Brain Sciences (ELSC), The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, Jerusalem, 91904, Israel
| | - Matthew J. Burney
- Cancer Epigenetics Laboratory, The Francis Crick Institute, Lincoln's Inn Fields Laboratory, London WC2A 3LY, UK
| | - Harshil Patel
- Bioinformatics, The Francis Crick Institute, Lincoln's Inn Fields Laboratory, London WC2A 3LY, UK
| | - Tristan Henser-Brownhill
- Cancer Epigenetics Laboratory, The Francis Crick Institute, Lincoln's Inn Fields Laboratory, London WC2A 3LY, UK
| | - Ayelet-Hashahar Shapira Cohen
- Department of Genetics, The Institute of Life Sciences, and The Edmond and Lily Safra Center for Brain Sciences (ELSC), The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, Jerusalem, 91904, Israel
| | - Yilong Li
- Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB101SA, UK
| | - Rotem Ben-Hamo
- The Mina and Everard Goodman Faculty of Life Science, Bar Ilan University, Ramat-Gan, 52900, Israel
| | - Emma Nye
- Experimental Histopathology, The Francis Crick Institute, Lincoln's Inn Fields Laboratory, London WC2A 3LY, UK
| | - Bradley Spencer-Dene
- Experimental Histopathology, The Francis Crick Institute, Lincoln's Inn Fields Laboratory, London WC2A 3LY, UK
| | - Probir Chakravarty
- Bioinformatics, The Francis Crick Institute, Lincoln's Inn Fields Laboratory, London WC2A 3LY, UK
| | - Sol Efroni
- The Mina and Everard Goodman Faculty of Life Science, Bar Ilan University, Ramat-Gan, 52900, Israel
| | - Nik Matthews
- Advanced sequencing, The Francis Crick Institute, Lincoln's Inn Fields Laboratory, London WC2A 3LY, UK
| | - Tom Misteli
- National Cancer Institute, NIH, Bethesda, MD, 20892, USA
| | - Eran Meshorer
- Department of Genetics, The Institute of Life Sciences, and The Edmond and Lily Safra Center for Brain Sciences (ELSC), The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, Jerusalem, 91904, Israel
| | - Paola Scaffidi
- Cancer Epigenetics Laboratory, The Francis Crick Institute, Lincoln's Inn Fields Laboratory, London WC2A 3LY, UK
- UCL Cancer Institute, University College London, London WC1E 6DD, UK
| |
Collapse
|
12
|
Tarallo A, Angelini C, Sanges R, Yagi M, Agnisola C, D'Onofrio G. On the genome base composition of teleosts: the effect of environment and lifestyle. BMC Genomics 2016; 17:173. [PMID: 26935583 PMCID: PMC4776435 DOI: 10.1186/s12864-016-2537-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 02/25/2016] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND The DNA base composition is well known to be highly variable among organisms. Bio-physic studies on the effect of the GC increments on the DNA structure have shown that GC-richer DNA sequences are more bendable. The result was the keystone of the hypothesis proposing the metabolic rate as the major force driving the GC content variability, since an increased resistance to the torsion stress is mainly required during the transcription process to avoid DNA breakage. Hence, the aim of the present work is to test if both salinity and migration, suggested to affect the metabolic rate of teleostean fishes, affect the average genomic GC content as well. Moreover, since the gill surface has been reported to be a major morphological expression of metabolic rate, this parameter was also analyzed in the light of the above hypothesis. RESULTS Teleosts living in different environments (freshwater and seawater) and with different lifestyles (migratory and non-migratory) were analyzed studying three variables: routine metabolic rate, gill area and genomic GC-content, none of them showing a phylogenetic signal among fish species. Routine metabolic rate, specific gill area and average genomic GC were higher in seawater than freshwater species. The same trend was observed comparing migratory versus non-migratory species. Crossing salinity and lifestyle, the active migratory species living in seawater show coincidentally the highest routine metabolic rate, the highest specific gill area and the highest average genomic GC content. CONCLUSIONS The results clearly highlight that environmental factors (salinity) and lifestyle (migration) affect not only the physiology (i.e. the routine metabolic rate), and the morphology (i.e. gill area) of teleosts, but also basic genome feature (i.e. the GC content), thus opening to an interesting liaison among the three variables in the light of the metabolic rate hypothesis.
Collapse
Affiliation(s)
- Andrea Tarallo
- Genome Evolution and Organization - Department BEOM, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Naples, Italy
| | - Claudia Angelini
- Istituto per le Applicazioni del Calcolo "Mauro Picone" - CNR, Via Pietro Castellino, 111, 80131, Naples, Italy
| | - Remo Sanges
- Genome Evolution and Organization - Department BEOM, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Naples, Italy
| | - Mitsuharu Yagi
- Faculty of Fisheries, Nagasaki University, 1-14 Bunkyo, Nagasaki, 852-8521, Japan
| | - Claudio Agnisola
- Department of Biology, Complesso Universitario di Monte Sant'Angelo, University of Naples Federico II, Edificio 7, Via Cinthia, 80126, Naples, Italy
| | - Giuseppe D'Onofrio
- Genome Evolution and Organization - Department BEOM, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Naples, Italy.
| |
Collapse
|
13
|
Qi WH, Jiang XM, Du LM, Xiao GS, Hu TZ, Yue BS, Quan QM. Genome-Wide Survey and Analysis of Microsatellite Sequences in Bovid Species. PLoS One 2015. [PMID: 26196922 PMCID: PMC4510479 DOI: 10.1371/journal.pone.0133667] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Microsatellites or simple sequence repeats (SSRs) have become the most popular source of genetic markers, which are ubiquitously distributed in many eukaryotic and prokaryotic genomes. This is the first study examining and comparing SSRs in completely sequenced genomes of the Bovidae. We analyzed and compared the number of SSRs, relative abundance, relative density, guanine-cytosine (GC) content and proportion of SSRs in six taxonomically different bovid species: Bos taurus, Bubalus bubalis, Bos mutus, Ovis aries, Capra hircus, and Pantholops hodgsonii. Our analysis revealed that, based on our search criteria, the total number of perfect SSRs found ranged from 663,079 to 806,907 and covered from 0.44% to 0.48% of the bovid genomes. Relative abundance and density of SSRs in these Bovinae genomes were non-significantly correlated with genome size (Pearson, r < 0.420, p > 0.05). Perfect mononucleotide SSRs were the most abundant, followed by the pattern: perfect di- > tri- > penta- > tetra- > hexanucleotide SSRs. Generally, the number of SSRs, relative abundance, and relative density of SSRs decreased as the motif repeat length increased in each species of Bovidae. The most GC-content was in trinucleotide SSRs and the least was in the mononucleotide SSRs in the six bovid genomes. The GC-contents of tri- and pentanucleotide SSRs showed a great deal of similarity among different chromosomes of B. taurus, O. aries, and C. hircus. SSR number of all chromosomes in the B. taurus, O.aries, and C. hircus is closely positively correlated with chromosome sequence size (Pearson, r > 0.980, p < 0.01) and significantly negatively correlated with GC-content (Pearson, r < -0.638, p < 0.01). Relative abundance and density of SSRs in all chromosomes of the three species were significantly negatively correlated with GC-content (Pearson, r < -0.333, P < 0.05) but not significantly correlated with chromosome sequence size (Pearson, r < -0.185, P > 0.05). Relative abundances of the same nucleotide SSR type showed great similarity among different chromosomes of B. taurus, O. aries, and C. hircus.
Collapse
Affiliation(s)
- Wen-Hua Qi
- College of Life Science and Engineering, Chongqing Three Gorges University, Chongqing, 404100, China
- * E-mail:
| | - Xue-Mei Jiang
- College of Environmental and Chemistry Engineering, Chongqing Three Gorges University, Chongqing, 404100, China
| | - Lian-Ming Du
- Key Laboratory of Bio-resources and Eco-environment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu, 610064, China
| | - Guo-Sheng Xiao
- College of Life Science and Engineering, Chongqing Three Gorges University, Chongqing, 404100, China
| | - Ting-Zhang Hu
- College of Life Science and Engineering, Chongqing Three Gorges University, Chongqing, 404100, China
| | - Bi-Song Yue
- Key Laboratory of Bio-resources and Eco-environment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu, 610064, China
| | - Qiu-Mei Quan
- School of Life Sciences, China West Normal University, Nanchong, 637009, China
| |
Collapse
|
14
|
Chaurasia A, Tarallo A, Bernà L, Yagi M, Agnisola C, D’Onofrio G. Length and GC content variability of introns among teleostean genomes in the light of the metabolic rate hypothesis. PLoS One 2014; 9:e103889. [PMID: 25093416 PMCID: PMC4122358 DOI: 10.1371/journal.pone.0103889] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2013] [Accepted: 07/07/2014] [Indexed: 01/30/2023] Open
Abstract
A comparative analysis of five teleostean genomes, namely zebrafish, medaka, three-spine stickleback, fugu and pufferfish was performed with the aim to highlight the nature of the forces driving both length and base composition of introns (i.e., bpi and GCi). An inter-genome approach using orthologous intronic sequences was carried out, analyzing independently both variables in pairwise comparisons. An average length shortening of introns was observed at increasing average GCi values. The result was not affected by masking transposable and repetitive elements harbored in the intronic sequences. The routine metabolic rate (mass specific temperature-corrected using the Boltzmann's factor) was measured for each species. A significant correlation held between average differences of metabolic rate, length and GC content, while environmental temperature of fish habitat was not correlated with bpi and GCi. Analyzing the concomitant effect of both variables, i.e., bpi and GCi, at increasing genomic GC content, a decrease of bpi and an increase of GCi was observed for the significant majority of the intronic sequences (from ∼40% to ∼90%, in each pairwise comparison). The opposite event, concomitant increase of bpi and decrease of GCi, was counter selected (from <1% to ∼10%, in each pairwise comparison). The results further support the hypothesis that the metabolic rate plays a key role in shaping genome architecture and evolution of vertebrate genomes.
Collapse
Affiliation(s)
- Ankita Chaurasia
- Genome Evolution and Organization – Dept. Animal Physiology and Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, Italy
- Campus UAB - CRAG Bellaterra - Cerdanyola del Vallès, Barcelona, Spain
| | - Andrea Tarallo
- Genome Evolution and Organization – Dept. Animal Physiology and Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, Italy
| | - Luisa Bernà
- Genome Evolution and Organization – Dept. Animal Physiology and Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, Italy
- Molecular Biology Unit, Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Mitsuharu Yagi
- Faculty of Fisheries, Nagasaki University, Bunkyo, Nagasaki, Japan
| | - Claudio Agnisola
- Department of Biological Sciences, University of Naples Federico II, Napoli, Italy
| | - Giuseppe D’Onofrio
- Genome Evolution and Organization – Dept. Animal Physiology and Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, Italy
- * E-mail:
| |
Collapse
|
15
|
Yokoyama KD, Pollock DD. SP transcription factor paralogs and DNA-binding sites coevolve and adaptively converge in mammals and birds. Genome Biol Evol 2013; 4:1102-17. [PMID: 23019068 PMCID: PMC3514965 DOI: 10.1093/gbe/evs085] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Functional modification of regulatory proteins can affect hundreds of genes throughout the genome, and is therefore thought to be almost universally deleterious. This belief, however, has recently been challenged. A potential example comes from transcription factor SP1, for which statistical evidence indicates that motif preferences were altered in eutherian mammals. Here, we set out to discover possible structural and theoretical explanations, evaluate the role of selection in SP1 evolution, and discover effects on coregulatory proteins. We show that SP1 motif preferences were convergently altered in birds as well as mammals, inducing coevolutionary changes in over 800 regulatory regions. Structural and phylogenic evidence implicates a single causative amino acid replacement at the same SP1 position along both lineages. Furthermore, paralogs SP3 and SP4, which coregulate SP1 target genes through competitive binding to the same sites, have accumulated convergent replacements at the homologous position multiple times during eutherian and bird evolution, presumably to preserve competitive binding. To determine plausibility, we developed and implemented a simple model of transcription factor and binding site coevolution. This model predicts that, in contrast to prevailing beliefs, even small selective benefits per locus can drive concurrent fixation of transcription factor and binding site mutants under a broad range of conditions. Novel binding sites tend to arise de novo, rather than by mutation from ancestral sites, a prediction substantiated by SP1-binding site alignments. Thus, multiple lines of evidence indicate that selection has driven convergent evolution of transcription factors along with their binding sites and coregulatory proteins.
Collapse
Affiliation(s)
- Ken Daigoro Yokoyama
- Department of Biochemistry and Molecular Genetics, University of Colorado, Denver School of Medicine, USA
| | | |
Collapse
|
16
|
Berná L, Chaurasia A, Angelini C, Federico C, Saccone S, D'Onofrio G. The footprint of metabolism in the organization of mammalian genomes. BMC Genomics 2012; 13:174. [PMID: 22568857 PMCID: PMC3384468 DOI: 10.1186/1471-2164-13-174] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 05/08/2012] [Indexed: 01/02/2023] Open
Abstract
Background At present five evolutionary hypotheses have been proposed to explain the great variability of the genomic GC content among and within genomes: the mutational bias, the biased gene conversion, the DNA breakpoints distribution, the thermal stability and the metabolic rate. Several studies carried out on bacteria and teleostean fish pointed towards the critical role played by the environment on the metabolic rate in shaping the base composition of genomes. In mammals the debate is still open, and evidences have been produced in favor of each evolutionary hypothesis. Human genes were assigned to three large functional categories (as well as to the corresponding functional classes) according to the KOG database: (i) information storage and processing, (ii) cellular processes and signaling, and (iii) metabolism. The classification was extended to the organisms so far analyzed performing a reciprocal Blastp and selecting the best reciprocal hit. The base composition was calculated for each sequence of the whole CDS dataset. Results The GC3 level of the above functional categories was increasing from (i) to (iii). This specific compositional pattern was found, as footprint, in all mammalian genomes, but not in frog and lizard ones. Comparative analysis of human versus both frog and lizard functional categories showed that genes involved in the metabolic processes underwent the highest GC3 increment. Analyzing the KOG functional classes of genes, again a well defined intra-genomic pattern was found in all mammals. Not only genes of metabolic pathways, but also genes involved in chromatin structure and dynamics, transcription, signal transduction mechanisms and cytoskeleton, showed an average GC3 level higher than that of the whole genome. In the case of the human genome, the genes of the aforementioned functional categories showed a high probability to be associated with the chromosomal bands. Conclusions In the light of different evolutionary hypotheses proposed so far, and contributing with different potential to the genome compositional heterogeneity of mammalian genomes, the one based on the metabolic rate seems to play not a minor role. Keeping in mind similar results reported in bacteria and in teleosts, the specific compositional patterns observed in mammals highlight metabolic rate as unifying factor that fits over a wide range of living organisms.
Collapse
Affiliation(s)
- Luisa Berná
- Genome Evolution and Organization - Department Animal Physiology and Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Naples, Italy
| | | | | | | | | | | |
Collapse
|
17
|
Ouyang Q, Zhao X, Feng H, Tian Y, Li D, Li M, Tan Z. High GC content of simple sequence repeats in Herpes simplex virus type 1 genome. Gene 2012; 499:37-40. [PMID: 22414335 DOI: 10.1016/j.gene.2012.02.049] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2011] [Revised: 02/22/2012] [Accepted: 02/24/2012] [Indexed: 10/28/2022]
Abstract
The presence, locations and composition of simple sequence repeats (SSRs) in Herpes simplex virus type 1 (HSV-1) genome were extracted and analyzed by using the software Imperfect Microsatellite Extractor (IMEx). There were 663 mon-, 502 di-, 184 tri-, 20 tetra-, 4 penta- and 4 hexanucleotide SSRs that were observed in different distribution between coding and noncoding regions in the HSV-1 genome. G/C, GC/CG, and (GGC)(n) were predominant in mononucleotide, dinucletide, trinucleotide repeats respectively. Indeed, the results showed that GC content in simple sequence repeats was notably higher than that in entire HSV-1 genome. Our data might be helpful for studying the pathogenesis, genome structure and evolution of HSV-1.
Collapse
Affiliation(s)
- Qingjian Ouyang
- College of Biology, State Key Laboratory for Chemo/Biosensing and Chemometrics, Hunan University, Changsha 410082, China
| | | | | | | | | | | | | |
Collapse
|
18
|
Uliano E, Chaurasia A, Bernà L, Agnisola C, D'Onofrio G. Metabolic rate and genomic GC: what we can learn from teleost fish. Mar Genomics 2010; 3:29-34. [PMID: 21798194 DOI: 10.1016/j.margen.2010.02.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2009] [Revised: 02/05/2010] [Accepted: 02/11/2010] [Indexed: 11/29/2022]
Abstract
Teleosts are a highly diverse group of animals occupying all kind of aquatic environment. Data on routine mass specific metabolic rate were re-examined correcting them for the Boltzmann's factor. Teleostean fish were grouped in five broad groups, corresponding to major environmental classifications: polar, temperate, sub-tropical, tropical and deep-water. The specific routine metabolic rate, temperature-corrected using the Boltzmann's factor (MR), and the average base composition of genomes (GC%) were calculated in each group. Fish of the polar habitat showed the highest MR. Temperate fish displayed a significantly higher MR than tropical fish, which had the lowest average value. These results were apparently in agreement with the cold adaptation hypothesis. In contrast with this hypothesis, however, the MR of fish living in deep-water environment turned out to be not significantly different from that of fish living in tropical habitats. Most probably, the amount of oxygen dissolved in the water directly affects MR adaptation. Regarding the different habitats, the genomic GC levels showed a decreasing trend similar to that of MR. Indeed, both polar and temperate fish showed a GC level significantly higher than that of both sub-tropical and tropical fish. Plotting the genomic GC levels versus the MR a significant positive correlation was found, supporting the hypothesis that metabolic rate can explain not only the compositional transition mode (e.g. amphibian/mammals), but also the compositional shifting mode (e.g. fish/fish) of evolution observed for vertebrate genomes.
Collapse
Affiliation(s)
- Erminia Uliano
- Department of Biological Sciences, University of Naples Federico II, Napoli, Italy
| | | | | | | | | |
Collapse
|
19
|
Mutational biases and selective forces shaping the structure of Arabidopsis genes. PLoS One 2009; 4:e6356. [PMID: 19633720 PMCID: PMC2712092 DOI: 10.1371/journal.pone.0006356] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2009] [Accepted: 06/01/2009] [Indexed: 01/08/2023] Open
Abstract
Recently features of gene expression profiles have been associated with structural parameters of gene sequences in organisms representing a diverse set of taxa. The emerging picture indicates that natural selection, mediated by gene expression profiles, has a significant role in determining genic structures. However the current situation is less clear in plants as the available data indicates that the effect of natural selection mediated by gene expression is very weak. Moreover, the direction of the patterns in plants appears to contradict those observed in animal genomes. In the present work we analized expression data for >18000 Arabidopsis genes retrieved from public datasets obtained with different technologies (MPSS and high density chip arrays) and compared them with gene parameters. Our results show that the impact of natural selection mediated by expression on genes sequences is significant and distinguishable from the effects of regional mutational biases. In addition, we provide evidence that the level and the breadth of gene expression are related in opposite ways to many structural parameters of gene sequences. Higher levels of expression abundance are associated with smaller transcripts, consistent with the need to reduce costs of both transcription and translation. Expression breadth, however, shows a contrasting pattern, i.e. longer genes have higher breadth of expression, possibly to ensure those structural features associated with gene plasticity. Based on these results, we propose that the specific balance between these two selective forces play a significant role in shaping the structure of Arabidopsis genes.
Collapse
|
20
|
Pozzoli U, Menozzi G, Fumagalli M, Cereda M, Comi GP, Cagliani R, Bresolin N, Sironi M. Both selective and neutral processes drive GC content evolution in the human genome. BMC Evol Biol 2008; 8:99. [PMID: 18371205 PMCID: PMC2292697 DOI: 10.1186/1471-2148-8-99] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2007] [Accepted: 03/27/2008] [Indexed: 11/10/2022] Open
Abstract
Background Mammalian genomes consist of regions differing in GC content, referred to as isochores or GC-content domains. The scientific debate is still open as to whether such compositional heterogeneity is a selected or neutral trait. Results Here we analyze SNP allele frequencies, retrotransposon insertion polymorphisms (RIPs), as well as fixed substitutions accumulated in the human lineage since its divergence from chimpanzee to indicate that biased gene conversion (BGC) has been playing a role in within-genome GC content variation. Yet, a distinct contribution to GC content evolution is accounted for by a selective process. Accordingly, we searched for independent evidences that GC content distribution does not conform to neutral expectations. Indeed, after correcting for possible biases, we show that intron GC content and size display isochore-specific correlations. Conclusion We consider that the more parsimonious explanation for our results is that GC content is subjected to the action of both weak selection and BGC in the human genome with features such as nucleosome positioning or chromatin conformation possibly representing the final target of selective processes. This view might reconcile previous contrasting findings and add some theoretical background to recent evidences suggesting that GC content domains display different behaviors with respect to highly regulated biological processes such as developmentally-stage related gene expression and programmed replication timing during neural stem cell differentiation.
Collapse
Affiliation(s)
- Uberto Pozzoli
- Scientific Institute IRCCS E, Medea, Bioinformatic Lab, Via don L, Monza 20, 23842 Bosisio Parini (LC), Italy.
| | | | | | | | | | | | | | | |
Collapse
|
21
|
Different functional classes of genes are characterized by different compositional properties. FEBS Lett 2007; 581:5819-24. [DOI: 10.1016/j.febslet.2007.11.052] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2007] [Revised: 11/14/2007] [Accepted: 11/16/2007] [Indexed: 11/19/2022]
|
22
|
Chamary JV, Parmley JL, Hurst LD. Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat Rev Genet 2006; 7:98-108. [PMID: 16418745 DOI: 10.1038/nrg1770] [Citation(s) in RCA: 590] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Although the assumption of the neutral theory of molecular evolution - that some classes of mutation have too small an effect on fitness to be affected by natural selection - seems intuitively reasonable, over the past few decades the theory has been in retreat. At least in species with large populations, even synonymous mutations in exons are not neutral. By contrast, in mammals, neutrality of these mutations is still commonly assumed. However, new evidence indicates that even some synonymous mutations are subject to constraint, often because they affect splicing and/or mRNA stability. This has implications for understanding disease, optimizing transgene design, detecting positive selection and estimating the mutation rate.
Collapse
Affiliation(s)
- J V Chamary
- Center for Integrative Genomics, University of Lausanne, Switzerland.
| | | | | |
Collapse
|
23
|
Vinogradov AE, Anatskaya OV. Genome size and metabolic intensity in tetrapods: a tale of two lines. Proc Biol Sci 2006; 273:27-32. [PMID: 16519230 PMCID: PMC1560010 DOI: 10.1098/rspb.2005.3266] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
We show the negative link between genome size and metabolic intensity in tetrapods, using the heart index (relative heart mass) as a unified indicator of metabolic intensity in poikilothermal and homeothermal animals. We found two separate regression lines of heart index on genome size for reptiles-birds and amphibians-mammals (the slope of regression is steeper in reptiles-birds). We also show a negative correlation between GC content and nucleosome formation potential in vertebrate DNA, and, consistent with this relationship, a positive correlation between genome GC content and nuclear size (independent of genome size). It is known that there are two separate regression lines of genome GC content on genome size for reptiles-birds and amphibians-mammals: reptiles-birds have the relatively higher GC content (for their genome sizes) compared to amphibians-mammals. Our results suggest uniting all these data into one concept. The slope of negative regression between GC content and nucleosome formation potential is steeper in exons than in non-coding DNA (where nucleosome formation potential is generally higher), which indicates a special role of non-coding DNA for orderly chromatin organization. The chromatin condensation and nuclear size are supposed to be key parameters that accommodate the effects of both genome size and GC content and connect them with metabolic intensity. Our data suggest that the reptilian-birds clade evolved special relationships among these parameters, whereas mammals preserved the amphibian-like relationships. Surprisingly, mammals, although acquiring a more complex general organization, seem to retain certain genome-related properties that are similar to amphibians. At the same time, the slope of regression between nucleosome formation potential and GC content is steeper in poikilothermal than in homeothermal genomes, which suggests that mammals and birds acquired certain common features of genomic organization.
Collapse
Affiliation(s)
- Alexander E Vinogradov
- Institute of Cytology, Russian Academy of Sciences, Tikhoretsky Avenue 4, St Petersburg 194064, Russia.
| | | |
Collapse
|
24
|
Paparcone R, Morosetti S, Scipioni A, De Santis P. A statistical approach for analyzing structural and regulative information in prokaryotic genomes. Biophys Chem 2006; 120:71-9. [PMID: 16298036 DOI: 10.1016/j.bpc.2005.09.020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2005] [Accepted: 09/11/2005] [Indexed: 12/01/2022]
Abstract
Although DNA is iconized as a straight double helix, it does not exist in this canonical form in biological systems. Instead, it is characterized by sequence dependent structural and dynamic deviations from the monotonous regularity of the canonical B-DNA. Despite the complexity of the system, we showed that DNA structural and dynamics large-scale properties can be predicted starting from the simple knowledge of nucleotide sequence by adopting a statistical approach. The paper reports the statistical analysis of large pools of different prokaryotic genes in terms of the sequence-dependent curvature and flexibility. Conserved features characterize the regions close to the Start Translation Site, which are related to their function in the regulation system. In addition, regular patterns with three-fold periodicity were found in the coding regions. They were reproduced in terms of the nucleotide frequency expected on the basis of the genetic code and the pertinent occurrence of the aminoacid residues.
Collapse
Affiliation(s)
- Raffaella Paparcone
- Dipartimento di Chimica, Università di Roma 'La Sapienza', P.le A. Moro 5, 00185 Rome, Italy
| | | | | | | |
Collapse
|
25
|
Vinogradov AE. Dualism of gene GC content and CpG pattern in regard to expression in the human genome: magnitude versus breadth. Trends Genet 2005; 21:639-43. [PMID: 16202472 DOI: 10.1016/j.tig.2005.09.002] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2005] [Revised: 08/10/2005] [Accepted: 09/09/2005] [Indexed: 11/26/2022]
Abstract
In this article, I show that, in the human genome, the GC content in genes (but not the CpG island in the promoter) is related to the maximum level of gene expression among tissues, whereas the promoter CpG island and gene CpG level are more strongly related to the breadth of expression among tissues. The relevance of gene GC content to expression cannot be a consequence (i.e. a byproduct) of transcription because it does not correlate with expression in the germline. The variation of GC content and CpG level can determine the characteristics of gene expression in a synergistic interplay with transcription-factor-binding sites (mediated by chromatin condensation).
Collapse
|
26
|
Chamary JV, Hurst LD. Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals. Genome Biol 2005; 6:R75. [PMID: 16168082 PMCID: PMC1242210 DOI: 10.1186/gb-2005-6-9-r75] [Citation(s) in RCA: 236] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2005] [Revised: 06/08/2005] [Accepted: 07/20/2005] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND In mammals, contrary to what is usually assumed, recent evidence suggests that synonymous mutations may not be selectively neutral. This position has proven contentious, not least because of the absence of a viable mechanism. Here we test whether synonymous mutations might be under selection owing to their effects on the thermodynamic stability of mRNA, mediated by changes in secondary structure. RESULTS We provide numerous lines of evidence that are all consistent with the above hypothesis. Most notably, by simulating evolution and reallocating the substitutions observed in the mouse lineage, we show that the location of synonymous mutations is non-random with respect to stability. Importantly, the preference for cytosine at 4-fold degenerate sites, diagnostic of selection, can be explained by its effect on mRNA stability. Likewise, by interchanging synonymous codons, we find naturally occurring mRNAs to be more stable than simulant transcripts. Housekeeping genes, whose proteins are under strong purifying selection, are also under the greatest pressure to maintain stability. CONCLUSION Taken together, our results provide evidence that, in mammals, synonymous sites do not evolve neutrally, at least in part owing to selection on mRNA stability. This has implications for the application of synonymous divergence in estimating the mutation rate.
Collapse
Affiliation(s)
- JV Chamary
- Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, UK
| | - Laurence D Hurst
- Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, UK
| |
Collapse
|
27
|
Vinogradov AE. Noncoding DNA, isochores and gene expression: nucleosome formation potential. Nucleic Acids Res 2005; 33:559-63. [PMID: 15673716 PMCID: PMC548339 DOI: 10.1093/nar/gki184] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2004] [Revised: 12/21/2004] [Accepted: 12/21/2004] [Indexed: 12/04/2022] Open
Abstract
The nucleosome formation potential of introns, intergenic spacers and exons of human genes is shown here to negatively correlate with among-tissues breadth of gene expression. The nucleosome formation potential is also found to negatively correlate with the GC content of genomic sequences; the slope of regression line is steeper in exons compared with noncoding DNA (introns and intergenic spacers). The correlation with GC content is independent of sequence length; in turn, the nucleosome formation potential of introns and intergenic spacers positively (albeit weakly) correlates with sequence length independently of GC content. These findings help explain the functional significance of the isochores (regions differing in GC content) in the human genome as a result of optimization of genomic structure for epigenetic complexity and support the notion that noncoding DNA is important for orderly chromatin condensation and chromatin-mediated suppression of tissue-specific genes.
Collapse
|
28
|
Zhang L, Kasif S, Cantor CR, Broude NE. GC/AT-content spikes as genomic punctuation marks. Proc Natl Acad Sci U S A 2004; 101:16855-60. [PMID: 15548610 PMCID: PMC534751 DOI: 10.1073/pnas.0407821101] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Large-scale analysis of the GC-content distribution at the gene level reveals both common features and basic differences in genomes of different groups of species. Sharp changes in GC content are detected at the transcription boundaries for all species analyzed, including human, mouse, rat, chicken, fruit fly, and worm. However, two substantially distinct groups of GC-content profiles can be recognized: warm-blooded vertebrates including human, mouse, rat, and chicken, and invertebrates including fruit fly and worm. In vertebrates, sharp positive and negative spikes of GC content are observed at the transcription start and stop sites, respectively, and there is also a progressive decrease in GC content from the 5' untranslated region to the 3' untranslated region along the gene. In invertebrates, the positive and negative GC-content spikes at the transcription start and stop sites are preceded by spikes of opposite value, and the highest GC content is found in the coding regions of the genes. Cross-correlation analysis indicates high frequencies of GC-content spikes at transcription start and stop sites. The strong conservation of this genomic feature seen in comparisons of the human/mouse and human/rat orthologs, and the clustering of genes with GC-content spikes on chromosomes imply a biological function. The GC-content spikes at transcription boundaries may reflect a general principle of genomic punctuation. Our analysis also provides means for identifying these GC-content spikes in individual genomic sequences.
Collapse
Affiliation(s)
- Lingang Zhang
- Center for Advanced Biotechnology, Boston University, Boston, MA 02215, USA.
| | | | | | | |
Collapse
|
29
|
Chamary JV, Hurst LD. Similar rates but different modes of sequence evolution in introns and at exonic silent sites in rodents: evidence for selectively driven codon usage. Mol Biol Evol 2004; 21:1014-23. [PMID: 15014158 DOI: 10.1093/molbev/msh087] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In mammals divergence at fourfold degenerate sites in codons (K(4)) and intronic sequence (K(i)) are both used to estimate the mutation rate, under the supposition that both evolve neutrally. Does it matter which of these we use? Using either class of sequence can be defended because (1) K(4) is the same as K(i) (at least in rodents) and (2) there is no selectively driven codon usage (hence no systematic selection on third sites). Here we re-examine these findings using 560 introns (for 136 genes) in the mouse-rat comparison, aligned by eye and using a new maximum likelihood protocol. We find that the rate of evolution at fourfold sites and at intronic sites is similar in magnitude, but only after eliminating putatively constrained sites from introns (first introns and sites flanking intron-exon junctions). Any approximate congruence between the two rates is not, however, owing to an underlying similarity in the mode of sequence evolution. Some dinucleotides are hypermutable and differently abundant in exons and introns (e.g., CpGs). More importantly, after controlling for relative abundance, all dinucleotides starting with A or T are more prevalent in mismatches in exons than in introns, whereas C-starting dinucleotides (except CG) are more common in introns. Although C content at intronic sites is lower than at flanking fourfold sites, G content is similar, demonstrating that there exists a strong strand-specific preference for C nucleotides that is unique to exons. Transcription-coupled mutational processes and biased gene conversion cannot explain this, as they should affect introns and flanking exons equally. Therefore, by elimination, we propose this to be strong evidence for selectively driven codon usage in mammals.
Collapse
Affiliation(s)
- Jean-Vincent Chamary
- Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | | |
Collapse
|
30
|
Vinogradov AE. Isochores and tissue-specificity. Nucleic Acids Res 2003; 31:5212-20. [PMID: 12930973 PMCID: PMC212799 DOI: 10.1093/nar/gkg699] [Citation(s) in RCA: 97] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2003] [Revised: 05/11/2003] [Accepted: 07/03/2003] [Indexed: 11/13/2022] Open
Abstract
The housekeeping (ubiquitously expressed) genes in the mammal genome were shown here to be on average slightly GC-richer than tissue-specific genes. Both housekeeping and tissue-specific genes occupy similar ranges of GC content, but the former tend to concentrate in the upper part of the range. In the human genome, tissue-specific genes show two maxima, GC-poor and GC-rich. The strictly tissue-specific human genes tend to concentrate in the GC-poor region; their distribution is left-skewed and thus reciprocal to the distribution of housekeeping genes. The intermediately tissue-specific genes show an intermediate GC content and the right-skewed distribution. Both in the human and mouse, genes specific for some tissues (e.g., parts of the central nervous system) have a higher average GC content than housekeeping genes. Since they are not transcribed in the germ line (in contrast to housekeeping genes), and therefore have a lower probability of inheritable gene conversion, this finding contradicts the biased gene conversion (BGC) explanation for elevated GC content in the heavy isochores of mammal genome. Genes specific for germ-line tissues (ovary, testes) show a low average GC content, which is also in contradiction to the BGC explanation. Both for the total data set and for the most part of tissues taken separately, a weak positive correlation was found between gene GC content and expression level. The fraction of ubiquitously expressed genes is nearly 1.5-fold higher in the mouse than in the human. This suggests that mouse tissues are comparatively less differentiated (on the molecular level), which can be related to a less pronounced isochoric structure of the mouse genome. In each separate tissue (in both species), tissue-specific genes do not form a clear-cut frequency peak (in contrast to housekeeping genes), but constitute a continuum with a gradually increasing degree of tissue-specificity, which probably reflects the path of cell differentiation and/or an independent use of the same protein in several unrelated tissues.
Collapse
Affiliation(s)
- Alexander E Vinogradov
- Institute of Cytology, Russian Academy of Sciences, Tikhoretsky Avenue 4, St Petersburg 194064, Russia.
| |
Collapse
|
31
|
Vinogradov AE. DNA helix: the importance of being GC-rich. Nucleic Acids Res 2003; 31:1838-44. [PMID: 12654999 PMCID: PMC152811 DOI: 10.1093/nar/gkg296] [Citation(s) in RCA: 174] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2003] [Revised: 02/12/2003] [Accepted: 02/12/2003] [Indexed: 11/12/2022] Open
Abstract
A new explanation for the emergence of heavy (GC-rich) isochores is proposed, based on the study of thermostability, bendability, ability to B-Z transition and curvature of the DNA helix. The absolute values of thermostability, bendability and ability to B-Z transition correlated positively with GC content, whereas curvature correlated negatively. The relative values of these parameters were determined as compared to randomized sequences. In genes and intergenic spacers of warm-blooded animals, both the relative bendability and ability to B-Z transition increased with elevation of GC content, whereas the relative thermostability and curvature decreased. The usage of synonymous codons in GC-rich genes was also found to augment bendability and ability to B-Z transition and to reduce thermostability of DNA (as compared to synonymous codons with the same GC content). The analysis of transposable elements (Alu and B2 repeats in the human and mouse) showed that the level of their divergence from the consensus sequence positively correlated with relative bendability and ability to B-Z transition and negatively with relative thermostability. The bendability and ability to B-Z transition are known to relate to open chromatin and active transcription, whereas curvature facilitates chromatin condensation. Because heavy isochores are known to be gene-rich and show a high level of transcription, it is suggested here that isochores arose not as an adaptation to elevated temperature but because of a certain grade of general organization and correspondingly advanced level of genomic organization, reflected in genome structuring, with physical properties of DNA in the gene-rich regions being optimized for active transcription and in the gene-poor regions for chromatin condensation ('transcription/grade' concept).
Collapse
Affiliation(s)
- Alexander E Vinogradov
- Institute of Cytology, Russian Academy of Sciences, Tikhoretsky Avenue 4, St Petersburg 194064, Russia.
| |
Collapse
|