1
|
Khandia R, Pandey MK, Garg R, Khan AA, Baklanov I, Alanazi AM, Nepali P, Gurjar P, Choudhary OP. Molecular insights into codon usage analysis of mitochondrial fission and fusion gene: relevance to neurodegenerative diseases. Ann Med Surg (Lond) 2024; 86:1416-1425. [PMID: 38463054 PMCID: PMC10923317 DOI: 10.1097/ms9.0000000000001725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 01/05/2024] [Indexed: 03/12/2024] Open
Abstract
Mitochondrial dysfunction is the leading cause of neurodegenerative disorders like Alzheimer's disease and Parkinson's disease. Mitochondria is a highly dynamic organelle continuously undergoing the process of fission and fusion for even distribution of components and maintaining proper shape, number, and bioenergetic functionality. A set of genes governs the process of fission and fusion. OPA1, Mfn1, and Mfn2 govern fusion, while Drp1, Fis1, MIEF1, and MIEF2 genes control fission. Determination of specific molecular patterns of transcripts of these genes revealed the impact of compositional constraints on selecting optimal codons. AGA and CCA codons were over-represented, and CCC, GTC, TTC, GGG, ACG were under-represented in the fusion gene set. In contrast, CTG was over-represented, and GCG, CCG, and TCG were under-represented in the fission gene set. Hydropathicity analysis revealed non-polar protein products of both fission and fusion gene set transcripts. AGA codon repeats are an integral part of translational regulation machinery and present a distinct pattern of over-representation and under-representation in different transcripts within the gene sets, suggestive of selective translational force precisely controlling the occurrence of the codon. Out of six synonymous codons, five synonymous codons encoding for leucine were used differently in both gene sets. Hence, forces regulating the occurrence of AGA and five synonymous leucine-encoding codons suggest translational selection. A correlation of mutational bias with gene expression and codon bias and GRAVY and AROMA signifies the selection pressure in both gene sets, while the correlation of compositional bias with gene expression, codon bias, protein properties, and minimum free energy signifies the presence of compositional constraints. More than 25% of codons of both gene sets showed a significant difference in codon usage. The overall analysis shed light on molecular features of gene sets involved in fission and fusion.
Collapse
Affiliation(s)
| | - Megha Katare Pandey
- Translational Medicine Center, All India Institute of Medical Sciences, Bhopal
| | | | - Azmat Ali Khan
- Pharmaceutical Biotechnology Laboratory, Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Igor Baklanov
- Department of Philosophy, North Caucasus Federal University, Stavropol, Russia
| | - Amer M. Alanazi
- Pharmaceutical Biotechnology Laboratory, Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Prakash Nepali
- Government Medical Officer, Bhimad Primary Health Care Center, Government of Nepal, Tanahun, Nepal
| | - Pankaj Gurjar
- Centre for Global Health Research, Saveetha Medical College and Hospital, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, Tamil Nadu, India
- Department of Science and Engineering, Novel Global Community Educational Foundation, Hebersham, NSW, Australia
| | - Om Prakash Choudhary
- Department of Veterinary Anatomy, College of Veterinary Science, Guru Angad Dev Veterinary and Animal Sciences University (GADVASU), Rampura Phul, Bathinda, Punjab, India
| |
Collapse
|
2
|
Li Y, Khandia R, Papadakis M, Alexiou A, Simonov AN, Khan AA. An investigation of codon usage pattern analysis in pancreatitis associated genes. BMC Genom Data 2022; 23:81. [PMID: 36434531 PMCID: PMC9700901 DOI: 10.1186/s12863-022-01089-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2022] [Accepted: 10/10/2022] [Indexed: 11/27/2022] Open
Abstract
BACKGROUND Pancreatitis is an inflammatory disorder resulting from the autoactivation of trypsinogen in the pancreas. The genetic basis of the disease is an old phenomenon, and evidence is accumulating for the involvement of synonymous/non-synonymous codon variants in disease initiation and progression. RESULTS The present study envisaged a panel of 26 genes involved in pancreatitis for their codon choices, compositional analysis, relative dinucleotide frequency, nucleotide disproportion, protein physical properties, gene expression, codon bias, and interrelated of all these factors. In this set of genes, gene length was positively correlated with nucleotide skews and codon usage bias. Codon usage of any gene is dependent upon its AT and GC component; however, AGG, CGT, and CGA encoding for Arg, TCG for Ser, GTC for Val, and CCA for Pro were independent of nucleotide compositions. In addition, Codon GTC showed a correlation with protein properties, isoelectric point, instability index, and frequency of basic amino acids. We also investigated the effect of various evolutionary forces in shaping the codon usage choices of genes. CONCLUSIONS This study will enable us to gain insight into the molecular signatures associated with the disease that might help identify more potential genes contributing to enhanced risk for pancreatitis. All the genes associated with pancreatitis are generally associated with physiological function, and mutations causing loss of function, over or under expression leads to an ailment. Therefore, the present study attempts to envisage the molecular signature in a group of genes that lead to pancreatitis in case of malfunction.
Collapse
Affiliation(s)
- Yuanyang Li
- Third-Grade Pharmacological Laboratory On Chinese Medicine Approved By State Administration of Traditional Chinese Medicine, Medical College of China Three Gorges, Yichang, China ,grid.254148.e0000 0001 0033 6389College of Medical Science, China Three Gorges University, Yichang, China
| | - Rekha Khandia
- grid.411530.20000 0001 0694 3745Department of Biochemistry and Genetics, Barkatullah University, Bhopal, MP 462026 India
| | - Marios Papadakis
- grid.412581.b0000 0000 9024 6397Department of Surgery II, University Hospital Witten-Herdecke, University of Witten-Herdecke, Heusnerstrasse 40, 42283 Wuppertal, Germany
| | - Athanasios Alexiou
- Department of Science and Engineering, Novel Global Community Educational Foundation, Hebersham, Australia ,AFNP Med Austria, Vienna, Austria
| | | | - Azmat Ali Khan
- grid.56302.320000 0004 1773 5396Pharmaceutical Biotechnology Laboratory, Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, Riyadh, 11451 Saudi Arabia
| |
Collapse
|
3
|
Saayman X, Esashi F. Breaking the paradigm: early insights from mammalian DNA breakomes. FEBS J 2022; 289:2409-2428. [PMID: 33792193 PMCID: PMC9451923 DOI: 10.1111/febs.15849] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 03/04/2021] [Accepted: 03/29/2021] [Indexed: 12/13/2022]
Abstract
DNA double-strand breaks (DSBs) can result from both exogenous and endogenous sources and are potentially toxic lesions to the human genome. If improperly repaired, DSBs can threaten genome integrity and contribute to premature ageing, neurodegenerative disorders and carcinogenesis. Through decades of work on genome stability, it has become evident that certain regions of the genome are inherently more prone to breakage than others, known as genome instability hotspots. Recent advancements in sequencing-based technologies now enable the profiling of genome-wide distributions of DSBs, also known as breakomes, to systematically map these instability hotspots. Here, we review the application of these technologies and their implications for our current understanding of the genomic regions most likely to drive genome instability. These breakomes ultimately highlight both new and established breakage hotspots including actively transcribed regions, loop boundaries and early-replicating regions of the genome. Further, these breakomes challenge the paradigm that DNA breakage primarily occurs in hard-to-replicate regions. With these advancements, we begin to gain insights into the biological mechanisms both invoking and protecting against genome instability.
Collapse
Affiliation(s)
- Xanita Saayman
- Sir William Dunn School of Pathology, University of Oxford, UK
| | - Fumiko Esashi
- Sir William Dunn School of Pathology, University of Oxford, UK
| |
Collapse
|
4
|
Ueberham U, Arendt T. Genomic Indexing by Somatic Gene Recombination of mRNA/ncRNA - Does It Play a Role in Genomic Mosaicism, Memory Formation, and Alzheimer's Disease? Front Genet 2020; 11:370. [PMID: 32411177 PMCID: PMC7200996 DOI: 10.3389/fgene.2020.00370] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Accepted: 03/25/2020] [Indexed: 12/26/2022] Open
Abstract
Recent evidence indicates that genomic individuality of neurons, characterized by DNA-content variation, is a common if not universal phenomenon in the human brain that occurs naturally but can also show aberrancies that have been linked to the pathomechanism of Alzheimer’s disease and related neurodegenerative disorders. Etiologically, this genomic mosaic has been suggested to arise from defects of cell cycle regulation that may occur either during brain development or in the mature brain after terminal differentiation of neurons. Here, we aim to draw attention towards another mechanism that can give rise to genomic individuality of neurons, with far-reaching consequences. This mechanism has its origin in the transcriptome rather than in replication defects of the genome, i.e., somatic gene recombination of RNA. We continue to develop the concept that somatic gene recombination of RNA provides a physiological process that, through integration of intronless mRNA/ncRNA into the genome, allows a particular functional state at the level of the individual neuron to be indexed. By insertion of defined RNAs in a somatic recombination process, the presence of specific mRNA transcripts within a definite temporal context can be “frozen” and can serve as an index that can be recalled at any later point in time. This allows information related to a specific neuronal state of differentiation and/or activity relevant to a memory trace to be fixed. We suggest that this process is used throughout the lifetime of each neuron and might have both advantageous and deleterious consequences.
Collapse
Affiliation(s)
- Uwe Ueberham
- Paul Flechsig Institute for Brain Research, University of Leipzig, Leipzig, Germany
| | - Thomas Arendt
- Paul Flechsig Institute for Brain Research, University of Leipzig, Leipzig, Germany
| |
Collapse
|
5
|
Georgakopoulos-Soares I, Koh G, Momen SE, Jiricny J, Hemberg M, Nik-Zainal S. Transcription-coupled repair and mismatch repair contribute towards preserving genome integrity at mononucleotide repeat tracts. Nat Commun 2020; 11:1980. [PMID: 32332764 PMCID: PMC7181645 DOI: 10.1038/s41467-020-15901-w] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 03/27/2020] [Indexed: 01/07/2023] Open
Abstract
The mechanisms that underpin how insertions or deletions (indels) become fixed in DNA have primarily been ascribed to replication-related and/or double-strand break (DSB)-related processes. Here, we introduce a method to evaluate indels, orientating them relative to gene transcription. In so doing, we reveal a number of surprising findings: First, there is a transcriptional strand asymmetry in the distribution of mononucleotide repeat tracts in the reference human genome. Second, there is a strong transcriptional strand asymmetry of indels across 2,575 whole genome sequenced human cancers. We suggest that this is due to the activity of transcription-coupled nucleotide excision repair (TC-NER). Furthermore, TC-NER interacts with mismatch repair (MMR) under physiological conditions to produce strand bias. Finally, we show how insertions and deletions differ in their dependencies on these repair pathways. Our analytical approach reveals insights into the contribution of DNA repair towards indel mutagenesis in human cells.
Collapse
Affiliation(s)
- Ilias Georgakopoulos-Soares
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Gene Koh
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Academic Department of Medical Genetics, The Clinical School, University of Cambridge, Cambridge, CB2 0QQ, UK
- MRC Cancer Unit, The Clinical School, University of Cambridge, Cambridge, CB2 0XZ, UK
| | - Sophie E Momen
- Academic Department of Medical Genetics, The Clinical School, University of Cambridge, Cambridge, CB2 0QQ, UK
- MRC Cancer Unit, The Clinical School, University of Cambridge, Cambridge, CB2 0XZ, UK
| | - Josef Jiricny
- Institute of Molecular Life Sciences, University of Zurich and Institute of Biochemistry, ETH Zurich, CH-8093, Zurich, Switzerland
| | - Martin Hemberg
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK.
| | - Serena Nik-Zainal
- Academic Department of Medical Genetics, The Clinical School, University of Cambridge, Cambridge, CB2 0QQ, UK.
- MRC Cancer Unit, The Clinical School, University of Cambridge, Cambridge, CB2 0XZ, UK.
| |
Collapse
|
6
|
Vitelli V, Galbiati A, Iannelli F, Pessina F, Sharma S, d'Adda di Fagagna F. Recent Advancements in DNA Damage-Transcription Crosstalk and High-Resolution Mapping of DNA Breaks. Annu Rev Genomics Hum Genet 2017; 18:87-113. [PMID: 28859573 DOI: 10.1146/annurev-genom-091416-035314] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Until recently, DNA damage arising from physiological DNA metabolism was considered a detrimental by-product for cells. However, an increasing amount of evidence has shown that DNA damage could have a positive role in transcription activation. In particular, DNA damage has been detected in transcriptional elements following different stimuli. These physiological DNA breaks are thought to be instrumental for the correct expression of genomic loci through different mechanisms. In this regard, although a plethora of methods are available to precisely map transcribed regions and transcription start sites, commonly used techniques for mapping DNA breaks lack sufficient resolution and sensitivity to draw a robust correlation between DNA damage generation and transcription. Recently, however, several methods have been developed to map DNA damage at single-nucleotide resolution, thus providing a new set of tools to correlate DNA damage and transcription. Here, we review how DNA damage can positively regulate transcription initiation, the current techniques for mapping DNA breaks at high resolution, and how these techniques can benefit future studies of DNA damage and transcription.
Collapse
Affiliation(s)
- Valerio Vitelli
- FIRC Institute of Molecular Oncology (IFOM), Milan 20139, Italy;
| | | | - Fabio Iannelli
- FIRC Institute of Molecular Oncology (IFOM), Milan 20139, Italy;
| | - Fabio Pessina
- FIRC Institute of Molecular Oncology (IFOM), Milan 20139, Italy;
| | - Sheetal Sharma
- FIRC Institute of Molecular Oncology (IFOM), Milan 20139, Italy;
| | - Fabrizio d'Adda di Fagagna
- FIRC Institute of Molecular Oncology (IFOM), Milan 20139, Italy; .,Istituto di Genetica Molecolare, Consiglio Nazionale delle Ricerche (CNR), Pavia 27100, Italy
| |
Collapse
|
7
|
Gene expression, nucleotide composition and codon usage bias of genes associated with human Y chromosome. Genetica 2017; 145:295-305. [PMID: 28421323 DOI: 10.1007/s10709-017-9965-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2016] [Accepted: 04/08/2017] [Indexed: 10/19/2022]
Abstract
Analysis of codon usage pattern is important to understand the genetic and evolutionary characteristics of genomes. We have used bioinformatic approaches to analyze the codon usage bias (CUB) of the genes located in human Y chromosome. Codon bias index (CBI) indicated that the overall extent of codon usage bias was low. The relative synonymous codon usage (RSCU) analysis suggested that approximately half of the codons out of 59 synonymous codons were most frequently used, and possessed a T or G at the third codon position. The codon usage pattern was different in different genes as revealed from correspondence analysis (COA). A significant correlation between effective number of codons (ENC) and various GC contents suggests that both mutation pressure and natural selection affect the codon usage pattern of genes located in human Y chromosome. In addition, Y-linked genes have significant difference in GC contents at the second and third codon positions, expression level, and codon usage pattern of some codons like the SPANX genes in X chromosome.
Collapse
|
8
|
Price N, Graur D. Are Synonymous Sites in Primates and Rodents Functionally Constrained? J Mol Evol 2015; 82:51-64. [PMID: 26563252 DOI: 10.1007/s00239-015-9719-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2015] [Accepted: 11/04/2015] [Indexed: 11/28/2022]
Abstract
It has been claimed that synonymous sites in mammals are under selective constraint. Furthermore, in many studies the selective constraint at such sites in primates was claimed to be more stringent than that in rodents. Given the larger effective population sizes in rodents than in primates, the theoretical expectation is that selection in rodents would be more effective than that in primates. To resolve this contradiction between expectations and observations, we used processed pseudogenes as a model for strict neutral evolution, and estimated selective constraint on synonymous sites using the rate of substitution at pseudosynonymous and pseudononsynonymous sites in pseudogenes as the neutral expectation. After controlling for the effects of GC content, our results were similar to those from previous studies, i.e., synonymous sites in primates exhibited evidence for higher selective constraint that those in rodents. Specifically, our results indicated that in primates up to 24% of synonymous sites could be under purifying selection, while in rodents synonymous sites evolved neutrally. To further control for shifts in GC content, we estimated selective constraint at fourfold degenerate sites using a maximum parsimony approach. This allowed us to estimate selective constraint using mutational patterns that cause a shift in GC content (GT ↔ TG, CT ↔ TC, GA ↔ AG, and CA ↔ AC) and ones that do not (AT ↔ TA and CG ↔ GC). Using this approach, we found that synonymous sites evolve neutrally in both primates and rodents. Apparent deviations from neutrality were caused by a higher rate of C → A and C → T mutations in pseudogenes. Such differences are most likely caused by the shift in GC content experienced by pseudogenes. We conclude that previous estimates according to which 20-40% of synonymous sites in primates were under selective constraint were most likely artifacts of the biased pattern of mutation.
Collapse
Affiliation(s)
- Nicholas Price
- Department of Bioagricultural Sciences and Pest Management, Colorado State University, Fort Collins, CO, 80523, USA.
| | - Dan Graur
- Department of Biology and Biochemistry, University of Houston, Houston, TX, 77204-5001, USA
| |
Collapse
|
9
|
Sandoval IM, Price BA, Gross AK, Chan F, Sammons JD, Wilson JH, Wensel TG. Abrupt onset of mutations in a developmentally regulated gene during terminal differentiation of post-mitotic photoreceptor neurons in mice. PLoS One 2014; 9:e108135. [PMID: 25264759 PMCID: PMC4180260 DOI: 10.1371/journal.pone.0108135] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2014] [Accepted: 08/18/2014] [Indexed: 11/22/2022] Open
Abstract
For sensitive detection of rare gene repair events in terminally differentiated photoreceptors, we generated a knockin mouse model by replacing one mouse rhodopsin allele with a form of the human rhodopsin gene that causes a severe, early-onset form of retinitis pigmentosa. The human gene contains a premature stop codon at position 344 (Q344X), cDNA encoding the enhanced green fluorescent protein (EGFP) at its 3′ end, and a modified 5′ untranslated region to reduce translation rate so that the mutant protein does not induce retinal degeneration. Mutations that eliminate the stop codon express a human rhodopsin-EGFP fusion protein (hRho-GFP), which can be readily detected by fluorescence microscopy. Spontaneous mutations were observed at a frequency of about one per retina; in every case, they gave rise to single fluorescent rod cells, indicating that each mutation occurred during or after the last mitotic division. Additionally, the number of fluorescent rods did not increase with age, suggesting that the rhodopsin gene in mature rod cells is less sensitive to mutation than it is in developing rods. Thus, there is a brief developmental window, coinciding with the transcriptional activation of the rhodopsin locus, in which somatic mutations of the rhodopsin gene abruptly begin to appear.
Collapse
Affiliation(s)
- Ivette M. Sandoval
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Houston, Texas, United States of America
| | - Brandee A. Price
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Alecia K. Gross
- Department of Vision Science, University of Alabama Birmingham, Birmingham, Alabama, United States of America
| | - Fung Chan
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Houston, Texas, United States of America
| | - Joshua D. Sammons
- Department of Vision Science, University of Alabama Birmingham, Birmingham, Alabama, United States of America
| | - John H. Wilson
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Houston, Texas, United States of America
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Theodore G. Wensel
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
10
|
Abstract
The mammalian genome is extensively transcribed, a large fraction of which is divergent transcription from promoters and enhancers that is tightly coupled with active gene transcription. Here, we propose that divergent transcription may shape the evolution of the genome by new gene origination.
Collapse
Affiliation(s)
- Xuebing Wu
- David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Computational and Systems Biology Graduate Program, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | |
Collapse
|
11
|
Olovnikov AM. Why do primordial germ cells migrate through an embryo and what does it mean for biological evolution? BIOCHEMISTRY (MOSCOW) 2013; 78:1190-9. [PMID: 24237154 DOI: 10.1134/s0006297913100143] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
An explanation of the role of primordial germ cell (PGC) migration during embryogenesis is proposed. According to the hypothesis, various PGCs during their migrations through an early embryo are contacting with anlagen of organs and acquiring nonidentical organ specificities. An individual PGC gets such an organ specificity, which corresponds to specificity of the first anlage with which this PGC has the first contact. As a result, the cellular descendants of PGCs (oocytes or spermatocytes) will express nonidentical organ-specific receptors, hence becoming functionally heterogeneous. Therefore, each clone of germ cells becomes capable of recognizing specifically the molecular signals that correspond only to "its" organ of the body. Such signals are produced by the body's organ when it functions in an extreme mode. Signals from the "exercising" organ of the body are delivered to the gonad only via the brain retransmitter, which is composed of neurons grouped as virtual organs of a homunculus. Homunculi are so-called somatotopic maps of the skeletomotor and other parts of the body represented in the brain. Signals, as complexes of regulatory RNAs and proteins, are transported from the "exercising" organ of the body to the corresponding virtual organ of the homunculus where they are processed and then forwarded to the gonad. The organ-specific signal will be selectively recognized by certain gametocytes according to their organ specificity, and then it will initiate the directed epimutation in the gametocyte genome. The nonrandomness of the gene order in chromosomes, that is the synteny and genetic map, is controlled by the so-called creatron that consolidates the soma and germline into a united system, providing the possibility of evolutionary responses of an organism to environmental influences.
Collapse
Affiliation(s)
- A M Olovnikov
- Institute of Biochemical Physics, Russian Academy of Sciences, Moscow, 125319, Russia.
| |
Collapse
|
12
|
Zhang Z, Yu J. Does the genetic code have a eukaryotic origin? GENOMICS PROTEOMICS & BIOINFORMATICS 2013; 11:41-55. [PMID: 23402863 PMCID: PMC4357656 DOI: 10.1016/j.gpb.2013.01.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Revised: 01/09/2013] [Accepted: 01/11/2013] [Indexed: 11/29/2022]
Abstract
In the RNA world, RNA is assumed to be the dominant macromolecule performing most, if not all, core “house-keeping” functions. The ribo-cell hypothesis suggests that the genetic code and the translation machinery may both be born of the RNA world, and the introduction of DNA to ribo-cells may take over the informational role of RNA gradually, such as a mature set of genetic code and mechanism enabling stable inheritance of sequence and its variation. In this context, we modeled the genetic code in two content variables—GC and purine contents—of protein-coding sequences and measured the purine content sensitivities for each codon when the sensitivity (% usage) is plotted as a function of GC content variation. The analysis leads to a new pattern—the symmetric pattern—where the sensitivity of purine content variation shows diagonally symmetry in the codon table more significantly in the two GC content invariable quarters in addition to the two existing patterns where the table is divided into either four GC content sensitivity quarters or two amino acid diversity halves. The most insensitive codon sets are GUN (valine) and CAN (CAR for asparagine and CAY for aspartic acid) and the most biased amino acid is valine (always over-estimated) followed by alanine (always under-estimated). The unique position of valine and its codons suggests its key roles in the final recruitment of the complete codon set of the canonical table. The distinct choice may only be attributable to sequence signatures or signals of splice sites for spliceosomal introns shared by all extant eukaryotes.
Collapse
Affiliation(s)
- Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | | |
Collapse
|
13
|
The transcript-centric mutations in human genomes. GENOMICS PROTEOMICS & BIOINFORMATICS 2012; 10:11-22. [PMID: 22449397 PMCID: PMC5054492 DOI: 10.1016/s1672-0229(11)60029-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/06/2012] [Accepted: 02/15/2012] [Indexed: 01/30/2023]
Abstract
Since the human genome is mostly transcribed, genetic variations must exhibit sequence signatures reflecting the relationship between transcription processes and chromosomal structures as we have observed in unicellular organisms. In this study, a set of 646 ubiquitous expression-invariable genes (EIGs) which are present in germline cells were defined and examined based on RNA-sequencing data from multiple high-throughput transcriptomic data. We demonstrated a relationship between gene expression level and transcript-centric mutations in the human genome based on single nucleotide polymorphism (SNP) data. A significant positive correlation was shown between gene expression and mutation, where highly-expressed genes accumulate more mutations than lowly-expressed genes. Furthermore, we found four major types of transcript-centric mutations: C→T, A→G, C→G, and G→T in human genomes and identified a negative gradient of the sequence variations aligning from the 5′ end to the 3′ end of the transcription units (TUs). The periodical occurrence of these genetic variations across TUs is associated with nucleosome phasing. We propose that transcript-centric mutations are one of the major driving forces for gene and genome evolution along with creation of new genes, gene/genome duplication, and horizontal gene transfer.
Collapse
|
14
|
Distinct contributions of replication and transcription to mutation rate variation of human genomes. GENOMICS PROTEOMICS & BIOINFORMATICS 2012; 10:4-10. [PMID: 22449396 PMCID: PMC5054443 DOI: 10.1016/s1672-0229(11)60028-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/06/2012] [Accepted: 02/15/2012] [Indexed: 01/29/2023]
Abstract
Here, we evaluate the contribution of two major biological processes—DNA replication and transcription—to mutation rate variation in human genomes. Based on analysis of the public human tissue transcriptomics data, high-resolution replicating map of Hela cells and dbSNP data, we present significant correlations between expression breadth, replication time in local regions and SNP density. SNP density of tissue-specific (TS) genes is significantly higher than that of housekeeping (HK) genes. TS genes tend to locate in late-replicating genomic regions and genes in such regions have a higher SNP density compared to those in early-replication regions. In addition, SNP density is found to be positively correlated with expression level among HK genes. We conclude that the process of DNA replication generates stronger mutational pressure than transcription-associated biological processes do, resulting in an increase of mutation rate in TS genes while having weaker effects on HK genes. In contrast, transcription-associated processes are mainly responsible for the accumulation of mutations in highly-expressed HK genes.
Collapse
|
15
|
Lin Q, Cui P, Ding F, Hu S, Yu J. Replication-Associated Mutational Pressure (RMP) Governs Strand-Biased Compositional Asymmetry (SCA) and Gene Organization in Animal Mitochondrial Genomes. Curr Genomics 2012; 13:28-36. [PMID: 22942673 PMCID: PMC3269014 DOI: 10.2174/138920212799034811] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2011] [Revised: 10/01/2011] [Accepted: 10/04/2011] [Indexed: 11/30/2022] Open
Abstract
The nucleotide composition of the light (L-) and heavy (H-) strands of animal mitochondrial genomes is known to exhibit strand-biased compositional asymmetry (SCA). One of the possibilities is the existence of a replication-associated mutational pressure (RMP) that may introduce characteristic nucleotide changes among mitochondrial genomes of different animal lineages. Here, we discuss the influence of RMP on nucleotide and amino acid compositions as well as gene organization. Among animal mitochondrial genomes, RMP may represent the major force that compels the evolution of mitochondrial protein-coding genes, coupled with other process-based selective pressures, such as on components of translation machinery— tRNAs and their anticodons. Through comparative analyses of sequenced mitochondrial genomes among diverse animal lineages and literature reviews, we suggest a strong RMP effect, observed among invertebrate mitochondrial genes as compared to those of vertebrates, that is either a result of positive selection on the invertebrate or a relaxed selective pressure on the vertebrate mitochondrial genes.
Collapse
Affiliation(s)
- Qiang Lin
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, 100029 Beijing, China
| | | | | | | | | |
Collapse
|
16
|
Baker A, Julienne H, Chen CL, Audit B, d'Aubenton-Carafa Y, Thermes C, Arneodo A. Linking the DNA strand asymmetry to the spatio-temporal replication program. I. About the role of the replication fork polarity in genome evolution. THE EUROPEAN PHYSICAL JOURNAL. E, SOFT MATTER 2012; 35:92. [PMID: 23001787 DOI: 10.1140/epje/i2012-12092-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2012] [Revised: 08/08/2012] [Accepted: 08/21/2012] [Indexed: 06/01/2023]
Abstract
Two key cellular processes, namely transcription and replication, require the opening of the DNA double helix and act differently on the two DNA strands, generating different mutational patterns (mutational asymmetry) that may result, after long evolutionary time, in different nucleotide compositions on the two DNA strands (compositional asymmetry). We elaborate on the simplest model of neutral substitution rates that takes into account the strand asymmetries generated by the transcription and replication processes. Using perturbation theory, we then solve the time evolution of the DNA composition under strand-asymmetric substitution rates. In our minimal model, the compositional and substitutional asymmetries are predicted to decompose into a transcription- and a replication-associated components. The transcription-associated asymmetry increases in magnitude with transcription rate and changes sign with gene orientation while the replication-associated asymmetry is proportional to the replication fork polarity. These results are confirmed experimentally in the human genome, using substitution rates obtained by aligning the human and chimpanzee genomes using macaca and orangutan as outgroups, and replication fork polarity determined in the HeLa cell line as estimated from the derivative of the mean replication timing. When further investigating the dynamics of compositional skew evolution, we show that it is not at equilibrium yet and that its evolution is an extremely slow process with characteristic time scales of several hundred Myrs.
Collapse
Affiliation(s)
- A Baker
- Université de Lyon, Lyon, France
| | | | | | | | | | | | | |
Collapse
|
17
|
Zhang Z, Yu J. The pendulum model for genome compositional dynamics: from the four nucleotides to the twenty amino acids. GENOMICS PROTEOMICS & BIOINFORMATICS 2012; 10:175-80. [PMID: 23084772 PMCID: PMC5054704 DOI: 10.1016/j.gpb.2012.08.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2012] [Accepted: 08/02/2012] [Indexed: 12/29/2022]
Abstract
The genetic code serves as one of the natural links for life’s two conceptual frameworks—the informational and operational tracks—bridging the nucleotide sequence of DNA and RNA to the amino acid sequence of protein and thus its structure and function. On the informational track, DNA and its four building blocks have four basic variables: order, length, GC and purine contents; the latter two exhibit unique characteristics in prokaryotic genomes where protein-coding sequences dominate. Bridging the two tracks, tRNAs and their aminoacyl tRNA synthases that interpret each codon—nucleotide triplet, together with ribosomes, form a complex machinery that translates genetic information encoded on the messenger RNAs into proteins. On the operational track, proteins are selected in a context of cellular and organismal functions constantly. The principle of such a functional selection is to minimize the damage caused by sequence alteration in a seemingly random fashion at the nucleotide level and its function-altering consequence at the protein level; the principle also suggests that there must be complex yet sophisticated mechanisms to protect molecular interactions and cellular processes for cells and organisms from the damage in addition to both immediate or short-term eliminations and long-term selections. The two-century study of selection at species and population levels has been leading a way to understand rules of inheritance and evolution at molecular levels along the informational track, while ribogenomics, epigenomics and other operationally-defined omics (such as the metabolite-centric metabolomics) have been ushering biologists into the new millennium along the operational track.
Collapse
Affiliation(s)
- Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | | |
Collapse
|
18
|
Abstract
The codon composition of coding sequences plays an important role in the regulation of gene expression. Herein, we report systematic differences in the usage of synonymous codons among Arabidopsis thaliana genes that are expressed specifically in distinct tissues. Although we observed that both regionally and transcriptionally associated mutational biases were associated significantly with codon bias, they could not explain the observed differences fully. Similarly, given that transcript abundances did not account for the differences in codon usage, it is unlikely that selection for translational efficiency can account exclusively for the observed codon bias. Thus, we considered the possible evolution of codon bias as an adaptive response to the different abundances of tRNAs in different tissues. Our analysis demonstrated that in some cases, codon usage in genes that were expressed in a broad range of tissues was influenced primarily by the tissue in which the gene was expressed maximally. On the basis of this finding we propose that genes that are expressed in certain tissues might show a tissue-specific compositional signature in relation to codon usage. These findings might have implications for the design of transgenes in relation to optimizing their expression.
Collapse
|
19
|
Abstract
Alterations in genome sequence and structure contribute to somatic disease, affect the fitness of subsequent generations and drive evolutionary processes. The crucial roles of highly accurate replication and efficient repair in maintaining overall genome integrity are well-known, but the more localized stability costs that are associated with transcribing DNA into RNA molecules are less appreciated. Here we review the diverse ways in which the essential process of transcription alters the underlying DNA template and thereby modifies the genetic landscape.
Collapse
|
20
|
McLean MA, Tirosh I. Opposite GC skews at the 5' and 3' ends of genes in unicellular fungi. BMC Genomics 2011; 12:638. [PMID: 22208287 PMCID: PMC3315797 DOI: 10.1186/1471-2164-12-638] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Accepted: 12/30/2011] [Indexed: 11/24/2022] Open
Abstract
Background GC-skews have previously been linked to transcription in some eukaryotes. They have been associated with transcription start sites, with the coding strand G-biased in mammals and C-biased in fungi and invertebrates. Results We show a consistent and highly significant pattern of GC-skew within genes of almost all unicellular fungi. The pattern of GC-skew is asymmetrical: the coding strand of genes is typically C-biased at the 5' ends but G-biased at the 3' ends, with intermediate skews at the middle of genes. Thus, the initiation, elongation, and termination phases of transcription are associated with different skews. This pattern influences the encoded proteins by generating differential usage of amino acids at the 5' and 3' ends of genes. These biases also affect fourfold-degenerate positions and extend into promoters and 3' UTRs, indicating that skews cannot be accounted by selection for protein function or translation. Conclusions We propose two explanations, the mutational pressure hypothesis, and the adaptive hypothesis. The mutational pressure hypothesis is that different co-factors bind to RNA pol II at different phases of transcription, producing different mutational regimes. The adaptive hypothesis is that cytidine triphosphate deficiency may lead to C-avoidance at the 3' ends of transcripts to control the flow of RNA pol II molecules and reduce their frequency of collisions.
Collapse
Affiliation(s)
- Malcolm A McLean
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel.
| | | |
Collapse
|
21
|
Voets AM, van den Bosch BJC, Stassen AP, Hendrickx AT, Hellebrekers DM, Van Laer L, Van Eyken E, Van Camp G, Pyle A, Baudouin SV, Chinnery PF, Smeets HJM. Large scale mtDNA sequencing reveals sequence and functional conservation as major determinants of homoplasmic mtDNA variant distribution. Mitochondrion 2011; 11:964-72. [PMID: 21946566 DOI: 10.1016/j.mito.2011.09.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2010] [Revised: 04/19/2011] [Accepted: 09/09/2011] [Indexed: 02/07/2023]
Abstract
The mitochondrial DNA (mtDNA) is highly variable, containing large numbers of pathogenic mutations and neutral polymorphisms. The spectrum of homoplasmic mtDNA variation was characterized in 730 subjects and compared with known pathogenic sites. The frequency and distribution of variants in protein coding genes were inversely correlated with conservation at the amino acid level. Analysis of tRNA secondary structures indicated a preference of variants for the loops and some acceptor stem positions. This comprehensive overview of mtDNA variants distinguishes between regions and positions which are likely not critical, mainly conserved regions with pathogenic mutations and essential regions containing no mutations at all.
Collapse
Affiliation(s)
- A M Voets
- Department of Genetics and Cell Biology, Maastricht University, Maastricht, The Netherlands
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Unexpected functional similarities between gatekeeper tumour suppressor genes and proto-oncogenes revealed by systems biology. J Hum Genet 2011; 56:369-76. [PMID: 21368766 DOI: 10.1038/jhg.2011.21] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Familial tumor suppressor genes comprise two subgroups: caretaker genes (CTs) that repair DNA, and gatekeeper genes (GKs) that trigger cell death. Since GKs may also induce cell cycle delay and thus enhance cell survival by facilitating DNA repair, we hypothesized that the prosurvival phenotype of GKs could be selected during cancer progression, and we used a multivariable systems biology approach to test this. We performed multidimensional data analysis, non-negative matrix factorization and logistic regression to compare the features of GKs with those of their putative antagonists, the proto-oncogenes (POs), as well as with control groups of CTs and functionally unrelated congenital heart disease genes (HDs). GKs and POs closely resemble each other, but not CTs or HDs, in terms of gene structure (P<0.001), expression level and breadth (P<0.01), DNA methylation signature (P<0.001) and evolutionary rate (P<0.001). The similar selection pressures and epigenetic trajectories of GKs and POs so implied suggest a common functional attribute that is strongly negatively selected-that is, a shared phenotype that enhances cell survival. The counterintuitive finding of similar evolutionary pressures affecting GKs and POs raises an intriguing possibility: namely, that cancer microevolution is accelerated by an epistatic cascade in which upstream suppressor gene defects subvert the normal bifunctionality of wild-type GKs by constitutively shifting the phenotype away from apoptosis towards survival. If correct, this interpretation would explain the hitherto unexplained phenomenon of frequent wild-type GK (for example, p53) overexpression in tumors.
Collapse
|
23
|
Chen CL, Duquenne L, Audit B, Guilbaud G, Rappailles A, Baker A, Huvet M, d'Aubenton-Carafa Y, Hyrien O, Arneodo A, Thermes C. Replication-associated mutational asymmetry in the human genome. Mol Biol Evol 2011; 28:2327-37. [PMID: 21368316 DOI: 10.1093/molbev/msr056] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
During evolution, mutations occur at rates that can differ between the two DNA strands. In the human genome, nucleotide substitutions occur at different rates on the transcribed and non-transcribed strands that may result from transcription-coupled repair. These mutational asymmetries generate transcription-associated compositional skews. To date, the existence of such asymmetries associated with replication has not yet been established. Here, we compute the nucleotide substitution matrices around replication initiation zones identified as sharp peaks in replication timing profiles and associated with abrupt jumps in the compositional skew profile. We show that the substitution matrices computed in these regions fully explain the jumps in the compositional skew profile when crossing initiation zones. In intergenic regions, we observe mutational asymmetries measured as differences between complementary substitution rates; their sign changes when crossing initiation zones. These mutational asymmetries are unlikely to result from cryptic transcription but can be explained by a model based on replication errors and strand-biased repair. In transcribed regions, mutational asymmetries associated with replication superimpose on the previously described mutational asymmetries associated with transcription. We separate the substitution asymmetries associated with both mechanisms, which allows us to determine for the first time in eukaryotes, the mutational asymmetries associated with replication and to reevaluate those associated with transcription. Replication-associated mutational asymmetry may result from unequal rates of complementary base misincorporation by the DNA polymerases coupled with DNA mismatch repair (MMR) acting with different efficiencies on the leading and lagging strands. Replication, acting in germ line cells during long evolutionary times, contributed equally with transcription to produce the present abrupt jumps in the compositional skew. These results demonstrate that DNA replication is one of the major processes that shape human genome composition.
Collapse
Affiliation(s)
- Chun-Long Chen
- Centre de Génétique Moléculaire, Centre National de la Recherche Scientifique (CNRS), Gif-sur-Yvette, France
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Abstract
Despite their name, synonymous mutations have significant consequences for cellular processes in all taxa. As a result, an understanding of codon bias is central to fields as diverse as molecular evolution and biotechnology. Although recent advances in sequencing and synthetic biology have helped to resolve longstanding questions about codon bias, they have also uncovered striking patterns that suggest new hypotheses about protein synthesis. Ongoing work to quantify the dynamics of initiation and elongation is as important for understanding natural synonymous variation as it is for designing transgenes in applied contexts.
Collapse
Affiliation(s)
- Joshua B Plotkin
- Department of Biology and Program in Applied Mathematics and Computational Science, University of Pennsylvania, 433 South University Avenue, Philadelphia, Pennsylvania 19104, USA.
| | | |
Collapse
|
25
|
Weber CC, Hurst LD. Intronic AT skew is a defendable proxy for germline transcription but does not predict crossing-over or protein evolution rates in Drosophila melanogaster. J Mol Evol 2010; 71:415-26. [PMID: 20938653 DOI: 10.1007/s00239-010-9395-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2010] [Accepted: 09/17/2010] [Indexed: 01/28/2023]
Abstract
Recent evidence suggests that germline transcription may affect both protein evolutionary rates, possibly mediated by repair processes, and recombination rates, possibly mediated by chromatin and epigenetic modification. Here, we test these propositions in Drosophila melanogaster. The challenge for such analyses is to provide defendable measures of germline gene expression. Intronic AT skew is a good candidate measure as it is thought to be a consequence, at least in part, of transcription-coupled repair. Prior evidence suggests that intronic AT skew in D. melanogaster is not affected by proximity to intron extremities and differs between transcribed DNA and flanking sequence. We now also establish that intronic AT skew is a defendable proxy for germline expression as (a) it is more similar than expected by chance between introns of the same gene (which is not accounted for by physical proximity), (b) is correlated with male germline expression, and (c) is more pronounced in broadly expressed genes. Furthermore, (d) a trend for intronic skew to differ between 3' and 5' ends of genes is particular to broadly expressed genes. Finally, (e) controlling for physical distance, introns of proximate genes are most different in skew if they have different tissue specificity. We find that intronic AT skew, employed as a proxy for germline transcription, correlates neither with recombination rates nor with the rate of protein evolution. We conclude that there is no prima facie evidence that germline expression modulates recombination rates or monotonically affects protein evolution rates in D. melanogaster.
Collapse
Affiliation(s)
- Claudia C Weber
- Department of Biology and Biochemistry, University of Bath, Bath, UK
| | | |
Collapse
|
26
|
Abstract
Transcribed regions in the human genome differ from adjacent intergenic regions in transposable element density, crossover rates, and asymmetric substitution and sequence composition patterns. We tested whether these differences reflect selection or are instead a byproduct of germline transcription, using publicly available gene expression data from a variety of germline and somatic tissues. Crossover rate shows a strong negative correlation with gene expression in meiotic tissues, suggesting that crossover is inhibited by transcription. Strand-biased composition (G+T content) and A → G versus T → C substitution asymmetry are both positively correlated with germline gene expression. We find no evidence for a strand bias in allele frequency data, implying that the substitution asymmetry reflects a mutation rather than a fixation bias. The density of transposable elements is positively correlated with germline expression, suggesting that such elements preferentially insert into regions that are actively transcribed. For each of the features examined, our analyses favor a nonselective explanation for the observed trends and point to the role of germline gene expression in shaping the mammalian genome.
Collapse
Affiliation(s)
- Graham McVicker
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | | |
Collapse
|
27
|
Abasic sites in the transcribed strand of yeast DNA are removed by transcription-coupled nucleotide excision repair. Mol Cell Biol 2010; 30:3206-15. [PMID: 20421413 DOI: 10.1128/mcb.00308-10] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Abasic (AP) sites are potent blocks to DNA and RNA polymerases, and their repair is essential for maintaining genome integrity. Although AP sites are efficiently dealt with through the base excision repair (BER) pathway, genetic studies suggest that repair also can occur via nucleotide excision repair (NER). The involvement of NER in AP-site removal has been puzzling, however, as this pathway is thought to target only bulky lesions. Here, we examine the repair of AP sites generated when uracil is removed from a highly transcribed gene in yeast. Because uracil is incorporated instead of thymine under these conditions, the position of the resulting AP site is known. Results demonstrate that only AP sites on the transcribed strand are efficient substrates for NER, suggesting the recruitment of the NER machinery by an AP-blocked RNA polymerase. Such transcription-coupled NER of AP sites may explain previously suggested links between the BER pathway and transcription.
Collapse
|
28
|
Eory L, Halligan DL, Keightley PD. Distributions of selectively constrained sites and deleterious mutation rates in the hominid and murid genomes. Mol Biol Evol 2010; 27:177-92. [PMID: 19759235 DOI: 10.1093/molbev/msp219] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Protein-coding sequences make up only about 1% of the mammalian genome. Much of the remaining 99% has been long assumed to be junk DNA, with little or no functional significance. Here, we show that in hominids, a group with historically low effective population sizes, all classes of noncoding DNA evolve more slowly than ancestral transposable elements and so appear to be subject to significant evolutionary constraints. Under the nearly neutral theory, we expected to see lower levels of selective constraints on most sequence types in hominids than murids, a group that is thought to have a higher effective population size. We found that this is the case for many sequence types examined, the most extreme example being 5'UTRs, for which constraint in hominids is only about one-third that of murids. Surprisingly, however, we observed higher constraints for some sequence types in hominids, notably 4-fold sites, where constraint is more than twice as high as in murids. This implies that more than about one-fifth of mutations at 4-fold sites are effectively selected against in hominids. The higher constraint at 4-fold sites in hominids suggests a more complex protein-coding gene structure than murids and indicates that methods for detecting selection on protein-coding sequences (e.g., using the d(N)/d(S) ratio), with 4-fold sites as a neutral standard, may lead to biased estimates, particularly in hominids. Our constraint estimates imply that 5.4% of nucleotide sites in the human genome are subject to effective negative selection and that there are three times as many constrained sites within noncoding sequences as within protein-coding sequences. Including coding and noncoding sites, we estimate that the genomic deleterious mutation rate U = 4.2. The mutational load predicted under a multiplicative model is therefore about 99% in hominids.
Collapse
Affiliation(s)
- Lél Eory
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom.
| | | | | |
Collapse
|
29
|
Geraci G, D'Elia I, del Gaudio R, Di Giaimo R. Evidence of genetic instability in tumors and normal nearby tissues. PLoS One 2010; 5:e9343. [PMID: 20186333 PMCID: PMC2826410 DOI: 10.1371/journal.pone.0009343] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2009] [Accepted: 01/18/2010] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Comprehensive analyses have recently been performed on many human cancer tissues, leading to the identification of a number of mutated genes but providing no information on the variety of mutations present in each of them. This information is of interest to understand the possible origin of gene mutations that cause tumors. METHODOLOGY/PRINCIPAL FINDINGS We have analyzed the sequence heterogeneity of the transcripts of the human HPRT and G6PD single copy genes that are not considered tumor markers. Analyses have been performed on different colon cancers and on the nearby histologically normal tissues of two male patients. Several copies of each cDNA, which were produced by cloning the RT-PCR-amplified fragments of the specific mRNA, have been sequenced. Similar analyses have been performed on blood samples of two ostensibly healthy males as reference controls. The sequence heterogeneity of the HPRT and G6PD genes was also determined on DNA from tumor tissues. The employed analytical approach revealed the presence of low-frequency mutations not detectable by other procedures. The results show that genetic heterogeneity is detectable in HPRT and G6PD transcripts in both tumors and nearby healthy tissues of the two studied colon tumors. Similar frequencies of mutations are observed in patient genomic DNA, indicating that mutations have a somatic origin. HPRT transcripts show genetic heterogeneity also in healthy individuals, in agreement with previous results on human T-cells, while G6PD transcript heterogeneity is a characteristic of the patient tissues. Interestingly, data on TP53 show little, if any, heterogeneity in the same tissues. CONCLUSIONS/SIGNIFICANCE These findings show that genetic heterogeneity is a peculiarity not only of cancer cells but also of the normal tissue where a tumor arises.
Collapse
Affiliation(s)
- Giuseppe Geraci
- Department of Biological Sciences, University of Naples Federico II, Napoli, Italy
- Ceinge Biotecnologie Avanzate s.c. a r.l., Napoli, Italy
| | - Ida D'Elia
- Department of Biological Sciences, University of Naples Federico II, Napoli, Italy
| | - Rosanna del Gaudio
- Department of Biological Sciences, University of Naples Federico II, Napoli, Italy
| | - Rossella Di Giaimo
- Department of Biological Sciences, University of Naples Federico II, Napoli, Italy
- * E-mail:
| |
Collapse
|
30
|
Mugal CF, Wolf JBW, von Grünberg HH, Ellegren H. Conservation of neutral substitution rate and substitutional asymmetries in mammalian genes. Genome Biol Evol 2010; 2:19-28. [PMID: 20333222 PMCID: PMC2839347 DOI: 10.1093/gbe/evp056] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/22/2009] [Indexed: 12/21/2022] Open
Abstract
Local variation in neutral substitution rate across mammalian genomes is governed by several factors, including sequence context variables and structural variables. In addition, the interplay of replication and transcription, known to induce a strand bias in mutation rate, gives rise to variation in substitutional strand asymmetries. Here, we address the conservation of variation in mutation rate and substitutional strand asymmetries using primate- and rodent-specific repeat elements located within the introns of protein-coding genes. We find significant but weak conservation of local mutation rates between human and mouse orthologs. Likewise, substitutional strand asymmetries are conserved between human and mouse, where substitution rate asymmetries show a higher degree of conservation than mutation rate. Moreover, we provide evidence that replication and transcription are correlated to the strength of substitutional asymmetries. The effect of transcription is particularly visible for genes with highly conserved gene expression. In comparison with replication and transcription, mutation rate influences the strength of substitutional asymmetries only marginally.
Collapse
Affiliation(s)
- C F Mugal
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden.
| | | | | | | |
Collapse
|
31
|
Pink CJ, Hurst LD. Timing of replication is a determinant of neutral substitution rates but does not explain slow Y chromosome evolution in rodents. Mol Biol Evol 2009; 27:1077-86. [PMID: 20026481 DOI: 10.1093/molbev/msp314] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Mutation rates, assayed as substitution rates of putatively neutral sites, are highly variable around mammalian genomes: There is heterogeneity between genes, between autosomes, and between X, Y, and autosomes. The differences between X, Y, and autosomes are typically assumed to reflect the greater number of cell divisions in the male germ-line. Such an effect can neither account for within-autosome differences nor does it predict the differences between X, Y, and autosome observed in rodents. It has recently been proposed that in primates, the time during S-phase when a gene is replicated is an important determinant of neutral rates of evolution. Here we ask 1) whether we can replicate this result in rodents, 2) whether different autosomes replicate on average at different times, and 3) whether this might explain differences in their substitution rates. Finally we ask 4) whether X, Y, and autosome replicate at different times and 5) whether any difference might explain why the number of replication events alone cannot explain their substitution rates. We find that, as in primates, autosomal intronic rates of evolution increase significantly during S-phase. Different autosomes do have different average replication times, and together with rearrangement, this is a significant predictor of between-autosome differences in substitution rate. Although we find that autosomal, X-, and Y-linked genes replicate at different times, it is paradoxical that the Y-linked genes replicate latest, and replicate more often, but are not especially fast evolving. These results support the hypothesis that replication timing is an important source of substitution rate heterogeneity.
Collapse
Affiliation(s)
- Catherine J Pink
- Department of Biology and Biochemistry, University of Bath, Somerset, United Kingdom
| | | |
Collapse
|
32
|
Abstract
Recent large-scale cancer sequencing studies have focused primarily on identifying cancer-associated genes, but as an important byproduct provide "passenger mutation" data that can potentially illuminate the mutational mechanisms at work in cancer cells. Here, we explore patterns of nucleotide substitution in several cancer types using published data. We first show that selection (negative or positive) has affected only a small fraction of mutations, allowing us to attribute observed trends to underlying mutational processes rather than selection. We then show that the increased CpG mutation frequency observed in some cancers primarily occurs outside of CpG islands and CpG island shores, thus rejecting the hypothesis that the increase is a byproduct of island or shore methylation followed by deamination. We observe an A-->G vs. T-->C mutational asymmetry in some cancers similar to one that has been observed in germline mutations in transcribed regions, suggesting that the mutation process may be influenced by gene expression. We also demonstrate that the relative frequency of mutations at dinucleotide "hotspots" can be used as a tool to detect likely technical artifacts in large-scale studies.
Collapse
|
33
|
Understanding what determines the frequency and pattern of human germline mutations. Nat Rev Genet 2009; 10:478-88. [PMID: 19488047 DOI: 10.1038/nrg2529] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Surprising findings about human germline mutation have come from applying new technologies to detect rare mutations in germline DNA, from analysing DNA sequence divergence between humans and closely related species, and from investigating human polymorphic variation. In this Review we discuss how these approaches affect our current understanding of the roles of sex, age, mutation hot spots, germline selection and genomic factors in determining human nucleotide substitution mutation patterns and frequencies. To enhance our understanding of mutation and disease, more extensive molecular data on the human germ line with regard to mutation origin, DNA repair, epigenetic status and the effect of newly arisen mutations on gamete development are needed.
Collapse
|
34
|
Pink CJ, Swaminathan SK, Dunham I, Rogers J, Ward A, Hurst LD. Evidence that replication-associated mutation alone does not explain between-chromosome differences in substitution rates. Genome Biol Evol 2009; 1:13-22. [PMID: 20333173 PMCID: PMC2817397 DOI: 10.1093/gbe/evp001] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/05/2009] [Indexed: 12/12/2022] Open
Abstract
Since Haldane first noticed an excess of paternally derived mutations, it has been considered that most mutations derive from errors during germ line replication. Miyata et al. (1987) proposed that differences in the rate of neutral evolution on X, Y, and autosome can be employed to measure the extent of this male bias. This commonly applied method assumes replication to be the sole source of between-chromosome variation in substitution rates. We propose a simple test of this assumption: If true, estimates of the male bias should be independent of which two chromosomal classes are compared. Prior evidence from rodents suggested that this might not be true, but conclusions were limited by a lack of rat Y-linked sequence. We therefore sequenced two rat Y-linked bacterial artificial chromosomes and determined evolutionary rate by comparison with mouse. For estimation of rates we consider both introns and synonymous rates. Surprisingly, for both data sets the prediction of congruent estimates of alpha is strongly rejected. Indeed, some comparisons suggest a female bias with autosomes evolving faster than Y-linked sequence. We conclude that the method of Miyata et al. (1987) has the potential to provide incorrect estimates. Correcting the method requires understanding of the other causes of substitution that might differ between chromosomal classes. One possible cause is recombination-associated substitution bias for which we find some evidence. We note that if, as some suggest, this association is dominantly owing to male recombination, the high estimates of alpha seen in birds is to be expected as Z chromosomes recombine in males.
Collapse
Affiliation(s)
- Catherine J Pink
- Department of Biology and Biochemistry, University of Bath, Bath, Somerset, United Kingdom
| | | | | | | | | | | |
Collapse
|
35
|
Mugal CF, von Grünberg HH, Peifer M. Transcription-induced mutational strand bias and its effect on substitution rates in human genes. Mol Biol Evol 2008; 26:131-42. [PMID: 18974087 DOI: 10.1093/molbev/msn245] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
If substitution rates are not the same on the two complementary DNA strands, a substitution is considered strand asymmetric. Such substitutional strand asymmetries are determined here for the three most frequent types of substitution on the human genome (C --> T, A --> G, and G --> T). Substitution rate differences between both strands are estimated for 4,590 human genes by aligning all repeats occurring within the introns with their ancestral consensus sequences. For 1,630 of these genes, both coding strand and noncoding strand rates could be compared with rates in gene-flanking regions. All three rates considered are found to be on average higher on the coding strand and lower on the transcribed strand in comparison to their values in the gene-flanking regions. This finding points to the simultaneous action of rate-increasing effects on the coding strand--such as increased adenine and cytosine deamination--and transcription-coupled repair as a rate-reducing effect on the transcribed strand. The common behavior of the three rates leads to strong correlations of the rate asymmetries: Whenever one rate is strand biased, the other two rates are likely to show the same bias. Furthermore, we determine all three rate asymmetries as a function of time: the A --> G and G --> T rate asymmetries are both found to be constant in time, whereas the C --> T rate asymmetry shows a pronounced time dependence, an observation that explains the difference between our results and those of an earlier work by Green et al. (2003. Transcription-associated mutational asymmetry in mammalian evolution. Nat Genet. 33:514-517.). Finally, we show that in addition to transcription also the replication process biases the substitution rates in genes.
Collapse
Affiliation(s)
- Carina F Mugal
- Institute of Chemistry, Karl-Franzens University Graz, Graz, Austria
| | | | | |
Collapse
|
36
|
Zhao Y, Epstein RJ. Programmed genetic instability: a tumor-permissive mechanism for maintaining the evolvability of higher species through methylation-dependent mutation of DNA repair genes in the male germ line. Mol Biol Evol 2008; 25:1737-49. [PMID: 18535014 PMCID: PMC2464741 DOI: 10.1093/molbev/msn126] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Tumor suppressor genes are classified by their somatic behavior either as caretakers (CTs) that maintain DNA integrity or as gatekeepers (GKs) that regulate cell survival, but the germ line role of these disease-related gene subgroups may differ. To test this hypothesis, we have used genomic data mining to compare the features of human CTs (n = 38), GKs (n = 36), DNA repair genes (n = 165), apoptosis genes (n = 622), and their orthologs. This analysis reveals that repair genes are numerically less common than apoptosis genes in the genomes of multicellular organisms (P < 0.01), whereas CT orthologs are commoner than GK orthologs in unicellular organisms (P < 0.05). Gene targeting data show that CTs are less essential than GKs for survival of multicellular organisms (P < 0.0005) and that CT knockouts often permit offspring viability at the cost of male sterility. Patterns of human familial oncogenic mutations confirm that isolated CT loss is commoner than is isolated GK loss (P < 0.00001). In sexually reproducing species, CTs appear subject to less efficient purifying selection (i.e., higher Ka/Ks) than GKs (P = 0.000003); the faster evolution of CTs seems likely to be mediated by gene methylation and reduced transcription-coupled repair, based on differences in dinucleotide patterns (P = 0.001). These data suggest that germ line CT/repair gene function is relatively dispensable for survival, and imply that milder (e.g., epimutational) male prezygotic repair defects could enhance sperm variation—and hence environmental adaptation and speciation—while sparing fertility. We submit that CTs and repair genes are general targets for epigenetically initiated adaptive evolution, and propose a model in which human cancers arise in part as an evolutionarily programmed side effect of age- and damage-inducible genetic instability affecting both somatic and germ line lineages.
Collapse
Affiliation(s)
- Yongzhong Zhao
- Laboratory of Computational Oncology, Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | | |
Collapse
|
37
|
Abstract
A regional analysis of nucleotide substitution rates along human genes and their flanking regions allows us to quantify the effect of mutational mechanisms associated with transcription in germ line cells. Our analysis reveals three distinct patterns of substitution rates. First, a sharp decline in the deamination rate of methylated CpG dinucleotides, which is observed in the vicinity of the 5' end of genes. Second, a strand asymmetry in complementary substitution rates, which extends from the 5' end to 1 kbp downstream from the 3' end, associated with transcription-coupled repair. Finally, a localized strand asymmetry, an excess of C-->T over G-->A substitution in the nontemplate strand confined to the first 1-2 kbp downstream of the 5' end of genes. We hypothesize that higher exposure of the nontemplate strand near the 5' end of genes leads to a higher cytosine deamination rate. Up to now, only the somatic hypermutation (SHM) pathway has been known to mediate localized and strand-specific mutagenic processes associated with transcription in mammalia. The mutational patterns in SHM are induced by cytosine deaminase, which just targets single-stranded DNA. This DNA conformation is induced by R-loops, which preferentially occur at the 5' ends of genes. We predict that R-loops are extensively formed in the beginning of transcribed regions in germ line cells.
Collapse
|
38
|
Evans KJ. Genomic DNA from animals shows contrasting strand bias in large and small subsequences. BMC Genomics 2008; 9:43. [PMID: 18221531 PMCID: PMC2267173 DOI: 10.1186/1471-2164-9-43] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2007] [Accepted: 01/25/2008] [Indexed: 01/09/2023] Open
Abstract
Background For eukaryotes, there is almost no strand bias with regard to base composition, with exceptions for origins of replication and transcription start sites and transcribed regions. This paper revisits the question for subsequences of DNA taken at random from the genome. Results For a typical mammal, for example mouse or human, there is a small strand bias throughout the genomic DNA: there is a correlation between (G - C) and (A - T) on the same strand, (that is between the difference in the number of guanine and cytosine bases and the difference in the number of adenine and thymine bases). For small subsequences – up to 1 kb – this correlation is weak but positive; but for large windows – around 50 kb to 2 Mb – the correlation is strong and negative. This effect is largely independent of GC%. Transcribed and untranscribed regions give similar correlations both for small and large subsequences, but there is a difference in these regions for intermediate sized subsequences. An analysis of the human genome showed that position within the isochore structure did not affect these correlations. An analysis of available genomes of different species shows that this contrast between large and small windows is a general feature of mammals and birds. Further down the evolutionary tree, other organisms show a similar but smaller effect. Except for the nematode, all the animals analysed showed at least a small effect. Conclusion The correlations on the large scale may be explained by DNA replication. Transcription may be a modifier of these effects but is not the fundamental cause. These results cast light on how DNA mutations affect the genome over evolutionary time. At least for vertebrates, there is a broad relationship between body temperature and the size of the correlation. The genome of mammals and birds has a structure marked by strand bias segments.
Collapse
Affiliation(s)
- Kenneth J Evans
- School of Crystallography, Birkbeck College, University of London, Malet Street, London, WC1E 7HX, UK.
| |
Collapse
|
39
|
Evans KJ. Strand bias structure in mouse DNA gives a glimpse of how chromatin structure affects gene expression. BMC Genomics 2008; 9:16. [PMID: 18194530 PMCID: PMC2266913 DOI: 10.1186/1471-2164-9-16] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2007] [Accepted: 01/14/2008] [Indexed: 12/20/2022] Open
Abstract
Background On a single strand of genomic DNA the number of As is usually about equal to the number of Ts (and similarly for Gs and Cs), but deviations have been noted for transcribed regions and origins of replication. Results The mouse genome is shown to have a segmented structure defined by strand bias. Transcription is known to cause a strand bias and numerous analyses are presented to show that the strand bias in question is not caused by transcription. However, these strand bias segments influence the position of genes and their unspliced length. The position of genes within the strand bias structure affects the probability that a gene is switched on and its expression level. Transcription has a highly directional flow within this structure and the peak volume of transcription is around 20 kb from the A-rich/T-rich segment boundary on the T-rich side, directed away from the boundary. The A-rich/T-rich boundaries are SATB1 binding regions, whereas the T-rich/A-rich boundary regions are not. Conclusion The direct cause of the strand bias structure may be DNA replication. The strand bias segments represent a further biological feature, the chromatin structure, which in turn influences the ease of transcription.
Collapse
Affiliation(s)
- Kenneth J Evans
- School of Crystallography, Birkbeck College, University of London, Malet Street, London, WC1E 7HX, UK.
| |
Collapse
|
40
|
Mutational pattern and frequency of induced nucleotide changes in mouse ENU mutagenesis. BMC Mol Biol 2007; 8:52. [PMID: 17584492 PMCID: PMC1914352 DOI: 10.1186/1471-2199-8-52] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2006] [Accepted: 06/20/2007] [Indexed: 11/16/2022] Open
Abstract
Background With the advent of sequence-based approaches in the mutagenesis studies, it is now possible to directly evaluate the genome-wide pattern of experimentally induced DNA sequence changes for a diverse array of organisms. To gain a more comprehensive understanding of the mutational bias inherent in mouse ENU mutagenesis, this study describes a detailed evaluation of the induced mutational pattern obtained from a sequence-based screen of ENU-mutagenized mice. Results Based on a large-scale screening data, we derive the sequence-based estimates of the nucleotide-specific pattern and frequency of ENU-induced base replacement mutation in the mouse germline, which are then combined with the pattern of codon usage in the mouse coding sequences to infer the spectrum of amino acid changes obtained by ENU mutagenesis. We detect a statistically significant difference between the mutational patterns in phenotype- versus sequence-based screens, which presumably reflects differential phenotypic effects caused by different amino acid replacements. We also demonstrate that the mutations exhibit strong strand asymmetry, and that this imbalance is generated by transcription, most likely as a by-product of transcription-coupled DNA repair in the germline. Conclusion The results clearly illustrate the biased nature of ENU-induced mutations. We expect that a precise understanding of the mutational pattern and frequency of induced nucleotide changes would be of practical importance when designing sequence-based screening strategies to generate mutant mouse strains harboring amino acid variants at specific loci. More generally, by enhancing the collection of experimentally induced mutations in unambiguously defined genomic regions, sequence-based mutagenesis studies will further illuminate the molecular basis of mutagenic and repair mechanisms that preferentially produce a certain class of mutational changes over others.
Collapse
|
41
|
Wang HF, Hou WR, Niu DK. Strand compositional asymmetries in vertebrate large genes. Mol Biol Rep 2007; 35:163-9. [PMID: 17420956 DOI: 10.1007/s11033-007-9066-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2006] [Accepted: 02/26/2007] [Indexed: 10/23/2022]
Abstract
Both transcription-associated and replication-associated strand compositional asymmetries have recently been shown in vertebrate genomes. In this paper, we illustrate that transcription-associated strand compositional asymmetries and replication-associated ones coexist in most vertebrate large genes, although in most case the former conceals the latter. Furthermore, we found that the transcription-associated strand compositional asymmetries of housekeeping genes are stronger than those of somatic cell expressed genes. Together with other evidence, we suggest that germline transcription-associated strand asymmetric mutations may be the main cause of the transcription-associated strand compositional asymmetries.
Collapse
Affiliation(s)
- Hai-Fang Wang
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | | | | |
Collapse
|
42
|
Goodstadt L, Ponting CP. Phylogenetic reconstruction of orthology, paralogy, and conserved synteny for dog and human. PLoS Comput Biol 2006; 2:e133. [PMID: 17009864 PMCID: PMC1584324 DOI: 10.1371/journal.pcbi.0020133] [Citation(s) in RCA: 125] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2006] [Accepted: 08/21/2006] [Indexed: 01/22/2023] Open
Abstract
Accurate predictions of orthology and paralogy relationships are necessary to infer human molecular function from experiments in model organisms. Previous genome-scale approaches to predicting these relationships have been limited by their use of protein similarity and their failure to take into account multiple splicing events and gene prediction errors. We have developed PhyOP, a new phylogenetic orthology prediction pipeline based on synonymous rate estimates, which accurately predicts orthology and paralogy relationships for transcripts, genes, exons, or genomic segments between closely related genomes. We were able to identify orthologue relationships to human genes for 93% of all dog genes from Ensembl. Among 1:1 orthologues, the alignments covered a median of 97.4% of protein sequences, and 92% of orthologues shared essentially identical gene structures. PhyOP accurately recapitulated genomic maps of conserved synteny. Benchmarking against predictions from Ensembl and Inparanoid showed that PhyOP is more accurate, especially in its predictions of paralogy. Nearly half (46%) of PhyOP paralogy predictions are unique. Using PhyOP to investigate orthologues and paralogues in the human and dog genomes, we found that the human assembly contains 3-fold more gene duplications than the dog. Species-specific duplicate genes, or “in-paralogues,” are generally shorter and have fewer exons than 1:1 orthologues, which is consistent with selective constraints and mutation biases based on the sizes of duplicated genes. In-paralogues have experienced elevated amino acid and synonymous nucleotide substitution rates. Duplicates possess similar biological functions for either the dog or human lineages. Having accounted for 2,954 likely pseudogenes and gene fragments, and after separating 346 erroneously merged genes, we estimated that the human genome encodes a minimum of 19,700 protein-coding genes, similar to the gene count of nematode worms. PhyOP is a fast and robust approach to orthology prediction that will be applicable to whole genomes from multiple closely related species. PhyOP will be particularly useful in predicting orthology for mammalian genomes that have been incompletely sequenced, and for large families of rapidly duplicating genes. Biologists often exploit the evolutionary relationships between proteins in order to explain how their findings are relevant to the biology of other species, including Homo sapiens. The most natural way to define these relationships is to draw family trees showing, for example, which human protein is the counterpart (“orthologue”) of a protein in dog, and which human proteins have arisen by recent duplication of existing genes (“paralogues”). On a small-scale this is relatively straightforward, but it is difficult to do this automatically on a genome-wide scale. In this paper the authors describe a new approach to drawing a giant family tree of all proteins from humans and dogs. They show how this tree allows them to refine some protein predictions and discard others that are likely to be nonfunctional dead sequences. Family relationships can show how the dog and human genomes have been rearranged since their last common ancestor. In addition, they help to identify the proteins that are specific to either dog or human, and which contribute to these species' biological differences. Giant trees, drawn from this method, will help to associate the differences, duplications, and evolution of proteins in different mammals with their distinctive physiologies and behaviours.
Collapse
Affiliation(s)
- Leo Goodstadt
- Medical Research Council Functional Genetics Unit, University of Oxford, Department of Physiology, Anatomy, and Genetics, Oxford, United Kingdom.
| | | |
Collapse
|
43
|
Ponting CP, Lunter G. Signatures of adaptive evolution within human non-coding sequence. Hum Mol Genet 2006; 15 Spec No 2:R170-5. [PMID: 16987880 DOI: 10.1093/hmg/ddl182] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
The human genome is often portrayed as consisting of three sequence types, each distinguished by their mode of evolution. Purifying selection is estimated to act on 2.5-5.0% of the genome, whereas virtually all remaining sequence is considered to have evolved neutrally and to be devoid of functionality. The third mode of evolution, positive selection of advantageous changes, is considered rare. Such instances have been inferred only for a handful of sites, and these lie almost exclusively within protein-coding genes. Nevertheless, the majority of positively selected sequence is expected to lie within the wealth of functional 'dark matter' present outside of the coding sequence. Here, we review the evolutionary evidence for the majority of human-conserved DNA lying outside of the protein-coding sequence. We argue that within this non-coding fraction lies at least 1 Mb of functional sequence that has accumulated many beneficial nucleotide replacements. Illuminating the functions of this adaptive dark matter will lead to a better understanding of the sequence changes that have shaped the innovative biology of our species.
Collapse
Affiliation(s)
- Chris P Ponting
- MRC Functional Genetics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, South Parks Road, Oxford OX1 3QX, UK.
| | | |
Collapse
|
44
|
Abstract
The exon/intron structure of eukaryotic genes differs extensively across species, but the mechanisms and relative rates of intron loss and gain are still poorly understood. Here, we used whole-genome sequence alignments of human, mouse, rat, and dog to perform a genome-wide analysis of intron loss and gain events in >17,000 mammalian genes. We found no evidence for intron gain and 122 cases of intron loss, most of which occurred within the rodent lineage. The majority (68%) of the deleted introns were extremely small (<150 bp), significantly smaller than average. The intron losses occurred almost exclusively within highly expressed, housekeeping genes, supporting the hypothesis that intron loss is mediated via germline recombination of genomic DNA with intronless cDNA. This study constitutes the largest scale analysis for intron dynamics in vertebrates to date and allows us to confirm and extend several hypotheses previously based on much smaller samples. Our results in mammals show that intron gain has not been a factor in the evolution of gene structure during the past 95 Myr and has likely been restricted to more ancient history.
Collapse
Affiliation(s)
| | - Jacek Majewski
- Department of Human Genetics, McGill University, Montreal, Quebec H3A 1A4, Canada
- Corresponding author.E-mail ; fax (514) 398-1790
| |
Collapse
|
45
|
Tang CS, Zhao YZ, Smith DK, Epstein RJ. Intron length and accelerated 3' gene evolution. Genomics 2006; 88:682-689. [PMID: 16928427 DOI: 10.1016/j.ygeno.2006.06.017] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2006] [Revised: 06/27/2006] [Accepted: 06/28/2006] [Indexed: 11/24/2022]
Abstract
Genetic evolution depends in part upon a balance between negative selection and environmentally driven mutation. To explore whether this balance is affected by gene structure, we have used phylogenetic data mining to compare gene compositions across a range of species. Here we show that genomes of higher species exhibit a greater frequency of 5' CpG islands and of CpG-->TpG/CpA transitions. This latter mutational pattern exhibits a 5'-to-3' trend in higher species, consistent with a length-dependent effect on methylation-dependent CpG suppression. Associated strand asymmetry (TpG>CpA) declines with gene length, implying attenuation of transcription-coupled repair 3' to introns. A sharp 3' rise in coding region single-nucleotide polymorphism frequency further supports a mechanistic role for intron length in promoting genetic variation by reducing repair and/or weakening negative selection. Consistent with this, the Ka/Ks ratio of 3' exons exceeds that of centrally located exons in intron-containing, but not in intronless, genes (p<0.0003). We conclude that the efficiency of transcription-coupled repair decreases with gene length, suggesting in turn that 3' gene evolution is accelerated both by introns and by gene methylation.
Collapse
Affiliation(s)
- Clara S Tang
- Laboratory of Computational Oncology, Department of Medicine, Pokfulam, Hong Kong
| | - Yong Z Zhao
- Laboratory of Computational Oncology, Department of Medicine, Pokfulam, Hong Kong
| | - David K Smith
- Department of Biochemistry, The University of Hong Kong, Pokfulam, Hong Kong
| | - Richard J Epstein
- Laboratory of Computational Oncology, Department of Medicine, Pokfulam, Hong Kong.
| |
Collapse
|
46
|
Qu HQ, Lawrence SG, Guo F, Majewski J, Polychronakos C. Strand bias in complementary single-nucleotide polymorphisms of transcribed human sequences: evidence for functional effects of synonymous polymorphisms. BMC Genomics 2006; 7:213. [PMID: 16916449 PMCID: PMC1559705 DOI: 10.1186/1471-2164-7-213] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2006] [Accepted: 08/17/2006] [Indexed: 11/25/2022] Open
Abstract
Background Complementary single-nucleotide polymorphisms (SNPs) may not be distributed equally between two DNA strands if the strands are functionally distinct, such as in transcribed genes. In introns, an excess of A↔G over the complementary C↔T substitutions had previously been found and attributed to transcription-coupled repair (TCR), demonstrating the valuable functional clues that can be obtained by studying such asymmetry. Here we studied asymmetry of human synonymous SNPs (sSNPs) in the fourfold degenerate (FFD) sites as compared to intronic SNPs (iSNPs). Results The identities of the ancestral bases and the direction of mutations were inferred from human-chimpanzee genomic alignment. After correction for background nucleotide composition, excess of A→G over the complementary T→C polymorphisms, which was observed previously and can be explained by TCR, was confirmed in FFD SNPs and iSNPs. However, when SNPs were separately examined according to whether they mapped to a CpG dinucleotide or not, an excess of C→T over G→A polymorphisms was found in non-CpG site FFD SNPs but was absent from iSNPs and CpG site FFD SNPs. Conclusion The genome-wide discrepancy of human FFD SNPs provides novel evidence for widespread selective pressure due to functional effects of sSNPs. The similar asymmetry pattern of FFD SNPs and iSNPs that map to a CpG can be explained by transcription-coupled mechanisms, including TCR and transcription-coupled mutation. Because of the hypermutability of CpG sites, more CpG site FFD SNPs are relatively younger and have confronted less selection effect than non-CpG FFD SNPs, which can explain the asymmetric discrepancy of CpG site FFD SNPs vs. non-CpG site FFD SNPs.
Collapse
Affiliation(s)
- Hui-Qi Qu
- Endocrine Genetics Laboratory, The McGill University Health Center (Montreal Children's Hospital), Montréal, Québec, Canada
| | - Steve G Lawrence
- Department of Human Genetics, McGill University, Montréal, Québec, Canada
| | - Fan Guo
- Endocrine Genetics Laboratory, The McGill University Health Center (Montreal Children's Hospital), Montréal, Québec, Canada
| | - Jacek Majewski
- Department of Human Genetics, McGill University, Montréal, Québec, Canada
| | - Constantin Polychronakos
- Endocrine Genetics Laboratory, The McGill University Health Center (Montreal Children's Hospital), Montréal, Québec, Canada
- Department of Pediatrics, The McGill University Health Center (Montreal Children's Hospital), 2300 Tupper, Montréal, Québec H3H 1P3, Canada
| |
Collapse
|
47
|
Chamary JV, Parmley JL, Hurst LD. Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat Rev Genet 2006; 7:98-108. [PMID: 16418745 DOI: 10.1038/nrg1770] [Citation(s) in RCA: 590] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Although the assumption of the neutral theory of molecular evolution - that some classes of mutation have too small an effect on fitness to be affected by natural selection - seems intuitively reasonable, over the past few decades the theory has been in retreat. At least in species with large populations, even synonymous mutations in exons are not neutral. By contrast, in mammals, neutrality of these mutations is still commonly assumed. However, new evidence indicates that even some synonymous mutations are subject to constraint, often because they affect splicing and/or mRNA stability. This has implications for understanding disease, optimizing transgene design, detecting positive selection and estimating the mutation rate.
Collapse
Affiliation(s)
- J V Chamary
- Center for Integrative Genomics, University of Lausanne, Switzerland.
| | | | | |
Collapse
|
48
|
Hou WR, Wang HF, Niu DK. Replication-associated strand asymmetries in vertebrate genomes and implications for replicon size, DNA replication origin, and termination. Biochem Biophys Res Commun 2006; 344:1258-62. [PMID: 16650814 DOI: 10.1016/j.bbrc.2006.04.039] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2006] [Accepted: 04/17/2006] [Indexed: 11/16/2022]
Abstract
Strand compositional asymmetry has been observed in prokaryotes and used in predicting prokaryotic DNA replication origins and termini. However, it was not found in eukaryotic genomes by the same methods. We propose that transcription-associated strand asymmetries mask the replication-associated ones. By analyzing the nucleotide composition of intergenic sequences larger than 50 kb by cumulative skew diagrams (CSD), we found replication-associated strand asymmetry in vertebrate genomes. Furthermore, we found that the most common replicon sizes in vertebrates are 50-100 kb, and show evidence that the replication origin and termination regions of vertebrate genomes range from a discrete site to a broad zone.
Collapse
Affiliation(s)
- Wen-Ru Hou
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | | | | |
Collapse
|
49
|
Glusman G, Qin S, El-Gewely MR, Siegel AF, Roach JC, Hood L, Smit AFA. A third approach to gene prediction suggests thousands of additional human transcribed regions. PLoS Comput Biol 2006; 2:e18. [PMID: 16543943 PMCID: PMC1391917 DOI: 10.1371/journal.pcbi.0020018] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2005] [Accepted: 01/25/2006] [Indexed: 12/26/2022] Open
Abstract
The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent "genomic deserts."
Collapse
|
50
|
Kondrashov FA, Ogurtsov AY, Kondrashov AS. Selection in favor of nucleotides G and C diversifies evolution rates and levels of polymorphism at mammalian synonymous sites. J Theor Biol 2005; 240:616-26. [PMID: 16343547 DOI: 10.1016/j.jtbi.2005.10.020] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2005] [Revised: 10/26/2005] [Accepted: 10/27/2005] [Indexed: 11/24/2022]
Abstract
The impact of synonymous nucleotide substitutions on fitness in mammals remains controversial. Despite some indications of selective constraint, synonymous sites are often assumed to be neutral, and the rate of their evolution is used as a proxy for mutation rate. We subdivide all sites into four classes in terms of the mutable CpG context, nonCpG, postC, preG, and postCpreG, and compare four-fold synonymous sites and intron sites residing outside transposable elements. The distribution of the rate of evolution across all synonymous sites is trimodal. Rate of evolution at nonCpG synonymous sites, not preceded by C and not followed by G, is approximately 10% below that at such intron sites. In contrast, rate of evolution at postCpreG synonymous sites is approximately 30% above that at such intron sites. Finally, synonymous and intron postC and preG sites evolve at similar rates. The relationship between the levels of polymorphism at the corresponding synonymous and intron sites is very similar to that between their rates of evolution. Within every class, synonymous sites are occupied by G or C much more often than intron sites, whose nucleotide composition is consistent with neutral mutation-drift equilibrium. These patterns suggest that synonymous sites are under weak selection in favor of G and C, with the average coefficient s approximately 0.25/Ne approximately 10(-5), where Ne is the effective population size. Such selection decelerates evolution and reduces variability at sites with symmetric mutation, but has the opposite effects at sites where the favored nucleotides are more mutable. The amino-acid composition of proteins dictates that many synonymous sites are CpGprone, which causes them, on average, to evolve faster and to be more polymorphic than intron sites. An average genotype carries approximately 10(7) suboptimal nucleotides at synonymous sites, implying synergistic epistasis in selection against them.
Collapse
Affiliation(s)
- Fyodor A Kondrashov
- Section of Ecology, Behavior and Evolution, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0346, USA.
| | | | | |
Collapse
|