1
|
Waneka G, Broz AK, Wold-McGimsey F, Zou Y, Wu Z, Sloan DB. Disruption of recombination machinery alters the mutational landscape in plant organellar genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.03.597120. [PMID: 38895361 PMCID: PMC11185577 DOI: 10.1101/2024.06.03.597120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Land plant organellar genomes have extremely low rates of point mutation yet also experience high rates of recombination and genome instability. Characterizing the molecular machinery responsible for these patterns is critical for understanding the evolution of these genomes. While much progress has been made towards understanding recombination activity in land plant organellar genomes, the relationship between recombination pathways and point mutation rates remains uncertain. The organellar targeted mutS homolog MSH1 has previously been shown to suppress point mutations as well as non-allelic recombination between short repeats in Arabidopsis thaliana. We therefore implemented high-fidelity Duplex Sequencing to test if other genes that function in recombination and maintenance of genome stability also affect point mutation rates. We found small to moderate increases in the frequency of single nucleotide variants (SNVs) and indels in mitochondrial and/or plastid genomes of A. thaliana mutant lines lacking radA, recA1, or recA3. In contrast, osb2 and why2 mutants did not exhibit an increase in point mutations compared to wild type (WT) controls. In addition, we analyzed the distribution of SNVs in previously generated Duplex Sequencing data from A. thaliana organellar genomes and found unexpected strand asymmetries and large effects of flanking nucleotides on mutation rates in WT plants and msh1 mutants. Finally, using long-read Oxford Nanopore sequencing, we characterized structural variants in organellar genomes of the mutant lines and show that different short repeat sequences become recombinationally active in different mutant backgrounds. Together, these complementary sequencing approaches shed light on how recombination may impact the extraordinarily low point mutation rates in plant organellar genomes.
Collapse
Affiliation(s)
- Gus Waneka
- Department of Biology, Colorado State University, Fort Collins, Colorado, USA
| | - Amanda K Broz
- Department of Biology, Colorado State University, Fort Collins, Colorado, USA
| | | | - Yi Zou
- Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Zhiqiang Wu
- Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Daniel B Sloan
- Department of Biology, Colorado State University, Fort Collins, Colorado, USA
| |
Collapse
|
2
|
Beura PK, Sen P, Aziz R, Satapathy SS, Ray SK. Transcribed intergenic regions exhibit a lower frequency of nucleotide polymorphism than the untranscribed intergenic regions in the genomes of Escherichia coli and Salmonella enterica. J Genet 2023. [DOI: 10.1007/s12041-023-01418-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
|
3
|
Do Noncoding and Coding Sites in Angiosperm Chloroplast DNA Have Different Mutation Processes? Genes (Basel) 2023; 14:genes14010148. [PMID: 36672890 PMCID: PMC9858945 DOI: 10.3390/genes14010148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 12/30/2022] [Accepted: 01/03/2023] [Indexed: 01/09/2023] Open
Abstract
Fourfold degenerate sites within coding regions and intergenic sites have both been used as estimates of neutral evolution. In chloroplast DNA, the pattern of substitution at intergenic sites is strongly dependent on the composition of the surrounding hexanucleotide composed of the three base pairs on each side, which suggests that the mutation process is highly context-dependent in this genome. This study examines the context-dependency of substitutions at fourfold degenerate sites in protein-coding regions and compares the pattern to what has been observed at intergenic sites. Overall, there is strong similarity between the two types of sites, but there are some intriguing differences. One of these is that substitutions of G and C are significantly higher at fourfold degenerate sites across a range of contexts. In fact, A → T and T → A substitutions are the only substitution types that occur at a lower rate at fourfold degenerate sites. The data are not consistent with selective constraints being responsible for the difference in substitution patterns between intergenic and fourfold degenerate sites. Rather, it is suggested that the difference may be a result of different epigenetic modifications that result in slightly different mutation patterns in coding and intergenic DNA.
Collapse
|
4
|
Morton BR. Substitution rate heterogeneity across hexanucleotide contexts in noncoding chloroplast DNA. G3 GENES|GENOMES|GENETICS 2022; 12:6608088. [PMID: 35699494 PMCID: PMC9339276 DOI: 10.1093/g3journal/jkac150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Accepted: 06/07/2022] [Indexed: 11/13/2022]
Abstract
Substitutions between closely related noncoding chloroplast DNA sequences are studied with respect to the composition of the 3 bases on each side of the substitution, that is the hexanucleotide context. There is about 100-fold variation in rate, among the contexts, particularly on substitutions of A and T. Rate heterogeneity of transitions differs from that of transversions, resulting in a more than 200-fold variation in the transitions: transversion bias. The data are consistent with a CpG effect, and it is shown that both the A + T content and the arrangement of purines/pyrimidines along the same DNA strand are correlated with rate variation. Expected equilibrium A + T content ranges from 36.4% to 82.8% across contexts, while G–C skew ranges from −77.4 to 72.2 and A–T skew ranges from −63.9 to 68.2. The predicted equilibria are associated with specific features of the content of the hexanucleotide context, and also show close agreement with the observed context-dependent compositions. Finally, by controlling for the content of nucleotides closer to the substitution site, it is shown that both the third and fourth nucleotide removed on each side of the substitution directly influence substitution dynamics at that site. Overall, the results demonstrate that noncoding sites in different contexts are evolving along very different evolutionary trajectories and that substitution dynamics are far more complex than typically assumed. This has important implications for a number of types of sequence analysis, particularly analyses of natural selection, and the context-dependent substitution matrices developed here can be applied in future analyses.
Collapse
Affiliation(s)
- Brian R Morton
- Department of Biology, Barnard College, Columbia University , New York, NY 10027, USA
| |
Collapse
|
5
|
Cui G, Wang C, Wei X, Wang H, Wang X, Zhu X, Li J, Yang H, Duan H. Complete chloroplast genome of Hordeum brevisubulatum: Genome organization, synonymous codon usage, phylogenetic relationships, and comparative structure analysis. PLoS One 2021; 16:e0261196. [PMID: 34898618 PMCID: PMC8668134 DOI: 10.1371/journal.pone.0261196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Accepted: 11/28/2021] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Hordeum brevisubulatum, known as fine perennial forage, is used for soil salinity improvement in northern China. Chloroplast (cp) genome is an ideal model for assessing its genome evolution and the phylogenetic relationships. We de novo sequenced and analyzed the cp genome of H. brevisubulatum, providing a fundamental reference for further studies in genetics and molecular breeding. RESULTS The cp genome of H. brevisubulatum was 137,155 bp in length with a typical quadripartite structure. A total of 130 functional genes were annotated and the gene of accD was lost in the process of evolution. Among all the annotated genes, 16 different genes harbored introns and the genes of ycf3 and rps12 contained two introns. Parity rule 2 (PR2) plot analysis showed that majority of genes had a bias toward T over A in the coding strand in all five Hordeum species, and a slight G over C in the other four Hordeum species except for H. bogdanil. Additionally, 52 dispersed repeat sequences and 182 simple sequence repeats were identified. Moreover, some unique SSRs of each species could be used as molecular markers for further study. Compared to the other four Hordeum species, H. brevisubulatum was most closely related to H. bogdanii and its cp genome was relatively conserved. Moreover, inverted repeat regions (IRa and IRb) were less divergent than other parts and coding regions were relatively conserved compared to non-coding regions. Main divergence was presented at the SSC/IR border. CONCLUSIONS This research comprehensively describes the architecture of the H. brevisubulatum cp genome and improves our understanding of its cp biology and genetic diversity, which will facilitate biological discoveries and cp genome engineering.
Collapse
Affiliation(s)
- Guangxin Cui
- Lanzhou Institute of Husbandry and Pharmaceutical Science, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
| | - Chunmei Wang
- Lanzhou Institute of Husbandry and Pharmaceutical Science, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
| | - Xiaoxing Wei
- Academy of Animal and Veterinary Sciences, Qinghai University, Xining, Qinghai, China
| | - Hongbo Wang
- Lanzhou Institute of Husbandry and Pharmaceutical Science, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
- Laboratory of Quality & Safety Risk Assessment for Livestock Products, Ministry of Agriculture and Rural Affairs, Lanzhou, Gansu, China
| | - Xiaoli Wang
- Lanzhou Institute of Husbandry and Pharmaceutical Science, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
| | - Xinqiang Zhu
- Lanzhou Institute of Husbandry and Pharmaceutical Science, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
| | - JinHua Li
- Lanzhou Institute of Husbandry and Pharmaceutical Science, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
| | - Hongshan Yang
- Lanzhou Institute of Husbandry and Pharmaceutical Science, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
- * E-mail: (HY); (HD)
| | - Huirong Duan
- Lanzhou Institute of Husbandry and Pharmaceutical Science, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
- * E-mail: (HY); (HD)
| |
Collapse
|
6
|
Quintero-Ruiz N, Corradi C, Moreno NC, de Souza TA, Pereira Castro L, Rocha CRR, Menck CFM. Mutagenicity Profile Induced by UVB Light in Human Xeroderma Pigmentosum Group C Cells †. Photochem Photobiol 2021; 98:713-731. [PMID: 34516658 DOI: 10.1111/php.13516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 09/07/2021] [Indexed: 11/29/2022]
Abstract
Nucleotide excision repair (NER) is one of the main pathways for genome protection against structural DNA damage caused by sunlight, which in turn is extensively related to skin cancer development. The mutation spectra induced by UVB were investigated by whole-exome sequencing of randomly selected clones of NER-proficient and XP-C-deficient human skin fibroblasts. As a model, a cell line unable to recognize and remove lesions (XP-C) was used and compared to the complemented isogenic control (COMP). As expected, a significant increase of mutagenesis was observed in irradiated XP-C cells, mainly C>T transitions, but also CC>TT and C>A base substitutions. Remarkably, the C>T mutations occur mainly at the second base of dipyrimidine sites in pyrimidine-rich sequence contexts, with 5'TC sequence the most mutated. Although T>N mutations were also significantly increased, they were not directly related to pyrimidine dimers. Moreover, the large-scale study of a single UVB irradiation on XP-C cells allowed recovering the typical mutation spectrum found in human skin cancer tumors. Eventually, the data may be used for comparison with the mutational profiles of skin tumors obtained from XP-C patients and may help to understand the mutational process in nonaffected individuals.
Collapse
Affiliation(s)
- Nathalia Quintero-Ruiz
- Laboratorio de reparo de DNA, Departamento de Microbiologia, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo, Brazil
| | - Camila Corradi
- Laboratorio de reparo de DNA, Departamento de Microbiologia, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo, Brazil
| | - Natália Cestari Moreno
- Laboratorio de reparo de DNA, Departamento de Microbiologia, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo, Brazil.,Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brazil
| | - Tiago Antonio de Souza
- Laboratorio de reparo de DNA, Departamento de Microbiologia, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo, Brazil.,Tau GC Bioinformatics, São Paulo, Brazil
| | - Ligia Pereira Castro
- Laboratorio de reparo de DNA, Departamento de Microbiologia, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo, Brazil
| | - Clarissa Ribeiro Reily Rocha
- Laboratorio de reparo de DNA, Departamento de Microbiologia, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo, Brazil.,Drug resistance and mutagenesis Laboratory, Departmento de Oncologia Clínica e Experimental, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Carlos Frederico Martins Menck
- Laboratorio de reparo de DNA, Departamento de Microbiologia, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo, Brazil
| |
Collapse
|
7
|
Whittle CA, Kulkarni A, Chung N, Extavour CG. Adaptation of codon and amino acid use for translational functions in highly expressed cricket genes. BMC Genomics 2021; 22:234. [PMID: 33823803 PMCID: PMC8022432 DOI: 10.1186/s12864-021-07411-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 01/27/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND For multicellular organisms, much remains unknown about the dynamics of synonymous codon and amino acid use in highly expressed genes, including whether their use varies with expression in different tissue types and sexes. Moreover, specific codons and amino acids may have translational functions in highly transcribed genes, that largely depend on their relationships to tRNA gene copies in the genome. However, these relationships and putative functions are poorly understood, particularly in multicellular systems. RESULTS Here, we studied codon and amino acid use in highly expressed genes from reproductive and nervous system tissues (male and female gonad, somatic reproductive system, brain and ventral nerve cord, and male accessory glands) in the cricket Gryllus bimaculatus. We report an optimal codon, defined as the codon preferentially used in highly expressed genes, for each of the 18 amino acids with synonymous codons in this organism. The optimal codons were mostly shared among tissue types and both sexes. However, the frequency of optimal codons was highest in gonadal genes. Concordant with translational selection, a majority of the optimal codons had abundant matching tRNA gene copies in the genome, but sometimes obligately required wobble tRNAs. We suggest the latter may comprise a mechanism for slowing translation of abundant transcripts, particularly for cell-cycle genes. Non-optimal codons, defined as those least commonly used in highly transcribed genes, intriguingly often had abundant tRNAs, and had elevated use in a subset of genes with specialized functions (gametic and apoptosis genes), suggesting their use promotes the translational upregulation of particular mRNAs. In terms of amino acids, we found evidence suggesting that amino acid frequency, tRNA gene copy number, and amino acid biosynthetic costs (size/complexity) had all interdependently evolved in this insect model, potentially for translational optimization. CONCLUSIONS Collectively, the results suggest a model whereby codon use in highly expressed genes, including optimal, wobble, and non-optimal codons, and their tRNA abundances, as well as amino acid use, have been influenced by adaptation for various functional roles in translation within this cricket. The effects of expression in different tissue types and the two sexes are discussed.
Collapse
Affiliation(s)
- Carrie A Whittle
- Department of Organismic and Evolutionary Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA, 02138, USA
| | - Arpita Kulkarni
- Department of Organismic and Evolutionary Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA, 02138, USA
| | - Nina Chung
- Department of Organismic and Evolutionary Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA, 02138, USA
| | - Cassandra G Extavour
- Department of Organismic and Evolutionary Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA, 02138, USA.
- Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, 02138, MA, USA.
| |
Collapse
|
8
|
Heilbrun EE, Merav M, Adar S. Exons and introns exhibit transcriptional strand asymmetry of dinucleotide distribution, damage formation and DNA repair. NAR Genom Bioinform 2021; 3:lqab020. [PMID: 33817640 PMCID: PMC8002178 DOI: 10.1093/nargab/lqab020] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 02/24/2021] [Accepted: 03/22/2021] [Indexed: 12/29/2022] Open
Abstract
Recent cancer sequencing efforts have uncovered asymmetry in DNA damage induced mutagenesis between the transcribed and non-transcribed strands of genes. Here, we investigate the major type of damage induced by ultraviolet (UV) radiation, the cyclobutane pyrimidine dimers (CPDs), which are formed primarily in TT dinucleotides. We reveal that a transcriptional asymmetry already exists at the level of TT dinucleotide frequency and therefore also in CPD damage formation. This asymmetry is conserved in vertebrates and invertebrates and is completely reversed between introns and exons. We show the asymmetry in introns is linked to the transcription process itself, and is also found in enhancer elements. In contrast, the asymmetry in exons is not correlated to transcription, and is associated with codon usage preferences. Reanalysis of nucleotide excision repair, normalizing repair to the underlying TT frequencies, we show repair of CPDs is more efficient in exons compared to introns, contributing to the maintenance and integrity of coding regions. Our results highlight the importance of considering the primary sequence of the DNA in determining DNA damage sensitivity and mutagenic potential.
Collapse
Affiliation(s)
- Elisheva E Heilbrun
- Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel Canada, Faculty of Medicine, Hebrew University of Jerusalem, Ein Kerem, Jerusalem 91120, Israel
| | - May Merav
- Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel Canada, Faculty of Medicine, Hebrew University of Jerusalem, Ein Kerem, Jerusalem 91120, Israel
| | - Sheera Adar
- Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel Canada, Faculty of Medicine, Hebrew University of Jerusalem, Ein Kerem, Jerusalem 91120, Israel
| |
Collapse
|
9
|
Global Genome Demethylation Causes Transcription-Associated DNA Double Strand Breaks in HPV-Associated Head and Neck Cancer Cells. Cancers (Basel) 2020; 13:cancers13010021. [PMID: 33374558 PMCID: PMC7793113 DOI: 10.3390/cancers13010021] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 12/18/2020] [Accepted: 12/21/2020] [Indexed: 02/07/2023] Open
Abstract
High levels of DNA methylation at CpG loci are associated with transcriptional repression of tumor suppressor genes and dysregulation of DNA repair genes. Human papilloma virus (HPV)-associated head and neck squamous cell carcinomas (HNSCC) have high levels of DNA methylation and methylation has been associated with dampening of an innate immune response in virally infected cells. We have been exploring demethylation as a potential treatment in HPV+ HNSCC and recently reported results of a window clinical trial showing that HNSCCs are particularly sensitive to demethylating agent 5-azacytidine (5-aza). Mechanistically, sensitivity is partially due to downregulation of HPV genes expression and restoration of tumor suppressors p53 and Rb. Here, for the first time, we show that 5-azaC treatment of HPV+ HNSCC induces replication and transcription-associated DNA double strand breaks (DSBs) that occur preferentially at demethylated genomic DNA. Blocking replication or transcription prevented formation of DNA DSBs and reduced sensitivity of HPV-positive head and neck cancer cells to 5-azaC, demonstrating that both replication and active transcription are required for formation of DSBs associated with 5-azaC.
Collapse
|
10
|
Mas-Ponte D, Supek F. DNA mismatch repair promotes APOBEC3-mediated diffuse hypermutation in human cancers. Nat Genet 2020; 52:958-968. [PMID: 32747826 PMCID: PMC7610516 DOI: 10.1038/s41588-020-0674-6] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Accepted: 06/30/2020] [Indexed: 01/12/2023]
Abstract
Certain mutagens, including the APOBEC3 (A3) cytosine deaminase enzymes, can create multiple genetic changes in a single event. Activity of A3s results in striking 'mutation showers' occurring near DNA breakpoints; however, less is known about the mechanisms underlying the majority of A3 mutations. We classified the diverse patterns of clustered mutagenesis in tumor genomes, which identified a new A3 pattern: nonrecurrent, diffuse hypermutation (omikli). This mechanism occurs independently of the known focal hypermutation (kataegis), and is associated with activity of the DNA mismatch-repair pathway, which can provide the single-stranded DNA substrate needed by A3, and contributes to a substantial proportion of A3 mutations genome wide. Because mismatch repair is directed towards early-replicating, gene-rich chromosomal domains, A3 mutagenesis has a high propensity to generate impactful mutations, which exceeds that of other common carcinogens such as tobacco smoke and ultraviolet exposure. Cells direct their DNA repair capacity towards more important genomic regions; thus, carcinogens that subvert DNA repair can be remarkably potent.
Collapse
Affiliation(s)
- David Mas-Ponte
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Fran Supek
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
11
|
Boot A, Ng AWT, Chong FT, Ho SC, Yu W, Tan DSW, Iyer NG, Rozen SG. Characterization of colibactin-associated mutational signature in an Asian oral squamous cell carcinoma and in other mucosal tumor types. Genome Res 2020; 30:803-813. [PMID: 32661091 PMCID: PMC7370881 DOI: 10.1101/gr.255620.119] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 06/04/2020] [Indexed: 12/24/2022]
Abstract
Mutational signatures can reveal the history of mutagenic processes that cells were exposed to before and during tumorigenesis. We expect that as-yet-undiscovered mutational processes will shed further light on mutagenesis leading to carcinogenesis. With this in mind, we analyzed the mutational spectra of 36 Asian oral squamous cell carcinomas. The mutational spectra of two samples from patients who presented with oral bacterial infections showed novel mutational signatures. One of these novel signatures, SBS_AnT, is characterized by a preponderance of thymine mutations, strong transcriptional strand bias, and enrichment for adenines in the 4 bp 5′ of mutation sites. The mutational signature described in this manuscript was shown to be caused by colibactin, a bacterial mutagen produced by E. coli carrying the pks-island. Examination of publicly available sequencing data revealed SBS_AnT in 25 tumors from several mucosal tissue types, expanding the list of tissues in which this mutational signature is observed.
Collapse
Affiliation(s)
- Arnoud Boot
- Cancer and Stem Cell Biology, Duke-NUS Medical School, 169857, Singapore.,Center for Computational Biology, Duke-NUS Medical School, 169857, Singapore
| | - Alvin W T Ng
- Center for Computational Biology, Duke-NUS Medical School, 169857, Singapore.,NUS Graduate School for Integrative Sciences and Engineering, 117456, Singapore
| | - Fui Teen Chong
- Cancer Therapeutics Research Laboratory, Division of Medical Science, National Cancer Centre Singapore, 169610, Singapore
| | - Szu-Chi Ho
- Cancer and Stem Cell Biology, Duke-NUS Medical School, 169857, Singapore
| | - Willie Yu
- Cancer and Stem Cell Biology, Duke-NUS Medical School, 169857, Singapore.,Center for Computational Biology, Duke-NUS Medical School, 169857, Singapore
| | - Daniel S W Tan
- Cancer Therapeutics Research Laboratory, Division of Medical Science, National Cancer Centre Singapore, 169610, Singapore
| | - N Gopalakrishna Iyer
- Cancer and Stem Cell Biology, Duke-NUS Medical School, 169857, Singapore.,Cancer Therapeutics Research Laboratory, Division of Medical Science, National Cancer Centre Singapore, 169610, Singapore
| | - Steven G Rozen
- Cancer and Stem Cell Biology, Duke-NUS Medical School, 169857, Singapore.,Center for Computational Biology, Duke-NUS Medical School, 169857, Singapore.,NUS Graduate School for Integrative Sciences and Engineering, 117456, Singapore
| |
Collapse
|
12
|
Xia B, Yan Y, Baron M, Wagner F, Barkley D, Chiodin M, Kim SY, Keefe DL, Alukal JP, Boeke JD, Yanai I. Widespread Transcriptional Scanning in the Testis Modulates Gene Evolution Rates. Cell 2020; 180:248-262.e21. [PMID: 31978344 DOI: 10.1016/j.cell.2019.12.015] [Citation(s) in RCA: 95] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Revised: 09/04/2019] [Accepted: 12/12/2019] [Indexed: 02/07/2023]
Abstract
The testis expresses the largest number of genes of any mammalian organ, a finding that has long puzzled molecular biologists. Our single-cell transcriptomic data of human and mouse spermatogenesis provide evidence that this widespread transcription maintains DNA sequence integrity in the male germline by correcting DNA damage through a mechanism we term transcriptional scanning. We find that genes expressed during spermatogenesis display lower mutation rates on the transcribed strand and have low diversity in the population. Moreover, this effect is fine-tuned by the level of gene expression during spermatogenesis. The unexpressed genes, which in our model do not benefit from transcriptional scanning, diverge faster over evolutionary timescales and are enriched for sensory and immune-defense functions. Collectively, we propose that transcriptional scanning shapes germline mutation signatures and modulates mutation rates in a gene-specific manner, maintaining DNA sequence integrity for the bulk of genes but allowing for faster evolution in a specific subset.
Collapse
Affiliation(s)
- Bo Xia
- Institute for Computational Medicine, NYU Langone Health, New York, NY 10016, USA; Institute for Systems Genetics, NYU Langone Health, New York, NY 10016, USA
| | - Yun Yan
- Institute for Computational Medicine, NYU Langone Health, New York, NY 10016, USA
| | - Maayan Baron
- Institute for Computational Medicine, NYU Langone Health, New York, NY 10016, USA
| | - Florian Wagner
- Institute for Computational Medicine, NYU Langone Health, New York, NY 10016, USA
| | - Dalia Barkley
- Institute for Computational Medicine, NYU Langone Health, New York, NY 10016, USA
| | - Marta Chiodin
- Institute for Computational Medicine, NYU Langone Health, New York, NY 10016, USA
| | - Sang Y Kim
- Department of Pathology, NYU Langone Health, New York, NY 10016, USA
| | - David L Keefe
- Department of Obstetrics and Gynecology, NYU Langone Health, New York, NY 10016, USA
| | - Joseph P Alukal
- Department of Obstetrics and Gynecology, NYU Langone Health, New York, NY 10016, USA
| | - Jef D Boeke
- Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY 10016, USA; Institute for Systems Genetics, NYU Langone Health, New York, NY 10016, USA
| | - Itai Yanai
- Institute for Computational Medicine, NYU Langone Health, New York, NY 10016, USA; Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY 10016, USA.
| |
Collapse
|
13
|
Whittle CA, Kulkarni A, Extavour CG. Evidence of multifaceted functions of codon usage in translation within the model beetle Tribolium castaneum. DNA Res 2020; 26:473-484. [PMID: 31922535 PMCID: PMC6993815 DOI: 10.1093/dnares/dsz025] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 01/07/2020] [Indexed: 01/06/2023] Open
Abstract
Synonymous codon use is non-random. Codons most used in highly transcribed genes, often called optimal codons, typically have high gene counts of matching tRNA genes (tRNA abundance) and promote accurate and/or efficient translation. Non-optimal codons, those least used in highly expressed genes, may also affect translation. In multicellular organisms, codon optimality may vary among tissues. At present, however, tissue specificity of codon use remains poorly understood. Here, we studied codon usage of genes highly transcribed in germ line (testis and ovary) and somatic tissues (gonadectomized males and females) of the beetle Tribolium castaneum. The results demonstrate that: (i) the majority of optimal codons were organism-wide, the same in all tissues, and had numerous matching tRNA gene copies (Opt-codon↑tRNAs), consistent with translational selection; (ii) some optimal codons varied among tissues, suggesting tissue-specific tRNA populations; (iii) wobble tRNA were required for translation of certain optimal codons (Opt-codonwobble), possibly allowing precise translation and/or protein folding; and (iv) remarkably, some non-optimal codons had abundant tRNA genes (Nonopt-codon↑tRNAs), and genes using those codons were tightly linked to ribosomal and stress-response functions. Thus, Nonopt-codon↑tRNAs codons may regulate translation of specific genes. Together, the evidence suggests that codon use and tRNA genes regulate multiple translational processes in T. castaneum.
Collapse
Affiliation(s)
| | | | - Cassandra G Extavour
- Department of Organismic and Evolutionary Biology.,Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
14
|
Pan S, Bruford MW, Wang Y, Lin Z, Gu Z, Hou X, Deng X, Dixon A, Graves JAM, Zhan X. Transcription-Associated Mutation Promotes RNA Complexity in Highly Expressed Genes-A Major New Source of Selectable Variation. Mol Biol Evol 2018; 35:1104-1119. [PMID: 29420738 PMCID: PMC5913671 DOI: 10.1093/molbev/msy017] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Alternatively spliced transcript isoforms are thought to play a critical role for functional diversity. However, the mechanism generating the enormous diversity of spliced transcript isoforms remains unknown, and its biological significance remains unclear. We analyzed transcriptomes in saker falcons, chickens, and mice to show that alternative splicing occurs more frequently, yielding more isoforms, in highly expressed genes. We focused on hemoglobin in the falcon, the most abundantly expressed genes in blood, finding that alternative splicing produces 10-fold more isoforms than expected from the number of splice junctions in the genome. These isoforms were produced mainly by alternative use of de novo splice sites generated by transcription-associated mutation (TAM), not by the RNA editing mechanism normally invoked. We found that high expression of globin genes increases mutation frequencies during transcription, especially on nontranscribed DNA strands. After DNA replication, transcribed strands inherit these somatic mutations, creating de novo splice sites, and generating multiple distinct isoforms in the cell clone. Bisulfate sequencing revealed that DNA methylation may counteract this process by suppressing TAM, suggesting DNA methylation can spatially regulate RNA complexity. RNA profiling showed that falcons living on the high Qinghai-Tibetan Plateau possess greater global gene expression levels and higher diversity of mean to high abundance isoforms (reads per kilobases per million mapped reads ≥18) than their low-altitude counterparts, and we speculate that this may enhance their oxygen transport capacity under low-oxygen environments. Thus, TAM-induced RNA diversity may be physiologically significant, providing an alternative strategy in lifestyle evolution.
Collapse
Affiliation(s)
- Shengkai Pan
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.,Cardiff University-Institute of Zoology Joint Laboratory for Biocomplexity Research, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Michael W Bruford
- Cardiff University-Institute of Zoology Joint Laboratory for Biocomplexity Research, Beijing, China.,Organisms and Environment Division, School of Biosciences and Sustainable Place Institute, Cardiff University, Cardiff, United Kingdom
| | - Yusong Wang
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Zhenzhen Lin
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.,Cardiff University-Institute of Zoology Joint Laboratory for Biocomplexity Research, Beijing, China
| | - Zhongru Gu
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.,Cardiff University-Institute of Zoology Joint Laboratory for Biocomplexity Research, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Xian Hou
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Xuemei Deng
- National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding, and Reproduction of the Ministry of Agriculture, China Agricultural University, Beijing, China
| | - Andrew Dixon
- Cardiff University-Institute of Zoology Joint Laboratory for Biocomplexity Research, Beijing, China.,Emirates Falconers' Club, Abu Dhabi, UAE
| | | | - Xiangjiang Zhan
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.,Cardiff University-Institute of Zoology Joint Laboratory for Biocomplexity Research, Beijing, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| |
Collapse
|
15
|
Bergman J, Betancourt AJ, Vogl C. Transcription-Associated Compositional Skews in Drosophila Genes. Genome Biol Evol 2018; 10:269-275. [PMID: 29036491 PMCID: PMC5786239 DOI: 10.1093/gbe/evx200] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/25/2017] [Indexed: 12/23/2022] Open
Abstract
In many organisms, local deviations from Chargaff's second parity rule are observed around replication and transcription start sites and within intron sequences. Here, we use expression data as well as a whole-genome data set of nearly 200 haplotypes to investigate such compositional skews in Drosophila melanogaster genes. We find a positive correlation between compositional skew and gene expression, comparable in strength to similar correlations between expression levels and genome-wide sequence features. This correlation is relatively stronger for germline, compared with somatic expression, consistent with the process of transcription-associated mutation bias. We also inferred mutation rates from alleles segregating at low frequencies in short introns, and show that, whereas the overall GC content of short introns does not conform to the equilibrium expectation, the level of the observed deviation from the second parity rule is generally consistent with the inferred rates.
Collapse
Affiliation(s)
- Juraj Bergman
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria
- Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Wien, Austria
| | - Andrea J Betancourt
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria
- Present address: Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Claus Vogl
- Institut für Tierzucht und Genetik, Vetmeduni Vienna, Wien, Austria
| |
Collapse
|
16
|
Schaeffer CE, Figueroa ND, Liu X, Karro JE. phRAIDER: Pattern-Hunter based Rapid Ab Initio Detection of Elementary Repeats. Bioinformatics 2017; 32:i209-i215. [PMID: 27307619 PMCID: PMC4908342 DOI: 10.1093/bioinformatics/btw258] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Motivation: Transposable elements (TEs) and repetitive DNA make up a sizable fraction of Eukaryotic genomes, and their annotation is crucial to the study of the structure, organization, and evolution of any newly sequenced genome. Although RepeatMasker and nHMMER are useful for identifying these repeats, they require a pre-compiled repeat library—which is not always available. De novo identification tools such as Recon, RepeatScout or RepeatGluer serve to identify TEs purely from sequence content, but are either limited by runtimes that prohibit whole-genome use or degrade in quality in the presence of substitutions that disrupt the sequence patterns. Results: phRAIDER is a de novo TE identification tool that address the issues of excessive runtime without sacrificing sensitivity as compared to competing tools. The underlying model is a new definition of elementary repeats that incorporates the PatternHunter spaced seed model, allowing for greater sensitivity in the presence of genomic substitutions. As compared with the premier tool in the literature, RepeatScout, phRAIDER shows an average 10× speedup on any single human chromosome and has the ability to process the whole human genome in just over three hours. Here we discuss the tool, the theoretical model underlying the tool, and the results demonstrating its effectiveness. Availability and implementation: phRAIDER is an open source tool available from https://github.com/karroje/phRAIDER. Contact: karroje@miamiOH.edu or Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Xiaolin Liu
- Department of Cell, Molecular, and Structural Biology
| | - John E Karro
- Department of Computer Science and Software Engineering Department of Cell, Molecular, and Structural Biology Department of Microbiology Department of Statistics, Miami University, Oxford, OH, USA
| |
Collapse
|
17
|
Shewaramani S, Finn TJ, Leahy SC, Kassen R, Rainey PB, Moon CD. Anaerobically Grown Escherichia coli Has an Enhanced Mutation Rate and Distinct Mutational Spectra. PLoS Genet 2017; 13:e1006570. [PMID: 28103245 PMCID: PMC5289635 DOI: 10.1371/journal.pgen.1006570] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Revised: 02/02/2017] [Accepted: 01/04/2017] [Indexed: 12/21/2022] Open
Abstract
Oxidative stress is a major cause of mutation but little is known about how growth in the absence of oxygen impacts the rate and spectrum of mutations. We employed long-term mutation accumulation experiments to directly measure the rates and spectra of spontaneous mutation events in Escherichia coli populations propagated under aerobic and anaerobic conditions. To detect mutations, whole genome sequencing was coupled with methods of analysis sufficient to identify a broad range of mutational classes, including structural variants (SVs) generated by movement of repetitive elements. The anaerobically grown populations displayed a mutation rate nearly twice that of the aerobic populations, showed distinct asymmetric mutational strand biases, and greater insertion element activity. Consistent with mutation rate and spectra observations, genes for transposition and recombination repair associated with SVs were up-regulated during anaerobic growth. Together, these results define differences in mutational spectra affecting the evolution of facultative anaerobes.
Collapse
Affiliation(s)
- Sonal Shewaramani
- AgResearch Ltd, Grasslands Research Centre, Palmerston North, New Zealand
- New Zealand Institute for Advanced Study, Massey University, Auckland, New Zealand
| | - Thomas J. Finn
- AgResearch Ltd, Grasslands Research Centre, Palmerston North, New Zealand
- New Zealand Institute for Advanced Study, Massey University, Auckland, New Zealand
| | - Sinead C. Leahy
- AgResearch Ltd, Grasslands Research Centre, Palmerston North, New Zealand
| | - Rees Kassen
- Department of Biology, University of Ottawa, Ottawa, Ontario, Canada
| | - Paul B. Rainey
- New Zealand Institute for Advanced Study, Massey University, Auckland, New Zealand
- Department of Microbial Population Biology, Max Planck Institute for Evolutionary Biology, Plön, Germany
- Ecole Supérieure de Physique et de Chimie Industrielles de la Ville de Paris (ESPCI ParisTech), CNRS UMR 8231, PSL Research University, Paris, France
| | - Christina D. Moon
- AgResearch Ltd, Grasslands Research Centre, Palmerston North, New Zealand
- * E-mail:
| |
Collapse
|
18
|
Seplyarskiy VB, Andrianova MA, Bazykin GA. APOBEC3A/B-induced mutagenesis is responsible for 20% of heritable mutations in the TpCpW context. Genome Res 2016; 27:175-184. [PMID: 27940951 PMCID: PMC5287224 DOI: 10.1101/gr.210336.116] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Accepted: 12/01/2016] [Indexed: 12/18/2022]
Abstract
APOBEC3A/B cytidine deaminase is responsible for the majority of cancerous mutations in a large fraction of cancer samples. However, its role in heritable mutagenesis remains very poorly understood. Recent studies have demonstrated that both in yeast and in human cancerous cells, most APOBEC3A/B-induced mutations occur on the lagging strand during replication and on the nontemplate strand of transcribed regions. Here, we use data on rare human polymorphisms, interspecies divergence, and de novo mutations to study germline mutagenesis and to analyze mutations at nucleotide contexts prone to attack by APOBEC3A/B. We show that such mutations occur preferentially on the lagging strand and on nontemplate strands of transcribed regions. Moreover, we demonstrate that APOBEC3A/B-like mutations tend to produce strand-coordinated clusters, which are also biased toward the lagging strand. Finally, we show that the mutation rate is increased 3' of C→G mutations to a greater extent than 3' of C→T mutations, suggesting pervasive trans-lesion bypass of the APOBEC3A/B-induced damage. Our study demonstrates that 20% of C→T and C→G mutations in the TpCpW context-where W denotes A or T, segregating as polymorphisms in human population-or 1.4% of all heritable mutations are attributable to APOBEC3A/B activity.
Collapse
Affiliation(s)
- Vladimir B Seplyarskiy
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), Moscow 127994, Russia.,Pirogov Russian National Research Medical University, Moscow 117997, Russia
| | - Maria A Andrianova
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), Moscow 127994, Russia.,Pirogov Russian National Research Medical University, Moscow 117997, Russia.,Lomonosov Moscow State University, Moscow 119234, Russia
| | - Georgii A Bazykin
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), Moscow 127994, Russia.,Pirogov Russian National Research Medical University, Moscow 117997, Russia.,Lomonosov Moscow State University, Moscow 119234, Russia.,Skolkovo Institute of Science and Technology, Skolkovo 143026, Russia
| |
Collapse
|
19
|
Harpak A, Bhaskar A, Pritchard JK. Mutation Rate Variation is a Primary Determinant of the Distribution of Allele Frequencies in Humans. PLoS Genet 2016; 12:e1006489. [PMID: 27977673 PMCID: PMC5157949 DOI: 10.1371/journal.pgen.1006489] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Accepted: 11/16/2016] [Indexed: 01/06/2023] Open
Abstract
The site frequency spectrum (SFS) has long been used to study demographic history and natural selection. Here, we extend this summary by examining the SFS conditional on the alleles found at the same site in other species. We refer to this extension as the "phylogenetically-conditioned SFS" or cSFS. Using recent large-sample data from the Exome Aggregation Consortium (ExAC), combined with primate genome sequences, we find that human variants that occurred independently in closely related primate lineages are at higher frequencies in humans than variants with parallel substitutions in more distant primates. We show that this effect is largely due to sites with elevated mutation rates causing significant departures from the widely-used infinite sites mutation model. Our analysis also suggests substantial variation in mutation rates even among mutations involving the same nucleotide changes. In summary, we show that variable mutation rates are key determinants of the SFS in humans.
Collapse
Affiliation(s)
- Arbel Harpak
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - Anand Bhaskar
- Department of Genetics, Stanford University, Stanford, California, United States of America
- Howard Hughes Medical Institute, Stanford University, Stanford, California, United States of America
| | - Jonathan K. Pritchard
- Department of Biology, Stanford University, Stanford, California, United States of America
- Department of Genetics, Stanford University, Stanford, California, United States of America
- Howard Hughes Medical Institute, Stanford University, Stanford, California, United States of America
| |
Collapse
|
20
|
Callegari AJ. Does transcription-associated DNA damage limit lifespan? DNA Repair (Amst) 2016; 41:1-7. [PMID: 27010736 DOI: 10.1016/j.dnarep.2016.03.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2015] [Revised: 03/09/2016] [Accepted: 03/10/2016] [Indexed: 12/31/2022]
Abstract
Small mammals undergo an aging process similar to that of larger mammals, but aging occurs at a dramatically faster rate. This phenomenon is often assumed to be the result of damage caused by reactive oxygen species generated in mitochondria. An alternative explanation for the phenomenon is suggested here. The rate of RNA synthesis is dramatically elevated in small mammals and correlates quantitatively with the rate of aging among different mammalian species. The rate of RNA synthesis is reduced by caloric restriction and inhibition of TOR pathway signaling, two perturbations that increase lifespan in multiple metazoan species. From bacteria to man, the transcription of a gene has been found to increase the rate at which it is damaged, and a number of lines of evidence suggest that DNA damage is sufficient to induce multiple symptoms associated with normal aging. Thus, the correlations frequently found between the rate of RNA synthesis and the rate of aging could potentially reflect an important role for transcription-associated DNA damage in the aging process.
Collapse
Affiliation(s)
- A John Callegari
- Molecular Biology Program, Memorial Sloan-Kettering Cancer Center, New York, NY, USA.
| |
Collapse
|
21
|
Abstract
Species survival depends on the faithful replication of genetic information, which is continually monitored and maintained by DNA repair pathways that correct replication errors and the thousands of lesions that arise daily from the inherent chemical lability of DNA and the effects of genotoxic agents. Nonetheless, neutrally evolving DNA (not under purifying selection) accumulates base substitutions with time (the neutral mutation rate). Thus, repair processes are not 100% efficient. The neutral mutation rate varies both between and within chromosomes. For example it is 10-50 fold higher at CpGs than at non-CpG positions. Interestingly, the neutral mutation rate at non-CpG sites is positively correlated with CpG content. Although the basis of this correlation was not immediately apparent, some bioinformatic results were consistent with the induction of non-CpG mutations by DNA repair at flanking CpG sites. Recent studies with a model system showed that in vivo repair of preformed lesions (mismatches, abasic sites, single stranded nicks) can in fact induce mutations in flanking DNA. Mismatch repair (MMR) is an essential component for repair-induced mutations, which can occur as distant as 5 kb from the introduced lesions. Most, but not all, mutations involved the C of TpCpN (G of NpGpA) which is the target sequence of the C-preferring single-stranded DNA specific APOBEC deaminases. APOBEC-mediated mutations are not limited to our model system: Recent studies by others showed that some tumors harbor mutations with the same signature, as can intermediates in RNA-guided endonuclease-mediated genome editing. APOBEC deaminases participate in normal physiological functions such as generating mutations that inactivate viruses or endogenous retrotransposons, or that enhance immunoglobulin diversity in B cells. The recruitment of normally physiological error-prone processes during DNA repair would have important implications for disease, aging and evolution. This perspective briefly reviews both the bioinformatic and biochemical literature relevant to repair-induced mutagenesis and discusses future directions required to understand the mechanistic basis of this process.
Collapse
Affiliation(s)
- Jia Chen
- School of Life Science and Technology, ShanghaiTech University, Building 8, 319 Yueyang Road, Shanghai 200031, China
| | - Anthony V Furano
- Section on Genomic Structure and Function, Laboratory of Cell and Molecular Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Building 8, Room 203, 8 Center Drive, MSC 0830, Bethesda, MD 20892-0830, USA.
| |
Collapse
|
22
|
Makova KD, Hardison RC. The effects of chromatin organization on variation in mutation rates in the genome. Nat Rev Genet 2015; 16:213-23. [PMID: 25732611 PMCID: PMC4500049 DOI: 10.1038/nrg3890] [Citation(s) in RCA: 145] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The variation in local rates of mutations can affect both the evolution of genes and their function in normal and cancer cells. Deciphering the molecular determinants of this variation will be aided by the elucidation of distinct types of mutations, as they differ in regional preferences and in associations with genomic features. Chromatin organization contributes to regional variation in mutation rates, but its contribution differs among mutation types. In both germline and somatic mutations, base substitutions are more abundant in regions of closed chromatin, perhaps reflecting error accumulation late in replication. By contrast, a distinctive mutational state with very high levels of insertions and deletions (indels) and substitutions is enriched in regions of open chromatin. These associations indicate an intricate interplay between the nucleotide sequence of DNA and its dynamic packaging into chromatin, and have important implications for current biomedical research. This Review focuses on recent studies showing associations between chromatin state and mutation rates, including pairwise and multivariate investigations of germline and somatic (particularly cancer) mutations.
Collapse
Affiliation(s)
- Kateryna D Makova
- Department of Biology, Huck Institute for Genome Sciences, The Pennsylvania State University, University Park, State College, Pennsylvania 16802, USA
| | - Ross C Hardison
- Department of Biochemistry and Molecular Biology, Huck Institute for Genome Sciences, The Pennsylvania State University, University Park, State College, Pennsylvania 16802, USA
| |
Collapse
|
23
|
Fares M. Modeling Evolution of Molecular Sequences. NATURAL SELECTION 2014:28-47. [DOI: 10.1201/b17795-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
24
|
Abstract
Mutational heterogeneity must be taken into account when reconstructing evolutionary histories, calibrating molecular clocks, and predicting links between genes and disease. Selective pressures and various DNA transactions have been invoked to explain the heterogeneous distribution of genetic variation between species, within populations, and in tissue-specific tumors. To examine relationships between such heterogeneity and variations in leading- and lagging-strand replication fidelity and mismatch repair, we accumulated 40,000 spontaneous mutations in eight diploid yeast strains in the absence of selective pressure. We found that replicase error rates vary by fork direction, coding state, nucleosome proximity, and sequence context. Further, error rates and DNA mismatch repair efficiency both vary by mismatch type, responsible polymerase, replication time, and replication origin proximity. Mutation patterns implicate replication infidelity as one driver of variation in somatic and germline evolution, suggest mechanisms of mutual modulation of genome stability and composition, and predict future observations in specific cancers.
Collapse
|
25
|
Deaconescu AM. RNA polymerase between lesion bypass and DNA repair. Cell Mol Life Sci 2013; 70:4495-509. [PMID: 23807206 PMCID: PMC11113250 DOI: 10.1007/s00018-013-1384-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2013] [Revised: 05/13/2013] [Accepted: 05/23/2013] [Indexed: 11/29/2022]
Abstract
DNA damage leads to heritable changes in the genome via DNA replication. However, as the DNA helix is the site of numerous other transactions, notably transcription, DNA damage can have diverse repercussions on cellular physiology. In particular, DNA lesions have distinct effects on the passage of transcribing RNA polymerases, from easy bypass to almost complete block of transcription elongation. The fate of the RNA polymerase positioned at a lesion is largely determined by whether the lesion is structurally subtle and can be accommodated and eventually bypassed, or bulky, structurally distorting and requiring remodeling/complete dissociation of the transcription elongation complex, excision, and repair. Here we review cellular responses to DNA damage that involve RNA polymerases with a focus on bacterial transcription-coupled nucleotide excision repair and lesion bypass via transcriptional mutagenesis. Emphasis is placed on the explosion of new structural information on RNA polymerases and relevant DNA repair factors and the mechanistic models derived from it.
Collapse
Affiliation(s)
- Alexandra M Deaconescu
- Rosenstiel Basic Medical Sciences Research Center, Brandeis University, 415 South St., MS029, Waltham, MA, 02454, USA,
| |
Collapse
|
26
|
Abstract
The mammalian genome is extensively transcribed, a large fraction of which is divergent transcription from promoters and enhancers that is tightly coupled with active gene transcription. Here, we propose that divergent transcription may shape the evolution of the genome by new gene origination.
Collapse
Affiliation(s)
- Xuebing Wu
- David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Computational and Systems Biology Graduate Program, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | |
Collapse
|
27
|
Jóźwiak J, Sontowska I, Płoski R. Frequency of TSC1 and TSC2 mutations in American, British, Polish and Taiwanese populations. Mol Med Rep 2013; 8:909-13. [PMID: 23846400 DOI: 10.3892/mmr.2013.1583] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2013] [Accepted: 06/10/2013] [Indexed: 11/05/2022] Open
Abstract
Tuberous sclerosis (TS) is caused by mutation of the tumor suppressor genes, tuberous sclerosis complex 1 (TSC1) or 2 (TSC2). The aim of the present study was to compare the frequency and types of TSC1 and TSC2 mutations in American, British, Polish and Taiwanese populations. A meta‑analysis of 380 TS patients was performed. Significant differences were analyzed using the Chi-square test and one-way ANOVA analysis. Results showed a difference in frequency for the four populations analyzed. The frequency of TSC1 mutations was twice as high in the American and British populations. However, there were no significant differences in the types of mutations, with insertions of >1 nucleotide being the least frequent. Additionally, in an analysis of the complexity of nucleotide sequences it was demonstrated that the level of sequence complexity in the Polish population was significant higher compared to the remaining populations. Concerning strand bias, in the case of two types of substitutions, C>G/G>C and C>T/G>A, the ratio of corresponding mutations on the two DNA strands was approximately 3:1 and 2:1. In the present study, an increased frequency of C>G/G>C and C>T/G>A mutations in the coding strand was found in the analyzed populations. However, additional studies and larger patient cohorts are required to verify these results.
Collapse
Affiliation(s)
- Jarosław Jóźwiak
- Department of Histology and Embryology, Center for Biostructure Research, Medical University of Warsaw, PL‑02004 Warsaw, Poland
| | | | | |
Collapse
|
28
|
Gaillard H, Herrera-Moyano E, Aguilera A. Transcription-associated genome instability. Chem Rev 2013; 113:8638-61. [PMID: 23597121 DOI: 10.1021/cr400017y] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- Hélène Gaillard
- Centro Andaluz de Biología Molecular y Medicina Regenerativa CABIMER, Universidad de Sevilla , Av. Américo Vespucio s/n, 41092 Seville, Spain
| | | | | |
Collapse
|
29
|
Agier N, Romano OM, Touzain F, Cosentino Lagomarsino M, Fischer G. The spatiotemporal program of replication in the genome of Lachancea kluyveri. Genome Biol Evol 2013; 5:370-88. [PMID: 23355306 PMCID: PMC3590768 DOI: 10.1093/gbe/evt014] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/18/2013] [Indexed: 12/11/2022] Open
Abstract
We generated a genome-wide replication profile in the genome of Lachancea kluyveri and assessed the relationship between replication and base composition. This species diverged from Saccharomyces cerevisiae before the ancestral whole genome duplication. The genome comprises eight chromosomes among which a chromosomal arm of 1 Mb has a G + C-content much higher than the rest of the genome. We identified 252 active replication origins in L. kluyveri and found considerable divergence in origin location with S. cerevisiae and with Lachancea waltii. Although some global features of S. cerevisiae replication are conserved: Centromeres replicate early, whereas telomeres replicate late, we found that replication origins both in L. kluyveri and L. waltii do not behave as evolutionary fragile sites. In L. kluyveri, replication timing along chromosomes alternates between regions of early and late activating origins, except for the 1 Mb GC-rich chromosomal arm. This chromosomal arm contains an origin consensus motif different from other chromosomes and is replicated early during S-phase. We showed that precocious replication results from the specific absence of late firing origins in this chromosomal arm. In addition, we found a correlation between GC-content and distance from replication origins as well as a lack of replication-associated compositional skew between leading and lagging strands specifically in this GC-rich chromosomal arm. These findings suggest that the unusual base composition in the genome of L. kluyveri could be linked to replication.
Collapse
Affiliation(s)
- Nicolas Agier
- UPMC, UMR7238, Génomique des Microorganismes, Paris, France
- CNRS, UMR7238, Génomique des Microorganismes, Paris, France
| | | | - Fabrice Touzain
- UPMC, UMR7238, Génomique des Microorganismes, Paris, France
- CNRS, UMR7238, Génomique des Microorganismes, Paris, France
- Present address: ANSES, Ploufragan/Plouzané Laboratory Viral Genomics and Biosecurity Unit (GVB), Ploufragan, France
| | - Marco Cosentino Lagomarsino
- UPMC, UMR7238, Génomique des Microorganismes, Paris, France
- CNRS, UMR7238, Génomique des Microorganismes, Paris, France
| | - Gilles Fischer
- UPMC, UMR7238, Génomique des Microorganismes, Paris, France
- CNRS, UMR7238, Génomique des Microorganismes, Paris, France
| |
Collapse
|
30
|
Park C, Qian W, Zhang J. Genomic evidence for elevated mutation rates in highly expressed genes. EMBO Rep 2012; 13:1123-9. [PMID: 23146897 DOI: 10.1038/embor.2012.165] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2012] [Revised: 09/10/2012] [Accepted: 10/05/2012] [Indexed: 11/09/2022] Open
Abstract
Reporter gene assays have demonstrated both transcription-associated mutagenesis (TAM) and transcription-coupled repair, but the net impact of transcription on mutation rate remains unclear, especially at the genomic scale. Using comparative genomics of related species as well as mutation accumulation lines, we show in yeast that the rate of point mutation in a gene increases with the expression level of the gene. Transcription induces mutagenesis on both DNA strands, indicating simultaneous actions of several TAM mechanisms. A significant positive correlation is also detected between the human germline mutation rate and expression level. These results indicate that transcription is overall mutagenic.
Collapse
Affiliation(s)
- Chungoo Park
- Department of Ecology and Evolutionary Biology, University of Michigan, 1075 Natural Science Building, 830 North University Avenue, Ann Arbor, Michigan 48109, USA
| | | | | |
Collapse
|
31
|
The transcript-centric mutations in human genomes. GENOMICS PROTEOMICS & BIOINFORMATICS 2012; 10:11-22. [PMID: 22449397 PMCID: PMC5054492 DOI: 10.1016/s1672-0229(11)60029-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/06/2012] [Accepted: 02/15/2012] [Indexed: 01/30/2023]
Abstract
Since the human genome is mostly transcribed, genetic variations must exhibit sequence signatures reflecting the relationship between transcription processes and chromosomal structures as we have observed in unicellular organisms. In this study, a set of 646 ubiquitous expression-invariable genes (EIGs) which are present in germline cells were defined and examined based on RNA-sequencing data from multiple high-throughput transcriptomic data. We demonstrated a relationship between gene expression level and transcript-centric mutations in the human genome based on single nucleotide polymorphism (SNP) data. A significant positive correlation was shown between gene expression and mutation, where highly-expressed genes accumulate more mutations than lowly-expressed genes. Furthermore, we found four major types of transcript-centric mutations: C→T, A→G, C→G, and G→T in human genomes and identified a negative gradient of the sequence variations aligning from the 5′ end to the 3′ end of the transcription units (TUs). The periodical occurrence of these genetic variations across TUs is associated with nucleosome phasing. We propose that transcript-centric mutations are one of the major driving forces for gene and genome evolution along with creation of new genes, gene/genome duplication, and horizontal gene transfer.
Collapse
|
32
|
Xia X. DNA replication and strand asymmetry in prokaryotic and mitochondrial genomes. Curr Genomics 2012; 13:16-27. [PMID: 22942672 PMCID: PMC3269012 DOI: 10.2174/138920212799034776] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2011] [Revised: 09/26/2011] [Accepted: 10/02/2011] [Indexed: 11/22/2022] Open
Abstract
Different patterns of strand asymmetry have been documented in a variety of prokaryotic genomes as well as mitochondrial genomes. Because different replication mechanisms often lead to different patterns of strand asymmetry, much can be learned of replication mechanisms by examining strand asymmetry. Here I summarize the diverse patterns of strand asymmetry among different taxonomic groups to suggest that (1) the single-origin replication may not be universal among bacterial species as the endosymbionts Wigglesworthia glossinidia, Wolbachia species, cyanobacterium Synechocystis 6803 and Mycoplasma pulmonis genomes all exhibit strand asymmetry patterns consistent with the multiple origins of replication, (2) different replication origins in some archaeal genomes leave quite different patterns of strand asymmetry, suggesting that different replication origins in the same genome may be differentially used, (3) mitochondrial genomes from representative vertebrate species share one strand asymmetry pattern consistent with the strand-displacement replication documented in mammalian mtDNA, suggesting that the mtDNA replication mechanism in mammals may be shared among all vertebrate species, and (4) mitochondrial genomes from primitive forms of metazoans such as the sponge and hydra (representing Porifera and Cnidaria, respectively), as well as those from plants, have strand asymmetry patterns similar to single-origin or multi-origin replications observed in prokaryotes and are drastically different from mitochondrial genomes from other metazoans. This may explain why sponge and hydra mitochondrial genomes, as well as plant mitochondrial genomes, evolves much slower than those from other metazoans.
Collapse
Affiliation(s)
- Xuhua Xia
- Department of Biology and Center for Advanced Research in Environmental Genomics, University of Ottawa, 30 Marie Curie, P.O. Box 450, Station A, Ottawa, Ontario, Canada
| |
Collapse
|
33
|
Baker A, Julienne H, Chen CL, Audit B, d'Aubenton-Carafa Y, Thermes C, Arneodo A. Linking the DNA strand asymmetry to the spatio-temporal replication program. I. About the role of the replication fork polarity in genome evolution. THE EUROPEAN PHYSICAL JOURNAL. E, SOFT MATTER 2012; 35:92. [PMID: 23001787 DOI: 10.1140/epje/i2012-12092-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2012] [Revised: 08/08/2012] [Accepted: 08/21/2012] [Indexed: 06/01/2023]
Abstract
Two key cellular processes, namely transcription and replication, require the opening of the DNA double helix and act differently on the two DNA strands, generating different mutational patterns (mutational asymmetry) that may result, after long evolutionary time, in different nucleotide compositions on the two DNA strands (compositional asymmetry). We elaborate on the simplest model of neutral substitution rates that takes into account the strand asymmetries generated by the transcription and replication processes. Using perturbation theory, we then solve the time evolution of the DNA composition under strand-asymmetric substitution rates. In our minimal model, the compositional and substitutional asymmetries are predicted to decompose into a transcription- and a replication-associated components. The transcription-associated asymmetry increases in magnitude with transcription rate and changes sign with gene orientation while the replication-associated asymmetry is proportional to the replication fork polarity. These results are confirmed experimentally in the human genome, using substitution rates obtained by aligning the human and chimpanzee genomes using macaca and orangutan as outgroups, and replication fork polarity determined in the HeLa cell line as estimated from the derivative of the mean replication timing. When further investigating the dynamics of compositional skew evolution, we show that it is not at equilibrium yet and that its evolution is an extremely slow process with characteristic time scales of several hundred Myrs.
Collapse
Affiliation(s)
- A Baker
- Université de Lyon, Lyon, France
| | | | | | | | | | | | | |
Collapse
|
34
|
Khrustalev VV, Barkovsky EV. A blueprint for a mutationist theory of replicative strand asymmetries formation. Curr Genomics 2012; 13:55-64. [PMID: 22942675 PMCID: PMC3269017 DOI: 10.2174/138920212799034730] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Revised: 09/15/2011] [Accepted: 09/29/2011] [Indexed: 11/26/2022] Open
Abstract
In the present review, we summarized current knowledge on replicative strand asymmetries in prokaryotic genomes. A cornerstone for the creation of a theory of their formation has been overviewed. According to our recent works, the probability of nonsense mutation caused by replication-associated mutational pressure is higher for genes from lagging strands than for genes from leading strands of both bacterial and archaeal genomes. Lower density of open reading frames in lagging strands can be explained by faster rates of nonsense mutations in genes situated on them. According to the asymmetries in nucleotide usage in fourfold and twofold degenerate sites, the direction of replication-associated mutational pressure for genes from lagging strands is usually the same as the direction of transcription-associated mutational pressure. It means that lagging strands should accumulate more 8-oxo-G, uracil and 5-formyl-uracil, respectively. In our opinion, consequences of cytosine deamination (C to T transitions) do not lead to the decrease of cytosine usage in genes from lagging strands because of the consequences of thymine oxidation (T to C transitions), while guanine oxidation (causing G to T transversions) makes the main contribution into the decrease of guanine usage in fourfold degenerate sites of genes from lagging strands. Nucleotide usage asymmetries and bias in density of coding regions can be found in archaeal genomes, although, the percent of "inversed" asymmetries is much higher for them than for bacterial genomes. "Homogenized" and "inversed" replicative strand asymmetries in archaeal genomes can be used as retrospective indexes for detection of OriC translocations and large inversions.
Collapse
Affiliation(s)
- Vladislav V Khrustalev
- Department of General Chemistry, Belarussian State Medical University, Belarus, Minsk, Dzerzinskogo, 83, Russia
| | | |
Collapse
|
35
|
Wu H, Zhang Z, Hu S, Yu J. On the molecular mechanism of GC content variation among eubacterial genomes. Biol Direct 2012; 7:2. [PMID: 22230424 PMCID: PMC3274465 DOI: 10.1186/1745-6150-7-2] [Citation(s) in RCA: 79] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2011] [Accepted: 01/10/2012] [Indexed: 12/02/2022] Open
Abstract
Background As a key parameter of genome sequence variation, the GC content of bacterial genomes has been investigated for over half a century, and many hypotheses have been put forward to explain this GC content variation and its relationship to other fundamental processes. Previously, we classified eubacteria into dnaE-based groups (the dimeric combination of DNA polymerase III alpha subunits), according to a hypothesis where GC content variation is essentially governed by genome replication and DNA repair mechanisms. Further investigation led to the discovery that two major mutator genes, polC and dnaE2, may be responsible for genomic GC content variation. Consequently, an in-depth analysis was conducted to evaluate various potential intrinsic and extrinsic factors in association with GC content variation among eubacterial genomes. Results Mutator genes, especially those with dominant effects on the mutation spectra, are biased towards either GC or AT richness, and they alter genomic GC content in the two opposite directions. Increased bacterial genome size (or gene number) appears to rely on increased genomic GC content; however, it is unclear whether the changes are directly related to certain environmental pressures. Certain environmental and bacteriological features are related to GC content variation, but their trends are more obvious when analyzed under the dnaE-based grouping scheme. Most terrestrial, plant-associated, and nitrogen-fixing bacteria are members of the dnaE1|dnaE2 group, whereas most pathogenic or symbiotic bacteria in insects, and those dwelling in aquatic environments, are largely members of the dnaE1|polV group. Conclusion Our studies provide several lines of evidence indicating that DNA polymerase III α subunit and its isoforms participating in either replication (such as polC) or SOS mutagenesis/translesion synthesis (such as dnaE2), play dominant roles in determining GC variability. Other environmental or bacteriological factors, such as genome size, temperature, oxygen requirement, and habitat, either play subsidiary roles or rely indirectly on different mutator genes to fine-tune the GC content. These results provide a comprehensive insight into mechanisms of GC content variation and the robustness of eubacterial genomes in adapting their ever-changing environments over billions of years. Reviewers This paper was reviewed by Nicolas Galtier, Adam Eyre-Walker, and Eugene Koonin.
Collapse
Affiliation(s)
- Hao Wu
- James D Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310007, China
| | | | | | | |
Collapse
|
36
|
McLean MA, Tirosh I. Opposite GC skews at the 5' and 3' ends of genes in unicellular fungi. BMC Genomics 2011; 12:638. [PMID: 22208287 PMCID: PMC3315797 DOI: 10.1186/1471-2164-12-638] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Accepted: 12/30/2011] [Indexed: 11/24/2022] Open
Abstract
Background GC-skews have previously been linked to transcription in some eukaryotes. They have been associated with transcription start sites, with the coding strand G-biased in mammals and C-biased in fungi and invertebrates. Results We show a consistent and highly significant pattern of GC-skew within genes of almost all unicellular fungi. The pattern of GC-skew is asymmetrical: the coding strand of genes is typically C-biased at the 5' ends but G-biased at the 3' ends, with intermediate skews at the middle of genes. Thus, the initiation, elongation, and termination phases of transcription are associated with different skews. This pattern influences the encoded proteins by generating differential usage of amino acids at the 5' and 3' ends of genes. These biases also affect fourfold-degenerate positions and extend into promoters and 3' UTRs, indicating that skews cannot be accounted by selection for protein function or translation. Conclusions We propose two explanations, the mutational pressure hypothesis, and the adaptive hypothesis. The mutational pressure hypothesis is that different co-factors bind to RNA pol II at different phases of transcription, producing different mutational regimes. The adaptive hypothesis is that cytidine triphosphate deficiency may lead to C-avoidance at the 3' ends of transcripts to control the flow of RNA pol II molecules and reduce their frequency of collisions.
Collapse
Affiliation(s)
- Malcolm A McLean
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel.
| | | |
Collapse
|
37
|
Agier N, Fischer G. The Mutational Profile of the Yeast Genome Is Shaped by Replication. Mol Biol Evol 2011; 29:905-13. [DOI: 10.1093/molbev/msr280] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
|
38
|
Guo FB. [Strong strand specific composition bias-a genomic character of some obligate parasites or symbionts]. YI CHUAN = HEREDITAS 2011; 33:1039-1047. [PMID: 21993278 DOI: 10.3724/sp.j.1005.2011.01039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
DNA replication includes a set of asymmetric mechanisms, which is a division into lagging and leading strands. The former is synthesized continuously whereas the synthesis for the latter is discontinuous. Such a asymmetric mechanism leads to distinct nucleotide composition of these two strands. Strands specific nucleotide composition bias was originally found in genomes of echinoderm and vertebrate mitochondria and then in several bacterial genomes. With the rapid growth in the number of sequenced genomes, many bacteria and even eukaryotes are found to have the consistent strand composition bias. In some bacteria, the extent of strand specific composition bias was so strong that genes on the two replicating strands could be separated according to their codon usages. Till now, 11 obligate intracellular bacteria have been found to have separate codon usages according to whether genes located on the leading or lagging strands. However, there is still not a well-accepted theory that could interpret the reason for the occurrence of separate codon usages in some special bacterial genomes and not in others. This paper reviews the related works and points out its open problems.
Collapse
Affiliation(s)
- Feng-Biao Guo
- University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
39
|
Cooper DN, Bacolla A, Férec C, Vasquez KM, Kehrer-Sawatzki H, Chen JM. On the sequence-directed nature of human gene mutation: the role of genomic architecture and the local DNA sequence environment in mediating gene mutations underlying human inherited disease. Hum Mutat 2011; 32:1075-99. [PMID: 21853507 PMCID: PMC3177966 DOI: 10.1002/humu.21557] [Citation(s) in RCA: 90] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2011] [Accepted: 06/17/2011] [Indexed: 12/21/2022]
Abstract
Different types of human gene mutation may vary in size, from structural variants (SVs) to single base-pair substitutions, but what they all have in common is that their nature, size and location are often determined either by specific characteristics of the local DNA sequence environment or by higher order features of the genomic architecture. The human genome is now recognized to contain "pervasive architectural flaws" in that certain DNA sequences are inherently mutation prone by virtue of their base composition, sequence repetitivity and/or epigenetic modification. Here, we explore how the nature, location and frequency of different types of mutation causing inherited disease are shaped in large part, and often in remarkably predictable ways, by the local DNA sequence environment. The mutability of a given gene or genomic region may also be influenced indirectly by a variety of noncanonical (non-B) secondary structures whose formation is facilitated by the underlying DNA sequence. Since these non-B DNA structures can interfere with subsequent DNA replication and repair and may serve to increase mutation frequencies in generalized fashion (i.e., both in the context of subtle mutations and SVs), they have the potential to serve as a unifying concept in studies of mutational mechanisms underlying human inherited disease.
Collapse
Affiliation(s)
- David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, United Kingdom.
| | | | | | | | | | | |
Collapse
|
40
|
Khobta A, Epe B. Interactions between DNA damage, repair, and transcription. Mutat Res 2011; 736:5-14. [PMID: 21907218 DOI: 10.1016/j.mrfmmm.2011.07.014] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2010] [Revised: 06/22/2011] [Accepted: 07/25/2011] [Indexed: 01/16/2023]
Abstract
This review addresses a variety of mechanisms by which DNA repair interacts with transcription and vice versa. Blocking of transcriptional elongation is the best studied of these mechanisms. Transcription recovery after damage therefore has often been used as a surrogate marker of DNA repair in cells. However, it has become evident that relationships between DNA damage, repair, and transcription are more complex due to various indirect effects of DNA damage on gene transcription. These include inhibition of transcription by DNA repair intermediates as well as regulation of transcription and of the epigenetic status of the genes by DNA repair-related mechanisms. In addition, since transcription is emerging as an important endogenous source of DNA damage in cells, we briefly summarise recent advances in understanding the nature of co-transcriptionally induced DNA damage and the DNA repair pathways involved.
Collapse
Affiliation(s)
- Andriy Khobta
- Institute of Pharmacy and Biochemistry, University of Mainz, Mainz, Germany
| | | |
Collapse
|
41
|
Arbiza L, Patricio M, Dopazo H, Posada D. Genome-wide heterogeneity of nucleotide substitution model fit. Genome Biol Evol 2011; 3:896-908. [PMID: 21824869 PMCID: PMC3175760 DOI: 10.1093/gbe/evr080] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
At a genomic scale, the patterns that have shaped molecular evolution are believed to be largely heterogeneous. Consequently, comparative analyses should use appropriate probabilistic substitution models that capture the main features under which different genomic regions have evolved. While efforts have concentrated in the development and understanding of model selection techniques, no descriptions of overall relative substitution model fit at the genome level have been reported. Here, we provide a characterization of best-fit substitution models across three genomic data sets including coding regions from mammals, vertebrates, and Drosophila (24,000 alignments). According to the Akaike Information Criterion (AIC), 82 of 88 models considered were selected as best-fit models at least in one occasion, although with very different frequencies. Most parameter estimates also varied broadly among genes. Patterns found for vertebrates and Drosophila were quite similar and often more complex than those found in mammals. Phylogenetic trees derived from models in the 95% confidence interval set showed much less variance and were significantly closer to the tree estimated under the best-fit model than trees derived from models outside this interval. Although alternative criteria selected simpler models than the AIC, they suggested similar patterns. All together our results show that at a genomic scale, different gene alignments for the same set of taxa are best explained by a large variety of different substitution models and that model choice has implications on different parameter estimates including the inferred phylogenetic trees. After taking into account the differences related to sample size, our results suggest a noticeable diversity in the underlying evolutionary process. All together, we conclude that the use of model selection techniques is important to obtain consistent phylogenetic estimates from real data at a genomic scale.
Collapse
Affiliation(s)
- Leonardo Arbiza
- Department of Biochemistry, Genetics, and Immunology, University of Vigo, Vigo, Spain
| | | | | | | |
Collapse
|
42
|
Mugal CF, Ellegren H. Substitution rate variation at human CpG sites correlates with non-CpG divergence, methylation level and GC content. Genome Biol 2011; 12:R58. [PMID: 21696599 PMCID: PMC3218846 DOI: 10.1186/gb-2011-12-6-r58] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2011] [Revised: 05/04/2011] [Accepted: 06/22/2011] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND A major goal in the study of molecular evolution is to unravel the mechanisms that induce variation in the germ line mutation rate and in the genome-wide mutation profile. The rate of germ line mutation is considerably higher for cytosines at CpG sites than for any other nucleotide in the human genome, an increase commonly attributed to cytosine methylation at CpG sites. The CpG mutation rate, however, is not uniform across the genome and, as methylation levels have recently been shown to vary throughout the genome, it has been hypothesized that methylation status may govern variation in the rate of CpG mutation. RESULTS Here, we use genome-wide methylation data from human sperm cells to investigate the impact of DNA methylation on the CpG substitution rate in introns of human genes. We find that there is a significant correlation between the extent of methylation and the substitution rate at CpG sites. Further, we show that the CpG substitution rate is positively correlated with non-CpG divergence, suggesting susceptibility to factors responsible for the general mutation rate in the genome, and negatively correlated with GC content. We only observe a minor contribution of gene expression level, while recombination rate appears to have no significant effect. CONCLUSIONS Our study provides the first direct empirical support for the hypothesis that variation in the level of germ line methylation contributes to substitution rate variation at CpG sites. Moreover, we show that other genomic features also impact on CpG substitution rate variation.
Collapse
Affiliation(s)
- Carina F Mugal
- Department of Evolutionary Biology, Uppsala University, Norbyvägen 18D, Uppsala, Sweden
| | | |
Collapse
|
43
|
Nakken S, Rødland EA, Hovig E. Impact of DNA physical properties on local sequence bias of human mutation. Hum Mutat 2010; 31:1316-25. [PMID: 20886615 DOI: 10.1002/humu.21371] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2010] [Accepted: 08/31/2010] [Indexed: 01/07/2023]
Abstract
In selectively neutral regions of the human genome, nucleotide substitutions do not occur at random with respect to the local DNA sequence neighborhood. However, apart from the hypermutability of methylated CpG dinucleotides, which can explain the overrepresentation of nucleotide transitions in this context, the sequence-specific factors underlying point mutation bias remain largely to be determined, both in nature and in quantitative impact. One hypothesis suggests that the physical characteristics of a DNA context could have a modulating effect on its mutability, adjusting the impact of damage or the efficiency of repair. Here, we report a genome-wide computational test of this hypothesis, in which we utilize a constrained set of human non-CpG SNPs as the source of selectively neutral germline mutations. Interestingly, we observe that the quantitative context-dependencies of some substitution types display significant associations to measures of local structural topography and helix stability in DNA. Most prominently, we find that the local sequence bias of transition mutations is significantly associated with the sequence-dependent level of helix instability imposed by the potentially underlying DNA mismatches. The results of our work indicate the extent to which DNA physical properties could have shaped the recent point mutational spectrum in the human genome.
Collapse
Affiliation(s)
- Sigve Nakken
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Norwegian Radium Hospital, Norway
| | | | | |
Collapse
|
44
|
Abstract
The accumulation of base substitutions (mutations) not subject to natural selection is the neutral mutation rate. Because this rate reflects the in vivo processes involved in maintaining the integrity of genetic information, the factors that affect the neutral mutation rate are of considerable interest. Mammals exhibit two dramatically different neutral mutation rates: the CpG mutation rate, wherein the C of most CpGs (i.e., methyl-CpG) mutate at 10-50 times that of C in any other context or of any other base. The latter mutations constitute the non-CpG rate. The high CpG rate results from the spontaneous deamination of methyl-C to T and incomplete restoration of the ensuing T:G mismatches to C:Gs. Here, we determined the neutral non-CpG mutation rate as a function of CpG content by comparing sequence divergence of thousands of pairs of neutrally evolving chimpanzee and human orthologs that differ primarily in CpG content. Both the mutation rate and the mutational spectrum (transition/transversion ratio) of non-CpG residues change in parallel as sigmoidal (logistic) functions of CpG content. As different mechanisms generate transitions and transversions, these results indicate that both mutation rate and mutational processes are contingent on the local CpG content. We consider several possible mechanisms that might explain how CpG exerts these effects.
Collapse
Affiliation(s)
- Jean-Claude Walser
- Section on Genomic Structure and Function, Laboratory of Molecular and Cellular Biology, National Institute of Diabetes and Digestive and Kidney diseases, National Institutes of Health, Bethesda, Maryland 20892-0830, USA
| | | |
Collapse
|
45
|
Kim H, Lee BS, Tomita M, Kanai A. Transcription-associated mutagenesis increases protein sequence diversity more effectively than does random mutagenesis in Escherichia coli. PLoS One 2010; 5:e10567. [PMID: 20479947 PMCID: PMC2866735 DOI: 10.1371/journal.pone.0010567] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2009] [Accepted: 04/19/2010] [Indexed: 01/15/2023] Open
Abstract
Background During transcription, the nontranscribed DNA strand becomes single-stranded DNA (ssDNA), which can form secondary structures. Unpaired bases in the ssDNA are less protected from mutagens and hence experience more mutations than do paired bases. These mutations are called transcription-associated mutations. Transcription-associated mutagenesis is increased under stress and depends on the DNA sequence. Therefore, selection might significantly influence protein-coding sequences in terms of the transcription-associated mutability per transcription event under stress to improve the survival of Escherichia coli. Methodology/Principal Findings The mutability index (MI) was developed by Wright et al. to estimate the relative transcription-associated mutability of bases per transcription event. Using the most stable fold of each ssDNA that have an average length n, MI was defined as (the number of folds in which the base is unpaired)/n×(highest –ΔG of all n folds in which the base is unpaired), where ΔG is the free energy. The MI values show a significant correlation with mutation data under stress but not with spontaneous mutations in E. coli. Protein sequence diversity is preferred under stress but not under favorable conditions. Therefore, we evaluated the selection pressure on MI in terms of the protein sequence diversity for all the protein-coding sequences in E. coli. The distributions of the MI values were lower at bases that could be substituted with each of the other three bases without affecting the amino acid sequence than at bases that could not be so substituted. Start codons had lower distributions of MI values than did nonstart codons. Conclusions/Significance Our results suggest that the majority of protein-coding sequences have evolved to promote protein sequence diversity and to reduce gene knockout under stress. Consequently, transcription-associated mutagenesis increases protein sequence diversity more effectively than does random mutagenesis under stress. Nonrandom transcription-associated mutagenesis under stress should improve the survival of E. coli.
Collapse
Affiliation(s)
- Hyunchul Kim
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan
- Systems Biology Program, Graduate School of Media and Governance, Keio University, Fujisawa, Japan
| | - Baek-Seok Lee
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan
| | - Masaru Tomita
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan
- Systems Biology Program, Graduate School of Media and Governance, Keio University, Fujisawa, Japan
| | - Akio Kanai
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan
- Systems Biology Program, Graduate School of Media and Governance, Keio University, Fujisawa, Japan
- * E-mail:
| |
Collapse
|
46
|
Mugal CF, Wolf JBW, von Grünberg HH, Ellegren H. Conservation of neutral substitution rate and substitutional asymmetries in mammalian genes. Genome Biol Evol 2010; 2:19-28. [PMID: 20333222 PMCID: PMC2839347 DOI: 10.1093/gbe/evp056] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/22/2009] [Indexed: 12/21/2022] Open
Abstract
Local variation in neutral substitution rate across mammalian genomes is governed by several factors, including sequence context variables and structural variables. In addition, the interplay of replication and transcription, known to induce a strand bias in mutation rate, gives rise to variation in substitutional strand asymmetries. Here, we address the conservation of variation in mutation rate and substitutional strand asymmetries using primate- and rodent-specific repeat elements located within the introns of protein-coding genes. We find significant but weak conservation of local mutation rates between human and mouse orthologs. Likewise, substitutional strand asymmetries are conserved between human and mouse, where substitution rate asymmetries show a higher degree of conservation than mutation rate. Moreover, we provide evidence that replication and transcription are correlated to the strength of substitutional asymmetries. The effect of transcription is particularly visible for genes with highly conserved gene expression. In comparison with replication and transcription, mutation rate influences the strength of substitutional asymmetries only marginally.
Collapse
Affiliation(s)
- C F Mugal
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden.
| | | | | | | |
Collapse
|
47
|
Pink CJ, Hurst LD. Timing of replication is a determinant of neutral substitution rates but does not explain slow Y chromosome evolution in rodents. Mol Biol Evol 2009; 27:1077-86. [PMID: 20026481 DOI: 10.1093/molbev/msp314] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Mutation rates, assayed as substitution rates of putatively neutral sites, are highly variable around mammalian genomes: There is heterogeneity between genes, between autosomes, and between X, Y, and autosomes. The differences between X, Y, and autosomes are typically assumed to reflect the greater number of cell divisions in the male germ-line. Such an effect can neither account for within-autosome differences nor does it predict the differences between X, Y, and autosome observed in rodents. It has recently been proposed that in primates, the time during S-phase when a gene is replicated is an important determinant of neutral rates of evolution. Here we ask 1) whether we can replicate this result in rodents, 2) whether different autosomes replicate on average at different times, and 3) whether this might explain differences in their substitution rates. Finally we ask 4) whether X, Y, and autosome replicate at different times and 5) whether any difference might explain why the number of replication events alone cannot explain their substitution rates. We find that, as in primates, autosomal intronic rates of evolution increase significantly during S-phase. Different autosomes do have different average replication times, and together with rearrangement, this is a significant predictor of between-autosome differences in substitution rate. Although we find that autosomal, X-, and Y-linked genes replicate at different times, it is paradoxical that the Y-linked genes replicate latest, and replicate more often, but are not especially fast evolving. These results support the hypothesis that replication timing is an important source of substitution rate heterogeneity.
Collapse
Affiliation(s)
- Catherine J Pink
- Department of Biology and Biochemistry, University of Bath, Somerset, United Kingdom
| | | |
Collapse
|
48
|
Increased rate of human mutations where DNA and RNA polymerases collide. Trends Genet 2009; 25:523-7. [PMID: 19853958 DOI: 10.1016/j.tig.2009.10.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2009] [Revised: 10/05/2009] [Accepted: 10/05/2009] [Indexed: 12/27/2022]
Abstract
Gene density and orientation of genes in eukaryotes seem to be correlated with the replication origin and the mutation rate is greater in late replicating regions; however, the reason for these patterns is unknown. Here, we investigate predicted replication origins in the human genome and find that levels of polymorphism as well as divergence from the chimpanzee genome are greater in genes transcribed on the lagging strand than those on the leading strand. This might be caused by interference between RNA and DNA polymerases, and avoidance of collisions between these enzymes might be an evolutionary force shaping gene orientation and density surrounding replication start sites. Physical constraints might have a larger influence on genome evolution than previously thought.
Collapse
|
49
|
Poptsova MS, Larionov SA, Ryadchenko EV, Rybalko SD, Zakharov IA, Loskutov A. Hidden chromosome symmetry: in silico transformation reveals symmetry in 2D DNA walk trajectories of 671 chromosomes. PLoS One 2009; 4:e6396. [PMID: 19636424 PMCID: PMC2712679 DOI: 10.1371/journal.pone.0006396] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2009] [Accepted: 06/23/2009] [Indexed: 11/18/2022] Open
Abstract
Maps of 2D DNA walk of 671 examined chromosomes show composition complexity change from symmetrical half-turn in bacteria to pseudo-random trajectories in archaea, fungi and humans. In silico transformation of gene order and strand position returns most of the analyzed chromosomes to a symmetrical bacterial-like state with one transition point. The transformed chromosomal sequences also reveal remarkable segmental compositional symmetry between regions from different strands located equidistantly from the transition point. Despite extensive chromosome rearrangement the relation of gene numbers on opposite strands for chromosomes of different taxa varies in narrow limits around unity with Pearson coefficient r = 0.98. Similar relation is observed for total genes' length (r = 0.86) and cumulative GC (r = 0.95) and AT (r = 0.97) skews. This is also true for human coding sequences (CDS), which comprise only several percent of the entire chromosome length. We found that frequency distributions of the length of gene clusters, continuously located on the same strand, have close values for both strands. Eukaryotic gene distribution is believed to be non-random. Contribution of different subsystems to the noted symmetries and distributions, and evolutionary aspects of symmetry are discussed.
Collapse
Affiliation(s)
- Maria S Poptsova
- University of Connecticut, Storrs, Connecticut, United States of America.
| | | | | | | | | | | |
Collapse
|