1
|
Zani A, Messali S, Bugatti A, Uggeri M, Rondina A, Sclavi L, Caccuri F, Caruso A. Molecular mechanisms behind the generation of pro-oncogenic HIV-1 matrix protein p17 variants. J Gen Virol 2024; 105. [PMID: 38687324 DOI: 10.1099/jgv.0.001982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2024] Open
Abstract
HIV-1 matrix protein p17 variants (vp17s), characterized by amino acid insertions at the COOH-terminal region of the viral protein, have been recently identified and studied for their biological activity. Different from their wild-type counterpart (refp17), vp17s display a potent B cell growth and clonogenic activity. Recent data have highlighted the higher prevalence of vp17s in people living with HIV-1 (PLWH) with lymphoma compared with those without lymphoma, suggesting that vp17s may play a key role in lymphomagenesis. Molecular mechanisms involved in vp17 development are still unknown. Here we assessed the efficiency of HIV-1 Reverse Transcriptase (RT) in processing this genomic region and highlighted the existence of hot spots of mutation in Gag, at the end of the matrix protein and close to the matrix-capsid junction. This is possibly due to the presence of inverted repeats and palindromic sequences together with a high content of Adenine in the 322-342 nucleotide portion, which constrain HIV-1 RT to pause on the template. To define the recombinogenic properties of hot spots of mutation in the matrix gene, we developed plasmid vectors expressing Gag and a minimally modified Gag variant, and measured homologous recombination following cell co-nucleofection by next-generation sequencing. Data obtained allowed us to show that a wide range of recombination events occur in concomitance with the identified hot spots of mutation and that imperfect events may account for vp17s generation.
Collapse
Affiliation(s)
- Alberto Zani
- Section of Microbiology, Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
| | - Serena Messali
- Section of Microbiology, Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
| | - Antonella Bugatti
- Section of Microbiology, Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
| | - Matteo Uggeri
- Section of Microbiology, Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
| | - Alessandro Rondina
- Section of Microbiology, Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
| | - Leonardo Sclavi
- Section of Microbiology, Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
| | - Francesca Caccuri
- Section of Microbiology, Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
- Center for Advanced Medical and Pharmaceutical Research (CCAMF), George Emil Palade University of Medicine, Pharmacy, Science and Technology, Târgu Mures, Romania
| | - Arnaldo Caruso
- Section of Microbiology, Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
- Center for Advanced Medical and Pharmaceutical Research (CCAMF), George Emil Palade University of Medicine, Pharmacy, Science and Technology, Târgu Mures, Romania
| |
Collapse
|
2
|
Yu Y, Wang X, Fox J, Li Q, Yu Y, Hastings PJ, Chen K, Ira G. RPA and Rad27 limit templated and inverted insertions at DNA breaks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.07.583931. [PMID: 38496432 PMCID: PMC10942419 DOI: 10.1101/2024.03.07.583931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Formation of templated insertions at DNA double-strand breaks (DSBs) is very common in cancer cells. The mechanisms and enzymes regulating these events are largely unknown. Here, we investigated templated insertions in yeast at DSBs using amplicon sequencing across a repaired locus. We document very short (most ∼5-34 bp), templated inverted duplications at DSBs. They are generated through a foldback mechanism that utilizes microhomologies adjacent to the DSB. Enzymatic requirements suggest a hybrid mechanism wherein one end requires Polδ-mediated synthesis while the other end is captured by nonhomologous end joining (NHEJ). This process is exacerbated in mutants with low levels or mutated RPA ( rtt105 Δ; rfa1 -t33) or extensive resection mutant ( sgs1 Δ exo1 Δ). Templated insertions from various distant genomic locations also increase in these mutants as well as in rad27 Δ and originate from fragile regions of the genome. Among complex insertions, common events are insertions of two sequences, originating from the same locus and with inverted orientation. We propose that these inversions are also formed by microhomology-mediated template switching. Taken together, we propose that a shortage of RPA typical in cancer cells is one possible factor stimulating the formation of templated insertions.
Collapse
|
3
|
Murata MM, Igari F, Urbanowicz R, Mouakkad L, Kim S, Chen Z, DiVizio D, Posadas EM, Giuliano AE, Tanaka H. A Practical Approach for Targeting Structural Variants Genome-wide in Plasma Cell-free DNA. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.25.564058. [PMID: 37961589 PMCID: PMC10634834 DOI: 10.1101/2023.10.25.564058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Plasma cell-free DNA (cfDNA) is a promising source of gene mutations for cancer detection by liquid biopsy. However, no current tests interrogate chromosomal structural variants (SVs) genome-wide. Here, we report a simple molecular and sequencing workflow called Genome-wide Analysis of Palindrome Formation (GAPF-seq) to probe DNA palindromes, a type of SV that often demarcates gene amplification. With low-throughput next-generation sequencing and automated machine learning, tumor DNA showed skewed chromosomal distributions of high-coverage 1-kb bins (HCBs), which differentiated 39 breast tumors from matched normal DNA with an average Area Under the Curve (AUC) of 0.9819. A proof-of-concept liquid biopsy study using cfDNA from prostate cancer patients and healthy individuals yielded an average AUC of 0.965. HCBs on the X chromosome emerged as a determinant feature and were associated with androgen receptor gene amplification. As a novel agnostic liquid biopsy approach, GAPF-seq could fill the technological gap offering unique cancer-specific SV profiles.
Collapse
|
4
|
Tanaka H, Murata M, Igari F, Urbanowicz R, Mouakkad L, Kim S, Chen Z, Di Vizio D, Posadas E, Giuliano A. A Practical Approach for Targeting Structural Variants Genome-wide in Plasma Cell-free DNA. RESEARCH SQUARE 2024:rs.3.rs-3492157. [PMID: 38260372 PMCID: PMC10802711 DOI: 10.21203/rs.3.rs-3492157/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Interrogating plasma cell-free DNA (cfDNA) to detect cancer offers promise; however, no current tests scan structural variants (SVs) throughout the genome. Here, we report a simple molecular workflow to enrich a tumorigenic SV (DNA palindromes/fold-back inversions) that often demarcates genomic amplification and its feasibility for cancer detection by combining low-throughput next-generation sequencing with automated machine learning (Genome-wide Analysis of Palindrome Formation, GAPF-seq). Tumor DNA signal manifested as skewed chromosomal distributions of high-coverage 1-kb bins (HCBs), differentiating 39 matched breast tumor DNA from normal DNA with an average AUC of 0.9819. In a proof-of-concept liquid biopsy study, cfDNA from 0.5 mL plasma from prostate cancer patients was sufficient for binary classification against matched buffy coat DNA with an average AUC of 0.965. HCBs on the X chromosome emerged as a determinant feature and were associated with AR amplification. GAPF-seq could generate unique cancer-specific SV profiles in an agnostic liquid biopsy setting.
Collapse
|
5
|
Arnedo-Pac C, Muiños F, Gonzalez-Perez A, Lopez-Bigas N. Hotspot propensity across mutational processes. Mol Syst Biol 2024; 20:6-27. [PMID: 38177930 PMCID: PMC10883281 DOI: 10.1038/s44320-023-00001-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 10/30/2023] [Accepted: 11/09/2023] [Indexed: 01/06/2024] Open
Abstract
The sparsity of mutations observed across tumours hinders our ability to study mutation rate variability at nucleotide resolution. To circumvent this, here we investigated the propensity of mutational processes to form mutational hotspots as a readout of their mutation rate variability at single base resolution. Mutational signatures 1 and 17 have the highest hotspot propensity (5-78 times higher than other processes). After accounting for trinucleotide mutational probabilities, sequence composition and mutational heterogeneity at 10 Kbp, most (94-95%) signature 17 hotspots remain unexplained, suggesting a significant role of local genomic features. For signature 1, the inclusion of genome-wide distribution of methylated CpG sites into models can explain most (80-100%) of the hotspot propensity. There is an increased hotspot propensity of signature 1 in normal tissues and de novo germline mutations. We demonstrate that hotspot propensity is a useful readout to assess the accuracy of mutation rate models at nucleotide resolution. This new approach and the findings derived from it open up new avenues for a range of somatic and germline studies investigating and modelling mutagenesis.
Collapse
Affiliation(s)
- Claudia Arnedo-Pac
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Centro de Investigación Biomédica en Red en Cáncer (CIBERONC), Instituto de Salud Carlos III, Madrid, Spain
| | - Ferran Muiños
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Centro de Investigación Biomédica en Red en Cáncer (CIBERONC), Instituto de Salud Carlos III, Madrid, Spain
| | - Abel Gonzalez-Perez
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- Centro de Investigación Biomédica en Red en Cáncer (CIBERONC), Instituto de Salud Carlos III, Madrid, Spain.
| | - Nuria Lopez-Bigas
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- Centro de Investigación Biomédica en Red en Cáncer (CIBERONC), Instituto de Salud Carlos III, Madrid, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
- Department of Medicine and Life Sciences (MELIS), Universitat Pompeu Fabra (UPF), Barcelona, Spain.
| |
Collapse
|
6
|
Lim SH, Kim DH, Lee JY. Molecular mechanism controlling anthocyanin composition and content in radish plants with different root colors. PLANT PHYSIOLOGY AND BIOCHEMISTRY : PPB 2023; 204:108091. [PMID: 37864927 DOI: 10.1016/j.plaphy.2023.108091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 10/03/2023] [Accepted: 10/11/2023] [Indexed: 10/23/2023]
Abstract
Radish (Raphanus sativus) roots exhibit various colors that reflect their anthocyanin compositions and contents. However, the details of the mechanism linking the expression of anthocyanin biosynthesis and their transcriptional regulators to anthocyanin composition in radish roots remained unknown. Here, we characterized the role of the anthocyanin biosynthetic enzyme flavonoid 3'-hydroxylase (RsF3'H), together with the R2R3 MYB transcription factor (TF) RsMYB1 and the basic helix-loop-helix (bHLH) TF TRANSPARENT TESTA 8 (RsTT8), in four radish plants with different root colors: white (W), deep red (DR), dark purple (DP), and dark greyish purple (DGP). The DR plant contained heterozygous for RsF3'H with low expression level and accumulated a large amount of pelargonidin, resulting in deep red color. While, the DP and DGP plants accumulated the cyanidin due to the higher expression level of functional RsF3'H. Notably, RsMYB1 and RsTT8 transcripts were abundant in all pigmented roots, but not in white roots. To investigate the differential expression of RsMYB1 and RsTT8, we compared the sequences of their promoter regions among the four radish plants, revealing variations in the numbers of cis-elements and in promoter architecture. Promoter activation assays demonstrated that variation in the RsMYB1 and RsTT8 promoters may contribute to the expression level of these genes, and RsMYB1 can activate its own expression as well as promote the RsTT8 expression. These results suggested that RsF3'H plays a vital role in anthocyanin composition and the expression level of both RsMYB1 and RsTT8 are crucial determinants for anthocyanin content in radish roots. Overall, these findings provide insight into the molecular basis of anthocyanin composition and level in radish roots.
Collapse
Affiliation(s)
- Sun-Hyung Lim
- Division of Horticultural Biotechnology, School of Biotechnology, Hankyong National University, Anseong, 17579, Republic of Korea; Research Institute of International Technology and Information, Hankyong National University, Anseong, 17579, Republic of Korea.
| | - Da-Hye Kim
- Division of Horticultural Biotechnology, School of Biotechnology, Hankyong National University, Anseong, 17579, Republic of Korea; Research Institute of International Technology and Information, Hankyong National University, Anseong, 17579, Republic of Korea
| | - Jong-Yeol Lee
- National Academy of Agricultural Science, Rural Development Administration, Jeonju, 54874, Republic of Korea
| |
Collapse
|
7
|
Rhie A, Nurk S, Cechova M, Hoyt SJ, Taylor DJ, Altemose N, Hook PW, Koren S, Rautiainen M, Alexandrov IA, Allen J, Asri M, Bzikadze AV, Chen NC, Chin CS, Diekhans M, Flicek P, Formenti G, Fungtammasan A, Garcia Giron C, Garrison E, Gershman A, Gerton JL, Grady PGS, Guarracino A, Haggerty L, Halabian R, Hansen NF, Harris R, Hartley GA, Harvey WT, Haukness M, Heinz J, Hourlier T, Hubley RM, Hunt SE, Hwang S, Jain M, Kesharwani RK, Lewis AP, Li H, Logsdon GA, Lucas JK, Makalowski W, Markovic C, Martin FJ, Mc Cartney AM, McCoy RC, McDaniel J, McNulty BM, Medvedev P, Mikheenko A, Munson KM, Murphy TD, Olsen HE, Olson ND, Paulin LF, Porubsky D, Potapova T, Ryabov F, Salzberg SL, Sauria MEG, Sedlazeck FJ, Shafin K, Shepelev VA, Shumate A, Storer JM, Surapaneni L, Taravella Oill AM, Thibaud-Nissen F, Timp W, Tomaszkiewicz M, Vollger MR, Walenz BP, Watwood AC, Weissensteiner MH, Wenger AM, Wilson MA, Zarate S, Zhu Y, Zook JM, Eichler EE, O'Neill RJ, Schatz MC, Miga KH, Makova KD, Phillippy AM. The complete sequence of a human Y chromosome. Nature 2023; 621:344-354. [PMID: 37612512 PMCID: PMC10752217 DOI: 10.1038/s41586-023-06457-y] [Citation(s) in RCA: 71] [Impact Index Per Article: 71.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 07/19/2023] [Indexed: 08/25/2023]
Abstract
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.
Collapse
Affiliation(s)
- Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Oxford Nanopore Technologies Inc., Oxford, UK
| | - Monika Cechova
- Faculty of Informatics, Masaryk University, Brno, Czech Republic
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Savannah J Hoyt
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Nicolas Altemose
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - Paul W Hook
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ivan A Alexandrov
- Federal Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
- Center for Algorithmic Biotechnology, Saint Petersburg State University, St Petersburg, Russia
- Department of Anatomy and Anthropology and Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv-Yafo, Israel
| | - Jamie Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, CA, USA
| | - Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Chen-Shan Chin
- GeneDX Holdings Corp, Stamford, CT, USA
- Foundation of Biological Data Science, Belmont, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | | | | | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ariel Gershman
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Jennifer L Gerton
- Stowers Institute for Medical Research, Kansas City, MO, USA
- University of Kansas Medical Center, Kansas City, MO, USA
| | - Patrick G S Grady
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Reza Halabian
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Nancy F Hansen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Robert Harris
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Gabrielle A Hartley
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Jakob Heinz
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Stephen Hwang
- XDBio Program, Johns Hopkins University, Baltimore, MD, USA
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Northeastern University, Boston, MA, USA
| | - Rupesh K Kesharwani
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Julian K Lucas
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Wojciech Makalowski
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Christopher Markovic
- Genome Technology Access Center at the McDonnell Genome Institute, Washington University, St. Louis, MO, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ann M Mc Cartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Jennifer McDaniel
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Brandy M McNulty
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Paul Medvedev
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
- Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Saint Petersburg State University, St Petersburg, Russia
- UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Hugh E Olsen
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Nathan D Olson
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | - Steven L Salzberg
- Departments of Biomedical Engineering, Computer Science, and Biostatistics, Johns Hopkins University, Baltimore, MD, USA
| | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | | | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | | | - Likhitha Surapaneni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Angela M Taravella Oill
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Marta Tomaszkiewicz
- Department of Biology, Pennsylvania State University, University Park, PA, USA
- Department of Biomedical Engineering, Pennsylvania State University, State College, PA, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Allison C Watwood
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | | | | | - Melissa A Wilson
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Yiming Zhu
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - Justin M Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Investigator, Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Rachel J O'Neill
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Genetics and Genome Sciences, UConn Health, Farmington, CT, USA
| | - Michael C Schatz
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Karen H Miga
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
8
|
Poulsgaard GA, Sørensen SG, Juul RI, Nielsen MM, Pedersen JS. Sequence dependencies and mutation rates of localized mutational processes in cancer. Genome Med 2023; 15:63. [PMID: 37592287 PMCID: PMC10436389 DOI: 10.1186/s13073-023-01217-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 08/02/2023] [Indexed: 08/19/2023] Open
Abstract
BACKGROUND Cancer mutations accumulate through replication errors and DNA damage coupled with incomplete repair. Individual mutational processes often show nucleotide sequence and functional region preferences. As a result, some sequence contexts mutate at much higher rates than others, with additional variation found between functional regions. Mutational hotspots, with recurrent mutations across cancer samples, represent genomic positions with elevated mutation rates, often caused by highly localized mutational processes. METHODS We count the 11-mer genomic sequences across the genome, and using the PCAWG set of 2583 pan-cancer whole genomes, we associate 11-mers with mutational signatures, hotspots of single nucleotide variants, and specific genomic regions. We evaluate the mutation rates of individual and combined sets of 11-mers and derive mutational sequence motifs. RESULTS We show that hotspots generally identify highly mutable sequence contexts. Using these, we show that some mutational signatures are enriched in hotspot sequence contexts, corresponding to well-defined sequence preferences for the underlying localized mutational processes. This includes signature 17b (of unknown etiology) and signatures 62 (POLE deficiency), 7a (UV), and 72 (linked to lymphomas). In some cases, the mutation rate and sequence preference increase further when focusing on certain genomic regions, such as signature 62 in transcribed regions, where the mutation rate is increased up to 9-folds over cancer type and mutational signature average. CONCLUSIONS We summarize our findings in a catalog of localized mutational processes, their sequence preferences, and their estimated mutation rates.
Collapse
Affiliation(s)
- Gustav Alexander Poulsgaard
- Department of Clinical Medicine, Aarhus University, Palle Juul-Jensens Boulevard 82, 8200, Aarhus N, Denmark
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Palle Juul-Jensens Boulevard 99, 8200, Aarhus N, Denmark
| | - Simon Grund Sørensen
- Department of Clinical Medicine, Aarhus University, Palle Juul-Jensens Boulevard 82, 8200, Aarhus N, Denmark
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Palle Juul-Jensens Boulevard 99, 8200, Aarhus N, Denmark
| | - Randi Istrup Juul
- Department of Clinical Medicine, Aarhus University, Palle Juul-Jensens Boulevard 82, 8200, Aarhus N, Denmark
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Palle Juul-Jensens Boulevard 99, 8200, Aarhus N, Denmark
| | - Morten Muhlig Nielsen
- Department of Clinical Medicine, Aarhus University, Palle Juul-Jensens Boulevard 82, 8200, Aarhus N, Denmark
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Palle Juul-Jensens Boulevard 99, 8200, Aarhus N, Denmark
| | - Jakob Skou Pedersen
- Department of Clinical Medicine, Aarhus University, Palle Juul-Jensens Boulevard 82, 8200, Aarhus N, Denmark.
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Palle Juul-Jensens Boulevard 99, 8200, Aarhus N, Denmark.
- Bioinformatics Research Centre (BiRC), Aarhus University, University City 81, Building 1872, 3Rd Floor, 8000, Aarhus C, Denmark.
| |
Collapse
|
9
|
Liu Z, Samee M. Structural underpinnings of mutation rate variations in the human genome. Nucleic Acids Res 2023; 51:7184-7197. [PMID: 37395403 PMCID: PMC10415140 DOI: 10.1093/nar/gkad551] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 06/06/2023] [Accepted: 06/15/2023] [Indexed: 07/04/2023] Open
Abstract
Single nucleotide mutation rates have critical implications for human evolution and genetic diseases. Importantly, the rates vary substantially across the genome and the principles underlying such variations remain poorly understood. A recent model explained much of this variation by considering higher-order nucleotide interactions in the 7-mer sequence context around mutated nucleotides. This model's success implicates a connection between DNA shape and mutation rates. DNA shape, i.e. structural properties like helical twist and tilt, is known to capture interactions between nucleotides within a local context. Thus, we hypothesized that changes in DNA shape features at and around mutated positions can explain mutation rate variations in the human genome. Indeed, DNA shape-based models of mutation rates showed similar or improved performance over current nucleotide sequence-based models. These models accurately characterized mutation hotspots in the human genome and revealed the shape features whose interactions underlie mutation rate variations. DNA shape also impacts mutation rates within putative functional regions like transcription factor binding sites where we find a strong association between DNA shape and position-specific mutation rates. This work demonstrates the structural underpinnings of nucleotide mutations in the human genome and lays the groundwork for future models of genetic variations to incorporate DNA shape.
Collapse
Affiliation(s)
- Zian Liu
- Department of Integrative Physiology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Md Abul Hassan Samee
- Department of Integrative Physiology, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
10
|
Revisiting mutagenesis at non-B DNA motifs in the human genome. Nat Struct Mol Biol 2023; 30:417-424. [PMID: 36914796 DOI: 10.1038/s41594-023-00936-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 02/03/2023] [Indexed: 03/16/2023]
Abstract
Non-B DNA structures formed by repetitive sequence motifs are known instigators of mutagenesis in experimental systems. Analyzing this phenomenon computationally in the human genome requires careful disentangling of intrinsic confounding factors, including overlapping and interrupted motifs and recurrent sequencing errors. Here, we show that accounting for these factors eliminates all signals of repeat-induced mutagenesis that extend beyond the motif boundary, and eliminates or dramatically shrinks the magnitude of mutagenesis within some motifs, contradicting previous reports. Mutagenesis not attributable to artifacts revealed several biological mechanisms. Polymerase slippage generates frequent indels within every variety of short tandem repeat motif, implicating slipped-strand structures. Interruption-correcting single nucleotide variants within short tandem repeats may originate from error-prone polymerases. Secondary-structure formation promotes single nucleotide variants within palindromic repeats and duplications within direct repeats. G-quadruplex motifs cause recurrent sequencing errors, whereas mutagenesis at Z-DNAs is conspicuously absent.
Collapse
|
11
|
Amgalan B, Wojtowicz D, Kim YA, Przytycka TM. Influence network model uncovers relations between biological processes and mutational signatures. Genome Med 2023; 15:15. [PMID: 36879282 PMCID: PMC9987115 DOI: 10.1186/s13073-023-01162-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 02/08/2023] [Indexed: 03/08/2023] Open
Abstract
BACKGROUND There has been a growing appreciation recently that mutagenic processes can be studied through the lenses of mutational signatures, which represent characteristic mutation patterns attributed to individual mutagens. However, the causal links between mutagens and observed mutation patterns as well as other types of interactions between mutagenic processes and molecular pathways are not fully understood, limiting the utility of mutational signatures. METHODS To gain insights into these relationships, we developed a network-based method, named GENESIGNET that constructs an influence network among genes and mutational signatures. The approach leverages sparse partial correlation among other statistical techniques to uncover dominant influence relations between the activities of network nodes. RESULTS Applying GENESIGNET to cancer data sets, we uncovered important relations between mutational signatures and several cellular processes that can shed light on cancer-related processes. Our results are consistent with previous findings, such as the impact of homologous recombination deficiency on clustered APOBEC mutations in breast cancer. The network identified by GENESIGNET also suggest an interaction between APOBEC hypermutation and activation of regulatory T Cells (Tregs), as well as a relation between APOBEC mutations and changes in DNA conformation. GENESIGNET also exposed a possible link between the SBS8 signature of unknown etiology and the Nucleotide Excision Repair (NER) pathway. CONCLUSIONS GENESIGNET provides a new and powerful method to reveal the relation between mutational signatures and gene expression. The GENESIGNET method was implemented in python, and installable package, source codes and the data sets used for and generated during this study are available at the Github site https://github.com/ncbi/GeneSigNet.
Collapse
Affiliation(s)
- Bayarbaatar Amgalan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, 20894, Bethesda, USA
| | - Damian Wojtowicz
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, 20894, Bethesda, USA.,Current address: Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, ul. Banacha 2, 02-097, Warszawa, Poland
| | - Yoo-Ah Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, 20894, Bethesda, USA
| | - Teresa M Przytycka
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, 20894, Bethesda, USA.
| |
Collapse
|
12
|
Makova KD, Weissensteiner MH. Noncanonical DNA structures are drivers of genome evolution. Trends Genet 2023; 39:109-124. [PMID: 36604282 PMCID: PMC9877202 DOI: 10.1016/j.tig.2022.11.005] [Citation(s) in RCA: 31] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 11/04/2022] [Accepted: 11/28/2022] [Indexed: 01/05/2023]
Abstract
In addition to the canonical right-handed double helix, other DNA structures, termed 'non-B DNA', can form in the genomes across the tree of life. Non-B DNA regulates multiple cellular processes, including replication and transcription, yet its presence is associated with elevated mutagenicity and genome instability. These discordant cellular roles fuel the enormous potential of non-B DNA to drive genomic and phenotypic evolution. Here we discuss recent studies establishing non-B DNA structures as novel functional elements subject to natural selection, affecting evolution of transposable elements (TEs), and specifying centromeres. By highlighting the contributions of non-B DNA to repeated evolution and adaptation to changing environments, we conclude that evolutionary analyses should include a perspective of not only DNA sequence, but also its structure.
Collapse
Affiliation(s)
- Kateryna D Makova
- Department of Biology, Penn State University, 310 Wartik Laboratory, University Park, PA 16802, USA.
| | | |
Collapse
|
13
|
Inverted repeats in coronavirus SARS-CoV-2 genome manifest the evolution events. J Theor Biol 2021; 530:110885. [PMID: 34478743 PMCID: PMC8406619 DOI: 10.1016/j.jtbi.2021.110885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 08/13/2021] [Accepted: 08/25/2021] [Indexed: 11/24/2022]
Abstract
The world faces a great unforeseen challenge through the COVID-19 pandemic caused by coronavirus SARS-CoV-2. The virus genome structure and evolution are positioned front and center for further understanding insights on vaccine development, monitoring of transmission trajectories, and prevention of zoonotic infections of new coronaviruses. Of particular interest are genomic elements Inverse Repeats (IRs), which maintain genome stability, regulate gene expressions, and are the targets of mutations. However, little research attention is given to the IR content analysis in the SARS-CoV-2 genome. In this study, we propose a geometric analysis method and using the method to investigate the distributions of IRs in SARS-CoV-2 and its related coronavirus genomes. The method represents each genomic IR sequence pair as a single point and constructs the geometric shape of the genome using the IRs. Thus, the IR shape can be considered as the signature of the genome. The genomes of different coronaviruses are then compared using the constructed IR shapes. The results demonstrate that SARS-CoV-2 genome, specifically, has an abundance of IRs, and the IRs in coronavirus genomes show an increase during evolution events.
Collapse
|
14
|
Seplyarskiy VB, Sunyaev S. The origin of human mutation in light of genomic data. Nat Rev Genet 2021; 22:672-686. [PMID: 34163020 DOI: 10.1038/s41576-021-00376-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/06/2021] [Indexed: 02/05/2023]
Abstract
Despite years of active research into the role of DNA repair and replication in mutagenesis, surprisingly little is known about the origin of spontaneous human mutation in the germ line. With the advent of high-throughput sequencing, genome-scale data have revealed statistical properties of mutagenesis in humans. These properties include variation of the mutation rate and spectrum along the genome at different scales in relation to epigenomic features and dependency on parental age. Moreover, mutations originated in mothers are less frequent than mutations originated in fathers and have a distinct genomic distribution. Statistical analyses that interpret these patterns in the context of known biochemistry can provide mechanistic models of mutagenesis in humans.
Collapse
Affiliation(s)
- Vladimir B Seplyarskiy
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Shamil Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. .,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
15
|
Kim YA, Leiserson MDM, Moorjani P, Sharan R, Wojtowicz D, Przytycka TM. Mutational Signatures: From Methods to Mechanisms. Annu Rev Biomed Data Sci 2021; 4:189-206. [PMID: 34465178 DOI: 10.1146/annurev-biodatasci-122320-120920] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Mutations are the driving force of evolution, yet they underlie many diseases, in particular, cancer. They are thought to arise from a combination of stochastic errors in DNA processing, naturally occurring DNA damage (e.g., the spontaneous deamination of methylated CpG sites), replication errors, and dysregulation of DNA repair mechanisms. High-throughput sequencing has made it possible to generate large datasets to study mutational processes in health and disease. Since the emergence of the first mutational process studies in 2012, this field is gaining increasing attention and has already accumulated a host of computational approaches and biomedical applications.
Collapse
Affiliation(s)
- Yoo-Ah Kim
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA;
| | - Mark D M Leiserson
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland 20742, USA
| | - Priya Moorjani
- Department of Molecular and Cell Biology and Center for Computational Biology, University of California, Berkeley, California 94720, USA
| | - Roded Sharan
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | - Damian Wojtowicz
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA;
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA;
| |
Collapse
|
16
|
Riddiford N, Siudeja K, van den Beek M, Boumard B, Bardin AJ. Evolution and genomic signatures of spontaneous somatic mutation in Drosophila intestinal stem cells. Genome Res 2021; 31:1419-1432. [PMID: 34168010 PMCID: PMC8327918 DOI: 10.1101/gr.268441.120] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 06/15/2021] [Indexed: 12/15/2022]
Abstract
Spontaneous mutations can alter tissue dynamics and lead to cancer initiation. Although large-scale sequencing projects have illuminated processes that influence somatic mutation and subsequent tumor evolution, the mutational dynamics operating in the very early stages of cancer development are currently not well understood. To explore mutational processes in the early stages of cancer evolution, we exploited neoplasia arising spontaneously in the Drosophila intestine. Analysing whole-genome sequencing data with a dedicated bioinformatic pipeline, we found neoplasia formation to be driven largely through the inactivation of Notch by structural variants, many of which involve highly complex genomic rearrangements. The genome-wide mutational burden in neoplasia was found to be similar to that of several human cancers. Finally, we identified genomic features associated with spontaneous mutation, and defined the evolutionary dynamics and mutational landscape operating within intestinal neoplasia over the short lifespan of the adult fly. Our findings provide unique insight into mutational dynamics operating over a short timescale in the genetic model system, Drosophila melanogaster.
Collapse
Affiliation(s)
- Nick Riddiford
- Institut Curie, PSL Research University, CNRS UMR 3215, INSERM U934, Stem Cells and Tissue Homeostasis Group, 75005 Paris, France
| | - Katarzyna Siudeja
- Institut Curie, PSL Research University, CNRS UMR 3215, INSERM U934, Stem Cells and Tissue Homeostasis Group, 75005 Paris, France
| | - Marius van den Beek
- Institut Curie, PSL Research University, CNRS UMR 3215, INSERM U934, Stem Cells and Tissue Homeostasis Group, 75005 Paris, France
| | - Benjamin Boumard
- Institut Curie, PSL Research University, CNRS UMR 3215, INSERM U934, Stem Cells and Tissue Homeostasis Group, 75005 Paris, France
| | - Allison J Bardin
- Institut Curie, PSL Research University, CNRS UMR 3215, INSERM U934, Stem Cells and Tissue Homeostasis Group, 75005 Paris, France
| |
Collapse
|
17
|
Arnedo-Pac C, Mularoni L, Muiños F, Gonzalez-Perez A, Lopez-Bigas N. OncodriveCLUSTL: a sequence-based clustering method to identify cancer drivers. Bioinformatics 2020; 35:4788-4790. [PMID: 31228182 PMCID: PMC6853674 DOI: 10.1093/bioinformatics/btz501] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Revised: 04/25/2019] [Accepted: 06/18/2019] [Indexed: 12/12/2022] Open
Abstract
Motivation Identification of the genomic alterations driving tumorigenesis is one of the main goals in oncogenomics research. Given the evolutionary principles of cancer development, computational methods that detect signals of positive selection in the pattern of tumor mutations have been effectively applied in the search for cancer genes. One of these signals is the abnormal clustering of mutations, which has been shown to be complementary to other signals in the detection of driver genes. Results We have developed OncodriveCLUSTL, a new sequence-based clustering algorithm to detect significant clustering signals across genomic regions. OncodriveCLUSTL is based on a local background model derived from the simulation of mutations accounting for the composition of tri- or penta-nucleotide context substitutions observed in the cohort under study. Our method can identify known clusters and bona-fide cancer drivers across cohorts of tumor whole-exomes, outperforming the existing OncodriveCLUST algorithm and complementing other methods based on different signals of positive selection. Our results indicate that OncodriveCLUSTL can be applied to the analysis of non-coding genomic elements and non-human mutations data. Availability and implementation OncodriveCLUSTL is available as an installable Python 3.5 package. The source code and running examples are freely available at https://bitbucket.org/bbglab/oncodriveclustl under GNU Affero General Public License. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Claudia Arnedo-Pac
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Loris Mularoni
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Ferran Muiños
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Abel Gonzalez-Perez
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.,Research Program on Biomedical Informatics, Universitat Pompeu Fabra, Barcelona, Spain
| | - Nuria Lopez-Bigas
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.,Research Program on Biomedical Informatics, Universitat Pompeu Fabra, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluís Companys 23, Barcelona 08010, Spain
| |
Collapse
|
18
|
Janel-Bintz R, Kuhn L, Frit P, Chicher J, Wagner J, Haracska L, Hammann P, Cordonnier AM. Proteomic Analysis of DNA Synthesis on a Structured DNA Template in Human Cellular Extracts: Interplay Between NHEJ and Replication-Associated Proteins. Proteomics 2020; 20:e1900184. [PMID: 31999075 DOI: 10.1002/pmic.201900184] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 12/19/2019] [Indexed: 01/01/2023]
Abstract
It is established that short inverted repeats trigger base substitution mutagenesis in human cells. However, how the replication machinery deals with structured DNA is unknown. It has been previously reported that in human cell-free extracts, DNA primer extension using a structured single-stranded template is transiently blocked at DNA hairpins. Here, the proteomic analysis of proteins bound to the DNA template is reported and evidence that the DNA-PK complex (DNA-PKcs and the Ku heterodimer) recognizes, and is activated by, structured single-stranded DNA is provided. Hijacking the DNA-PK complex by double-stranded oligonucleotides results in a large removal of the pausing sites and an elevated DNA extension efficiency. Conversely, DNA-PKcs inhibition results in its stabilization on the template, along with other proteins acting downstream in the Non-Homologous End-Joining (NHEJ) pathway, especially the XRCC4-DNA ligase 4 complex and the cofactor PAXX. Retention of NHEJ factors to the DNA in the absence of DNA-PKcs activity correlates with additional halts of primer extension, suggesting that these proteins hinder the progression of the DNA synthesis at these sites. Overall these results raise the possibility that, upon binding to hairpins formed onto ssDNA during fork progression, the DNA-PK complex interferes with replication fork dynamics in vivo.
Collapse
Affiliation(s)
- Régine Janel-Bintz
- Biotechnologie et Signalisation Cellulaire, Université de Strasbourg, UMR7242, CNRS, Illkirch, 67412, France
| | - Lauriane Kuhn
- Institut de Biologie Moléculaire et Cellulaire du CNRS, Plateforme Protéomique Strasbourg - Esplanade, FR1589, 67084, Strasbourg, France
| | - Philippe Frit
- Institut de Pharmacologie et Biologie Structurale, IPBS, Université de Toulouse, CNRS, UPS, Toulouse, France.,Equipe Labellisée Ligue Contre le Cancer 2018, Toulouse, France
| | - Johana Chicher
- Institut de Biologie Moléculaire et Cellulaire du CNRS, Plateforme Protéomique Strasbourg - Esplanade, FR1589, 67084, Strasbourg, France
| | - Jérôme Wagner
- Biotechnologie et Signalisation Cellulaire, Université de Strasbourg, UMR7242, CNRS, Illkirch, 67412, France
| | - Lajos Haracska
- Institute of Genetics, Biological Research Center, HU-6726, Szeged, Hungary
| | - Philippe Hammann
- Institut de Biologie Moléculaire et Cellulaire du CNRS, Plateforme Protéomique Strasbourg - Esplanade, FR1589, 67084, Strasbourg, France
| | - Agnès M Cordonnier
- Biotechnologie et Signalisation Cellulaire, Université de Strasbourg, UMR7242, CNRS, Illkirch, 67412, France
| |
Collapse
|
19
|
Baez-Ortega A, Gori K, Strakova A, Allen JL, Allum KM, Bansse-Issa L, Bhutia TN, Bisson JL, Briceño C, Castillo Domracheva A, Corrigan AM, Cran HR, Crawford JT, Davis E, de Castro KF, B de Nardi A, de Vos AP, Delgadillo Keenan L, Donelan EM, Espinoza Huerta AR, Faramade IA, Fazil M, Fotopoulou E, Fruean SN, Gallardo-Arrieta F, Glebova O, Gouletsou PG, Häfelin Manrique RF, Henriques JJGP, Horta RS, Ignatenko N, Kane Y, King C, Koenig D, Krupa A, Kruzeniski SJ, Kwon YM, Lanza-Perea M, Lazyan M, Lopez Quintana AM, Losfelt T, Marino G, Martínez Castañeda S, Martínez-López MF, Meyer M, Migneco EJ, Nakanwagi B, Neal KB, Neunzig W, Ní Leathlobhair M, Nixon SJ, Ortega-Pacheco A, Pedraza-Ordoñez F, Peleteiro MC, Polak K, Pye RJ, Reece JF, Rojas Gutierrez J, Sadia H, Schmeling SK, Shamanova O, Sherlock AG, Stammnitz M, Steenland-Smit AE, Svitich A, Tapia Martínez LJ, Thoya Ngoka I, Torres CG, Tudor EM, van der Wel MG, Viţălaru BA, Vural SA, Walkinton O, Wang J, Wehrle-Martinez AS, Widdowson SAE, Stratton MR, Alexandrov LB, Martincorena I, Murchison EP. Somatic evolution and global expansion of an ancient transmissible cancer lineage. Science 2019; 365:eaau9923. [PMID: 31371581 PMCID: PMC7116271 DOI: 10.1126/science.aau9923] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2018] [Accepted: 06/20/2019] [Indexed: 12/29/2022]
Abstract
The canine transmissible venereal tumor (CTVT) is a cancer lineage that arose several millennia ago and survives by "metastasizing" between hosts through cell transfer. The somatic mutations in this cancer record its phylogeography and evolutionary history. We constructed a time-resolved phylogeny from 546 CTVT exomes and describe the lineage's worldwide expansion. Examining variation in mutational exposure, we identify a highly context-specific mutational process that operated early in the cancer's evolution but subsequently vanished, correlate ultraviolet-light mutagenesis with tumor latitude, and describe tumors with heritable hyperactivity of an endogenous mutational process. CTVT displays little evidence of ongoing positive selection, and negative selection is detectable only in essential genes. We illustrate how long-lived clonal organisms capture changing mutagenic environments, and reveal that neutral genetic drift is the dominant feature of long-term cancer evolution.
Collapse
Affiliation(s)
- Adrian Baez-Ortega
- Transmissible Cancer Group, Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | - Kevin Gori
- Transmissible Cancer Group, Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | - Andrea Strakova
- Transmissible Cancer Group, Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | - Janice L Allen
- Animal Management in Rural and Remote Indigenous Communities (AMRRIC), Darwin, Australia
| | | | | | - Thinlay N Bhutia
- Sikkim Anti-Rabies and Animal Health Programme, Department of Animal Husbandry, Livestock, Fisheries and Veterinary Services, Government of Sikkim, India
| | - Jocelyn L Bisson
- Transmissible Cancer Group, Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
- Royal (Dick) School of Veterinary Studies and the Roslin Institute, University of Edinburgh, Easter Bush Campus, Roslin EH25 9RG, UK
| | - Cristóbal Briceño
- ConserLab, Animal Preventive Medicine Department, Faculty of Animal and Veterinary Sciences, University of Chile, Santiago, Chile
| | | | | | - Hugh R Cran
- The Nakuru District Veterinary Scheme Ltd, Nakuru, Kenya
| | | | - Eric Davis
- International Animal Welfare Training Institute, UC Davis School of Veterinary Medicine, Davis, CA, USA
| | - Karina F de Castro
- Centro Universitário de Rio Preto (UNIRP), São José do Rio Preto, São Paulo, Brazil
| | - Andrigo B de Nardi
- Department of Clinical and Veterinary Surgery, São Paulo State University (UNESP), São Paulo, Brazil
| | | | | | - Edward M Donelan
- Animal Management in Rural and Remote Indigenous Communities (AMRRIC), Darwin, Australia
| | | | | | | | - Eleni Fotopoulou
- Intermunicipal Stray Animals Care Centre (DIKEPAZ), Perama, Greece
| | | | | | | | - Pagona G Gouletsou
- Faculty of Veterinary Medicine, School of Health Sciences, University of Thessaly, Karditsa, Greece
| | - Rodrigo F Häfelin Manrique
- Veterinary Clinic El Roble, Animal Healthcare Network, Faculty of Animal and Veterinary Sciences, University of Chile, Santiago de Chile, Chile
| | | | | | | | - Yaghouba Kane
- École Inter-états des Sciences et Médecine Vétérinaires de Dakar, Dakar, Senegal
| | | | | | - Ada Krupa
- Department of Small Animal Medicine, Faculty of Veterinary Medicine, Utrecht University, Utrecht, Netherlands
| | | | - Young-Mi Kwon
- Transmissible Cancer Group, Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | | | | | | | - Thibault Losfelt
- Clinique Veterinaire de Grand Fond, Saint Gilles les Bains, Reunion, France
| | - Gabriele Marino
- Department of Veterinary Sciences, University of Messina, Messina, Italy
| | - Simón Martínez Castañeda
- Facultad de Medicina Veterinaria y Zootecnia, Universidad Autónoma del Estado de México, Toluca, Mexico
| | - Mayra F Martínez-López
- School of Veterinary Medicine, Universidad de las Américas, Quito, Ecuador
- Cancer Development and Innate Immune Evasion Lab, Champalimaud Center for the Unknown, Lisbon, Portugal
| | | | | | | | | | | | - Máire Ní Leathlobhair
- Transmissible Cancer Group, Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | | | | | | | - Maria C Peleteiro
- Interdisciplinary Centre of Research in Animal Health (CIISA), Faculty of Veterinary Medicine, University of Lisbon, Lisboa, Portugal
| | | | - Ruth J Pye
- Vets Beyond Borders, The Rocks, Australia
| | | | | | - Haleema Sadia
- Department of Biotechnology, Balochistan University of Information Technology, Engineering and Management Sciences, Quetta, Pakistan
| | | | | | | | - Maximilian Stammnitz
- Transmissible Cancer Group, Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | | | - Alla Svitich
- State Hospital of Veterinary Medicine, Dniprodzerzhynsk, Ukraine
| | | | | | - Cristian G Torres
- Laboratory of Biomedicine and Regenerative Medicine, Department of Clinical Sciences, Faculty of Animal and Veterinary Sciences, University of Chile, Santiago, Chile
| | - Elizabeth M Tudor
- Faculty of Veterinary and Agricultural Sciences, University of Melbourne, Melbourne, Australia
| | | | - Bogdan A Viţălaru
- Clinical Sciences Department, Faculty of Veterinary Medicine Bucharest, Bucharest, Romania
| | - Sevil A Vural
- Department of Pathology, Faculty of Veterinary Medicine, Ankara University, Ankara, Turkey
| | | | - Jinhong Wang
- Transmissible Cancer Group, Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | | | | | | | - Ludmil B Alexandrov
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| | | | - Elizabeth P Murchison
- Transmissible Cancer Group, Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.
| |
Collapse
|
20
|
Buisson R, Langenbucher A, Bowen D, Kwan EE, Benes CH, Zou L, Lawrence MS. Passenger hotspot mutations in cancer driven by APOBEC3A and mesoscale genomic features. Science 2019; 364:eaaw2872. [PMID: 31249028 PMCID: PMC6731024 DOI: 10.1126/science.aaw2872] [Citation(s) in RCA: 181] [Impact Index Per Article: 36.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Accepted: 05/23/2019] [Indexed: 12/12/2022]
Abstract
Cancer drivers require statistical modeling to distinguish them from passenger events, which accumulate during tumorigenesis but provide no fitness advantage to cancer cells. The discovery of driver genes and mutations relies on the assumption that exact positional recurrence is unlikely by chance; thus, the precise sharing of mutations across patients identifies drivers. Examining the mutation landscape in cancer genomes, we found that many recurrent cancer mutations previously designated as drivers are likely passengers. Our integrated bioinformatic and biochemical analyses revealed that these passenger hotspot mutations arise from the preference of APOBEC3A, a cytidine deaminase, for DNA stem-loops. Conversely, recurrent APOBEC-signature mutations not in stem-loops are enriched in well-characterized driver genes and may predict new drivers. This demonstrates that mesoscale genomic features need to be integrated into computational models aimed at identifying mutations linked to diseases.
Collapse
Affiliation(s)
- Rémi Buisson
- Massachusetts General Hospital Cancer Center, Harvard Medical School, Boston, MA, USA
- Department of Biological Chemistry, Center for Epigenetics and Metabolism, Chao Family Comprehensive Cancer Center, University of California, Irvine, CA, USA
| | - Adam Langenbucher
- Massachusetts General Hospital Cancer Center, Harvard Medical School, Boston, MA, USA
| | - Danae Bowen
- Department of Biological Chemistry, Center for Epigenetics and Metabolism, Chao Family Comprehensive Cancer Center, University of California, Irvine, CA, USA
| | - Eugene E Kwan
- Massachusetts General Hospital Cancer Center, Harvard Medical School, Boston, MA, USA
| | - Cyril H Benes
- Massachusetts General Hospital Cancer Center, Harvard Medical School, Boston, MA, USA
| | - Lee Zou
- Massachusetts General Hospital Cancer Center, Harvard Medical School, Boston, MA, USA.
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Michael S Lawrence
- Massachusetts General Hospital Cancer Center, Harvard Medical School, Boston, MA, USA.
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| |
Collapse
|
21
|
Georgakopoulos-Soares I, Morganella S, Jain N, Hemberg M, Nik-Zainal S. Noncanonical secondary structures arising from non-B DNA motifs are determinants of mutagenesis. Genome Res 2018; 28:1264-1271. [PMID: 30104284 PMCID: PMC6120622 DOI: 10.1101/gr.231688.117] [Citation(s) in RCA: 73] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Accepted: 07/12/2018] [Indexed: 12/15/2022]
Abstract
Somatic mutations show variation in density across cancer genomes. Previous studies have shown that chromatin organization and replication time domains are correlated with, and thus predictive of, this variation. Here, we analyze 1809 whole-genome sequences from 10 cancer types to show that a subset of repetitive DNA sequences, called non-B motifs that predict noncanonical secondary structure formation can independently account for variation in mutation density. Combined with epigenetic factors and replication timing, the variance explained can be improved to 43%-76%. Approximately twofold mutation enrichment is observed directly within non-B motifs, is focused on exposed structural components, and is dependent on physical properties that are optimal for secondary structure formation. Therefore, there is mounting evidence that secondary structures arising from non-B motifs are not simply associated with increased mutation density-they are possibly causally implicated. Our results suggest that they are determinants of mutagenesis and increase the likelihood of recurrent mutations in the genome. This analysis calls for caution in the interpretation of recurrent mutations and highlights the importance of taking non-B motifs that can simply be inferred from the reference sequence into consideration in background models of mutability henceforth.
Collapse
Affiliation(s)
| | - Sandro Morganella
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Naman Jain
- Department of Life Sciences, Imperial College London, London SW7 2AZ, United Kingdom
| | - Martin Hemberg
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Serena Nik-Zainal
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
- East Anglian Medical Genetics Service, Cambridge University Hospitals NHS Foundation Trust, Cambridge CB2 2QQ, United Kingdom
| |
Collapse
|