1
|
Wang G, Vasquez KM. Dynamic alternative DNA structures in biology and disease. Nat Rev Genet 2023; 24:211-234. [PMID: 36316397 DOI: 10.1038/s41576-022-00539-9] [Citation(s) in RCA: 51] [Impact Index Per Article: 51.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/27/2022] [Indexed: 11/06/2022]
Abstract
Repetitive elements in the human genome, once considered 'junk DNA', are now known to adopt more than a dozen alternative (that is, non-B) DNA structures, such as self-annealed hairpins, left-handed Z-DNA, three-stranded triplexes (H-DNA) or four-stranded guanine quadruplex structures (G4 DNA). These dynamic conformations can act as functional genomic elements involved in DNA replication and transcription, chromatin organization and genome stability. In addition, recent studies have revealed a role for these alternative structures in triggering error-generating DNA repair processes, thereby actively enabling genome plasticity. As a driving force for genetic variation, non-B DNA structures thus contribute to both disease aetiology and evolution.
Collapse
Affiliation(s)
- Guliang Wang
- Division of Pharmacology and Toxicology, College of Pharmacy, The University of Texas at Austin, Dell Paediatric Research Institute, Austin, TX, USA
| | - Karen M Vasquez
- Division of Pharmacology and Toxicology, College of Pharmacy, The University of Texas at Austin, Dell Paediatric Research Institute, Austin, TX, USA.
| |
Collapse
|
2
|
Cheloshkina K, Bzhikhatlov I, Poptsova M. Randomness in Cancer Breakpoint Prediction. J Comput Biol 2021; 28:716-731. [PMID: 34129386 DOI: 10.1089/cmb.2020.0551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Cancer genomes are susceptible to multiple rearrangements by deleting, inserting, and translocating genomic regions. Recently, the problem of finding determinants of breakpoint formations was approached with machine learning methods; however, unlike cancer point mutations, breakpoint prediction appeared to be a more difficult task, and various machine learning models did not achieve high prediction power often slightly exceeding the threshold of random guessing. This raised the question of whether the breakpoints are random noise in cancer mutagenesis or there exist determinants in structural mutagenesis. In the present study, we investigated randomness in cancer breakpoint genome distributions through the power of machine learning models to predict breakpoint hot spots. We divided all cancer types into three groups by degree of randomness in their breakpoint formation. We tested different density thresholds and explored the bias in hot spot definition. We also compared prediction of hot spots versus individual breakpoints. We found that hot spots are considerably better predicted than individual breakpoints; however, some individual breakpoints can also be predicted with a satisfactory power, and thus, it is not proper to filter them from analyses. We demonstrated that positive-unlabeled learning can provide insights into insufficiency of cancer data sets, which are not always reflected by data set sizes. Overall, the present results support the view that cancer breakpoint landscape can be represented by predictable dense breakpoint regions and scattered individual breakpoints, which are not all random noise, but some are generated by detectable mechanism.
Collapse
Affiliation(s)
- Kseniia Cheloshkina
- Laboratory of Bioinformatics, Faculty of Computer Science, National Research University Higher School of Economics, Moscow, Russia
| | - Islam Bzhikhatlov
- Faculty of Control Systems and Robotics, ITMO University, St. Petersburg, Russia
| | - Maria Poptsova
- Laboratory of Bioinformatics, Faculty of Computer Science, National Research University Higher School of Economics, Moscow, Russia
| |
Collapse
|
3
|
Single-molecule imaging reveals replication fork coupled formation of G-quadruplex structures hinders local replication stress signaling. Nat Commun 2021; 12:2525. [PMID: 33953191 PMCID: PMC8099879 DOI: 10.1038/s41467-021-22830-9] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 03/30/2021] [Indexed: 12/19/2022] Open
Abstract
Guanine-rich DNA sequences occur throughout the human genome and can transiently form G-quadruplex (G4) structures that may obstruct DNA replication, leading to genomic instability. Here, we apply multi-color single-molecule localization microscopy (SMLM) coupled with robust data-mining algorithms to quantitatively visualize replication fork (RF)-coupled formation and spatial-association of endogenous G4s. Using this data, we investigate the effects of G4s on replisome dynamics and organization. We show that a small fraction of active replication forks spontaneously form G4s at newly unwound DNA immediately behind the MCM helicase and before nascent DNA synthesis. These G4s locally perturb replisome dynamics and organization by reducing DNA synthesis and limiting the binding of the single-strand DNA-binding protein RPA. We find that the resolution of RF-coupled G4s is mediated by an interplay between RPA and the FANCJ helicase. FANCJ deficiency leads to G4 accumulation, DNA damage at G4-associated replication forks, and silencing of the RPA-mediated replication stress response. Our study provides first-hand evidence of the intrinsic, RF-coupled formation of G4 structures, offering unique mechanistic insights into the interference and regulation of stable G4s at replication forks and their effect on RPA-associated fork signaling and genomic instability.
Collapse
|
4
|
Cheloshkina K, Poptsova M. Comprehensive analysis of cancer breakpoints reveals signatures of genetic and epigenetic contribution to cancer genome rearrangements. PLoS Comput Biol 2021; 17:e1008749. [PMID: 33647036 PMCID: PMC7951985 DOI: 10.1371/journal.pcbi.1008749] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Revised: 03/11/2021] [Accepted: 01/28/2021] [Indexed: 11/19/2022] Open
Abstract
Understanding mechanisms of cancer breakpoint mutagenesis is a difficult task and predictive models of cancer breakpoint formation have to this time failed to achieve even moderate predictive power. Here we take advantage of a machine learning approach that can gather important features from big data and quantify contribution of different factors. We performed comprehensive analysis of almost 630,000 cancer breakpoints and quantified the contribution of genomic and epigenomic features-non-B DNA structures, chromatin organization, transcription factor binding sites and epigenetic markers. The results showed that transcription and formation of non-B DNA structures are two major processes responsible for cancer genome fragility. Epigenetic factors, such as chromatin organization in TADs, open/closed regions, DNA methylation, histone marks are less informative but do make their contribution. As a general trend, individual features inside the groups show a relatively high contribution of G-quadruplexes and repeats and CTCF, GABPA, RXRA, SP1, MAX and NR2F2 transcription factors. Overall, the cancer breakpoint landscape can be represented by well-predicted hotspots and poorly predicted individual breakpoints scattered across genomes. We demonstrated that hotspot mutagenesis has genomic and epigenomic factors, and not all individual cancer breakpoints are just random noise but have a definite mutation signature. Besides we found a long-range action of some features on breakpoint mutagenesis. Combining omics data, cancer-specific individual feature importance and adding the distant to local features, predictive models for cancer breakpoint formation achieved 70-90% ROC AUC for different cancer types; however precision remained low at 2% and the recall did not exceed 50%. On the one hand, the power of models strongly correlates with the size of available cancer breakpoint and epigenomic data, and on the other hand finding strong determinants of cancer breakpoint formation still remains a challenge. The strength of predictive signals of each group and of each feature inside a group can be converted into cancer-specific breakpoint mutation signatures. Overall our results add to the understanding of cancer genome rearrangement processes.
Collapse
Affiliation(s)
- Kseniia Cheloshkina
- Laboratory of Bioinformatics, Faculty of Computer Science, National Research University Higher School of Economics, Moscow, Russia
- Faculty of Digital Transformation, ITMO University, St. Petersburg, Russia
| | - Maria Poptsova
- Laboratory of Bioinformatics, Faculty of Computer Science, National Research University Higher School of Economics, Moscow, Russia
- * E-mail:
| |
Collapse
|
5
|
Merrick BA, Phadke DP, Bostrom MA, Shah RR, Wright GM, Wang X, Gordon O, Pelch KE, Auerbach SS, Paules RS, DeVito MJ, Waalkes MP, Tokar EJ. KRAS-retroviral fusion transcripts and gene amplification in arsenic-transformed, human prostate CAsE-PE cancer cells. Toxicol Appl Pharmacol 2020; 397:115017. [PMID: 32344290 PMCID: PMC7606314 DOI: 10.1016/j.taap.2020.115017] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Revised: 04/16/2020] [Accepted: 04/19/2020] [Indexed: 01/03/2023]
Abstract
CAsE-PE cells are an arsenic-transformed, human prostate epithelial line containing oncogenic mutations in KRAS compared to immortalized, normal KRAS parent cells, RWPE-1. We previously reported increased copy number of mutated KRAS in CAsE-PE cells, suggesting gene amplification. Here, KRAS flanking genomic and transcriptomic regions were sequenced in CAsE-PE cells for insight into KRAS amplification. Comparison of DNA-Seq and RNA-Seq showed increased reads from background aligning to all KRAS exons in CAsE-PE cells, while a uniform DNA-Seq read distribution occurred in RWPE-1 cells with normal transcript expression. We searched for KRAS fusions in DNA and RNA sequencing data finding a portion of reads aligning to KRAS and viral sequence. After generation of cDNA from total RNA, short and long KRAS probes were generated to hybridize cDNA and KRAS enriched fragments were PacBio sequenced. More KRAS reads were captured from CAsE-PE cDNA versus RWPE-1 by each probe set. Only CAsE-PE cDNA showed KRAS viral fusion transcripts, primarily mapping to LTR and endogenous retrovirus sequences on either 5'- or 3'-ends of KRAS. Most KRAS viral fusion transcripts contained 4 to 6 exons but some PacBio sequences were in unusual orientations, suggesting viral insertions within the gene body. Additionally, conditioned media was extracted for potential retroviral particles. RNA-Seq of culture media isolates identified KRAS retroviral fusion transcripts in CAsE-PE media only. Truncated KRAS transcripts suggested multiple retroviral integration sites occurred within the KRAS gene producing KRAS retroviral fusions of various lengths. Findings suggest activation of endogenous retroviruses in arsenic carcinogenesis should be explored.
Collapse
Affiliation(s)
- B Alex Merrick
- Division of the National Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, United States.
| | - Dhiral P Phadke
- Sciome, LLC, Research Triangle Park, North Carolina, United States
| | - Meredith A Bostrom
- David H. Murdock Research Institute, Kannapolis, North Carolina, United States
| | - Ruchir R Shah
- Sciome, LLC, Research Triangle Park, North Carolina, United States
| | - Garron M Wright
- David H. Murdock Research Institute, Kannapolis, North Carolina, United States
| | - Xinguo Wang
- David H. Murdock Research Institute, Kannapolis, North Carolina, United States
| | - Oksana Gordon
- David H. Murdock Research Institute, Kannapolis, North Carolina, United States
| | - Katherine E Pelch
- Division of the National Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, United States
| | - Scott S Auerbach
- Division of the National Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, United States
| | - Richard S Paules
- Division of the National Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, United States
| | - Michael J DeVito
- Division of the National Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, United States
| | - Michael P Waalkes
- Division of the National Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, United States
| | - Erik J Tokar
- Division of the National Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, United States
| |
Collapse
|
6
|
Del Mundo IMA, Cho EJ, Dalby KN, Vasquez KM. A 'light-up' intercalator displacement assay for detection of triplex DNA stabilizers. Chem Commun (Camb) 2020; 56:1996-1999. [PMID: 31960843 PMCID: PMC7323859 DOI: 10.1039/c9cc08817b] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Here, we developed a coralyne-based, 'light-up' intercalator displacement assay to identify molecular stabilizers of triplex DNA using a sequence from a chromosomal breakpoint hotspot in the human c-MYC oncogene. Its potential to identify triplex DNA ligands was demonstrated using BePI and doxorubicin. Identification of triplex-interacting ligands may allow the regulation of genetic instability in human genomes.
Collapse
Affiliation(s)
- Imee M A Del Mundo
- Division of Pharmacology and Toxicology, College of Pharmacy, The University of Texas at Austin, Dell Pediatric Research Institute, 1400 Barbara Jordan Blvd., Austin, TX, USA.
| | | | | | | |
Collapse
|