1
|
Castro-Mondragon JA, Aure M, Lingjærde O, Langerød A, Martens JWM, Børresen-Dale AL, Kristensen V, Mathelier A. Cis-regulatory mutations associate with transcriptional and post-transcriptional deregulation of gene regulatory programs in cancers. Nucleic Acids Res 2022; 50:12131-12148. [PMID: 36477895 PMCID: PMC9757053 DOI: 10.1093/nar/gkac1143] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 11/03/2022] [Accepted: 11/17/2022] [Indexed: 12/13/2022] Open
Abstract
Most cancer alterations occur in the noncoding portion of the human genome, where regulatory regions control gene expression. The discovery of noncoding mutations altering the cells' regulatory programs has been limited to few examples with high recurrence or high functional impact. Here, we show that transcription factor binding sites (TFBSs) have similar mutation loads to those in protein-coding exons. By combining cancer somatic mutations in TFBSs and expression data for protein-coding and miRNA genes, we evaluate the combined effects of transcriptional and post-transcriptional alterations on the regulatory programs in cancers. The analysis of seven TCGA cohorts culminates with the identification of protein-coding and miRNA genes linked to mutations at TFBSs that are associated with a cascading trans-effect deregulation on the cells' regulatory programs. Our analyses of cis-regulatory mutations associated with miRNAs recurrently predict 12 mature miRNAs (derived from 7 precursors) associated with the deregulation of their target gene networks. The predictions are enriched for cancer-associated protein-coding and miRNA genes and highlight cis-regulatory mutations associated with the dysregulation of key pathways associated with carcinogenesis. By combining transcriptional and post-transcriptional regulation of gene expression, our method predicts cis-regulatory mutations related to the dysregulation of key gene regulatory networks in cancer patients.
Collapse
Affiliation(s)
- Jaime A Castro-Mondragon
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Miriam Ragle Aure
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway
- Department of Medical Genetics, Institute of Clinical Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Ole Christian Lingjærde
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway
- Centre for Bioinformatics, Department of Informatics, University of Oslo, Gaustadalléen 23 B, N-0373 Oslo, Norway
- KG Jebsen Centre for B-cell malignancies, Institute for Clinical Medicine, University of Oslo, Ullernchausseen 70, N-0372 Oslo, Norway
| | - Anita Langerød
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway
| | - John W M Martens
- Erasmus MC Cancer Institute and Cancer Genomics Netherlands, University Medical Center Rotterdam, Department of Medical Oncology, 3015GD Rotterdam, The Netherlands
| | - Anne-Lise Børresen-Dale
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway
| | - Vessela N Kristensen
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway
- Department of Medical Genetics, Institute of Clinical Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Anthony Mathelier
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway
- Department of Medical Genetics, Institute of Clinical Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway
| |
Collapse
|
2
|
Transcription Factor-Centric Approach to Identify Non-Recurring Putative Regulatory Drivers in Cancer. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY : ... ANNUAL INTERNATIONAL CONFERENCE, RECOMB ... : PROCEEDINGS. RECOMB (CONFERENCE : 2005- ) 2022; 13278:36-51. [PMID: 36507923 PMCID: PMC9740185 DOI: 10.1007/978-3-031-04749-7_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Recent efforts to sequence the genomes of thousands of matched normal-tumor samples have led to the identification of millions of somatic mutations, the majority of which are non-coding. Most of these mutations are believed to be passengers, but a small number of non-coding mutations could contribute to tumor initiation or progression, e.g. by leading to dysregulation of gene expression. Efforts to identify putative regulatory drivers rely primarily on information about the recurrence of mutations across tumor samples. However, in regulatory regions of the genome, individual mutations are rarely seen in more than one donor. Instead of using recurrence information, here we present a method to identify putative regulatory driver mutations based on the magnitude of their effects on transcription factor-DNA binding. For each gene, we integrate the effects of mutations across all its regulatory regions, and we ask whether these effects are larger than expected by chance, given the mutation spectra observed in regulatory DNA in the cohort of interest. We applied our approach to analyze mutations in a liver cancer data set with ample somatic mutation and gene expression data available. By combining the effects of mutations across all regulatory regions of each gene, we identified dozens of genes whose regulation in tumor cells is likely to be significantly perturbed by non-coding mutations. Overall, our results show that focusing on the functional effects of non-coding mutations, rather than their recurrence, has the potential to identify putative regulatory drivers and the genes they dysregulate in tumor cells.
Collapse
|
3
|
Abstract
Tumour formation involves random mutagenic events and positive evolutionary selection acting on a subset of such events, referred to as driver mutations. A decade of careful surveying of tumour DNA using exome-based analyses has revealed a multitude of protein-coding somatic driver mutations, some of which are clinically actionable. Today, a transition towards whole-genome analysis is well under way, technically enabling the discovery of potential driver mutations occurring outside protein-coding sequences. Mutations are abundant in this vast non-coding space, which is more than 50 times larger than the coding exome, but reliable identification of selection signals in non-coding DNA remains a challenge. In this Review, we discuss recent findings in the field, where the emerging landscape is one in which non-coding driver mutations appear to be relatively infrequent. Nevertheless, we highlight several notable discoveries. We consider possible reasons for the relative absence of non-coding driver events, as well as the difficulties associated with detecting signals of positive selection in non-coding DNA.
Collapse
Affiliation(s)
- Kerryn Elliott
- Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden
| | - Erik Larsson
- Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden.
| |
Collapse
|
4
|
Soltis AR, Dalgard CL, Pollard HB, Wilkerson MD. MutEnricher: a flexible toolset for somatic mutation enrichment analysis of tumor whole genomes. BMC Bioinformatics 2020; 21:338. [PMID: 32736515 PMCID: PMC7393734 DOI: 10.1186/s12859-020-03695-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 07/21/2020] [Indexed: 01/23/2023] Open
Abstract
Background Analysis of somatic mutations from tumor whole exomes has fueled discovery of novel cancer driver genes. However, ~ 98% of the genome is non-coding and includes regulatory elements whose normal cellular functions can be disrupted by mutation. Whole genome sequencing (WGS), on the other hand, allows for identification of non-coding somatic variation and expanded estimation of background mutation rates, yet fewer computational tools exist for specific interrogation of this space. Results We present MutEnricher, a flexible toolset for investigating somatic mutation enrichment in both coding and non-coding genomic regions from WGS data. MutEnricher contains two distinct modules for these purposes that provide customizable options for calculating sample- and feature-specific background mutation rates. Additionally, both MutEnricher modules calculate feature-level and local, or “hotspot,” somatic mutation enrichment statistics. Conclusions MutEnricher is a flexible software package for investigating somatic mutation enrichment that is implemented in Python, is freely available, can be efficiently parallelized, and is highly configurable to researcher's specific needs. MutEnricher is available online at https://github.com/asoltis/MutEnricher.
Collapse
Affiliation(s)
- Anthony R Soltis
- The American Genome Center, Collaborative Health Initiative Research Program, Uniformed Services University of the Health Sciences, Bethesda, MD, USA. .,Henry M. Jackson Foundation for the Advancement of Military Medicine, Bethesda, MD, USA.
| | - Clifton L Dalgard
- The American Genome Center, Collaborative Health Initiative Research Program, Uniformed Services University of the Health Sciences, Bethesda, MD, USA.,Department of Anatomy, Physiology, and Genetics, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
| | - Harvey B Pollard
- The American Genome Center, Collaborative Health Initiative Research Program, Uniformed Services University of the Health Sciences, Bethesda, MD, USA.,Department of Anatomy, Physiology, and Genetics, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
| | - Matthew D Wilkerson
- The American Genome Center, Collaborative Health Initiative Research Program, Uniformed Services University of the Health Sciences, Bethesda, MD, USA. .,Henry M. Jackson Foundation for the Advancement of Military Medicine, Bethesda, MD, USA. .,Department of Anatomy, Physiology, and Genetics, Uniformed Services University of the Health Sciences, Bethesda, MD, USA.
| |
Collapse
|
5
|
Zhang J, Liu J, Lee D, Feng JJ, Lochovsky L, Lou S, Rutenberg-Schoenberg M, Gerstein M. RADAR: annotation and prioritization of variants in the post-transcriptional regulome of RNA-binding proteins. Genome Biol 2020; 21:151. [PMID: 32727537 PMCID: PMC7391703 DOI: 10.1186/s13059-020-01979-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2018] [Accepted: 02/28/2020] [Indexed: 12/20/2022] Open
Abstract
RNA-binding proteins (RBPs) play key roles in post-transcriptional regulation and disease. Their binding sites cover more of the genome than coding exons; nevertheless, most noncoding variant prioritization methods only focus on transcriptional regulation. Here, we integrate the portfolio of ENCODE-RBP experiments to develop RADAR, a variant-scoring framework. RADAR uses conservation, RNA structure, network centrality, and motifs to provide an overall impact score. Then, it further incorporates tissue-specific inputs to highlight disease-specific variants. Our results demonstrate RADAR can successfully pinpoint variants, both somatic and germline, associated with RBP-function dysregulation, which cannot be found by most current prioritization methods, for example, variants affecting splicing.
Collapse
Affiliation(s)
- Jing Zhang
- Department of Computer Science, University of California, Irvine, CA, 92697, USA
| | - Jason Liu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Donghoon Lee
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
| | - Jo-Jo Feng
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Lucas Lochovsky
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Shaoke Lou
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Michael Rutenberg-Schoenberg
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.,Chemical Biology Institute, Yale University, West Haven, CT, 06516, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA. .,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA. .,Department of Computer Science, Yale University, New Haven, CT, 06520, USA.
| |
Collapse
|
6
|
Gan KA, Carrasco Pro S, Sewell JA, Fuxman Bass JI. Identification of Single Nucleotide Non-coding Driver Mutations in Cancer. Front Genet 2018; 9:16. [PMID: 29456552 PMCID: PMC5801294 DOI: 10.3389/fgene.2018.00016] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Accepted: 01/12/2018] [Indexed: 12/14/2022] Open
Abstract
Recent whole-genome sequencing studies have identified millions of somatic variants present in tumor samples. Most of these variants reside in non-coding regions of the genome potentially affecting transcriptional and post-transcriptional gene regulation. Although a few hallmark examples of driver mutations in non-coding regions have been reported, the functional role of the vast majority of somatic non-coding variants remains to be determined. This is because the few driver variants in each sample must be distinguished from the thousands of passenger variants and because the logic of regulatory element function has not yet been fully elucidated. Thus, variants prioritized based on mutational burden and location within regulatory elements need to be validated experimentally. This is generally achieved by combining assays that measure physical binding, such as chromatin immunoprecipitation, with those that determine regulatory activity, such as luciferase reporter assays. Here, we present an overview of in silico approaches used to prioritize somatic non-coding variants and the experimental methods used for functional validation and characterization.
Collapse
Affiliation(s)
- Kok A Gan
- Department of Biology, Boston University, Boston, MA, United States
| | | | - Jared A Sewell
- Department of Biology, Boston University, Boston, MA, United States
| | | |
Collapse
|