1
|
Mucaki EJ, Shirley BC, Rogan PK. Expression Changes Confirm Genomic Variants Predicted to Result in Allele-Specific, Alternative mRNA Splicing. Front Genet 2020; 11:109. [PMID: 32211018 PMCID: PMC7066660 DOI: 10.3389/fgene.2020.00109] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 01/30/2020] [Indexed: 12/11/2022] Open
Abstract
Splice isoform structure and abundance can be affected by either noncoding or masquerading coding variants that alter the structure or abundance of transcripts. When these variants are common in the population, these nonconstitutive transcripts are sufficiently frequent so as to resemble naturally occurring, alternative mRNA splicing. Prediction of the effects of such variants has been shown to be accurate using information theory-based methods. Single nucleotide polymorphisms (SNPs) predicted to significantly alter natural and/or cryptic splice site strength were shown to affect gene expression. Splicing changes for known SNP genotypes were confirmed in HapMap lymphoblastoid cell lines with gene expression microarrays and custom designed q-RT-PCR or TaqMan assays. The majority of these SNPs (15 of 22) as well as an independent set of 24 variants were then subjected to RNAseq analysis using the ValidSpliceMut web beacon (http://validsplicemut.cytognomix.com), which is based on data from the Cancer Genome Atlas and International Cancer Genome Consortium. SNPs from different genes analyzed with gene expression microarray and q-RT-PCR exhibited significant changes in affected splice site use. Thirteen SNPs directly affected exon inclusion and 10 altered cryptic site use. Homozygous SNP genotypes resulting in stronger splice sites exhibited higher levels of processed mRNA than alleles associated with weaker sites. Four SNPs exhibited variable expression among individuals with the same genotypes, masking statistically significant expression differences between alleles. Genome-wide information theory and expression analyses (RNAseq) in tumor exomes and genomes confirmed splicing effects for 7 of the HapMap SNP and 14 SNPs identified from tumor genomes. q-RT-PCR resolved rare splice isoforms with read abundance too low for statistical significance in ValidSpliceMut. Nevertheless, the web-beacon provides evidence of unanticipated splicing outcomes, for example, intron retention due to compromised recognition of constitutive splice sites. Thus, ValidSpliceMut and q-RT-PCR represent complementary resources for identification of allele-specific, alternative splicing.
Collapse
Affiliation(s)
- Eliseos J Mucaki
- Department of Biochemistry, University of Western Ontario, London, ON, Canada
| | | | - Peter K Rogan
- Department of Biochemistry, University of Western Ontario, London, ON, Canada.,CytoGnomix, London, ON, Canada.,Department of Oncology University of Western Ontario, London, ON, Canada.,Department of Computer Science, University of Western Ontario, London, ON, Canada
| |
Collapse
|
2
|
Shirley BC, Mucaki EJ, Rogan PK. Pan-cancer repository of validated natural and cryptic mRNA splicing mutations. F1000Res 2019; 7:1908. [PMID: 31275557 DOI: 10.12688/f1000research.17204.1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/30/2018] [Indexed: 12/26/2022] Open
Abstract
We present a major public resource of mRNA splicing mutations validated according to multiple lines of evidence of abnormal gene expression. Likely mutations present in all tumor types reported in the Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) were identified based on the comparative strengths of splice sites in tumor versus normal genomes, and then validated by respectively comparing counts of splice junction spanning and abundance of transcript reads in RNA-Seq data from matched tissues and tumors lacking these mutations. The comprehensive resource features 341,486 of these validated mutations, the majority of which (69.9%) are not present in the Single Nucleotide Polymorphism Database (dbSNP 150). There are 131,347 unique mutations which weaken or abolish natural splice sites, and 222,071 mutations which strengthen cryptic splice sites (11,932 affect both simultaneously). 28,812 novel or rare flagged variants (with <1% population frequency in dbSNP) were observed in multiple tumor tissue types. An algorithm was developed to classify variants into splicing molecular phenotypes that integrates germline heterozygosity, degree of information change and impact on expression. The classification thresholds were calibrated against the ClinVar clinical database phenotypic assignments. Variants are partitioned into allele-specific alternative splicing, likely aberrant and aberrant splicing phenotypes. Single variants or chromosome ranges can be queried using a Global Alliance for Genomics and Health (GA4GH)-compliant, web-based Beacon "Validated Splicing Mutations" either separately or in aggregate alongside other Beacons through the public Beacon Network, as well as through our website. The website provides additional information, such as a visual representation of supporting RNAseq results, gene expression in the corresponding normal tissues, and splicing molecular phenotypes.
Collapse
Affiliation(s)
| | - Eliseos J Mucaki
- Biochemistry, University of Western Ontario, London, Ontario, N6A 2C1, Canada
| | - Peter K Rogan
- CytoGnomix Inc., London, Ontario, N5X 3X5, Canada.,Biochemistry, University of Western Ontario, London, Ontario, N6A 2C1, Canada.,Computer Science, University of Western Ontario, London, Ontario, N6A 2C1, Canada.,Oncology, University of Western Ontario, London, Ontario, N6A 2C1, Canada
| |
Collapse
|
3
|
Shirley BC, Mucaki EJ, Rogan PK. Pan-cancer repository of validated natural and cryptic mRNA splicing mutations. F1000Res 2018; 7:1908. [PMID: 31275557 PMCID: PMC6544075 DOI: 10.12688/f1000research.17204.3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/27/2019] [Indexed: 11/20/2022] Open
Abstract
We present a major public resource of mRNA splicing mutations validated according to multiple lines of evidence of abnormal gene expression. Likely mutations present in all tumor types reported in the Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) were identified based on the comparative strengths of splice sites in tumor versus normal genomes, and then validated by respectively comparing counts of splice junction spanning and abundance of transcript reads in RNA-Seq data from matched tissues and tumors lacking these mutations. The comprehensive resource features 341,486 of these validated mutations, the majority of which (69.9%) are not present in the Single Nucleotide Polymorphism Database (dbSNP 150). There are 131,347 unique mutations which weaken or abolish natural splice sites, and 222,071 mutations which strengthen cryptic splice sites (11,932 affect both simultaneously). 28,812 novel or rare flagged variants (with <1% population frequency in dbSNP) were observed in multiple tumor tissue types. An algorithm was developed to classify variants into splicing molecular phenotypes that integrates germline heterozygosity, degree of information change and impact on expression. The classification thresholds were calibrated against the ClinVar clinical database phenotypic assignments. Variants are partitioned into allele-specific alternative splicing, likely aberrant and aberrant splicing phenotypes. Single variants or chromosome ranges can be queried using a Global Alliance for Genomics and Health (GA4GH)-compliant, web-based Beacon "Validated Splicing Mutations" either separately or in aggregate alongside other Beacons through the public Beacon Network, as well as through our website. The website provides additional information, such as a visual representation of supporting RNAseq results, gene expression in the corresponding normal tissues, and splicing molecular phenotypes.
Collapse
Affiliation(s)
| | - Eliseos J Mucaki
- Biochemistry, University of Western Ontario, London, Ontario, N6A 2C1, Canada
| | - Peter K Rogan
- CytoGnomix Inc., London, Ontario, N5X 3X5, Canada.,Biochemistry, University of Western Ontario, London, Ontario, N6A 2C1, Canada.,Computer Science, University of Western Ontario, London, Ontario, N6A 2C1, Canada.,Oncology, University of Western Ontario, London, Ontario, N6A 2C1, Canada
| |
Collapse
|
4
|
Entropy, or Information, Unifies Ecology and Evolution and Beyond. ENTROPY 2018; 20:e20100727. [PMID: 33265816 PMCID: PMC7512290 DOI: 10.3390/e20100727] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Revised: 08/18/2018] [Accepted: 09/11/2018] [Indexed: 02/07/2023]
Abstract
This article discusses how entropy/information methods are well-suited to analyzing and forecasting the four processes of innovation, transmission, movement, and adaptation, which are the common basis to ecology and evolution. Macroecologists study assemblages of differing species, whereas micro-evolutionary biologists study variants of heritable information within species, such as DNA and epigenetic modifications. These two different modes of variation are both driven by the same four basic processes, but approaches to these processes sometimes differ considerably. For example, macroecology often documents patterns without modeling underlying processes, with some notable exceptions. On the other hand, evolutionary biologists have a long history of deriving and testing mathematical genetic forecasts, previously focusing on entropies such as heterozygosity. Macroecology calls this Gini-Simpson, and has borrowed the genetic predictions, but sometimes this measure has shortcomings. Therefore it is important to note that predictive equations have now been derived for molecular diversity based on Shannon entropy and mutual information. As a result, we can now forecast all major types of entropy/information, creating a general predictive approach for the four basic processes in ecology and evolution. Additionally, the use of these methods will allow seamless integration with other studies such as the physical environment, and may even extend to assisting with evolutionary algorithms.
Collapse
|
5
|
Sherwin WB, Chao A, Jost L, Smouse PE. Information Theory Broadens the Spectrum of Molecular Ecology and Evolution. Trends Ecol Evol 2017; 32:948-963. [PMID: 29126564 DOI: 10.1016/j.tree.2017.09.012] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2017] [Revised: 09/22/2017] [Accepted: 09/26/2017] [Indexed: 01/18/2023]
Abstract
Information or entropy analysis of diversity is used extensively in community ecology, and has recently been exploited for prediction and analysis in molecular ecology and evolution. Information measures belong to a spectrum (or q profile) of measures whose contrasting properties provide a rich summary of diversity, including allelic richness (q=0), Shannon information (q=1), and heterozygosity (q=2). We present the merits of information measures for describing and forecasting molecular variation within and among groups, comparing forecasts with data, and evaluating underlying processes such as dispersal. Importantly, information measures directly link causal processes and divergence outcomes, have straightforward relationship to allele frequency differences (including monotonicity that q=2 lacks), and show additivity across hierarchical layers such as ecology, behaviour, cellular processes, and nongenetic inheritance.
Collapse
Affiliation(s)
- W B Sherwin
- Evolution and Ecology Research Centre, School of Biological Earth and Environmental Science, University of New South Wales, Sydney, NSW 2052, Australia; Murdoch University Cetacean Research Unit, Murdoch University, South Road, Murdoch, WA 6150, Australia.
| | - A Chao
- Institute of Statistics, National Tsing Hua University, Hsin-Chu 30043, Taiwan
| | - L Jost
- EcoMinga Foundation, Via a Runtun, Baños, Tungurahua, Ecuador
| | - P E Smouse
- Department of Ecology, Evolution and Natural Resources, School of Environmental and Biological Sciences, Rutgers University, New Brunswick, NJ 08901-8551, USA
| |
Collapse
|
6
|
Mucaki EJ, Caminsky NG, Perri AM, Lu R, Laederach A, Halvorsen M, Knoll JHM, Rogan PK. A unified analytic framework for prioritization of non-coding variants of uncertain significance in heritable breast and ovarian cancer. BMC Med Genomics 2016; 9:19. [PMID: 27067391 PMCID: PMC4828881 DOI: 10.1186/s12920-016-0178-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 03/15/2016] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Sequencing of both healthy and disease singletons yields many novel and low frequency variants of uncertain significance (VUS). Complete gene and genome sequencing by next generation sequencing (NGS) significantly increases the number of VUS detected. While prior studies have emphasized protein coding variants, non-coding sequence variants have also been proven to significantly contribute to high penetrance disorders, such as hereditary breast and ovarian cancer (HBOC). We present a strategy for analyzing different functional classes of non-coding variants based on information theory (IT) and prioritizing patients with large intragenic deletions. METHODS We captured and enriched for coding and non-coding variants in genes known to harbor mutations that increase HBOC risk. Custom oligonucleotide baits spanning the complete coding, non-coding, and intergenic regions 10 kb up- and downstream of ATM, BRCA1, BRCA2, CDH1, CHEK2, PALB2, and TP53 were synthesized for solution hybridization enrichment. Unique and divergent repetitive sequences were sequenced in 102 high-risk, anonymized patients without identified mutations in BRCA1/2. Aside from protein coding and copy number changes, IT-based sequence analysis was used to identify and prioritize pathogenic non-coding variants that occurred within sequence elements predicted to be recognized by proteins or protein complexes involved in mRNA splicing, transcription, and untranslated region (UTR) binding and structure. This approach was supplemented by in silico and laboratory analysis of UTR structure. RESULTS 15,311 unique variants were identified, of which 245 occurred in coding regions. With the unified IT-framework, 132 variants were identified and 87 functionally significant VUS were further prioritized. An intragenic 32.1 kb interval in BRCA2 that was likely hemizygous was detected in one patient. We also identified 4 stop-gain variants and 3 reading-frame altering exonic insertions/deletions (indels). CONCLUSIONS We have presented a strategy for complete gene sequence analysis followed by a unified framework for interpreting non-coding variants that may affect gene expression. This approach distills large numbers of variants detected by NGS to a limited set of variants prioritized as potential deleterious changes.
Collapse
Affiliation(s)
- Eliseos J Mucaki
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON, N6A 2C1, Canada
| | - Natasha G Caminsky
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON, N6A 2C1, Canada
| | - Ami M Perri
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON, N6A 2C1, Canada
| | - Ruipeng Lu
- Department of Computer Science, Faculty of Science, Western University, London, N6A 2C1, Canada
| | - Alain Laederach
- Department of Biology, University of North Carolina, Chapel Hill, NC, 27599-3290, USA
| | - Matthew Halvorsen
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY, 10032, USA
| | - Joan H M Knoll
- Department of Pathology and Laboratory Medicine, Schulich School of Medicine and Dentistry, Western University, London, N6A 2C1, Canada
- Cytognomix Inc., London, Canada
| | - Peter K Rogan
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON, N6A 2C1, Canada.
- Department of Computer Science, Faculty of Science, Western University, London, N6A 2C1, Canada.
- Cytognomix Inc., London, Canada.
- Department of Oncology, Schulich School of Medicine and Dentistry, Western University, London, N6A 2C1, Canada.
| |
Collapse
|
7
|
Caminsky NG, Mucaki EJ, Perri AM, Lu R, Knoll JHM, Rogan PK. Prioritizing Variants in Complete Hereditary Breast and Ovarian Cancer Genes in Patients Lacking Known BRCA Mutations. Hum Mutat 2016; 37:640-52. [PMID: 26898890 DOI: 10.1002/humu.22972] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2015] [Revised: 01/22/2016] [Accepted: 02/16/2016] [Indexed: 12/11/2022]
Abstract
BRCA1 and BRCA2 testing for hereditary breast and ovarian cancer (HBOC) does not identify all pathogenic variants. Sequencing of 20 complete genes in HBOC patients with uninformative test results (N = 287), including noncoding and flanking sequences of ATM, BARD1, BRCA1, BRCA2, CDH1, CHEK2, EPCAM, MLH1, MRE11A, MSH2, MSH6, MUTYH, NBN, PALB2, PMS2, PTEN, RAD51B, STK11, TP53, and XRCC2, identified 38,372 unique variants. We apply information theory (IT) to predict and prioritize noncoding variants of uncertain significance in regulatory, coding, and intronic regions based on changes in binding sites in these genes. Besides mRNA splicing, IT provides a common framework to evaluate potential affinity changes in transcription factor (TFBSs), splicing regulatory (SRBSs), and RNA-binding protein (RBBSs) binding sites following mutation. We prioritized variants affecting the strengths of 10 splice sites (four natural, six cryptic), 148 SRBS, 36 TFBS, and 31 RBBS. Three variants were also prioritized based on their predicted effects on mRNA secondary (2°) structure and 17 for pseudoexon activation. Additionally, four frameshift, two in-frame deletions, and five stop-gain mutations were identified. When combined with pedigree information, complete gene sequence analysis can focus attention on a limited set of variants in a wide spectrum of functional mutation types for downstream functional and co-segregation analysis.
Collapse
Affiliation(s)
- Natasha G Caminsky
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Eliseos J Mucaki
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Ami M Perri
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Ruipeng Lu
- Department of Computer Science, Faculty of Science, Western University, London, Ontario, Canada
| | - Joan H M Knoll
- Department of Pathology and Laboratory Medicine, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada.,Cytognomix Inc, London, Ontario, Canada
| | - Peter K Rogan
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada.,Department of Computer Science, Faculty of Science, Western University, London, Ontario, Canada.,Cytognomix Inc, London, Ontario, Canada.,Department of Oncology, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| |
Collapse
|
8
|
Caminsky NG, Mucaki EJ, Rogan PK. Interpretation of mRNA splicing mutations in genetic disease: review of the literature and guidelines for information-theoretical analysis. F1000Res 2015. [DOI: 10.12688/f1000research.5654.2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The interpretation of genomic variants has become one of the paramount challenges in the post-genome sequencing era. In this review we summarize nearly 20 years of research on the applications of information theory (IT) to interpret coding and non-coding mutations that alter mRNA splicing in rare and common diseases. We compile and summarize the spectrum of published variants analyzed by IT, to provide a broad perspective of the distribution of deleterious natural and cryptic splice site variants detected, as well as those affecting splicing regulatory sequences. Results for natural splice site mutations can be interrogated dynamically with Splicing Mutation Calculator, a companion software program that computes changes in information content for any splice site substitution, linked to corresponding publications containing these mutations. The accuracy of IT-based analysis was assessed in the context of experimentally validated mutations. Because splice site information quantifies binding affinity, IT-based analyses can discern the differences between variants that account for the observed reduced (leaky) versus abolished mRNA splicing. We extend this principle by comparing predicted mutations in natural, cryptic, and regulatory splice sites with observed deleterious phenotypic and benign effects. Our analysis of 1727 variants revealed a number of general principles useful for ensuring portability of these analyses and accurate input and interpretation of mutations. We offer guidelines for optimal use of IT software for interpretation of mRNA splicing mutations.
Collapse
|
9
|
Caminsky N, Mucaki EJ, Rogan PK. Interpretation of mRNA splicing mutations in genetic disease: review of the literature and guidelines for information-theoretical analysis. F1000Res 2014; 3:282. [PMID: 25717368 PMCID: PMC4329672 DOI: 10.12688/f1000research.5654.1] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/10/2014] [Indexed: 12/14/2022] Open
Abstract
The interpretation of genomic variants has become one of the paramount challenges in the post-genome sequencing era. In this review we summarize nearly 20 years of research on the applications of information theory (IT) to interpret coding and non-coding mutations that alter mRNA splicing in rare and common diseases. We compile and summarize the spectrum of published variants analyzed by IT, to provide a broad perspective of the distribution of deleterious natural and cryptic splice site variants detected, as well as those affecting splicing regulatory sequences. Results for natural splice site mutations can be interrogated dynamically with Splicing Mutation Calculator, a companion software program that computes changes in information content for any splice site substitution, linked to corresponding publications containing these mutations. The accuracy of IT-based analysis was assessed in the context of experimentally validated mutations. Because splice site information quantifies binding affinity, IT-based analyses can discern the differences between variants that account for the observed reduced (leaky) versus abolished mRNA splicing. We extend this principle by comparing predicted mutations in natural, cryptic, and regulatory splice sites with observed deleterious phenotypic and benign effects. Our analysis of 1727 variants revealed a number of general principles useful for ensuring portability of these analyses and accurate input and interpretation of mutations. We offer guidelines for optimal use of IT software for interpretation of mRNA splicing mutations.
Collapse
Affiliation(s)
- Natasha Caminsky
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON, N6A 2C1, Canada
| | - Eliseos J Mucaki
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON, N6A 2C1, Canada
| | - Peter K Rogan
- Departments of Biochemistry and Computer Science, Western University, London, ON, N6A 2C1, Canada
| |
Collapse
|
10
|
Entropy and Information Approaches to Genetic Diversity and its Expression: Genomic Geography. ENTROPY 2010. [DOI: 10.3390/e12071765] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|