1
|
Niroula A, Vihinen M. Variation Interpretation Predictors: Principles, Types, Performance, and Choice. Hum Mutat 2016; 37:579-97. [DOI: 10.1002/humu.22987] [Citation(s) in RCA: 90] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 03/07/2016] [Indexed: 12/18/2022]
Affiliation(s)
- Abhishek Niroula
- Department of Experimental Medical Science; Lund University; BMC B13 Lund SE-22184 Sweden
| | - Mauno Vihinen
- Department of Experimental Medical Science; Lund University; BMC B13 Lund SE-22184 Sweden
| |
Collapse
|
2
|
Caminsky NG, Mucaki EJ, Perri AM, Lu R, Knoll JHM, Rogan PK. Prioritizing Variants in Complete Hereditary Breast and Ovarian Cancer Genes in Patients Lacking Known BRCA Mutations. Hum Mutat 2016; 37:640-52. [PMID: 26898890 DOI: 10.1002/humu.22972] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2015] [Revised: 01/22/2016] [Accepted: 02/16/2016] [Indexed: 12/11/2022]
Abstract
BRCA1 and BRCA2 testing for hereditary breast and ovarian cancer (HBOC) does not identify all pathogenic variants. Sequencing of 20 complete genes in HBOC patients with uninformative test results (N = 287), including noncoding and flanking sequences of ATM, BARD1, BRCA1, BRCA2, CDH1, CHEK2, EPCAM, MLH1, MRE11A, MSH2, MSH6, MUTYH, NBN, PALB2, PMS2, PTEN, RAD51B, STK11, TP53, and XRCC2, identified 38,372 unique variants. We apply information theory (IT) to predict and prioritize noncoding variants of uncertain significance in regulatory, coding, and intronic regions based on changes in binding sites in these genes. Besides mRNA splicing, IT provides a common framework to evaluate potential affinity changes in transcription factor (TFBSs), splicing regulatory (SRBSs), and RNA-binding protein (RBBSs) binding sites following mutation. We prioritized variants affecting the strengths of 10 splice sites (four natural, six cryptic), 148 SRBS, 36 TFBS, and 31 RBBS. Three variants were also prioritized based on their predicted effects on mRNA secondary (2°) structure and 17 for pseudoexon activation. Additionally, four frameshift, two in-frame deletions, and five stop-gain mutations were identified. When combined with pedigree information, complete gene sequence analysis can focus attention on a limited set of variants in a wide spectrum of functional mutation types for downstream functional and co-segregation analysis.
Collapse
Affiliation(s)
- Natasha G Caminsky
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Eliseos J Mucaki
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Ami M Perri
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Ruipeng Lu
- Department of Computer Science, Faculty of Science, Western University, London, Ontario, Canada
| | - Joan H M Knoll
- Department of Pathology and Laboratory Medicine, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada.,Cytognomix Inc, London, Ontario, Canada
| | - Peter K Rogan
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada.,Department of Computer Science, Faculty of Science, Western University, London, Ontario, Canada.,Cytognomix Inc, London, Ontario, Canada.,Department of Oncology, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| |
Collapse
|
3
|
Caminsky NG, Mucaki EJ, Rogan PK. Interpretation of mRNA splicing mutations in genetic disease: review of the literature and guidelines for information-theoretical analysis. F1000Res 2015. [DOI: 10.12688/f1000research.5654.2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The interpretation of genomic variants has become one of the paramount challenges in the post-genome sequencing era. In this review we summarize nearly 20 years of research on the applications of information theory (IT) to interpret coding and non-coding mutations that alter mRNA splicing in rare and common diseases. We compile and summarize the spectrum of published variants analyzed by IT, to provide a broad perspective of the distribution of deleterious natural and cryptic splice site variants detected, as well as those affecting splicing regulatory sequences. Results for natural splice site mutations can be interrogated dynamically with Splicing Mutation Calculator, a companion software program that computes changes in information content for any splice site substitution, linked to corresponding publications containing these mutations. The accuracy of IT-based analysis was assessed in the context of experimentally validated mutations. Because splice site information quantifies binding affinity, IT-based analyses can discern the differences between variants that account for the observed reduced (leaky) versus abolished mRNA splicing. We extend this principle by comparing predicted mutations in natural, cryptic, and regulatory splice sites with observed deleterious phenotypic and benign effects. Our analysis of 1727 variants revealed a number of general principles useful for ensuring portability of these analyses and accurate input and interpretation of mutations. We offer guidelines for optimal use of IT software for interpretation of mRNA splicing mutations.
Collapse
|
4
|
Caminsky N, Mucaki EJ, Rogan PK. Interpretation of mRNA splicing mutations in genetic disease: review of the literature and guidelines for information-theoretical analysis. F1000Res 2014; 3:282. [PMID: 25717368 PMCID: PMC4329672 DOI: 10.12688/f1000research.5654.1] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/10/2014] [Indexed: 12/14/2022] Open
Abstract
The interpretation of genomic variants has become one of the paramount challenges in the post-genome sequencing era. In this review we summarize nearly 20 years of research on the applications of information theory (IT) to interpret coding and non-coding mutations that alter mRNA splicing in rare and common diseases. We compile and summarize the spectrum of published variants analyzed by IT, to provide a broad perspective of the distribution of deleterious natural and cryptic splice site variants detected, as well as those affecting splicing regulatory sequences. Results for natural splice site mutations can be interrogated dynamically with Splicing Mutation Calculator, a companion software program that computes changes in information content for any splice site substitution, linked to corresponding publications containing these mutations. The accuracy of IT-based analysis was assessed in the context of experimentally validated mutations. Because splice site information quantifies binding affinity, IT-based analyses can discern the differences between variants that account for the observed reduced (leaky) versus abolished mRNA splicing. We extend this principle by comparing predicted mutations in natural, cryptic, and regulatory splice sites with observed deleterious phenotypic and benign effects. Our analysis of 1727 variants revealed a number of general principles useful for ensuring portability of these analyses and accurate input and interpretation of mutations. We offer guidelines for optimal use of IT software for interpretation of mRNA splicing mutations.
Collapse
Affiliation(s)
- Natasha Caminsky
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON, N6A 2C1, Canada
| | - Eliseos J Mucaki
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON, N6A 2C1, Canada
| | - Peter K Rogan
- Departments of Biochemistry and Computer Science, Western University, London, ON, N6A 2C1, Canada
| |
Collapse
|
5
|
Vihinen M. Majority vote and other problems when using computational tools. Hum Mutat 2014; 35:912-4. [PMID: 24915749 DOI: 10.1002/humu.22600] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Accepted: 05/28/2014] [Indexed: 11/06/2022]
Abstract
Computational tools are essential for most of our research. To use these tools, one needs to know how they work. Problems in application of computational methods to variation analysis can appear at several stages and affect, for example, the interpretation of results. Such cases are discussed along with suggestions how to avoid them. The applications include incomplete reporting of methods, especially about the use of prediction tools; method selection on unscientific grounds and without consulting independent method performance assessments; extending application area of methods outside their intended purpose; use of the same data several times for obtaining majority vote; and filtering of datasets so that variants of interest are excluded. All these issues can be avoided by discontinuing the use software tools as black boxes.
Collapse
Affiliation(s)
- Mauno Vihinen
- Department of Experimental Medical Science, BMC D10, Lund University, Lund, Sweden
| |
Collapse
|
6
|
Viner C, Dorman SN, Shirley BC, Rogan PK. Validation of predicted mRNA splicing mutations using high-throughput transcriptome data. F1000Res 2014; 3:8. [PMID: 24741438 PMCID: PMC3983938 DOI: 10.12688/f1000research.3-8.v2] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/03/2014] [Indexed: 01/20/2023] Open
Abstract
Interpretation of variants present in complete genomes or exomes reveals numerous sequence changes, only a fraction of which are likely to be pathogenic. Mutations have been traditionally inferred from allele frequencies and inheritance patterns in such data. Variants predicted to alter mRNA splicing can be validated by manual inspection of transcriptome sequencing data, however this approach is intractable for large datasets. These abnormal mRNA splicing patterns are characterized by reads demonstrating either exon skipping, cryptic splice site use, and high levels of intron inclusion, or combinations of these properties. We present, Veridical, an
in silico method for the automatic validation of DNA sequencing variants that alter mRNA splicing. Veridical performs statistically valid comparisons of the normalized read counts of abnormal RNA species in mutant versus non-mutant tissues. This leverages large numbers of control samples to corroborate the consequences of predicted splicing variants in complete genomes and exomes.
Collapse
Affiliation(s)
- Coby Viner
- Department of Computer Science, University of Western Ontario, London, Ontario, N6A 5B7, Canada
| | - Stephanie N Dorman
- Department of Biochemistry, University of Western Ontario, London, Ontario, N6A 5C1, Canada
| | | | - Peter K Rogan
- Department of Computer Science, University of Western Ontario, London, Ontario, N6A 5B7, Canada ; Department of Biochemistry, University of Western Ontario, London, Ontario, N6A 5C1, Canada ; Cytognomix, Inc., London, Ontario, N6G 4X8, Canada
| |
Collapse
|
7
|
Abstract
Interpretation of variants present in complete genomes or exomes reveals numerous sequence changes, only a fraction of which are likely to be pathogenic. Mutations have been traditionally inferred from allele frequencies and inheritance patterns in such data. Variants predicted to alter mRNA splicing can be validated by manual inspection of transcriptome sequencing data, however this approach is intractable for large datasets. These abnormal mRNA splicing patterns are characterized by reads demonstrating either exon skipping, cryptic splice site use, and high levels of intron inclusion, or combinations of these properties. We present, Veridical, an in silico method for the automatic validation of DNA sequencing variants that alter mRNA splicing. Veridical performs statistically valid comparisons of the normalized read counts of abnormal RNA species in mutant versus non-mutant tissues. This leverages large numbers of control samples to corroborate the consequences of predicted splicing variants in complete genomes and exomes.
Collapse
Affiliation(s)
- Coby Viner
- Department of Computer Science, University of Western Ontario, London, Ontario, N6A 5B7, Canada
| | - Stephanie N Dorman
- Department of Biochemistry, University of Western Ontario, London, Ontario, N6A 5C1, Canada
| | | | - Peter K Rogan
- Department of Computer Science, University of Western Ontario, London, Ontario, N6A 5B7, Canada ; Department of Biochemistry, University of Western Ontario, London, Ontario, N6A 5C1, Canada ; Cytognomix, Inc., London, Ontario, N6G 4X8, Canada
| |
Collapse
|