1
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: trends from three decades of genetic variant impact predictors. Hum Genomics 2024; 18:90. [PMID: 39198917 PMCID: PMC11360829 DOI: 10.1186/s40246-024-00663-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2024] [Accepted: 08/19/2024] [Indexed: 09/01/2024] Open
Abstract
BACKGROUND Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). RESULTS The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past three decades, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 190 VIPs, resulting in a total of 407 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. CONCLUSIONS VIPdb version 2 summarizes 407 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. VIPdb is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA
| | - Arul S Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA
- Illumina, Foster City, CA, 94404, USA
| | - Steven E Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA.
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA.
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA.
| |
Collapse
|
2
|
Radrizzani S, Kudla G, Izsvák Z, Hurst LD. Selection on synonymous sites: the unwanted transcript hypothesis. Nat Rev Genet 2024; 25:431-448. [PMID: 38297070 DOI: 10.1038/s41576-023-00686-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/04/2023] [Indexed: 02/02/2024]
Abstract
Although translational selection to favour codons that match the most abundant tRNAs is not readily observed in humans, there is nonetheless selection in humans on synonymous mutations. We hypothesize that much of this synonymous site selection can be explained in terms of protection against unwanted RNAs - spurious transcripts, mis-spliced forms or RNAs derived from transposable elements or viruses. We propose not only that selection on synonymous sites functions to reduce the rate of creation of unwanted transcripts (for example, through selection on exonic splice enhancers and cryptic splice sites) but also that high-GC content (but low-CpG content), together with intron presence and position, is both particular to functional native mRNAs and used to recognize transcripts as native. In support of this hypothesis, transcription, nuclear export, liquid phase condensation and RNA degradation have all recently been shown to promote GC-rich transcripts and suppress AU/CpG-rich ones. With such 'traps' being set against AU/CpG-rich transcripts, the codon usage of native genes has, in turn, evolved to avoid such suppression. That parallel filters against AU/CpG-rich transcripts also affect the endosomal import of RNAs further supports the unwanted transcript hypothesis of synonymous site selection and explains the similar design rules that have enabled the successful use of transgenes and RNA vaccines.
Collapse
Affiliation(s)
- Sofia Radrizzani
- Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, UK
- Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, UK
| | - Grzegorz Kudla
- MRC Human Genetics Unit, Institute for Genetics and Cancer, The University of Edinburgh, Edinburgh, UK
| | - Zsuzsanna Izsvák
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Society, Berlin, Germany
| | - Laurence D Hurst
- Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, UK.
| |
Collapse
|
3
|
Ding M, Chen K, Yang Y, Zhao H. Prioritizing genomic variants pathogenicity via DNA, RNA, and protein-level features based on extreme gradient boosting. Hum Genet 2024:10.1007/s00439-024-02667-0. [PMID: 38575818 DOI: 10.1007/s00439-024-02667-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 03/05/2024] [Indexed: 04/06/2024]
Abstract
Genetic diseases are mostly implicated with genetic variants, including missense, synonymous, non-sense, and copy number variants. These different kinds of variants are indicated to affect phenotypes in various ways from previous studies. It remains essential but challenging to understand the functional consequences of these genetic variants, especially the noncoding ones, due to the lack of corresponding annotations. While many computational methods have been proposed to identify the risk variants. Most of them have only curated DNA-level and protein-level annotations to predict the pathogenicity of the variants, and others have been restricted to missense variants exclusively. In this study, we have curated DNA-, RNA-, and protein-level features to discriminate disease-causing variants in both coding and noncoding regions, where the features of protein sequences and protein structures have been shown essential for analyzing missense variants in coding regions while the features related to RNA-splicing and RBP binding are significant for variants in noncoding regions and synonymous variants in coding regions. Through the integration of these features, we have formulated the Multi-level feature Genomic Variants Predictor (ML-GVP) using the gradient boosting tree. The method has been trained on more than 400,000 variants in the Sherloc-training set from the 6th critical assessment of genome interpretation with superior performance. The method is one of the two best-performing predictors on the blind test in the Sherloc assessment, and is further confirmed by another independent test dataset of de novo variants.
Collapse
Affiliation(s)
- Maolin Ding
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Ken Chen
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, 510000, China.
- Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-Sen University), Ministry of Education, Guangzhou, China.
| | - Huiying Zhao
- Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, 510000, China.
| |
Collapse
|
4
|
Cheng N, Bi C, Shi Y, Liu M, Cao A, Ren M, Xia J, Liang Z. Effect Predictor of Driver Synonymous Mutations Based on Multi-Feature Fusion and Iterative Feature Representation Learning. IEEE J Biomed Health Inform 2024; 28:1144-1151. [PMID: 38096097 DOI: 10.1109/jbhi.2023.3343075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Accurate identification of driver mutations is crucial in genetic studies of human cancers. While numerous cancer driver missense mutations have been identified, research into potential cancer drivers for synonymous mutations has shown limited success to date. Here, we developed a novel machine learning framework, epSMic, for predicting cancer driver synonymous mutations. epSMic employs an iterative feature representation scheme that facilitates the learning of discriminative features from various sequential models in a supervised iterative mode. We constructed the benchmark datasets and encoded the embedding sequence, physicochemical property, and basic information such as conservation and splicing feature. The evaluation results on benchmark test datasets demonstrate that epSMic outperforms existing methods, making it a valuable tool for researchers in identifying functional synonymous mutations in cancer. We hope epSMic can enable researchers to concentrate on synonymous mutations that have a functional impact on cancer.
Collapse
|
5
|
Lewin LE, Daniels KG, Hurst LD. Genes for highly abundant proteins in Escherichia coli avoid 5' codons that promote ribosomal initiation. PLoS Comput Biol 2023; 19:e1011581. [PMID: 37878567 PMCID: PMC10599525 DOI: 10.1371/journal.pcbi.1011581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 10/09/2023] [Indexed: 10/27/2023] Open
Abstract
In many species highly expressed genes (HEGs) over-employ the synonymous codons that match the more abundant iso-acceptor tRNAs. Bacterial transgene codon randomization experiments report, however, that enrichment with such "translationally optimal" codons has little to no effect on the resultant protein level. By contrast, consistent with the view that ribosomal initiation is rate limiting, synonymous codon usage following the 5' ATG greatly influences protein levels, at least in part by modifying RNA stability. For the design of bacterial transgenes, for simple codon based in silico inference of protein levels and for understanding selection on synonymous mutations, it would be valuable to computationally determine initiation optimality (IO) scores for codons for any given species. One attractive approach is to characterize the 5' codon enrichment of HEGs compared with the most lowly expressed genes, just as translational optimality scores of codons have been similarly defined employing the full gene body. Here we determine the viability of this approach employing a unique opportunity: for Escherichia coli there is both the most extensive protein abundance data for native genes and a unique large-scale transgene codon randomization experiment enabling objective definition of the 5' codons that cause, rather than just correlate with, high protein abundance (that we equate with initiation optimality, broadly defined). Surprisingly, the 5' ends of native genes that specify highly abundant proteins avoid such initiation optimal codons. We find that this is probably owing to conflicting selection pressures particular to native HEGs, including selection favouring low initiation rates, this potentially enabling high efficiency of ribosomal usage and low noise. While the classical HEG enrichment approach does not work, rendering simple prediction of native protein abundance from 5' codon content futile, we report evidence that initiation optimality scores derived from the transgene experiment may hold relevance for in silico transgene design for a broad spectrum of bacteria.
Collapse
Affiliation(s)
- Loveday E. Lewin
- The Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, United Kingdom
| | - Kate G. Daniels
- The Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, United Kingdom
| | - Laurence D. Hurst
- The Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, United Kingdom
| |
Collapse
|
6
|
Vihinen M. Nonsynonymous Synonymous Variants Demand for a Paradigm Shift in Genetics. Curr Genomics 2023; 24:18-23. [PMID: 37920730 PMCID: PMC10334700 DOI: 10.2174/1389202924666230417101020] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 02/20/2023] [Accepted: 03/01/2023] [Indexed: 11/04/2023] Open
Abstract
Synonymous (also known as silent) variations are by definition not considered to change the coded protein. Still many variations in this category affect either protein abundance or properties. As this situation is confusing, we have recently introduced systematics for synonymous variations and those that may on the surface look like synonymous, but these may affect the coded protein in various ways. A new category, unsense variation, was introduced to describe variants that do not introduce a stop codon into the variation site, but which lead to different types of changes in the coded protein. Many of these variations lead to mRNA degradation and missing protein. Here, consequences of the systematics are discussed from the perspectives of variation annotation and interpretation, evolutionary calculations, nonsynonymous-to-synonymous substitution rates, phylogenetics and other evolutionary inferences that are based on the principle of (nearly) neutral synonymous variations. It may be necessary to reassess published results. Further, databases for synonymous variations and prediction methods for such variations should consider unsense variations. Thus, there is a need to evaluate and reflect principles of numerous aspects in genetics, ranging from variation naming and classification to evolutionary calculations.
Collapse
Affiliation(s)
- Mauno Vihinen
- Department of Experimental Medical Science, Lund University, Lund, BMC B13, Sweden
| |
Collapse
|
7
|
Lin BC, Katneni U, Jankowska KI, Meyer D, Kimchi-Sarfaty C. In silico methods for predicting functional synonymous variants. Genome Biol 2023; 24:126. [PMID: 37217943 PMCID: PMC10204308 DOI: 10.1186/s13059-023-02966-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 05/10/2023] [Indexed: 05/24/2023] Open
Abstract
Single nucleotide variants (SNVs) contribute to human genomic diversity. Synonymous SNVs are previously considered to be "silent," but mounting evidence has revealed that these variants can cause RNA and protein changes and are implicated in over 85 human diseases and cancers. Recent improvements in computational platforms have led to the development of numerous machine-learning tools, which can be used to advance synonymous SNV research. In this review, we discuss tools that should be used to investigate synonymous variants. We provide supportive examples from seminal studies that demonstrate how these tools have driven new discoveries of functional synonymous SNVs.
Collapse
Affiliation(s)
- Brian C Lin
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Upendra Katneni
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Katarzyna I Jankowska
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Douglas Meyer
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Chava Kimchi-Sarfaty
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA.
| |
Collapse
|
8
|
Katsonis P, Wilhelm K, Williams A, Lichtarge O. Genome interpretation using in silico predictors of variant impact. Hum Genet 2022; 141:1549-1577. [PMID: 35488922 PMCID: PMC9055222 DOI: 10.1007/s00439-022-02457-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 04/17/2022] [Indexed: 02/06/2023]
Abstract
Estimating the effects of variants found in disease driver genes opens the door to personalized therapeutic opportunities. Clinical associations and laboratory experiments can only characterize a tiny fraction of all the available variants, leaving the majority as variants of unknown significance (VUS). In silico methods bridge this gap by providing instant estimates on a large scale, most often based on the numerous genetic differences between species. Despite concerns that these methods may lack reliability in individual subjects, their numerous practical applications over cohorts suggest they are already helpful and have a role to play in genome interpretation when used at the proper scale and context. In this review, we aim to gain insights into the training and validation of these variant effect predicting methods and illustrate representative types of experimental and clinical applications. Objective performance assessments using various datasets that are not yet published indicate the strengths and limitations of each method. These show that cautious use of in silico variant impact predictors is essential for addressing genome interpretation challenges.
Collapse
Affiliation(s)
- Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Kevin Wilhelm
- Graduate School of Biomedical Sciences, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Amanda Williams
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Biochemistry, Human Genetics and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Pharmacology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| |
Collapse
|
9
|
When a Synonymous Variant Is Nonsynonymous. Genes (Basel) 2022; 13:genes13081485. [PMID: 36011397 PMCID: PMC9408308 DOI: 10.3390/genes13081485] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 08/17/2022] [Accepted: 08/17/2022] [Indexed: 12/27/2022] Open
Abstract
Term synonymous variation is widely used, but frequently in a wrong or misleading meaning and context. Twenty three point eight % of possible nucleotide substitution types in the universal genetic code are for synonymous amino acid changes, but when these variants have a phenotype and functional effect, they are very seldom synonymous. Such variants may manifest changes at DNA, RNA and/or protein levels. Large numbers of variations are erroneously annotated as synonymous, which causes problems e.g., in clinical genetics and diagnosis of diseases. To facilitate precise communication, novel systematics and nomenclature are introduced for variants that when looking only at the genetic code seem like synonymous, but which have phenotypes. A new term, unsense variant is defined as a substitution in the mRNA coding region that affects gene expression and protein production without introducing a stop codon in the variation site. Such variants are common and need to be correctly annotated. Proper naming and annotation are important also to increase awareness of these variants and their consequences.
Collapse
|
10
|
Kaissarian NM, Meyer D, Kimchi-Sarfaty C. Synonymous Variants: Necessary Nuance in our Understanding of Cancer Drivers and Treatment Outcomes. J Natl Cancer Inst 2022; 114:1072-1094. [PMID: 35477782 PMCID: PMC9360466 DOI: 10.1093/jnci/djac090] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 03/24/2022] [Accepted: 04/18/2022] [Indexed: 11/13/2022] Open
Abstract
Once called "silent mutations" and assumed to have no effect on protein structure and function, synonymous variants are now recognized to be drivers for some cancers. There have been significant advances in our understanding of the numerous mechanisms by which synonymous single nucleotide variants (sSNVs) can affect protein structure and function by affecting pre-mRNA splicing, mRNA expression, stability, folding, miRNA binding, translation kinetics, and co-translational folding. This review highlights the need for considering sSNVs in cancer biology to gain a better understanding of the genetic determinants of human cancers and to improve their diagnosis and treatment. We surveyed the literature for reports of sSNVs in cancer and found numerous studies on the consequences of sSNVs on gene function with supporting in vitro evidence. We also found reports of sSNVs that have statistically significant associations with specific cancer types but for which in vitro studies are lacking to support the reported associations. Additionally, we found reports of germline and somatic sSNVs that were observed in numerous clinical studies and for which in silico analysis predicts possible effects on gene function. We provide a review of these investigations and discuss necessary future studies to elucidate the mechanisms by which sSNVs disrupt protein function and are play a role in tumorigeneses, cancer progression, and treatment efficacy. As splicing dysregulation is one of the most well recognized mechanisms by which sSNVs impact protein function, we also include our own in silico analysis for predicting which sSNVs may disrupt pre-mRNA splicing.
Collapse
Affiliation(s)
- Nayiri M Kaissarian
- Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation & Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Douglas Meyer
- Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation & Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Chava Kimchi-Sarfaty
- Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation & Research, US Food and Drug Administration, Silver Spring, MD, USA
| |
Collapse
|
11
|
Zeng Z, Aptekmann AA, Bromberg Y. Decoding the effects of synonymous variants. Nucleic Acids Res 2021; 49:12673-12691. [PMID: 34850938 PMCID: PMC8682775 DOI: 10.1093/nar/gkab1159] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 11/02/2021] [Accepted: 11/08/2021] [Indexed: 12/12/2022] Open
Abstract
Synonymous single nucleotide variants (sSNVs) are common in the human genome but are often overlooked. However, sSNVs can have significant biological impact and may lead to disease. Existing computational methods for evaluating the effect of sSNVs suffer from the lack of gold-standard training/evaluation data and exhibit over-reliance on sequence conservation signals. We developed synVep (synonymous Variant effect predictor), a machine learning-based method that overcomes both of these limitations. Our training data was a combination of variants reported by gnomAD (observed) and those unreported, but possible in the human genome (generated). We used positive-unlabeled learning to purify the generated variant set of any likely unobservable variants. We then trained two sequential extreme gradient boosting models to identify subsets of the remaining variants putatively enriched and depleted in effect. Our method attained 90% precision/recall on a previously unseen set of variants. Furthermore, although synVep does not explicitly use conservation, its scores correlated with evolutionary distances between orthologs in cross-species variation analysis. synVep was also able to differentiate pathogenic vs. benign variants, as well as splice-site disrupting variants (SDV) vs. non-SDVs. Thus, synVep provides an important improvement in annotation of sSNVs, allowing users to focus on variants that most likely harbor effects.
Collapse
Affiliation(s)
- Zishuo Zeng
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08873, USA
| | - Ariel A Aptekmann
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08873, USA
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08873, USA
- Department of Genetics, Rutgers University, Piscataway, NJ 08854, USA
| |
Collapse
|
12
|
Lin X. Genomic Variation Prediction: A Summary From Different Views. Front Cell Dev Biol 2021; 9:795883. [PMID: 34901036 PMCID: PMC8656232 DOI: 10.3389/fcell.2021.795883] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 11/11/2021] [Indexed: 12/02/2022] Open
Abstract
Structural variations in the genome are closely related to human health and the occurrence and development of various diseases. To understand the mechanisms of diseases, find pathogenic targets, and carry out personalized precision medicine, it is critical to detect such variations. The rapid development of high-throughput sequencing technologies has accelerated the accumulation of large amounts of genomic mutation data, including synonymous mutations. Identifying pathogenic synonymous mutations that play important roles in the occurrence and development of diseases from all the available mutation data is of great importance. In this paper, machine learning theories and methods are reviewed, efficient and accurate pathogenic synonymous mutation prediction methods are developed, and a standardized three-level variant analysis framework is constructed. In addition, multiple variation tolerance prediction models are studied and integrated, and new ideas for structural variation detection based on deep information mining are explored.
Collapse
Affiliation(s)
- Xiuchun Lin
- College of Information and Electrical Engineering, China Agricultural University, Beijing, China
| |
Collapse
|
13
|
Whole exome sequencing identifies the potential role of genes involved in p53 pathway in Nasopharyngeal Carcinoma from Northeast India. Gene 2021; 812:146099. [PMID: 34906645 DOI: 10.1016/j.gene.2021.146099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 10/06/2021] [Accepted: 11/16/2021] [Indexed: 11/21/2022]
Abstract
Nasopharyngeal Carcinoma (NPC) found to be dependent on geographical and racial variation and is more prevalent in Northeast (NE) India. WES-based study was conducted in three states (tribes); Nagaland (Naga), Mizoram (Mizo) and Manipur (Manipuri), which provided an overview of germline variants involved inthemajor signaling pathways. Validation and recurrence assessment of WES data confirmed the risk effect of STEAP3_rs138941861 and JAG1_rs2273059, and the protective role of PARP4_rs17080653 and TGFBR1_rs11568778 variants, where STEAP3_rs138941861conferring Arg290His substitution was the only exonic non-synonymous variant and to be located in proximity to the linking region between the transmembrane and oxidoreductasedomainsof STEAP3 protein, andaffectedits structural and functional dynamics by altering the Electrostatic Potential around this connecting region. Moreover, these significantly associated variants having deleterious effect were observed to have interactions in p53 signaling pathway which emphasizes the importance of this pathway in the causation of NPC.
Collapse
|
14
|
Li G, Panday SK, Peng Y, Alexov E. SAMPDI-3D: predicting the effects of protein and DNA mutations on protein-DNA interactions. Bioinformatics 2021; 37:3760-3765. [PMID: 34343273 DOI: 10.1093/bioinformatics/btab567] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 06/28/2021] [Accepted: 07/31/2021] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Mutations that alter protein-DNA interactions may be pathogenic and cause diseases. Therefore, it is extremely important to quantify the effect of mutations on protein-DNA binding free energy to reveal the molecular origin of diseases and to assist the development of treatments. Although several methods that predict the change of protein-DNA binding affinity upon mutations in the binding protein were developed, the effect of DNA mutations was not considered yet. RESULTS Here, we report a new version of SAMPDI, the SAMPDI-3D, which is a gradient boosting decision tree machine learning method to predict the change of the protein-DNA binding free energy caused by mutations in both the binding protein and the bases of the corresponding DNA. The method is shown to achieve Pearson correlation coefficient of 0.76 and 0.80 in a benchmarking test against experimentally determined change of the binding free energy caused by mutations in the binding protein or DNA, respectively. Furthermore, three datasets collected from literature were used to do blind benchmark for SAMPDI-3D and it is shown that it outperforms all existing state-of-the-art methods. The method is very fast allowing for genome-scale investigations. AVAILABILITY It is available as a web server and a stand-code at http://compbio.clemson.edu/SAMPDI-3D/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gen Li
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | | | - Yunhui Peng
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| |
Collapse
|
15
|
Shen Y, Zhang Y, Xue W, Yue Z. dbMCS: A Database for Exploring the Mutation Markers of Anti-Cancer Drug Sensitivity. IEEE J Biomed Health Inform 2021; 25:4229-4237. [PMID: 34314366 DOI: 10.1109/jbhi.2021.3100424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The identification of mutation markers and the selection of appropriate treatment for patients with specific genome mutations are important steps in the development of targeted therapies and the realization of precision medicine for human cancers. To investigate the baseline characteristics of drug sensitivity markers and develop computational methods of mutation effect prediction, we presented a manually curated online- based database of mutation Markers for anti-Cancer drug Sensitivity (dbMCS). Currently, dbMCS contains 1271 mutations and 4427 mutation-disease-drug associations (3151 and 1276 for sensitivity and resistance, respectively) with their PubMed indexed articles. By comparing the mutations in dbMCS with the putative neutral polymorphisms, we investigated the characteristics of drug sensitivity markers. We found that the mutation markers tend to significantly impact on high-conservative regions both in DNA sequences and protein domains. And some of them presented pleiotropic effects depending on the tumor context, appearing concurrently in the sensitivity and resistance categories. In addition, we preliminarily explored the machine learning-based methods for identifying mutation markers of anti-cancer drug sensitivity and produced optimistic results, which suggests that a reliable dataset may provide new insights and essential clues for future cancer pharmacogenomics studies. dbMCS is available at http://bioinfo.aielab.cc/dbMCS/.
Collapse
|
16
|
Zhou Y, Lauschke VM. Computational Tools to Assess the Functional Consequences of Rare and Noncoding Pharmacogenetic Variability. Clin Pharmacol Ther 2021; 110:626-636. [PMID: 33998671 DOI: 10.1002/cpt.2289] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 05/07/2021] [Indexed: 12/19/2022]
Abstract
Interindividual differences in drug response are a common concern in both drug development and across layers of care. While genetics clearly influences drug response and toxicity of many drugs, a substantial fraction of the heritable pharmacological and toxicological variability remains unexplained by known genetic polymorphisms. In recent years, population-scale sequencing projects have unveiled tens of thousands of coding and noncoding pharmacogenetic variants with unclear functional effects that might explain at least part of this missing heritability. However, translating these personalized variant signatures into drug response predictions and actionable advice remains challenging and constitutes one of the most important frontiers of contemporary pharmacogenomics. Conventional prediction methods are primarily based on evolutionary conservation, which drastically reduces their predictive accuracy when applied to poorly conserved pharmacogenes. Here, we review the current state-of-the-art of computational variant effect predictors across variant classes and critically discuss their utility for pharmacogenomics. Besides missense variants, we discuss recent progress in the evaluation of synonymous, splice, and noncoding variations. Furthermore, we discuss emerging possibilities to assess haplotypes and structural variations. We advocate for the development of algorithms trained on pharmacogenomic instead of pathogenic data sets to improve the predictive accuracy in order to facilitate the utilization of next-generation sequencing data for personalized clinical decision support and precision pharmacogenomics.
Collapse
Affiliation(s)
- Yitian Zhou
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| | - Volker M Lauschke
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
17
|
Gaither JBS, Lammi GE, Li JL, Gordon DM, Kuck HC, Kelly BJ, Fitch JR, White P. Synonymous variants that disrupt messenger RNA structure are significantly constrained in the human population. Gigascience 2021; 10:6211353. [PMID: 33822938 PMCID: PMC8023685 DOI: 10.1093/gigascience/giab023] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2020] [Revised: 02/10/2021] [Accepted: 03/10/2021] [Indexed: 12/16/2022] Open
Abstract
Background The role of synonymous single-nucleotide variants in human health and disease is poorly understood, yet evidence suggests that this class of “silent” genetic variation plays multiple regulatory roles in both transcription and translation. One mechanism by which synonymous codons direct and modulate the translational process is through alteration of the elaborate structure formed by single-stranded mRNA molecules. While tools to computationally predict the effect of non-synonymous variants on protein structure are plentiful, analogous tools to systematically assess how synonymous variants might disrupt mRNA structure are lacking. Results We developed novel software using a parallel processing framework for large-scale generation of secondary RNA structures and folding statistics for the transcriptome of any species. Focusing our analysis on the human transcriptome, we calculated 5 billion RNA-folding statistics for 469 million single-nucleotide variants in 45,800 transcripts. By considering the impact of all possible synonymous variants globally, we discover that synonymous variants predicted to disrupt mRNA structure have significantly lower rates of incidence in the human population. Conclusions These findings support the hypothesis that synonymous variants may play a role in genetic disorders due to their effects on mRNA structure. To evaluate the potential pathogenic impact of synonymous variants, we provide RNA stability, edge distance, and diversity metrics for every nucleotide in the human transcriptome and introduce a “Structural Predictivity Index” (SPI) to quantify structural constraint operating on any synonymous variant. Because no single RNA-folding metric can capture the diversity of mechanisms by which a variant could alter secondary mRNA structure, we generated a SUmmarized RNA Folding (SURF) metric to provide a single measurement to predict the impact of secondary structure altering variants in human genetic studies.
Collapse
Affiliation(s)
- Jeffrey B S Gaither
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
| | - Grant E Lammi
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
| | - James L Li
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
| | - David M Gordon
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
| | - Harkness C Kuck
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
| | - Benjamin J Kelly
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
| | - James R Fitch
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
| | - Peter White
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA.,Department of Pediatrics, College of Medicine, The Ohio State University, 370 W. 9th Avenue, Columbus, OH 43210, USA
| |
Collapse
|
18
|
An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants. Genes (Basel) 2020; 11:genes11091102. [PMID: 32967157 PMCID: PMC7565489 DOI: 10.3390/genes11091102] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 09/08/2020] [Accepted: 09/17/2020] [Indexed: 12/18/2022] Open
Abstract
Single-nucleotide variants (SNVs) are a major form of genetic variation in the human genome that contribute to various disorders. There are two types of SNVs, namely non-synonymous (missense) variants (nsSNVs) and synonymous variants (sSNVs), predominantly involved in RNA processing or gene regulation. sSNVs, unlike missense or nsSNVs, do not alter the amino acid sequences, thereby making challenging candidates for downstream functional studies. Numerous computational methods have been developed to evaluate the clinical impact of nsSNVs, but very few methods are available for understanding the effects of sSNVs. For this analysis, we have downloaded sSNVs from the ClinVar database with various features such as conservation, DNA-RNA, and splicing properties. We performed feature selection and implemented an ensemble random forest (RF) classification algorithm to build a classifier to predict the pathogenicity of the sSNVs. We demonstrate that the ensemble predictor with selected features (20 features) enhances the classification of sSNVs into two categories, pathogenic and benign, with high accuracy (87%), precision (79%), and recall (91%). Furthermore, we used this prediction model to reclassify sSNVs with unknown clinical significance. Finally, the method is very robust and can be used to predict the effect of other unknown sSNVs.
Collapse
|
19
|
Zhu C, Miller M, Zeng Z, Wang Y, Mahlich Y, Aptekmann A, Bromberg Y. Computational Approaches for Unraveling the Effects of Variation in the Human Genome and Microbiome. Annu Rev Biomed Data Sci 2020. [DOI: 10.1146/annurev-biodatasci-030320-041014] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The past two decades of analytical efforts have highlighted how much more remains to be learned about the human genome and, particularly, its complex involvement in promoting disease development and progression. While numerous computational tools exist for the assessment of the functional and pathogenic effects of genome variants, their precision is far from satisfactory, particularly for clinical use. Accumulating evidence also suggests that the human microbiome's interaction with the human genome plays a critical role in determining health and disease states. While numerous microbial taxonomic groups and molecular functions of the human microbiome have been associated with disease, the reproducibility of these findings is lacking. The human microbiome–genome interaction in healthy individuals is even less well understood. This review summarizes the available computational methods built to analyze the effect of variation in the human genome and microbiome. We address the applicability and precision of these methods across their possible uses. We also briefly discuss the exciting, necessary, and now possible integration of the two types of data to improve the understanding of pathogenicity mechanisms.
Collapse
Affiliation(s)
- Chengsheng Zhu
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA;,
| | - Maximilian Miller
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA;,
| | - Zishuo Zeng
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA;,
| | - Yanran Wang
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA;,
| | - Yannick Mahlich
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA;,
| | - Ariel Aptekmann
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA;,
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA;,
- Department of Genetics, Rutgers University, Piscataway, New Jersey 08854, USA
| |
Collapse
|
20
|
Yue Z, Chu X, Xia J. PredCID: prediction of driver frameshift indels in human cancer. Brief Bioinform 2020; 22:5860690. [PMID: 32591774 DOI: 10.1093/bib/bbaa119] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2020] [Revised: 05/14/2020] [Accepted: 05/16/2020] [Indexed: 11/12/2022] Open
Abstract
The discrimination of driver from passenger mutations has been a hot topic in the field of cancer biology. Although recent advances have improved the identification of driver mutations in cancer genomic research, there is no computational method specific for the cancer frameshift indels (insertions or/and deletions) yet. In addition, existing pathogenic frameshift indel predictors may suffer from plenty of missing values because of different choices of transcripts during the variant annotation processes. In this study, we proposed a computational model, called PredCID (Predictor for Cancer driver frameshift InDels), for accurately predicting cancer driver frameshift indels. Gene, DNA, transcript and protein level features are combined together and selected for classification with eXtreme Gradient Boosting classifier. Benchmarking results on the cross-validation dataset and independent dataset showed that PredCID achieves better and robust performance compared with existing noncancer-specific methods in distinguishing cancer driver frameshift indels from passengers and is therefore a valuable method for deeper understanding of frameshift indels in human cancer. PredCID is freely available for academic research at http://bioinfo.ahu.edu.cn:8080/PredCID.
Collapse
Affiliation(s)
| | - Xinlu Chu
- Institutes of Physical Science and Information Technology, Anhui University
| | - Junfeng Xia
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University
| |
Collapse
|
21
|
Kowalski TW, Gomes JDA, Garcia GBC, Fraga LR, Paixao-Cortes VR, Recamonde-Mendoza M, Sanseverino MTV, Schuler-Faccini L, Vianna FSL. CRL4-Cereblon complex in Thalidomide Embryopathy: a translational investigation. Sci Rep 2020; 10:851. [PMID: 31964914 PMCID: PMC6972723 DOI: 10.1038/s41598-020-57512-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Accepted: 12/28/2019] [Indexed: 01/13/2023] Open
Abstract
The Cereblon-CRL4 complex has been studied predominantly with regards to thalidomide treatment of multiple myeloma. Nevertheless, the role of Cereblon-CRL4 in Thalidomide Embryopathy (TE) is still not understood. Not all embryos exposed to thalidomide develop TE, hence here we evaluate the role of the CRL4-Cereblon complex in TE variability and susceptibility. We sequenced CRBN, DDB1, CUL4A, IKZF1, and IKZF3 in individuals with TE. To better interpret the variants, we suggested a score and a heatmap comprising their regulatory effect. Differential gene expression after thalidomide exposure and conservation of the CRL4-Cereblon protein complex were accessed from public repositories. Results suggest a summation effect of Cereblon variants on pre-axial longitudinal limb anomalies, and heatmap scores identify the CUL4A variant rs138961957 as potentially having an effect on TE susceptibility. CRL4-Cereblon gene expression after thalidomide exposure and CLR4-Cereblon protein conservation does not explain the difference in Thalidomide sensitivity between species. In conclusion, we suggest that CRL4-Cereblon variants act through several regulatory mechanisms, which may influence CRL4-Cereblon complex assembly and its ability to bind thalidomide. Human genetic variability must be addressed not only to further understand the susceptibility to TE, but as a crucial element in therapeutics, including in the development of pharmacogenomics strategies.
Collapse
Affiliation(s)
- Thayne Woycinck Kowalski
- Postgraduate Program in Genetics and Molecular Biology, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil. .,Laboratory of Medical and Population Genetics, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil. .,National Institute of Population Medical Genetics (INAGEMP), Porto Alegre, Brazil. .,Genomic Medicine Laboratory, Centro de Pesquisa Experimental, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil. .,National System of Information on Teratogenic Agents (SIAT), Medical Genetics Service, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil. .,Complexo de Ensino Superior de Cachoeirinha (CESUCA), Cachoeirinha, Brazil.
| | - Julia do Amaral Gomes
- Postgraduate Program in Genetics and Molecular Biology, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil.,Laboratory of Medical and Population Genetics, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil.,National Institute of Population Medical Genetics (INAGEMP), Porto Alegre, Brazil.,Genomic Medicine Laboratory, Centro de Pesquisa Experimental, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil.,National System of Information on Teratogenic Agents (SIAT), Medical Genetics Service, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
| | - Gabriela Barreto Caldas Garcia
- Postgraduate Program in Genetics and Molecular Biology, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
| | - Lucas Rosa Fraga
- Laboratory of Medical and Population Genetics, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil.,National Institute of Population Medical Genetics (INAGEMP), Porto Alegre, Brazil.,Genomic Medicine Laboratory, Centro de Pesquisa Experimental, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil.,National System of Information on Teratogenic Agents (SIAT), Medical Genetics Service, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil.,Department of Morphological Sciences, Institute of Health Sciences, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
| | | | - Mariana Recamonde-Mendoza
- Institute of Informatics, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil.,Bioinformatics Core, Centro de Pesquisa Experimental, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
| | - Maria Teresa Vieira Sanseverino
- Postgraduate Program in Genetics and Molecular Biology, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil.,Laboratory of Medical and Population Genetics, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil.,National Institute of Population Medical Genetics (INAGEMP), Porto Alegre, Brazil.,National System of Information on Teratogenic Agents (SIAT), Medical Genetics Service, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil.,School of Medicine - Pontificia Universidade Catolica do Rio Grande do Sul, Porto Alegre, Brazil
| | - Lavinia Schuler-Faccini
- Postgraduate Program in Genetics and Molecular Biology, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil.,Laboratory of Medical and Population Genetics, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil.,National Institute of Population Medical Genetics (INAGEMP), Porto Alegre, Brazil.,National System of Information on Teratogenic Agents (SIAT), Medical Genetics Service, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
| | - Fernanda Sales Luiz Vianna
- Postgraduate Program in Genetics and Molecular Biology, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil. .,Laboratory of Medical and Population Genetics, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil. .,National Institute of Population Medical Genetics (INAGEMP), Porto Alegre, Brazil. .,Genomic Medicine Laboratory, Centro de Pesquisa Experimental, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil. .,National System of Information on Teratogenic Agents (SIAT), Medical Genetics Service, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil. .,Immunobiology and Immunogenetics Laboratory, Departamento de Genética, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil.
| |
Collapse
|
22
|
Lin H, Hargreaves KA, Li R, Reiter JL, Wang Y, Mort M, Cooper DN, Zhou Y, Zhang C, Eadon MT, Dolan ME, Ipe J, Skaar TC, Liu Y. RegSNPs-intron: a computational framework for predicting pathogenic impact of intronic single nucleotide variants. Genome Biol 2019; 20:254. [PMID: 31779641 PMCID: PMC6883696 DOI: 10.1186/s13059-019-1847-4] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2019] [Accepted: 10/03/2019] [Indexed: 12/27/2022] Open
Abstract
Single nucleotide variants (SNVs) in intronic regions have yet to be systematically investigated for their disease-causing potential. Using known pathogenic and neutral intronic SNVs (iSNVs) as training data, we develop the RegSNPs-intron algorithm based on a random forest classifier that integrates RNA splicing, protein structure, and evolutionary conservation features. RegSNPs-intron showed excellent performance in evaluating the pathogenic impacts of iSNVs. Using a high-throughput functional reporter assay called ASSET-seq (ASsay for Splicing using ExonTrap and sequencing), we evaluate the impact of RegSNPs-intron predictions on splicing outcome. Together, RegSNPs-intron and ASSET-seq enable effective prioritization of iSNVs for disease pathogenesis.
Collapse
Affiliation(s)
- Hai Lin
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, 410 West 10th Street, Suite 5000, Indianapolis, IN, 46202, USA
| | - Katherine A Hargreaves
- Division of Clinical Pharmacology, Department of Medicine, Indiana University School of Medicine, 950 W Walnut St, Suite 419, Indianapolis, IN, 46202, USA
| | - Rudong Li
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, 410 West 10th Street, Suite 5000, Indianapolis, IN, 46202, USA
| | - Jill L Reiter
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, 410 West 10th Street, Suite 5000, Indianapolis, IN, 46202, USA
| | - Yue Wang
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, 410 West 10th Street, Suite 5000, Indianapolis, IN, 46202, USA
| | - Matthew Mort
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - David N Cooper
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Yaoqi Zhou
- Institute for Glycomics and School of Informatics and Communication Technology, Griffith University, Parklands Dr., Southport, QLD, 4215, Australia
| | - Chi Zhang
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, 410 West 10th Street, Suite 5000, Indianapolis, IN, 46202, USA
| | - Michael T Eadon
- Division of Nephrology, Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - M Eileen Dolan
- Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL, 60637, USA
| | - Joseph Ipe
- Division of Clinical Pharmacology, Department of Medicine, Indiana University School of Medicine, 950 W Walnut St, Suite 419, Indianapolis, IN, 46202, USA
| | - Todd C Skaar
- Division of Clinical Pharmacology, Department of Medicine, Indiana University School of Medicine, 950 W Walnut St, Suite 419, Indianapolis, IN, 46202, USA.
| | - Yunlong Liu
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA.
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, 410 West 10th Street, Suite 5000, Indianapolis, IN, 46202, USA.
| |
Collapse
|
23
|
Zeng Z, Bromberg Y. Predicting Functional Effects of Synonymous Variants: A Systematic Review and Perspectives. Front Genet 2019; 10:914. [PMID: 31649718 PMCID: PMC6791167 DOI: 10.3389/fgene.2019.00914] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Accepted: 08/29/2019] [Indexed: 12/13/2022] Open
Abstract
Recent advances in high-throughput experimentation have put the exploration of genome sequences at the forefront of precision medicine. In an effort to interpret the sequencing data, numerous computational methods have been developed for evaluating the effects of genome variants. Interestingly, despite the fact that every person has as many synonymous (sSNV) as non-synonymous single nucleotide variants, our ability to predict their effects is limited. The paucity of experimentally tested sSNV effects appears to be the limiting factor in development of such methods. Here, we summarize the details and evaluate the performance of nine existing computational methods capable of predicting sSNV effects. We used a set of observed and artificially generated variants to approximate large scale performance expectations of these tools. We note that the distribution of these variants across amino acid and codon types suggests purifying evolutionary selection retaining generated variants out of the observed set; i.e., we expect the generated set to be enriched for deleterious variants. Closer inspection of the relationship between the observed variant frequencies and the associated prediction scores identifies predictor-specific scoring thresholds of reliable effect predictions. Notably, across all predictors, the variants scoring above these thresholds were significantly more often generated than observed. which confirms our assumption that the generated set is enriched for deleterious variants. Finally, we find that while the methods differ in their ability to identify severe sSNV effects, no predictor appears capable of definitively recognizing subtle effects of such variants on a large scale.
Collapse
Affiliation(s)
- Zishuo Zeng
- Institute for Quantitative Biomedicine, Rutgers University, Piscataway, NJ, United States
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States
- Department of Genetics, Rutgers University, Human Genetics Institute, Piscataway, NJ, United States
| |
Collapse
|
24
|
Hu Z, Yu C, Furutsuki M, Andreoletti G, Ly M, Hoskins R, Adhikari AN, Brenner SE. VIPdb, a genetic Variant Impact Predictor Database. Hum Mutat 2019; 40:1202-1214. [PMID: 31283070 PMCID: PMC7288905 DOI: 10.1002/humu.23858] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 06/27/2019] [Indexed: 12/30/2022]
Abstract
Genome sequencing identifies vast number of genetic variants. Predicting these variants' molecular and clinical effects is one of the preeminent challenges in human genetics. Accurate prediction of the impact of genetic variants improves our understanding of how genetic information is conveyed to molecular and cellular functions, and is an essential step towards precision medicine. Over one hundred tools/resources have been developed specifically for this purpose. We summarize these tools as well as their characteristics, in the genetic Variant Impact Predictor Database (VIPdb). This database will help researchers and clinicians explore appropriate tools, and inform the development of improved methods. VIPdb can be browsed and downloaded at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Changhua Yu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Department of Bioengineering, University of California, Berkeley, California 94720, USA
| | - Mabel Furutsuki
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720, USA
| | - Gaia Andreoletti
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Melissa Ly
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Division of Data Sciences, University of California, Berkeley, California 94720, USA
| | - Roger Hoskins
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Aashish N. Adhikari
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Steven E. Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
25
|
Mount SM, Avsec Ž, Carmel L, Casadio R, Çelik MH, Chen K, Cheng J, Cohen NE, Fairbrother WG, Fenesh T, Gagneur J, Gotea V, Holzer T, Lin CF, Martelli PL, Naito T, Nguyen TYD, Savojardo C, Unger R, Wang R, Yang Y, Zhao H. Assessing predictions of the impact of variants on splicing in CAGI5. Hum Mutat 2019; 40:1215-1224. [PMID: 31301154 DOI: 10.1002/humu.23869] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Revised: 06/20/2019] [Accepted: 07/10/2019] [Indexed: 12/28/2022]
Abstract
Precision medicine and sequence-based clinical diagnostics seek to predict disease risk or to identify causative variants from sequencing data. The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotype-phenotype prediction challenges; participants build models, undergo assessment, and share key findings. In the past, few CAGI challenges have addressed the impact of sequence variants on splicing. In CAGI5, two challenges (Vex-seq and MaPSY) involved prediction of the effect of variants, primarily single-nucleotide changes, on splicing. Although there are significant differences between these two challenges, both involved prediction of results from high-throughput exon inclusion assays. Here, we discuss the methods used to predict the impact of these variants on splicing, their performance, strengths, and weaknesses, and prospects for predicting the impact of sequence variation on splicing and disease phenotypes.
Collapse
Affiliation(s)
- Stephen M Mount
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland
| | - Žiga Avsec
- Department of Informatics, Technical University of Munich, Garching, Germany
| | - Liran Carmel
- Department of Genetics, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Rita Casadio
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | | | - Ken Chen
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
| | - Jun Cheng
- Department of Informatics, Technical University of Munich, Garching, Germany
| | - Noa E Cohen
- Department of Genetics, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel.,The integrated program for Computer Science and Computational Biology, School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - William G Fairbrother
- Department of Molecular Biology, Cell Biology, and Biochemistry, Center For Computational Biology, Brown University, Providence, Rhode Island
| | - Tzila Fenesh
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| | - Julien Gagneur
- Department of Informatics, Technical University of Munich, Garching, Germany
| | - Valer Gotea
- National Human Genome Research Institute (NHGRI), National Institutes of Health (NIH), Bethesda, Maryland
| | - Tamar Holzer
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| | - Chiao-Feng Lin
- Translational Informatics, DNAnexus, Mountain View, California
| | - Pier Luigi Martelli
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | - Tatsuhiko Naito
- Department of Neurology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | | | - Castrense Savojardo
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | - Ron Unger
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| | - Robert Wang
- Department of Bioengineering, University of California, Berkeley, California.,Department of Plant and Molecular Biology, University of California, Berkeley, California
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
| | - Huiying Zhao
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
26
|
Chen K, Lu Y, Zhao H, Yang Y. Predicting the change of exon splicing caused by genetic variant using support vector regression. Hum Mutat 2019; 40:1235-1242. [PMID: 31070294 DOI: 10.1002/humu.23785] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 04/01/2019] [Accepted: 05/03/2019] [Indexed: 12/21/2022]
Abstract
Alternative splicing can be disrupted by genetic variants that are related to diseases like cancers. Discovering the influence of genetic variations on the alternative splicing will improve the understanding of the pathogenesis of variants. Here, we developed a new approach, PredPSI-SVR to predict the impact of variants on exon skipping events by using the support vector regression. From the sequence of a particular exon and its flanking regions, 42 comprehensive features related to splicing events were extracted. By using a greedy feature selection algorithm, we found eight features contributing most to the prediction. The trained model achieved a Pearson correlation coefficient (PCC) of 0.570 in the 10-fold cross-validation based on the training data set provided by the "vex-seq" challenge of the 5th Critical Assessment of Genome Interpretation. In the blind test also held by the challenge, our prediction ranked the 2nd with a PCC of 0.566 that demonstrates the robustness of our method. A further test indicated that the PredPSI-SVR is helpful in prioritizing deleterious synonymous mutations. The method is available on https://github.com/chenkenbio/PredPSI-SVR.
Collapse
Affiliation(s)
- Ken Chen
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
| | - Yutong Lu
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China.,Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University), Ministry of Education, Guangzhou, China
| |
Collapse
|
27
|
Cheng N, Li M, Zhao L, Zhang B, Yang Y, Zheng CH, Xia J. Comparison and integration of computational methods for deleterious synonymous mutation prediction. Brief Bioinform 2019; 21:970-981. [DOI: 10.1093/bib/bbz047] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 03/28/2019] [Accepted: 03/29/2019] [Indexed: 01/03/2023] Open
Abstract
Abstract
Synonymous mutations do not change the encoded amino acids but may alter the structure or function of an mRNA in ways that impact gene function. Advances in next generation sequencing technologies have detected numerous synonymous mutations in the human genome. Several computational models have been proposed to predict deleterious synonymous mutations, which have greatly facilitated the development of this important field. Consequently, there is an urgent need to assess the state-of-the-art computational methods for deleterious synonymous mutation prediction to further advance the existing methodologies and to improve performance. In this regard, we systematically compared a total of 10 computational methods (including specific method for deleterious synonymous mutation and general method for single nucleotide mutation) in terms of the algorithms used, calculated features, performance evaluation and software usability. In addition, we constructed two carefully curated independent test datasets and accordingly assessed the robustness and scalability of these different computational methods for the identification of deleterious synonymous mutations. In an effort to improve predictive performance, we established an ensemble model, named Prediction of Deleterious Synonymous Mutation (PrDSM), which averages the ratings generated by the three most accurate predictors. Our benchmark tests demonstrated that the ensemble model PrDSM outperformed the reviewed tools for the prediction of deleterious synonymous mutations. Using the ensemble model, we developed an accessible online predictor, PrDSM, available at http://bioinfo.ahu.edu.cn:8080/PrDSM/. We hope that this comprehensive survey and the proposed strategy for building more accurate models can serve as a useful guide for inspiring future developments of computational methods for deleterious synonymous mutation prediction.
Collapse
Affiliation(s)
- Na Cheng
- Institutes of Physical Science and Information Technology, Anhui University
| | - Menglu Li
- School of Computer Science and Technology, Anhui University
| | - Le Zhao
- School of Computer Science and Technology, Anhui University
| | - Bo Zhang
- School of Computer Science and Technology, Anhui University
| | - Yuhua Yang
- School of Computer Science and Technology, Anhui University
| | - Chun-Hou Zheng
- School of Computer Science and Technology, Anhui University
| | - Junfeng Xia
- Institutes of Physical Science and Information Technology, Anhui University
| |
Collapse
|
28
|
Carli D, Giorgio E, Pantaleoni F, Bruselles A, Barresi S, Riberi E, Licciardi F, Gazzin A, Baldassarre G, Pizzi S, Niceta M, Radio FC, Molinatto C, Montin D, Calvo PL, Ciolfi A, Fleischer N, Ferrero GB, Brusco A, Tartaglia M. NBAS
pathogenic variants: Defining the associated clinical and facial phenotype and genotype–phenotype correlations. Hum Mutat 2019; 40:721-728. [DOI: 10.1002/humu.23734] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 02/05/2019] [Accepted: 02/28/2019] [Indexed: 12/22/2022]
Affiliation(s)
- Diana Carli
- Department of Public Health and PediatricsUniversity of TorinoTorino Italy
| | - Elisa Giorgio
- Department of Medical SciencesUniversity of TorinoTorino Italy
| | - Francesca Pantaleoni
- Genetics and Rare Diseases Research DivisionOspedale Pediatrico Bambino Gesù IRCSSRome Italy
| | - Alessandro Bruselles
- Department of Oncology and Molecular MedicineIstituto Superiore di SanitàRome Italy
| | - Sabina Barresi
- Genetics and Rare Diseases Research DivisionOspedale Pediatrico Bambino Gesù IRCSSRome Italy
| | - Evelise Riberi
- Department of Public Health and PediatricsUniversity of TorinoTorino Italy
| | | | - Andrea Gazzin
- Department of Public Health and PediatricsUniversity of TorinoTorino Italy
| | | | - Simone Pizzi
- Genetics and Rare Diseases Research DivisionOspedale Pediatrico Bambino Gesù IRCSSRome Italy
| | - Marcello Niceta
- Genetics and Rare Diseases Research DivisionOspedale Pediatrico Bambino Gesù IRCSSRome Italy
| | - Francesca C. Radio
- Genetics and Rare Diseases Research DivisionOspedale Pediatrico Bambino Gesù IRCSSRome Italy
| | - Cristina Molinatto
- Department of Public Health and PediatricsUniversity of TorinoTorino Italy
| | - Davide Montin
- Department of Public Health and PediatricsUniversity of TorinoTorino Italy
| | - Pier L. Calvo
- Pediatric Gastroenterology UnitCittà della Salute e della Scienza University HospitalTorino Italy
| | - Andrea Ciolfi
- Genetics and Rare Diseases Research DivisionOspedale Pediatrico Bambino Gesù IRCSSRome Italy
| | | | | | - Alfredo Brusco
- Department of Medical SciencesUniversity of TorinoTorino Italy
- Medical Genetics UnitCittà della Salute e della Scienza University HospitalTorino Italy
| | - Marco Tartaglia
- Genetics and Rare Diseases Research DivisionOspedale Pediatrico Bambino Gesù IRCSSRome Italy
| |
Collapse
|
29
|
Shi F, Yao Y, Bin Y, Zheng CH, Xia J. Computational identification of deleterious synonymous variants in human genomes using a feature-based approach. BMC Med Genomics 2019; 12:12. [PMID: 30704475 PMCID: PMC6357349 DOI: 10.1186/s12920-018-0455-6] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Although synonymous single nucleotide variants (sSNVs) do not alter the protein sequences, they have been shown to play an important role in human disease. Distinguishing pathogenic sSNVs from neutral ones is challenging because pathogenic sSNVs tend to have low prevalence. Although many methods have been developed for predicting the functional impact of single nucleotide variants, only a few have been specifically designed for identifying pathogenic sSNVs. RESULTS In this work, we describe a computational model, IDSV (Identification of Deleterious Synonymous Variants), which uses random forest (RF) to detect deleterious sSNVs in human genomes. We systematically investigate a total of 74 multifaceted features across seven categories: splicing, conservation, codon usage, sequence, pre-mRNA folding energy, translation efficiency, and function regions annotation features. Then, to remove redundant and irrelevant features and improve the prediction performance, feature selection is employed using the sequential backward selection method. Based on the optimized 10 features, a RF classifier is developed to identify deleterious sSNVs. The results on benchmark datasets show that IDSV outperforms other state-of-the-art methods in identifying sSNVs that are pathogenic. CONCLUSIONS We have developed an efficient feature-based prediction approach (IDSV) for deleterious sSNVs by using a wide variety of features. Among all the features, a compact and useful feature subset that has an important implication for identifying deleterious sSNVs is identified. Our results indicate that besides splicing and conservation features, a new translation efficiency feature is also an informative feature for identifying deleterious sSNVs. While the function regions annotation and sequence features are weakly informative, they may have the ability to discriminate deleterious sSNVs from benign ones when combined with other features. The data and source code are available on website http://bioinfo.ahu.edu.cn:8080/IDSV .
Collapse
Affiliation(s)
- Fang Shi
- College of Electrical Engineering and Automation, Anhui University, Hefei, 230601, Anhui, China
| | - Yao Yao
- Institute of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, 111 Jiulong Avenue, Hefei, 230601, China
| | - Yannan Bin
- Institute of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, 111 Jiulong Avenue, Hefei, 230601, China
| | - Chun-Hou Zheng
- College of Electrical Engineering and Automation, Anhui University, Hefei, 230601, Anhui, China
| | - Junfeng Xia
- Institute of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, 111 Jiulong Avenue, Hefei, 230601, China.
| |
Collapse
|
30
|
Li Y, Furhang R, Ray A, Duncan T, Soucy J, Mahdi R, Chaitankar V, Gieser L, Poliakov E, Qian H, Liu P, Dong L, Rogozin IB, Redmond TM. Aberrant RNA splicing is the major pathogenic effect in a knock-in mouse model of the dominantly inherited c.1430A>G human RPE65 mutation. Hum Mutat 2019; 40:426-443. [PMID: 30628748 DOI: 10.1002/humu.23706] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Revised: 12/14/2018] [Accepted: 01/06/2019] [Indexed: 01/03/2023]
Abstract
Human RPE65 mutations cause a spectrum of retinal dystrophies that result in blindness. While RPE65 mutations have been almost invariably recessively inherited, a c.1430A>G (p.(D477G)) mutation has been reported to cause autosomal dominant retinitis pigmentosa (adRP). To study the pathogenesis of this human mutation, we have replicated the mutation in a knock-in (KI) mouse model using CRISPR/Cas9-mediated genome editing. Significantly, in contrast to human patients, heterozygous KI mice do not exhibit any phenotypes in visual function tests. When raised in regular vivarium conditions, homozygous KI mice display relatively undisturbed visual functions with minimal retinal structural changes. However, KI/KI mouse retinae are more sensitive to light exposure and exhibit signs of degenerative features when subjected to light stress. We find that instead of merely producing a missense mutant protein, the A>G nucleotide substitution greatly affects appropriate splicing of Rpe65 mRNA by generating an ectopic splice site in comparable context to the canonical one, thereby disrupting RPE65 protein expression. Similar splicing defects were also confirmed for the human RPE65 c.1430G mutant in an in vitro Exontrap assay. Our data demonstrate that a splicing defect is associated with c.1430G pathogenesis, and therefore provide insights in the therapeutic strategy for human patients.
Collapse
Affiliation(s)
- Yan Li
- Laboratory of Retinal Cell & Molecular Biology, National Eye Institute, NIH, Bethesda, Maryland
| | - Rachel Furhang
- Laboratory of Retinal Cell & Molecular Biology, National Eye Institute, NIH, Bethesda, Maryland
| | - Amanda Ray
- Laboratory of Retinal Cell & Molecular Biology, National Eye Institute, NIH, Bethesda, Maryland
| | - Todd Duncan
- Laboratory of Retinal Cell & Molecular Biology, National Eye Institute, NIH, Bethesda, Maryland
| | - Joseph Soucy
- Laboratory of Retinal Cell & Molecular Biology, National Eye Institute, NIH, Bethesda, Maryland
| | - Rashid Mahdi
- Laboratory of Retinal Cell & Molecular Biology, National Eye Institute, NIH, Bethesda, Maryland
| | - Vijender Chaitankar
- Neurobiology-Neurodegeneration & Repair Laboratory, National Eye Institute, NIH, Bethesda, Maryland
| | - Linn Gieser
- Neurobiology-Neurodegeneration & Repair Laboratory, National Eye Institute, NIH, Bethesda, Maryland
| | - Eugenia Poliakov
- Laboratory of Retinal Cell & Molecular Biology, National Eye Institute, NIH, Bethesda, Maryland
| | - Haohua Qian
- Visual Function Core, National Eye Institute, NIH, Bethesda, Maryland
| | - Pinghu Liu
- Genetic Engineering Core, National Eye Institute, NIH, Bethesda, Maryland
| | - Lijin Dong
- Genetic Engineering Core, National Eye Institute, NIH, Bethesda, Maryland
| | - Igor B Rogozin
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, Maryland
| | - T Michael Redmond
- Laboratory of Retinal Cell & Molecular Biology, National Eye Institute, NIH, Bethesda, Maryland
| |
Collapse
|
31
|
Savisaar R, Hurst LD. Exonic splice regulation imposes strong selection at synonymous sites. Genome Res 2018; 28:1442-1454. [PMID: 30143596 PMCID: PMC6169883 DOI: 10.1101/gr.233999.117] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Accepted: 07/31/2018] [Indexed: 01/17/2023]
Abstract
What proportion of coding sequence nucleotides have roles in splicing, and how strong is the selection that maintains them? Despite a large body of research into exonic splice regulatory signals, these questions have not been answered. This is because, to our knowledge, previous investigations have not explicitly disentangled the frequency of splice regulatory elements from the strength of the evolutionary constraint under which they evolve. Current data are consistent both with a scenario of weak and diffuse constraint, enveloping large swaths of sequence, as well as with well-defined pockets of strong purifying selection. In the former case, natural selection on exonic splice enhancers (ESEs) might primarily act as a slight modifier of codon usage bias. In the latter, mutations that disrupt ESEs are likely to have large fitness and, potentially, clinical effects. To distinguish between these scenarios, we used several different methods to determine the distribution of selection coefficients for new mutations within ESEs. The analyses converged to suggest that ∼15%-20% of fourfold degenerate sites are part of functional ESEs. Most of these sites are under strong evolutionary constraint. Therefore, exonic splice regulation does not simply impose a weak bias that gently nudges coding sequence evolution in a particular direction. Rather, the selection to preserve these motifs is a strong force that severely constrains the evolution of a substantial proportion of coding nucleotides. Thus synonymous mutations that disrupt ESEs should be considered as a potentially common cause of single-locus genetic disorders.
Collapse
Affiliation(s)
- Rosina Savisaar
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, United Kingdom
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, United Kingdom
| |
Collapse
|