1
|
Zhou Y, Lauschke VM. Next-generation sequencing in pharmacogenomics - fit for clinical decision support? Expert Rev Clin Pharmacol 2024; 17:213-223. [PMID: 38247431 DOI: 10.1080/17512433.2024.2307418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 01/16/2024] [Indexed: 01/23/2024]
Abstract
INTRODUCTION The technological advances of sequencing methods during the past 20 years have fuelled the generation of large amounts of sequencing data that comprise common variations, as well as millions of rare and personal variants that would not be identified by conventional genotyping. While comprehensive sequencing is technically feasible, its clinical utility for guiding personalized treatment decisions remains controversial. AREAS COVERED We discuss the opportunities and challenges of comprehensive sequencing compared to targeted genotyping for pharmacogenomic applications. Current pharmacogenomic sequencing panels are heterogeneous and clinical actionability of the included genes is not a major focus. We provide a current overview and critical discussion of how current studies utilize sequencing data either retrospectively from biobanks, databases or repurposed diagnostic sequencing, or prospectively using pharmacogenomic sequencing. EXPERT OPINION While sequencing-based pharmacogenomics has provided important insights into genetic variations underlying the safety and efficacy of a multitude pharmacological treatments, important hurdles for the clinical implementation of pharmacogenomic sequencing remain. We identify gaps in the interpretation of pharmacogenetic variants, technical challenges pertaining to complex loci and variant phasing, as well as unclear cost-effectiveness and incomplete reimbursement. It is critical to address these challenges in order to realize the promising prospects of pharmacogenomic sequencing.
Collapse
Affiliation(s)
- Yitian Zhou
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| | - Volker M Lauschke
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
- Dr Margarete Fischer-Bosch Institute of Clinical Pharmacology, Stuttgart, Germany
- University of Tübingen, Tübingen, Germany
| |
Collapse
|
2
|
Cankara F, Doğan T. ASCARIS: Positional feature annotation and protein structure-based representation of single amino acid variations. Comput Struct Biotechnol J 2023; 21:4743-4758. [PMID: 37822561 PMCID: PMC10562615 DOI: 10.1016/j.csbj.2023.09.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 09/15/2023] [Accepted: 09/15/2023] [Indexed: 10/13/2023] Open
Abstract
Background Genomic variations may cause deleterious effects on protein functionality and perturb biological processes. Elucidating the effects of variations is critical for developing novel treatment strategies for diseases of genetic origin. Computational approaches have been aiding the work in this field by modeling and analyzing the mutational landscape. However, new approaches are required, especially for accurate representation and data-centric analysis of sequence variations. Method In this study, we propose ASCARIS (Annotation and StruCture-bAsed RepresentatIon of Single amino acid variations), a method for the featurization (i.e., quantitative representation) of single amino acid variations (SAVs), which could be used for a variety of purposes, such as predicting their functional effects or building multi-omics-based integrative models. ASCARIS utilizes the direct and spatial correspondence between the location of the SAV on the sequence/structure and 30 different types of positional feature annotations (e.g., active/lipidation/glycosylation sites; calcium/metal/DNA binding, inter/transmembrane regions, etc.), along with structural features and physicochemical properties. The main novelty of this method lies in constructing reusable numerical representations of SAVs via functional annotations. Results We statistically analyzed the relationship between these features and the consequences of variations and found that each carries information in this regard. To investigate potential applications of ASCARIS, we trained variant effect prediction models that utilize our SAV representations as input. We carried out an ablation study and a comparison against the state-of-the-art methods and observed that ASCARIS has a competing and complementary performance against widely-used predictors. ASCARIS can be used alone or in combination with other approaches to represent SAVs from a functional perspective. ASCARIS is available as a programmatic tool at https://github.com/HUBioDataLab/ASCARIS and as a web-service at https://huggingface.co/spaces/HUBioDataLab/ASCARIS.
Collapse
Affiliation(s)
- Fatma Cankara
- Biological Data Science Laboratory, Dept. of Computer Engineering, Hacettepe University, Ankara, Turkey
- Department of Health Informatics, Graduate School of Informatics, METU, Ankara, Turkey
- Department of Computational Sciences and Engineering, Koc University, Istanbul, Turkey
| | - Tunca Doğan
- Biological Data Science Laboratory, Dept. of Computer Engineering, Hacettepe University, Ankara, Turkey
- Institute of Informatics, Hacettepe University, Ankara, Turkey
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey
| |
Collapse
|
3
|
Sun Y, Shen Y. Structure-Informed Protein Language Models are Robust Predictors for Variant Effects. Res Sq 2023:rs.3.rs-3219092. [PMID: 37577664 PMCID: PMC10418537 DOI: 10.21203/rs.3.rs-3219092/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Predicting protein variant effects through machine learning is often challenged by the scarcity of experimentally measured effect labels. Recently, protein language models (pLMs) emerge as zero-shot predictors without the need of effect labels, by modeling the evolutionary distribution of functional protein sequences. However, biological contexts important to variant effects are implicitly modeled and effectively marginalized. By assessing the sequence awareness and the structure awareness of pLMs, we find that their improvements often correlate with better variant effect prediction but their tradeoff can present a barrier as observed in over-finetuning to specific family sequences. We introduce a framework of structure-informed pLMs (SI-pLMs) to inject protein structural contexts purposely and controllably, by extending masked sequence denoising in conventional pLMs to cross-modality denoising. Our SI-pLMs are applicable to revising any sequence-only pLMs through model architecture and training objectives. They do not require structure data as model inputs for variant effect prediction and only use structures as context provider and model regularizer during training. Numerical results over deep mutagenesis scanning benchmarks show that our SI-pLMs, despite relatively compact sizes, are robustly top performers against competing methods including other pLMs, regardless of the target protein family's evolutionary information content or the tendency to overfitting / over-finetuning. Learned distributions in structural contexts could enhance sequence distributions in predicting variant effects. Ablation studies reveal major contributing factors and analyses of sequence embeddings provide further insights. The data and scripts are available at https://github.com/Stephen2526/Structure-informed_PLM.git.
Collapse
Affiliation(s)
- Yuanfei Sun
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, 77843, Texas, USA
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, 77843, Texas, USA
- Department of Computer Science and Engineering, Texas A&M University, College Station, 77843, Texas, USA
- Institute of Biosciences and Technology and Department of Translational Medical Sciences, Texas A&M University, Houston, 77030, Texas, USA
| |
Collapse
|
4
|
Himmelreich N, Bertoldi M, Alfadhel M, Alghamdi MA, Anikster Y, Bao X, Bashiri FA, Zeev BB, Bisello G, Ceylan AC, Chien YH, Choy YS, Elsea SH, Flint L, García-Cazorla À, Gijavanekar C, Gümüş EY, Hamad MH, Hişmi B, Honzik T, Hübschmann OK, Hwu WL, Ibáñez-Micó S, Jeltsch K, Juliá-Palacios N, Kasapkara ÇS, Kurian MA, Kusmierska K, Liu N, Ngu LH, Odom JD, Ong WP, Opladen T, Oppeboen M, Pearl PL, Pérez B, Pons R, Rygiel AM, Shien TE, Spaull R, Sykut-Cegielska J, Tabarki B, Tangeraas T, Thöny B, Wassenberg T, Wen Y, Yakob Y, Yin JGC, Zeman J, Blau N. Prevalence of DDC genotypes in patients with aromatic L-amino acid decarboxylase (AADC) deficiency and in silico prediction of structural protein changes. Mol Genet Metab 2023; 139:107624. [PMID: 37348148 DOI: 10.1016/j.ymgme.2023.107624] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 05/29/2023] [Accepted: 05/30/2023] [Indexed: 06/24/2023]
Abstract
Aromatic L-amino acid decarboxylase (AADC) deficiency is a rare autosomal recessive genetic disorder affecting the biosynthesis of dopamine, a precursor of both norepinephrine and epinephrine, and serotonin. Diagnosis is based on the analysis of CSF or plasma metabolites, AADC activity in plasma and genetic testing for variants in the DDC gene. The exact prevalence of AADC deficiency, the number of patients, and the variant and genotype prevalence are not known. Here, we present the DDC variant (n = 143) and genotype (n = 151) prevalence of 348 patients with AADC deficiency, 121 of whom were previously not reported. In addition, we report 26 new DDC variants, classify them according to the ACMG/AMP/ACGS recommendations for pathogenicity and score them based on the predicted structural effect. The splice variant c.714+4A>T, with a founder effect in Taiwan and China, was the most common variant (allele frequency = 32.4%), and c.[714+4A>T];[714+4A>T] was the most common genotype (genotype frequency = 21.3%). Approximately 90% of genotypes had variants classified as pathogenic or likely pathogenic, while 7% had one VUS allele and 3% had two VUS alleles. Only one benign variant was reported. Homozygous and compound heterozygous genotypes were interpreted in terms of AADC protein and categorized as: i) devoid of full-length AADC, ii) bearing one type of AADC homodimeric variant or iii) producing an AADC protein population composed of two homodimeric and one heterodimeric variant. Based on structural features, a score was attributed for all homodimers, and a tentative prediction was advanced for the heterodimer. Almost all AADC protein variants were pathogenic or likely pathogenic.
Collapse
Affiliation(s)
- Nastassja Himmelreich
- Dietmar-Hopp Metabolic Center and Centre for Pediatrics and Adolescent Medicine, University Children's Hospital, Heidelberg, Germany
| | - Mariarita Bertoldi
- Department of Neurosciences, Biomedicine and Movement Sciences, University of Verona, Verona, Italy
| | - Majid Alfadhel
- Medical Genomic Research Department, King Abdullah International Medical Research Center (KAIMRC), King Saud Bin Abdulaziz University for Health Sciences, Ministry of National Guard Health Affairs, Riyadh, Saudi Arabia; Genetics and Precision Medicine Department, King Abdullah Specialized Children's Hospital, King Abdulaziz Medical City, Ministry of National Guard Health Affairs, Riyadh, Saudi Arabia
| | - Malak Ali Alghamdi
- Medical Genetic Division, Pediatric Department, College of Medicine, King Saud University, Riyadh, SA, Saudi Arabia
| | - Yair Anikster
- Metabolic Disease Unit, The Edmond and Lily Safra Childrens Hospital, Sheba Medical Center, Tel Hashomer, Sackler School of Medicine, Tel Aviv University, Israel
| | - Xinhua Bao
- Department of Pediatrics, Peking University First Hospital, Beijing, China
| | - Fahad A Bashiri
- Division of Neurology, Department of Pediatrics, College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | - Bruria Ben Zeev
- Pediatric Neurology, Safra Pediatric Hospital, Sheba Medical Center, Sackler School of Medicine, Tel Aviv University, Ramat Gan, Israel
| | - Giovanni Bisello
- Department of Neurosciences, Biomedicine and Movement Sciences, University of Verona, Verona, Italy
| | - Ahmet Cevdet Ceylan
- Ankara Yıldırım Beyazıt University, Department of Medical Genetics, Ankara Bilkent City Hospital, Ankara, Turkey
| | - Yin-Hsiu Chien
- Department of Medical Genetics & Pediatrics, National Taiwan University Hospital, Taipei, Taiwan
| | | | - Sarah H Elsea
- Dept. of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | | | - Àngels García-Cazorla
- Neurometabolic Unit, Department of Neurology, Hospital Sant Joan de Déu, CIBERER, Barcelona, Spain
| | - Charul Gijavanekar
- Dept. of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Emel Yılmaz Gümüş
- Department of Pediatrics and Inherited Metabolic Diseases, Marmara University School of Medicine, Istanbul, Turkey
| | - Muddathir H Hamad
- Neurology Division, Pediatric Department, King Saud University Medical City, Riyadh, SA, Saudi Arabia
| | - Burcu Hişmi
- Department of Pediatrics and Inherited Metabolic Diseases, Marmara University School of Medicine, Istanbul, Turkey
| | - Tomas Honzik
- Dept. of Pediatrics and Inherited Metabolic Disorders, First Faculty of Medicine, Charles University and General University Hospital in Prague, Prague, Czech Republic
| | - Oya Kuseyri Hübschmann
- Division of Neuropediatrics and Metabolic Medicine, University Children's Hospital Heidelberg, Heidelberg, Germany
| | - Wuh-Liang Hwu
- Department of Medical Genetics & Pediatrics, National Taiwan University Hospital, Taipei, Taiwan
| | | | - Kathrin Jeltsch
- Division of Neuropediatrics and Metabolic Medicine, University Children's Hospital Heidelberg, Heidelberg, Germany
| | - Natalia Juliá-Palacios
- Neurometabolic Unit, Department of Neurology, Hospital Sant Joan de Déu, CIBERER, Barcelona, Spain
| | - Çiğdem Seher Kasapkara
- Department of Pediatric Metabolism, Ankara Yıldırım Beyazıt University, Ankara Bilkent City Hospital, Ankara, Turkey
| | - Manju A Kurian
- Developmental Neurosciences, Zayed Centre for Research, UCL GOS-Institute of Child Health & Department of Neurology, Great Ormond Street Hospital, London, United Kingdom
| | - Katarzyna Kusmierska
- Department of Screening and Metabolic Diagnostics, Institute of Mother and Child, Warsaw, Poland
| | - Ning Liu
- Dept. of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Lock Hock Ngu
- Department of Genetics, Hospital Kuala Lumpur, Ministry of Health, Malaysia
| | - John D Odom
- Dept. of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Winnie Peitee Ong
- Department of Genetics, Hospital Kuala Lumpur, Ministry of Health, Malaysia
| | - Thomas Opladen
- Division of Neuropediatrics and Metabolic Medicine, University Children's Hospital Heidelberg, Heidelberg, Germany
| | - Mari Oppeboen
- Children's Department, Division of Child Neurology and Norwegian National Unit for Newborn Screening, Division of Paediatric and Adolescent Medicine, Oslo University Hospital, Oslo, Norway
| | - Phillip L Pearl
- Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Belén Pérez
- Centro de Diagnostico de Enfermedades Moleculares, CIBERER, IdiPAZ, Universidad Autonoma de Madrid, Madrid, Spain
| | - Roser Pons
- First Department of Pediatrics, Aghia Sophia Children's Hospital, University of Athens, Athens, Greece
| | - Agnieszka Magdalena Rygiel
- Department of Medical Genetics, Laboratory of Hereditary Diseases, Institute of Mother and Child, Warsaw, Poland
| | - Tan Ee Shien
- Genetics Service, Department of Paediatrics, KK Women's and Children's Hospital, Singapore
| | - Robert Spaull
- Developmental Neurosciences, Zayed Centre for Research, UCL GOS-Institute of Child Health & Department of Neurology, Great Ormond Street Hospital, London, United Kingdom
| | - Jolanta Sykut-Cegielska
- Department of Inborn Errors of Metabolism and Paediatrics, The Institute of Mother and Child, Warsaw, Poland
| | - Brahim Tabarki
- Division of Neurology, Department of Pediatrics, Prince Sultan Military Medical City, Riyadh, Saudi Arabia
| | - Trine Tangeraas
- Norwegian National Unit for Newborn Screening, Division of Paediatric and Adolescent Medicine, Oslo University Hospital, Oslo, Norway
| | - Beat Thöny
- Divisions of Metabolism, University Children's Hospital, Zürich, Switzerland
| | | | - Yongxin Wen
- Medical Genetic Division, Pediatric Department, College of Medicine, King Saud University, Riyadh, SA, Saudi Arabia
| | - Yusnita Yakob
- Molecular Diagnostics Unit, Specialised Diagnostics Centre, Institute for Medical Research, National Institute of Health, Ministry of Health, Malaysia
| | - Jasmine Goh Chew Yin
- Genetics Service, Department of Paediatrics, KK Women's and Children's Hospital, Singapore
| | - Jiri Zeman
- Dept. of Pediatrics and Inherited Metabolic Disorders, First Faculty of Medicine, Charles University and General University Hospital in Prague, Prague, Czech Republic
| | - Nenad Blau
- Divisions of Metabolism, University Children's Hospital, Zürich, Switzerland.
| |
Collapse
|
5
|
Salz R, Saraiva-Agostinho N, Vorsteveld E, van der Made CI, Kersten S, Stemerdink M, Allen J, Volders PJ, Hunt SE, Hoischen A, 't Hoen PAC. SUsPECT: a pipeline for variant effect prediction based on custom long-read transcriptomes for improved clinical variant annotation. BMC Genomics 2023; 24:305. [PMID: 37280537 PMCID: PMC10245480 DOI: 10.1186/s12864-023-09391-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 05/19/2023] [Indexed: 06/08/2023] Open
Abstract
Our incomplete knowledge of the human transcriptome impairs the detection of disease-causing variants, in particular if they affect transcripts only expressed under certain conditions. These transcripts are often lacking from reference transcript sets, such as Ensembl/GENCODE and RefSeq, and could be relevant for establishing genetic diagnoses. We present SUsPECT (Solving Unsolved Patient Exomes/gEnomes using Custom Transcriptomes), a pipeline based on the Ensembl Variant Effect Predictor (VEP) to predict variant impact on custom transcript sets, such as those generated by long-read RNA-sequencing, for downstream prioritization. Our pipeline predicts the functional consequence and likely deleteriousness scores for missense variants in the context of novel open reading frames predicted from any transcriptome. We demonstrate the utility of SUsPECT by uncovering potential mutational mechanisms of pathogenic variants in ClinVar that are not predicted to be pathogenic using the reference transcript annotation. In further support of SUsPECT's utility, we identified an enrichment of immune-related variants predicted to have a more severe molecular consequence when annotating with a newly generated transcriptome from stimulated immune cells instead of the reference transcriptome. Our pipeline outputs crucial information for further prioritization of potentially disease-causing variants for any disease and will become increasingly useful as more long-read RNA sequencing datasets become available.
Collapse
Affiliation(s)
- Renee Salz
- Department of Medical BioSciences, Radboud University Medical Center, Nijmegen, 6525 GA, the Netherlands
| | - Nuno Saraiva-Agostinho
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Emil Vorsteveld
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, 6525 GA, the Netherlands
| | - Caspar I van der Made
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, 6525 GA, the Netherlands
- Department of Internal Medicine, Radboud Institute for Molecular Life Sciences, and Radboud Expertise Center for Immunodeficiency and Autoinflammation, Radboud University Medical Center for Infectious Diseases (RCI), Radboud University Medical Center, Nijmegen, the Netherlands
| | - Simone Kersten
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, 6525 GA, the Netherlands
| | - Merel Stemerdink
- Department of Otorhinolaryngology, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
| | - Jamie Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Pieter-Jan Volders
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- Laboratory of Molecular Diagnostics, Department of Clinical Biology, Jessa Hospital, Hasselt, 3500, Belgium
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Alexander Hoischen
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, 6525 GA, the Netherlands
- Department of Internal Medicine, Radboud Institute for Molecular Life Sciences, and Radboud Expertise Center for Immunodeficiency and Autoinflammation, Radboud University Medical Center for Infectious Diseases (RCI), Radboud University Medical Center, Nijmegen, the Netherlands
| | - Peter A C 't Hoen
- Department of Medical BioSciences, Radboud University Medical Center, Nijmegen, 6525 GA, the Netherlands.
| |
Collapse
|
6
|
Dunham AS, Beltrao P, AlQuraishi M. High-throughput deep learning variant effect prediction with Sequence UNET. Genome Biol 2023; 24:110. [PMID: 37161576 PMCID: PMC10169183 DOI: 10.1186/s13059-023-02948-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 04/20/2023] [Indexed: 05/11/2023] Open
Abstract
Understanding coding mutations is important for many applications in biology and medicine but the vast mutation space makes comprehensive experimental characterisation impossible. Current predictors are often computationally intensive and difficult to scale, including recent deep learning models. We introduce Sequence UNET, a highly scalable deep learning architecture that classifies and predicts variant frequency from sequence alone using multi-scale representations from a fully convolutional compression/expansion architecture. It achieves comparable pathogenicity prediction to recent methods. We demonstrate scalability by analysing 8.3B variants in 904,134 proteins detected through large-scale proteomics. Sequence UNET runs on modest hardware with a simple Python package.
Collapse
Affiliation(s)
- Alistair S Dunham
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1RQ, UK.
| | - Pedro Beltrao
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093, Zurich, Switzerland
| | | |
Collapse
|
7
|
Schröter J, Dattner T, Hüllein J, Jayme A, Heuveline V, Hoffmann GF, Kölker S, Lenz D, Opladen T, Popp B, Schaaf CP, Staufner C, Syrbe S, Uhrig S, Hübschmann D, Brennenstuhl H. aRgus: Multilevel visualization of non-synonymous single nucleotide variants & advanced pathogenicity score modeling for genetic vulnerability assessment. Comput Struct Biotechnol J 2023; 21:1077-83. [PMID: 36789265 DOI: 10.1016/j.csbj.2023.01.027] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 01/18/2023] [Accepted: 01/18/2023] [Indexed: 01/26/2023] Open
Abstract
The widespread use of high-throughput sequencing techniques is leading to a rapidly increasing number of disease-associated variants of unknown significance and candidate genes. Integration of knowledge concerning their genetic, protein as well as functional and conservational aspects is necessary for an exhaustive assessment of their relevance and for prioritization of further clinical and functional studies investigating their role in human disease. To collect the necessary information, a multitude of different databases has to be accessed and data extraction from the original sources commonly is not user-friendly and requires advanced bioinformatics skills. This leads to a decreased data accessibility for a relevant number of potential users such as clinicians, geneticist, and clinical researchers. Here, we present aRgus (https://argus.urz.uni-heidelberg.de/), a standalone webtool for simple extraction and intuitive visualization of multi-layered gene, protein, variant, and variant effect prediction data. aRgus provides interactive exploitation of these data within seconds for any known gene of the human genome. In contrast to existing online platforms for compilation of variant data, aRgus complements visualization of chromosomal exon-intron structure and protein domain annotation with ClinVar and gnomAD variant distributions as well as position-specific variant effect prediction score modeling. aRgus thereby enables timely assessment of protein regions vulnerable to variation with single amino acid resolution and provides numerous applications in variant and protein domain interpretation as well as in the design of in vitro experiments.
Collapse
|
8
|
Faure AJ, Schmiedel JM, Baeza-Centurion P, Lehner B. DiMSum: an error model and pipeline for analyzing deep mutational scanning data and diagnosing common experimental pathologies. Genome Biol 2020; 21:207. [PMID: 32799905 DOI: 10.1186/s13059-020-02091-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 07/05/2020] [Indexed: 12/30/2022] Open
Abstract
Deep mutational scanning (DMS) enables multiplexed measurement of the effects of thousands of variants of proteins, RNAs, and regulatory elements. Here, we present a customizable pipeline, DiMSum, that represents an end-to-end solution for obtaining variant fitness and error estimates from raw sequencing data. A key innovation of DiMSum is the use of an interpretable error model that captures the main sources of variability arising in DMS workflows, outperforming previous methods. DiMSum is available as an R/Bioconda package and provides summary reports to help researchers diagnose common DMS pathologies and take remedial steps in their analyses.
Collapse
|
9
|
Abstract
BACKGROUND Deep mutational scanning (DMS) studies exploit the mutational landscape of sequence variation by systematically and comprehensively assaying the effect of single amino acid variants (SAVs; also referred to as missense mutations, or non-synonymous Single Nucleotide Variants - missense SNVs or nsSNVs) for particular proteins. We assembled SAV annotations from 22 different DMS experiments and normalized the effect scores to evaluate variant effect prediction methods. Three trained on traditional variant effect data (PolyPhen-2, SIFT, SNAP2), a regression method optimized on DMS data (Envision), and a naïve prediction using conservation information from homologs. RESULTS On a set of 32,981 SAVs, all methods captured some aspects of the experimental effect scores, albeit not the same. Traditional methods such as SNAP2 correlated slightly more with measurements and better classified binary states (effect or neutral). Envision appeared to better estimate the precise degree of effect. Most surprising was that the simple naïve conservation approach using PSI-BLAST in many cases outperformed other methods. All methods captured beneficial effects (gain-of-function) significantly worse than deleterious (loss-of-function). For the few proteins with multiple independent experimental measurements, experiments differed substantially, but agreed more with each other than with predictions. CONCLUSIONS DMS provides a new powerful experimental means of understanding the dynamics of the protein sequence space. As always, promising new beginnings have to overcome challenges. While our results demonstrated that DMS will be crucial to improve variant effect prediction methods, data diversity hindered simplification and generalization.
Collapse
Affiliation(s)
- Jonas Reeb
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr 3, 85748, Garching/Munich, Germany.
| | - Theresa Wirth
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr 3, 85748, Garching/Munich, Germany
| | - Burkhard Rost
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr 3, 85748, Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr 2a, 85748, Garching/Munich, Germany
- TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany
- Department of Biochemistry and Molecular Biophysics, Columbia University, 701 West, 168th Street, New York, NY, 10032, USA
| |
Collapse
|
10
|
Mahmood K, Jung CH, Philip G, Georgeson P, Chung J, Pope BJ, Park DJ. Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics. Hum Genomics 2017; 11:10. [PMID: 28511696 PMCID: PMC5433009 DOI: 10.1186/s40246-017-0104-8] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2017] [Accepted: 05/04/2017] [Indexed: 11/16/2022] Open
Abstract
Background Genetic variant effect prediction algorithms are used extensively in clinical genomics and research to determine the likely consequences of amino acid substitutions on protein function. It is vital that we better understand their accuracies and limitations because published performance metrics are confounded by serious problems of circularity and error propagation. Here, we derive three independent, functionally determined human mutation datasets, UniFun, BRCA1-DMS and TP53-TA, and employ them, alongside previously described datasets, to assess the pre-eminent variant effect prediction tools. Results Apparent accuracies of variant effect prediction tools were influenced significantly by the benchmarking dataset. Benchmarking with the assay-determined datasets UniFun and BRCA1-DMS yielded areas under the receiver operating characteristic curves in the modest ranges of 0.52 to 0.63 and 0.54 to 0.75, respectively, considerably lower than observed for other, potentially more conflicted datasets. Conclusions These results raise concerns about how such algorithms should be employed, particularly in a clinical setting. Contemporary variant effect prediction tools are unlikely to be as accurate at the general prediction of functional impacts on proteins as reported prior. Use of functional assay-based datasets that avoid prior dependencies promises to be valuable for the ongoing development and accurate benchmarking of such tools. Electronic supplementary material The online version of this article (doi:10.1186/s40246-017-0104-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Khalid Mahmood
- Melbourne Bioinformatics, The University of Melbourne, Melbourne, Australia
| | - Chol-Hee Jung
- Melbourne Bioinformatics, The University of Melbourne, Melbourne, Australia
| | - Gayle Philip
- Melbourne Bioinformatics, The University of Melbourne, Melbourne, Australia
| | - Peter Georgeson
- Melbourne Bioinformatics, The University of Melbourne, Melbourne, Australia
| | - Jessica Chung
- Melbourne Bioinformatics, The University of Melbourne, Melbourne, Australia
| | - Bernard J Pope
- Melbourne Bioinformatics, The University of Melbourne, Melbourne, Australia
| | - Daniel J Park
- Melbourne Bioinformatics, The University of Melbourne, Melbourne, Australia.
| |
Collapse
|
11
|
Abstract
Background Accurate methods capable of predicting the impact of single nucleotide variants (SNVs) are assuming ever increasing importance. There exists a plethora of in silico algorithms designed to help identify and prioritize SNVs across the human genome for further investigation. However, no tool exists to visualize the predicted tolerance of the genome to mutation, or the similarities between these methods. Results We present the Genome Tolerance Browser (GTB, http://gtb.biocompute.org.uk): an online genome browser for visualizing the predicted tolerance of the genome to mutation. The server summarizes several in silico prediction algorithms and conservation scores: including 13 genome-wide prediction algorithms and conservation scores, 12 non-synonymous prediction algorithms and four cancer-specific algorithms. Conclusion The GTB enables users to visualize the similarities and differences between several prediction algorithms and to upload their own data as additional tracks; thereby facilitating the rapid identification of potential regions of interest. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1436-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hashem A Shihab
- MRC Integrative Epidemiology Unit (IEU), University of Bristol, Bristol, BS8 2BN, UK
| | - Mark F Rogers
- Intelligent Systems Laboratory, University of Bristol, Bristol, BS8 1UB, UK
| | - Michael Ferlaino
- Intelligent Systems Laboratory, University of Bristol, Bristol, BS8 1UB, UK
| | - Colin Campbell
- Intelligent Systems Laboratory, University of Bristol, Bristol, BS8 1UB, UK
| | - Tom R Gaunt
- MRC Integrative Epidemiology Unit (IEU), University of Bristol, Bristol, BS8 2BN, UK.
| |
Collapse
|