1
|
Chen X, Yu X. Toward a universal approach for predicting variant pathogenicity in diverse disease landscapes. J Genet Genomics 2024:S1673-8527(24)00193-0. [PMID: 39043334 DOI: 10.1016/j.jgg.2024.07.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Revised: 07/02/2024] [Accepted: 07/14/2024] [Indexed: 07/25/2024]
Affiliation(s)
- Xiang Chen
- Liangzhu Laboratory of Zhejiang University, Hangzhou, Zhejiang 310058, China; Department of Rheumatology, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310058, China
| | - Xiaomin Yu
- Liangzhu Laboratory of Zhejiang University, Hangzhou, Zhejiang 310058, China; Department of Rheumatology, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310058, China.
| |
Collapse
|
2
|
Moldenhauer HJ, Tammen K, Meredith AL. Structural mapping of patient-associated KCNMA1 gene variants. Biophys J 2024; 123:1984-2000. [PMID: 38042986 PMCID: PMC11309989 DOI: 10.1016/j.bpj.2023.11.3404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 11/30/2023] [Accepted: 11/30/2023] [Indexed: 12/04/2023] Open
Abstract
KCNMA1-linked channelopathy is a neurological disorder characterized by seizures, motor abnormalities, and neurodevelopmental disabilities. The disease mechanisms are predicted to result from alterations in KCNMA1-encoded BK K+ channel activity; however, only a subset of the patient-associated variants have been functionally studied. The localization of these variants within the tertiary structure or evaluation by pathogenicity algorithms has not been systematically assessed. In this study, 82 nonsynonymous patient-associated KCNMA1 variants were mapped within the BK channel protein. Fifty-three variants localized within cryoelectron microscopy-resolved structures, including 21 classified as either gain of function (GOF) or loss of function (LOF) in BK channel activity. Clusters of LOF variants were identified in the pore, the AC region (RCK1), and near the Ca2+ bowl (RCK2), overlapping with sites of pharmacological or endogenous modulation. However, no clustering was found for GOF variants. To further understand variants of uncertain significance (VUSs), assessments by multiple standard pathogenicity algorithms were compared, and new thresholds for sensitivity and specificity were established from confirmed GOF and LOF variants. An ensemble algorithm was constructed (KCNMA1 meta score (KMS)), consisting of a weighted summation of this trained dataset combined with a structural component derived from the Ca2+-bound and unbound BK channels. KMS assessment differed from the highest-performing individual algorithm (REVEL) at 10 VUS residues, and a subset were studied further by electrophysiology in HEK293 cells. M578T, E656A, and D965V (KMS+;REVEL-) were confirmed to alter BK channel properties in voltage-clamp recordings, and D800Y (KMS-;REVEL+) was assessed as benign under the test conditions. However, KMS failed to accurately assess K457E. These combined results reveal the distribution of potentially disease-causing KCNMA1 variants within BK channel functional domains and pathogenicity evaluation for VUSs, suggesting strategies for improving channel-level predictions in future studies by building on ensemble algorithms such as KMS.
Collapse
Affiliation(s)
- Hans J Moldenhauer
- Department of Physiology, University of Maryland School of Medicine, Baltimore, Maryland
| | - Kelly Tammen
- Department of Physiology, University of Maryland School of Medicine, Baltimore, Maryland
| | - Andrea L Meredith
- Department of Physiology, University of Maryland School of Medicine, Baltimore, Maryland.
| |
Collapse
|
3
|
Huang S, Wu Z, Wang T, Yu R, Song Z, Wang H. MmisAT and MmisP: an efficient and accurate suite of variant analysis toolkit for primary mitochondrial diseases. Hum Genomics 2023; 17:108. [PMID: 38012712 PMCID: PMC10683248 DOI: 10.1186/s40246-023-00557-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Accepted: 11/22/2023] [Indexed: 11/29/2023] Open
Abstract
Recent advances in next-generation sequencing (NGS) technology have greatly accelerated the need for efficient annotation to accurately interpret clinically relevant genetic variants in human diseases. Therefore, it is crucial to develop appropriate analytical tools to improve the interpretation of disease variants. Given the unique genetic characteristics of mitochondria, including haplogroup, heteroplasmy, and maternal inheritance, we developed a suite of variant analysis toolkits specifically designed for primary mitochondrial diseases: the Mitochondrial Missense Variant Annotation Tool (MmisAT) and the Mitochondrial Missense Variant Pathogenicity Predictor (MmisP). MmisAT can handle protein-coding variants from both nuclear DNA and mtDNA and generate 349 annotation types across six categories. It processes 4.78 million variant data in 76 min, making it a valuable resource for clinical and research applications. Additionally, MmisP provides pathogenicity scores to predict the pathogenicity of genetic variations in mitochondrial disease. It has been validated using cross-validation and external datasets and demonstrated higher overall discriminant accuracy with a receiver operating characteristic (ROC) curve area under the curve (AUC) of 0.94, outperforming existing pathogenicity predictors. In conclusion, the MmisAT is an efficient tool that greatly facilitates the process of variant annotation, expanding the scope of variant annotation information. Furthermore, the development of MmisP provides valuable insights into the creation of disease-specific, phenotype-specific, and even gene-specific predictors of pathogenicity, further advancing our understanding of specific fields.
Collapse
Affiliation(s)
- Shuangshuang Huang
- Department of Clinical Laboratory, Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Hangzhou, China
| | - Zhaoyu Wu
- Department of Clinical Laboratory, The Affiliated Hospital of Guangdong Medical University, Zhanjiang, China
| | - Tong Wang
- Department of Clinical Laboratory, Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Hangzhou, China
| | - Rui Yu
- Department of Ophthalmology, Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Hangzhou, China
| | - Zhijian Song
- OrigiMed, 5th Floor, Building 3, No.115 Xin Jun Huan Road, Minhang District, Shanghai, China.
| | - Hao Wang
- Department of Clinical Laboratory, Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Hangzhou, China.
| |
Collapse
|
4
|
Moldenhauer HJ, Tammen K, Meredith AL. Structural mapping of patient-associated KCNMA1 gene variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.27.550850. [PMID: 37546746 PMCID: PMC10402178 DOI: 10.1101/2023.07.27.550850] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
KCNMA1-linked channelopathy is a neurological disorder characterized by seizures, motor abnormalities, and neurodevelopmental disabilities. The disease mechanisms are predicted to result from alterations in KCNMA1-encoded BK K+ channel activity; however, only a subset of the patient-associated variants have been functionally studied. The localization of these variants within the tertiary structure or evaluation by pathogenicity algorithms has not been systematically assessed. In this study, 82 nonsynonymous patient-associated KCNMA1 variants were mapped within the BK channel protein. Fifty-three variants localized within cryo-EM resolved structures, including 21 classified as either gain-of-function (GOF) or loss-of-function (LOF) in BK channel activity. Clusters of LOF variants were identified in the pore, the AC region (RCK1), and near the Ca 2+ bowl (RCK2), overlapping with sites of pharmacological or endogenous modulation. However, no clustering was found for GOF variants. To further understand variants of uncertain significance (VUS), assessments by multiple standard pathogenicity algorithms were compared, and new thresholds for sensitivity and specificity were established from confirmed GOF and LOF variants. An ensemble algorithm was constructed (KCNMA1 Meta Score), consisting of a weighted summation of this trained dataset combined with a structural component derived from the Ca 2+ bound and unbound BK channels. KMS assessment differed from the highest performing individual algorithm (REVEL) at 10 VUS residues, and a subset were studied further by electrophysiology in HEK293 cells. M578T, E656A, and D965V (KMS+;REVEL-) were confirmed to alter BK channel properties in voltage-clamp recordings, and D800Y (KMS-;REVEL+) was assessed as benign under the test conditions. However, KMS failed to accurately assess K457E. These combined results reveal the distribution of potentially disease-causing KCNMA1 variants within BK channel functional domains and pathogenicity evaluation for VUS, suggesting strategies for improving channel-level predictions in future studies by building on ensemble algorithms such as KMS.
Collapse
|
5
|
Kumaran M, Devarajan B. eyeVarP: A computational framework for the identification of pathogenic variants specific to eye disease. Genet Med 2023; 25:100862. [PMID: 37092535 DOI: 10.1016/j.gim.2023.100862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 04/11/2023] [Accepted: 04/13/2023] [Indexed: 04/25/2023] Open
Abstract
PURPOSE Disease-specific pathogenic variant prediction tools that differentiate pathogenic variants from benign have been improved through disease specificity recently. However, they have not been evaluated on disease-specific pathogenic variants compared with other diseases, which would help to prioritize disease-specific variants from several genes or novel genes. Thus, we hypothesize that features of pathogenic variants alone would provide a better model. METHODS We developed an eye disease-specific variant prioritization tool (eyeVarP), which applied the random forest algorithm to the data set of pathogenic variants of eye diseases and other diseases. We also developed the VarP tool and generalized pipeline to filter missense and insertion-deletion variants and predict their pathogenicity from exome or genome sequencing data, thus we provide a complete computational procedure. RESULTS eyeVarP outperformed pan disease-specific tools in identifying eye disease-specific pathogenic variants under the top 10. VarP outperformed 12 pathogenicity prediction tools with an accuracy of 95% in correctly identifying the pathogenicity of missense and insertion-deletion variants. The complete pipeline would help to develop disease-specific tools for other genetic disorders. CONCLUSION eyeVarP performs better in identifying eye disease-specific pathogenic variants using pathogenic variant features and gene features. Implementing such complete computational procedure would significantly improve the clinical variant interpretation for specific diseases.
Collapse
Affiliation(s)
- Manojkumar Kumaran
- Department of Bioinformatics, Aravind Medical Research Foundation, Madurai, Tamil Nadu, India; School of Chemical and Biotechnology, SASTRA (Deemed to be a university), Thanjavur, Tamil Nadu, India
| | - Bharanidharan Devarajan
- Department of Bioinformatics, Aravind Medical Research Foundation, Madurai, Tamil Nadu, India.
| |
Collapse
|
6
|
Kang M, Kim S, Lee DB, Hong C, Hwang KB. Gene-specific machine learning for pathogenicity prediction of rare BRCA1 and BRCA2 missense variants. Sci Rep 2023; 13:10478. [PMID: 37380723 DOI: 10.1038/s41598-023-37698-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 06/26/2023] [Indexed: 06/30/2023] Open
Abstract
Machine learning-based pathogenicity prediction helps interpret rare missense variants of BRCA1 and BRCA2, which are associated with hereditary cancers. Recent studies have shown that classifiers trained using variants of a specific gene or a set of genes related to a particular disease perform better than those trained using all variants, due to their higher specificity, despite the smaller training dataset size. In this study, we further investigated the advantages of "gene-specific" machine learning compared to "disease-specific" machine learning. We used 1068 rare (gnomAD minor allele frequency (MAF) < 0.005) missense variants of 28 genes associated with hereditary cancers for our investigation. Popular machine learning classifiers were employed: regularized logistic regression, extreme gradient boosting, random forests, support vector machines, and deep neural networks. As features, we used MAFs from multiple populations, functional prediction and conservation scores, and positions of variants. The disease-specific training dataset included the gene-specific training dataset and was > 7 × larger. However, we observed that gene-specific training variants were sufficient to produce the optimal pathogenicity predictor if a suitable machine learning classifier was employed. Therefore, we recommend gene-specific over disease-specific machine learning as an efficient and effective method for predicting the pathogenicity of rare BRCA1 and BRCA2 missense variants.
Collapse
Affiliation(s)
- Moonjong Kang
- Research Center, Software Division, NGeneBio, Seoul, 08390, Korea
| | - Seonhwa Kim
- Research Center, Software Division, NGeneBio, Seoul, 08390, Korea
| | - Da-Bin Lee
- Department of Computer Science and Engineering, Graduate School, Soongsil University, Seoul, 06978, Korea
| | - Changbum Hong
- Research Center, Software Division, NGeneBio, Seoul, 08390, Korea.
| | - Kyu-Baek Hwang
- Department of Computer Science and Engineering, Graduate School, Soongsil University, Seoul, 06978, Korea.
| |
Collapse
|
7
|
Hasenahuer MA, Sanchis-Juan A, Laskowski RA, Baker JA, Stephenson JD, Orengo CA, Raymond FL, Thornton JM. Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins. J Mol Biol 2023; 435:167892. [PMID: 36410474 PMCID: PMC9875310 DOI: 10.1016/j.jmb.2022.167892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Revised: 11/08/2022] [Accepted: 11/14/2022] [Indexed: 11/23/2022]
Abstract
Constrained Coding Regions (CCRs) in the human genome have been derived from DNA sequencing data of large cohorts of healthy control populations, available in the Genome Aggregation Database (gnomAD) [1]. They identify regions depleted of protein-changing variants and thus identify segments of the genome that have been constrained during human evolution. By mapping these DNA-defined regions from genomic coordinates onto the corresponding protein positions and combining this information with protein annotations, we have explored the distribution of CCRs and compared their co-occurrence with different protein functional features, previously annotated at the amino acid level in public databases. As expected, our results reveal that functional amino acids involved in interactions with DNA/RNA, protein-protein contacts and catalytic sites are the protein features most likely to be highly constrained for variation in the control population. More surprisingly, we also found that linear motifs, linear interacting peptides (LIPs), disorder-order transitions upon binding with other protein partners and liquid-liquid phase separating (LLPS) regions are also strongly associated with high constraint for variability. We also compared intra-species constraints in the human CCRs with inter-species conservation and functional residues to explore how such CCRs may contribute to the analysis of protein variants. As has been previously observed, CCRs are only weakly correlated with conservation, suggesting that intraspecies constraints complement interspecies conservation and can provide more information to interpret variant effects.
Collapse
Affiliation(s)
- Marcia A. Hasenahuer
- European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK,Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK,Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK,Corresponding author at: European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK. @MarHasenahuer
| | - Alba Sanchis-Juan
- Department of Haematology, NHS Blood and Transplant Centre, University of Cambridge, Cambridge CB2 0XY, UK,NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK
| | - Roman A. Laskowski
- European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - James A. Baker
- European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - James D. Stephenson
- European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Christine A. Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - F. Lucy Raymond
- Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK,NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK
| | - Janet M. Thornton
- European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
8
|
Zhong G, Shen Y. Statistical models of the genetic etiology of congenital heart disease. Curr Opin Genet Dev 2022; 76:101967. [PMID: 35939966 PMCID: PMC10586490 DOI: 10.1016/j.gde.2022.101967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 06/29/2022] [Accepted: 07/08/2022] [Indexed: 11/03/2022]
Abstract
Congenital heart disease (CHD) is a collection of anatomically and clinically heterogeneous structure anomalies of heart at birth. Finding genetic causes of CHD can not only shed light on developmental biology of heart, but also provide basis for improving clinical care and interventions. The optimal study design and analytical approaches to identify genetic causes depend on the underlying genetic architecture. A few well-known syndromes with CHD as core conditions, such as Noonan and CHARGE, have known monogenic causes. The genetic causes of most of CHD patients, however, are unknown and likely to be complex. In this review, we highlight recent studies that assume a complex genetic architecture of CHD with two main approaches. One is genomic sequencing studies aiming for identifying rare or de novo risk variants with large genetic effect. The other is genome-wide association studies optimized for common variants with moderate genetic effect.
Collapse
Affiliation(s)
- Guojie Zhong
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA; Integrated Program in Cellular, Molecular, and Biological Studies, Columbia University Irving Medical Center, New York, NY, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA; Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA; JP Sulzberger Columbia Genome Center, Columbia University Irving Medical Center, New York, NY, USA.
| |
Collapse
|
9
|
Li B, Jin B, Capra JA, Bush WS. Integration of Protein Structure and Population-Scale DNA Sequence Data for Disease Gene Discovery and Variant Interpretation. Annu Rev Biomed Data Sci 2022; 5:141-161. [PMID: 35508071 DOI: 10.1146/annurev-biodatasci-122220-112147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The experimental and computational techniques for capturing information about protein structures and genetic variation within the human genome have advanced dramatically in the past 20 years, generating extensive new data resources. In this review, we discuss these advances, along with new approaches for determining the impact a genetic variant has on protein function. We focus on the potential of new methods that integrate human genetic variation into protein structures to discover relationships to disease, including the discovery of mutational hotspots in cancer-related proteins, the localization of protein-altering variants within protein regions for common complex diseases, and the assessment of variants of unknown significance for Mendelian traits. We expect that approaches that integrate these data sources will play increasingly important roles in disease gene discovery and variant interpretation. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Bian Li
- Department of Biological Sciences and Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, USA
| | - Bowen Jin
- Graduate Program in Systems Biology and Bioinformatics, Department of Nutrition, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | - John A Capra
- Bakar Computational Health Sciences Institute and Department of Epidemiology and Biostatistics, University of California, San Francisco, California, USA;
| | - William S Bush
- Cleveland Institute for Computational Biology, Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio, USA;
| |
Collapse
|
10
|
Wilcox EH, Sarmady M, Wulf B, Wright MW, Rehm HL, Biesecker LG, Abou Tayoun AN. Evaluating the impact of in silico predictors on clinical variant classification. Genet Med 2022; 24:924-930. [PMID: 34955381 PMCID: PMC9164215 DOI: 10.1016/j.gim.2021.11.018] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 11/18/2021] [Accepted: 11/19/2021] [Indexed: 12/29/2022] Open
Abstract
PURPOSE According to the American College of Medical Genetics and Genomics/Association of Medical Pathology (ACMG/AMP) guidelines, in silico evidence is applied at the supporting strength level for pathogenic (PP3) and benign (BP4) evidence. Although PP3 is commonly used, less is known about the effect of these criteria on variant classification outcomes. METHODS A total of 727 missense variants curated by Clinical Genome Resource expert groups were analyzed to determine how often PP3 and BP4 were applied and their impact on variant classification. The ACMG/AMP categorical system of variant classification was compared with a quantitative point-based system. The pathogenicity likelihood ratios of REVEL, VEST, FATHMM, and MPC were calibrated using a gold standard set of 237 pathogenic and benign variants (classified independent of the PP3/BP4 criteria). RESULTS The PP3 and BP4 criteria were applied by Variant Curation Expert Panels to 55% of missense variants. Application of those criteria changed the classification of 15% of missense variants for which either criterion was applied. The point-based system resolved borderline classifications. REVEL and VEST performed best at a strength level consistent with moderate evidence. CONCLUSION We show that in silico criteria are commonly applied and often affect the final variant classifications. When appropriate thresholds for in silico predictors are established, our results show that PP3 and BP4 can be used at a moderate strength.
Collapse
Affiliation(s)
- Emma H Wilcox
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | | | - Bryan Wulf
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA
| | - Matt W Wright
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
| | - Leslie G Biesecker
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Ahmad N Abou Tayoun
- Al Jalila Genomics Center, Al Jalila Children's Specialty Hospital, Dubai, United Arab Emirates; Center for Genomic Discovery, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, United Arab Emirates.
| |
Collapse
|
11
|
DVPred: a disease-specific prediction tool for variant pathogenicity classification for hearing loss. Hum Genet 2022; 141:401-411. [PMID: 35182233 DOI: 10.1007/s00439-022-02440-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 02/06/2022] [Indexed: 02/08/2023]
Abstract
Numerous computational prediction tools have been introduced to estimate the functional impact of variants in the human genome based on evolutionary constraints and biochemical metrics. However, their implementation in diagnostic settings to classify variants faced challenges with accuracy and validity. Most existing tools are pan-genome and pan-diseases, which neglected gene- and disease-specific properties and limited the accessibility of curated data. As a proof-of-concept, we developed a disease-specific prediction tool named Deafness Variant deleteriousness Prediction tool (DVPred) that focused on the 157 genes reportedly causing genetic hearing loss (HL). DVPred applied the gradient boosting decision tree (GBDT) algorithm to the dataset consisting of expert-curated pathogenic and benign variants from a large in-house HL patient cohort and public databases. With the incorporation of variant-level and gene-level features, DVPred outperformed the existing universal tools. It boasts an area under the curve (AUC) of 0.98, and showed consistent performance (AUC = 0.985) in an independent assessment dataset. We further demonstrated that multiple gene-level metrics, including low complexity genomic regions and substitution intolerance scores, were the top features of the model. A comprehensive analysis of missense variants showed a gene-specific ratio of predicted deleterious and neutral variants, implying varied tolerance or intolerance to variation in different genes. DVPred explored the utility of disease-specific strategy in improving the deafness variant prediction tool. It can improve the prioritization of pathogenic variants among massive variants identified by high-throughput sequencing on HL genes. It also shed light on the development of variant prediction tools for other genetic disorders.
Collapse
|
12
|
Ruscheinski A, Reimler AL, Ewald R, Uhrmacher AM. VPMBench: a test bench for variant prioritization methods. BMC Bioinformatics 2021; 22:543. [PMID: 34749640 PMCID: PMC8576923 DOI: 10.1186/s12859-021-04458-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 10/23/2021] [Indexed: 11/18/2022] Open
Abstract
Background Clinical diagnostics of whole-exome and whole-genome sequencing data requires geneticists to consider thousands of genetic variants for each patient. Various variant prioritization methods have been developed over the last years to aid clinicians in identifying variants that are likely disease-causing. Each time a new method is developed, its effectiveness must be evaluated and compared to other approaches based on the most recently available evaluation data. Doing so in an unbiased, systematic, and replicable manner requires significant effort. Results The open-source test bench “VPMBench” automates the evaluation of variant prioritization methods. VPMBench introduces a standardized interface for prioritization methods and provides a plugin system that makes it easy to evaluate new methods. It supports different input data formats and custom output data preparation. VPMBench exploits declaratively specified information about the methods, e.g., the variants supported by the methods. Plugins may also be provided in a technology-agnostic manner via containerization. Conclusions VPMBench significantly simplifies the evaluation of both custom and published variant prioritization methods. As we expect variant prioritization methods to become ever more critical with the advent of whole-genome sequencing in clinical diagnostics, such tool support is crucial to facilitate methodological research.
Collapse
Affiliation(s)
- Andreas Ruscheinski
- Modeling and Simulation Group, Institute for Visual and Analytic Computing, University of Rostock, Albert-Einstein-Straße 22, 18051, Rostock, Germany.
| | - Anna Lena Reimler
- Modeling and Simulation Group, Institute for Visual and Analytic Computing, University of Rostock, Albert-Einstein-Straße 22, 18051, Rostock, Germany
| | - Roland Ewald
- Limbus Medical Technologies GmbH, Lindenstraße 2, 18055, Rostock, Germany
| | - Adelinde M Uhrmacher
- Modeling and Simulation Group, Institute for Visual and Analytic Computing, University of Rostock, Albert-Einstein-Straße 22, 18051, Rostock, Germany
| |
Collapse
|
13
|
Chen HC, Wang J, Liu Q, Shyr Y. A domain damage index to prioritizing the pathogenicity of missense variants. Hum Mutat 2021; 42:1503-1517. [PMID: 34350656 PMCID: PMC8511099 DOI: 10.1002/humu.24269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Revised: 07/08/2021] [Accepted: 07/30/2021] [Indexed: 11/09/2022]
Abstract
Prioritizing causal variants is one major challenge for the clinical application of sequencing data. Prompted by the observation that 74.3% of missense pathogenic variants locate in protein domains, we developed an approach named domain damage index (DDI). DDI identifies protein domains depleted of rare missense variations in the general population, which can be further used as a metric to prioritize variants. DDI is significantly correlated with phylogenetic conservation, variant-level metrics, and reported pathogenicity. DDI achieved great performance for distinguishing pathogenic variants from benign ones in three benchmark datasets. The combination of DDI with the other two best approaches improved the performance of each individual method considerably, suggesting DDI provides a powerful and complementary way of variant prioritization.
Collapse
Affiliation(s)
- Hua-Chang Chen
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Jing Wang
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Qi Liu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Yu Shyr
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| |
Collapse
|
14
|
Huang YF. Unified inference of missense variant effects and gene constraints in the human genome. PLoS Genet 2020; 16:e1008922. [PMID: 32667917 PMCID: PMC7384676 DOI: 10.1371/journal.pgen.1008922] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Revised: 07/27/2020] [Accepted: 06/09/2020] [Indexed: 01/25/2023] Open
Abstract
A challenge in medical genomics is to identify variants and genes associated with severe genetic disorders. Based on the premise that severe, early-onset disorders often result in a reduction of evolutionary fitness, several statistical methods have been developed to predict pathogenic variants or constrained genes based on the signatures of negative selection in human populations. However, we currently lack a statistical framework to jointly predict deleterious variants and constrained genes from both variant-level features and gene-level selective constraints. Here we present such a unified approach, UNEECON, based on deep learning and population genetics. UNEECON treats the contributions of variant-level features and gene-level constraints as a variant-level fixed effect and a gene-level random effect, respectively. The sum of the fixed and random effects is then combined with an evolutionary model to infer the strength of negative selection at both variant and gene levels. Compared with previously published methods, UNEECON shows improved performance in predicting missense variants and protein-coding genes associated with autosomal dominant disorders, and feature importance analysis suggests that both gene-level selective constraints and variant-level predictors are important for accurate variant prioritization. Furthermore, based on UNEECON, we observe a low correlation between gene-level intolerance to missense mutations and that to loss-of-function mutations, which can be partially explained by the prevalence of disordered protein regions that are highly tolerant to missense mutations. Finally, we show that genes intolerant to both missense and loss-of-function mutations play key roles in the central nervous system and the autism spectrum disorders. Overall, UNEECON is a promising framework for both variant and gene prioritization. Numerous statistical methods have been developed to predict deleterious missense variants or constrained genes in the human genome, but unified prioritization methods that utilize both variant- and gene-level information are underdeveloped. Here we present UNEECON, an evolution-based deep learning framework for unified variant and gene prioritization. By integrating variant-level predictors and gene-level selective constraints, UNEECON outperforms existing methods in predicting missense variants and protein-coding genes associated with dominant disorders. Based on UNEECON, we show that disordered proteins are tolerant to missense mutations but not to loss-of-function mutations. In addition, we find that genes under strong selective constraints at both missense and loss-of-function levels are strongly associated with the central nervous system and the autism spectrum disorders, highlighting the need to investigate the function of these highly constrained genes in future studies.
Collapse
Affiliation(s)
- Yi-Fei Huang
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania, United States of America
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
15
|
Pauly R, Schwartz CE. The Future of Clinical Diagnosis: Moving Functional Genomics Approaches to the Bedside. Clin Lab Med 2020; 40:221-230. [PMID: 32439070 DOI: 10.1016/j.cll.2020.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Rini Pauly
- Greenwood Genetic Center, JC Self Research Institute, 113 Gregor Mendel Circle, Greenwood, SC 29646, USA.
| | - Charles E Schwartz
- Greenwood Genetic Center, JC Self Research Institute, 113 Gregor Mendel Circle, Greenwood, SC 29646, USA
| |
Collapse
|