1
|
Andreoletti G, Pal LR, Moult J, Brenner SE. Reports from the fifth edition of CAGI: The Critical Assessment of Genome Interpretation. Hum Mutat 2019; 40:1197-1201. [PMID: 31334884 PMCID: PMC7329230 DOI: 10.1002/humu.23876] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 07/19/2019] [Indexed: 12/20/2022]
Abstract
Interpretation of genomic variation plays an essential role in the analysis of cancer and monogenic disease, and increasingly also in complex trait disease, with applications ranging from basic research to clinical decisions. Many computational impact prediction methods have been developed, yet the field lacks a clear consensus on their appropriate use and interpretation. The Critical Assessment of Genome Interpretation (CAGI, /'kā-jē/) is a community experiment to objectively assess computational methods for predicting the phenotypic impacts of genomic variation. CAGI participants are provided genetic variants and make blind predictions of resulting phenotype. Independent assessors evaluate the predictions by comparing with experimental and clinical data. CAGI has completed five editions with the goals of establishing the state of art in genome interpretation and of encouraging new methodological developments. This special issue (https://onlinelibrary.wiley.com/toc/10981004/2019/40/9) comprises reports from CAGI, focusing on the fifth edition that culminated in a conference that took place 5 to 7 July 2018. CAGI5 was comprised of 14 challenges and engaged hundreds of participants from a dozen countries. This edition had a notable increase in splicing and expression regulatory variant challenges, while also continuing challenges on clinical genomics, as well as complex disease datasets and missense variants in diseases ranging from cancer to Pompe disease to schizophrenia. Full information about CAGI is at https://genomeinterpretation.org.
Collapse
|
Congress |
6 |
42 |
2
|
Pal LR, Moult J. Genetic Basis of Common Human Disease: Insight into the Role of Missense SNPs from Genome-Wide Association Studies. J Mol Biol 2015; 427:2271-89. [PMID: 25937569 PMCID: PMC4893807 DOI: 10.1016/j.jmb.2015.04.014] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Revised: 03/28/2015] [Accepted: 04/27/2015] [Indexed: 01/03/2023]
Abstract
Recent genome-wide association studies (GWAS) have led to the reliable identification of single nucleotide polymorphisms (SNPs) at a number of loci associated with increased risk of specific common human diseases. Each such locus implicates multiple possible candidate SNPs for involvement in disease mechanism. A variety of mechanisms may link the presence of an SNP to altered in vivo gene product function and hence contribute to disease risk. Here, we report an analysis of the role of one of these mechanisms, missense SNPs (msSNPs) in proteins in seven complex trait diseases. Linkage disequilibrium information was used to identify possible candidate msSNPs associated with increased disease risk at each of 356 loci for the seven diseases. Two computational methods were used to estimate which of these SNPs has a significant impact on in vivo protein function. 69% of the loci have at least one candidate msSNP and 33% have at least one predicted high-impact msSNP. In some cases, these SNPs are in well-established disease-related proteins, such as MST1 (macrophage stimulating 1) for Crohn's disease. In others, they are in proteins identified by GWAS as likely candidates for disease relevance, but previously without known mechanism, such as ADAMTS13 (ADAM metallopeptidase with thrombospondin type 1 motif, 13) for coronary artery disease. In still other cases, the missense SNPs are in proteins not previously suggested as disease candidates, such as TUBB1 (tubulin, beta 1, class VI) for hypertension. Together, these data support a substantial role for this class of SNPs in susceptibility to common human disease.
Collapse
|
Research Support, N.I.H., Extramural |
10 |
39 |
3
|
Yu CH, Pal LR, Moult J. Consensus Genome-Wide Expression Quantitative Trait Loci and Their Relationship with Human Complex Trait Disease. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2017; 20:400-14. [PMID: 27428252 DOI: 10.1089/omi.2016.0063] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Most of the risk loci identified from genome-wide association (GWA) studies do not provide direct information on the biological basis of a disease or on the underlying mechanisms. Recent expression quantitative trait locus (eQTL) association studies have provided information on genetic factors associated with gene expression variation. These eQTLs might contribute to phenotype diversity and disease susceptibility, but interpretation is handicapped by low reproducibility of the expression results. To address this issue, we have generated a set of consensus eQTLs by integrating publicly available data for specific human populations and cell types. Overall, we find over 4000 genes that are involved in high-confidence eQTL relationships. To elucidate the role that eQTLs play in human common diseases, we matched the high-confidence eQTLs to a set of 335 disease risk loci identified from the Wellcome Trust Case Control Consortium GWA study and follow-up studies for 7 human complex trait diseases-bipolar disorder (BD), coronary artery disease (CAD), Crohn's disease (CD), hypertension (HT), rheumatoid arthritis (RA), type 1 diabetes (T1D), and type 2 diabetes (T2D). The results show that the data are consistent with ∼50% of these disease loci arising from an underlying expression change mechanism.
Collapse
|
Research Support, N.I.H., Extramural |
8 |
35 |
4
|
Daneshjou R, Wang Y, Bromberg Y, Bovo S, Martelli PL, Babbi G, Lena PD, Casadio R, Edwards M, Gifford D, Jones DT, Sundaram L, Bhat RR, Li X, Pal LR, Kundu K, Yin Y, Moult J, Jiang Y, Pejaver V, Pagel KA, Li B, Mooney SD, Radivojac P, Shah S, Carraro M, Gasparini A, Leonardi E, Giollo M, Ferrari C, Tosatto SCE, Bachar E, Azaria JR, Ofran Y, Unger R, Niroula A, Vihinen M, Chang B, Wang MH, Franke A, Petersen BS, Pirooznia M, Zandi P, McCombie R, Potash JB, Altman RB, Klein TE, Hoskins RA, Repo S, Brenner SE, Morgan AA. Working toward precision medicine: Predicting phenotypes from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges. Hum Mutat 2017. [PMID: 28634997 DOI: 10.1002/humu.23280] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Precision medicine aims to predict a patient's disease risk and best therapeutic options by using that individual's genetic sequencing data. The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotype-phenotype prediction challenges; participants build models, undergo assessment, and share key findings. For CAGI 4, three challenges involved using exome-sequencing data: Crohn's disease, bipolar disorder, and warfarin dosing. Previous CAGI challenges included prior versions of the Crohn's disease challenge. Here, we discuss the range of techniques used for phenotype prediction as well as the methods used for assessing predictive models. Additionally, we outline some of the difficulties associated with making predictions and evaluating them. The lessons learned from the exome challenges can be applied to both research and clinical efforts to improve phenotype prediction from genotype. In addition, these challenges serve as a vehicle for sharing clinical and research exome data in a secure manner with scientists who have a broad range of expertise, contributing to a collaborative effort to advance our understanding of genotype-phenotype relationships.
Collapse
|
Research Support, Non-U.S. Gov't |
8 |
34 |
5
|
Galkin A, Li Z, Li L, Kulakova L, Pal LR, Dunaway-Mariano D, Herzberg O. Structural insights into the substrate binding and stereoselectivity of giardia fructose-1,6-bisphosphate aldolase. Biochemistry 2009; 48:3186-96. [PMID: 19236002 DOI: 10.1021/bi9001166] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Giardia lamblia fructose-1,6-bisphosphate aldolase (FBPA) is a member of the class II zinc-dependent aldolase family that catalyzes the cleavage of d-fructose 1,6-bisphosphate (FBP) into dihydroxyacetone phosphate (DHAP) and d-glyceraldehyde 3-phosphate (G3P). In addition to the active site zinc, the catalytic apparatus of FBPA employs an aspartic acid, Asp83 in the G. lamblia enzyme, which when replaced with an alanine residue renders the enzyme inactive. A comparison of the crystal structures of D83A FBPA in complex with FBP and of wild-type FBPA in the unbound state revealed a substrate-induced conformational transition of loops in the vicinity of the active site and a shift in the location of Zn(2+). When FBP binds, the Zn(2+) shifts up to 4.6 A toward the catalytic Asp83, which brings the metal within coordination distance of the Asp83 carboxylate group. In addition, the structure of wild-type FBPA was determined in complex with the competitive inhibitor d-tagatose 1,6-bisphosphate (TBP), a FBP stereoisomer. In this structure, the zinc binds in a site close to that previously seen in the structure of FBPA in complex with phosphoglycolohydroxamate, an analogue of the postulated DHAP ene-diolate intermediate. Together, the ensemble of structures suggests that the zinc mobility is necessary to orient the Asp83 side chain and to polarize the substrate for proton transfer from the FBP C(4) hydroxyl group to the Asp83 carboxyl group. In the absence of FBP, the alternative zinc position is too remote for coordinating the Asp83. We propose a modification of the catalytic mechanism that incorporates the novel features observed in the FBPA-FBP structure. The mechanism invokes coordination and coplanarity of the Zn(2+) with the FBP's O-C(3)-C(4)-O group concomitant with coordination of the Asp83 carboxylic group. Catalysis is accompanied by movement of Zn(2+) to a site coplanar with the O-C(2)-C(3)-O group of the DHAP. glFBPA exhibits strict substrate specificity toward FBP and does not cleave TBP. The active sites of FBPAs contain an aspartate residue equivalent to Asp255 of glFBPA, whereas tagatose-1,6-bisphosphate aldolase contains an alanine in this position. We and others hypothesized that this aspartic acid is a likely determinant of FBP versus TBP specificity. Replacement of Asp255 with an alanine resulted in an enzyme that possesses double specificity, now cleaving TBP (albeit with low efficacy; k(cat)/K(m) = 80 M(-1) s(-1)) while maintaining activity toward FBP at a 50-fold lower catalytic efficacy compared with that of wild-type FBPA. The collection of structures and sequence analyses highlighted additional residues that may be involved in substrate discrimination.
Collapse
|
Research Support, N.I.H., Extramural |
16 |
27 |
6
|
Gorlatova N, Chao K, Pal LR, Araj RH, Galkin A, Turko I, Moult J, Herzberg O. Protein characterization of a candidate mechanism SNP for Crohn's disease: the macrophage stimulating protein R689C substitution. PLoS One 2011; 6:e27269. [PMID: 22087277 PMCID: PMC3210151 DOI: 10.1371/journal.pone.0027269] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2011] [Accepted: 10/13/2011] [Indexed: 12/14/2022] Open
Abstract
High throughput genome wide associations studies (GWAS) are now identifying a large number of genome loci related to risk of common human disease. Each such locus presents a challenge in identifying the relevant underlying mechanism. Here we report the experimental characterization of a proposed causal single nucleotide polymorphism (SNP) in a locus related to risk of Crohn's disease and ulcerative colitis. The SNP lies in the MST1 gene encoding Macrophage Stimulating Protein (MSP), and results in an R689C amino acid substitution within the β-chain of MSP (MSPβ). MSP binding to the RON receptor tyrosine kinase activates signaling pathways involved in the inflammatory response. We have purified wild-type and mutant MSPβ proteins and compared biochemical and biophysical properties that might impact the MSP/RON signaling pathway. Surface plasmon resonance (SPR) binding studies showed that MSPβ R689C affinity to RON is approximately 10-fold lower than that of the wild-type MSPβ and differential scanning fluorimetry (DSF) showed that the thermal stability of the mutant MSPβ was slightly lower than that of wild-type MSPβ, by 1.6 K. The substitution was found not to impair the specific Arg483-Val484 peptide bond cleavage by matriptase-1, required for MSP activation, and mass spectrometry of tryptic fragments of the mutated protein showed that the free thiol introduced by the R689C mutation did not form an aberrant disulfide bond. Together, the studies indicate that the missense SNP impairs MSP function by reducing its affinity to RON and perhaps through a secondary effect on in vivo concentration arising from reduced thermodynamic stability, resulting in down-regulation of the MSP/RON signaling pathway.
Collapse
|
Research Support, N.I.H., Extramural |
14 |
23 |
7
|
Cheng K, Martin‐Sancho L, Pal LR, Pu Y, Riva L, Yin X, Sinha S, Nair NU, Chanda SK, Ruppin E. Genome-scale metabolic modeling reveals SARS-CoV-2-induced metabolic changes and antiviral targets. Mol Syst Biol 2021; 17:e10260. [PMID: 34709707 PMCID: PMC8552660 DOI: 10.15252/msb.202110260] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Revised: 09/29/2021] [Accepted: 09/30/2021] [Indexed: 12/15/2022] Open
Abstract
Tremendous progress has been made to control the COVID-19 pandemic caused by the SARS-CoV-2 virus. However, effective therapeutic options are still rare. Drug repurposing and combination represent practical strategies to address this urgent unmet medical need. Viruses, including coronaviruses, are known to hijack host metabolism to facilitate viral proliferation, making targeting host metabolism a promising antiviral approach. Here, we describe an integrated analysis of 12 published in vitro and human patient gene expression datasets on SARS-CoV-2 infection using genome-scale metabolic modeling (GEM), revealing complicated host metabolism reprogramming during SARS-CoV-2 infection. We next applied the GEM-based metabolic transformation algorithm to predict anti-SARS-CoV-2 targets that counteract the virus-induced metabolic changes. We successfully validated these targets using published drug and genetic screen data and by performing an siRNA assay in Caco-2 cells. Further generating and analyzing RNA-sequencing data of remdesivir-treated Vero E6 cell samples, we predicted metabolic targets acting in combination with remdesivir, an approved anti-SARS-CoV-2 drug. Our study provides clinical data-supported candidate anti-SARS-CoV-2 targets for future evaluation, demonstrating host metabolism targeting as a promising antiviral strategy.
Collapse
|
Meta-Analysis |
4 |
23 |
8
|
Pal LR, Kundu K, Yin Y, Moult J. CAGI4 Crohn's exome challenge: Marker SNP versus exome variant models for assigning risk of Crohn disease. Hum Mutat 2017; 38:1225-1234. [PMID: 28512778 PMCID: PMC5576730 DOI: 10.1002/humu.23256] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Revised: 05/09/2017] [Accepted: 05/10/2017] [Indexed: 12/18/2022]
Abstract
Understanding the basis of complex trait disease is a fundamental problem in human genetics. The CAGI Crohn's Exome challenges are providing insight into the adequacy of current disease models by requiring participants to identify which of a set of individuals has been diagnosed with the disease, given exome data. For the CAGI4 round, we developed a method that used the genotypes from exome sequencing data only to impute the status of genome wide association studies marker SNPs. We then used the imputed genotypes as input to several machine learning methods that had been trained to predict disease status from marker SNP information. We achieved the best performance using Naïve Bayes and with a consensus machine learning method, obtaining an area under the curve of 0.72, larger than other methods used in CAGI4. We also developed a model that incorporated the contribution from rare missense variants in the exome data, but this performed less well. Future progress is expected to come from the use of whole genome data rather than exomes.
Collapse
|
Research Support, N.I.H., Extramural |
8 |
14 |
9
|
Pal LR, Yu CH, Mount SM, Moult J. Insights from GWAS: emerging landscape of mechanisms underlying complex trait disease. BMC Genomics 2015; 16 Suppl 8:S4. [PMID: 26110739 PMCID: PMC4480957 DOI: 10.1186/1471-2164-16-s8-s4] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND There are now over 2000 loci in the human genome where genome wide association studies (GWAS) have found one or more SNPs to be associated with altered risk of a complex trait disease. At each of these loci, there must be some molecular level mechanism relevant to the disease. What are these mechanisms and how do they contribute to disease? RESULTS Here we consider the roles of three primary mechanism classes: changes that directly alter protein function (missense SNPs), changes that alter transcript abundance as a consequence of variants close-by in sequence, and changes that affect splicing. Missense SNPs are divided into those predicted to have a high impact on in vivo protein function, and those with a low impact. Splicing is divided into SNPs with a direct impact on splice sites, and those with a predicted effect on auxiliary splicing signals. The analysis was based on associations found for seven complex trait diseases in the classic Wellcome Trust Case Control Consortium (WTCCC1) GWA study and subsequent studies and meta-analyses, collected from the GWAS catalog. Linkage disequilibrium information was used to identify possible candidate SNPs for involvement in disease mechanism in each of the 356 loci associated with these seven diseases. With the parameters used, we find that 76% of loci have at least of these mechanisms. Overall, except for the low incidence of direct impact on splice sites, the mechanisms are found at similar frequencies, with changes in transcript abundance the most common. But the distribution of mechanisms over diseases varies markedly, as does the fraction of loci with assigned mechanisms. Many of the implicated proteins have previously been suggested as relevant, but the specific mechanism assignments are new. In addition, a number of new disease relevant proteins are proposed. CONCLUSIONS The high fraction of GWAS loci with proposed mechanisms suggests that these classes of mechanism play a major role. Other mechanism types, such as variants affecting expression of genes remote in the DNA sequence, will contribute in other loci. Each of the identified putative mechanisms provides a hypothesis for further investigation.
Collapse
|
Research Support, N.I.H., Extramural |
10 |
13 |
10
|
Guda C, King BR, Pal LR, Guda P. A top-down approach to infer and compare domain-domain interactions across eight model organisms. PLoS One 2009; 4:e5096. [PMID: 19333396 PMCID: PMC2659750 DOI: 10.1371/journal.pone.0005096] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2008] [Accepted: 02/10/2009] [Indexed: 11/22/2022] Open
Abstract
Knowledge of specific domain-domain interactions (DDIs) is essential to understand the functional significance of protein interaction networks. Despite the availability of an enormous amount of data on protein-protein interactions (PPIs), very little is known about specific DDIs occurring in them. Here, we present a top-down approach to accurately infer functionally relevant DDIs from PPI data. We created a comprehensive, non-redundant dataset of 209,165 experimentally-derived PPIs by combining datasets from five major interaction databases. We introduced an integrated scoring system that uses a novel combination of a set of five orthogonal scoring features covering the probabilistic, evolutionary, evidence-based, spatial and functional properties of interacting domains, which can map the interacting propensity of two domains in many dimensions. This method outperforms similar existing methods both in the accuracy of prediction and in the coverage of domain interaction space. We predicted a set of 52,492 high-confidence DDIs to carry out cross-species comparison of DDI conservation in eight model species including human, mouse, Drosophila, C. elegans, yeast, Plasmodium, E. coli and Arabidopsis. Our results show that only 23% of these DDIs are conserved in at least two species and only 3.8% in at least 4 species, indicating a rather low conservation across species. Pair-wise analysis of DDI conservation revealed a ‘sliding conservation’ pattern between the evolutionarily neighboring species. Our methodology and the high-confidence DDI predictions generated in this study can help to better understand the functional significance of PPIs at the modular level, thus can significantly impact further experimental investigations in systems biology research.
Collapse
|
Research Support, Non-U.S. Gov't |
16 |
12 |
11
|
Pejaver V, Babbi G, Casadio R, Folkman L, Katsonis P, Kundu K, Lichtarge O, Martelli PL, Miller M, Moult J, Pal LR, Savojardo C, Yin Y, Zhou Y, Radivojac P, Bromberg Y. Assessment of methods for predicting the effects of PTEN and TPMT protein variants. Hum Mutat 2019; 40:1495-1506. [PMID: 31184403 PMCID: PMC6744362 DOI: 10.1002/humu.23838] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Revised: 05/27/2019] [Accepted: 06/06/2019] [Indexed: 01/16/2023]
Abstract
Thermodynamic stability is a fundamental property shared by all proteins. Changes in stability due to mutation are a widespread molecular mechanism in genetic diseases. Methods for the prediction of mutation-induced stability change have typically been developed and evaluated on incomplete and/or biased data sets. As part of the Critical Assessment of Genome Interpretation, we explored the utility of high-throughput variant stability profiling (VSP) assay data as an alternative for the assessment of computational methods and evaluated state-of-the-art predictors against over 7,000 nonsynonymous variants from two proteins. We found that predictions were modestly correlated with actual experimental values. Predictors fared better when evaluated as classifiers of extreme stability effects. While different methods emerging as top performers depending on the metric, it is nontrivial to draw conclusions on their adoption or improvement. Our analyses revealed that only 16% of all variants in VSP assays could be confidently defined as stability-affecting. Furthermore, it is unclear as to what extent VSP abundance scores were reasonable proxies for the stability-related quantities that participating methods were designed to predict. Overall, our observations underscore the need for clearly defined objectives when developing and using both computational and experimental methods in the context of measuring variant impact.
Collapse
|
Research Support, N.I.H., Extramural |
6 |
12 |
12
|
Carraro M, Monzon AM, Chiricosta L, Reggiani F, Aspromonte MC, Bellini M, Pagel K, Jiang Y, Radivojac P, Kundu K, Pal LR, Yin Y, Limongelli I, Andreoletti G, Moult J, Wilson SJ, Katsonis P, Lichtarge O, Chen J, Wang Y, Hu Z, Brenner SE, Ferrari C, Murgia A, Tosatto SC, Leonardi E. Assessment of patient clinical descriptions and pathogenic variants from gene panel sequences in the CAGI-5 intellectual disability challenge. Hum Mutat 2019; 40:1330-1345. [PMID: 31144778 PMCID: PMC7341177 DOI: 10.1002/humu.23823] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 05/07/2019] [Accepted: 05/27/2019] [Indexed: 12/15/2022]
Abstract
The Critical Assessment of Genome Interpretation-5 intellectual disability challenge asked to use computational methods to predict patient clinical phenotypes and the causal variant(s) based on an analysis of their gene panel sequence data. Sequence data for 74 genes associated with intellectual disability (ID) and/or autism spectrum disorders (ASD) from a cohort of 150 patients with a range of neurodevelopmental manifestations (i.e. ID, autism, epilepsy, microcephaly, macrocephaly, hypotonia, ataxia) have been made available for this challenge. For each patient, predictors had to report the causative variants and which of the seven phenotypes were present. Since neurodevelopmental disorders are characterized by strong comorbidity, tested individuals often present more than one pathological condition. Considering the overall clinical manifestation of each patient, the correct phenotype has been predicted by at least one group for 93 individuals (62%). ID and ASD were the best predicted among the seven phenotypic traits. Also, causative or potentially pathogenic variants were predicted correctly by at least one group. However, the prediction of the correct causative variant seems to be insufficient to predict the correct phenotype. In some cases, the correct prediction has been supported by rare or common variants in genes different from the causative one.
Collapse
|
Research Support, N.I.H., Extramural |
6 |
11 |
13
|
Carraro M, Minervini G, Giollo M, Bromberg Y, Capriotti E, Casadio R, Dunbrack R, Elefanti L, Fariselli P, Ferrari C, Gough J, Katsonis P, Leonardi E, Lichtarge O, Menin C, Martelli PL, Niroula A, Pal LR, Repo S, Scaini MC, Vihinen M, Wei Q, Xu Q, Yang Y, Yin Y, Zaucha J, Zhao H, Zhou Y, Brenner SE, Moult J, Tosatto SCE. Performance of in silico tools for the evaluation of p16INK4a (CDKN2A) variants in CAGI. Hum Mutat 2017; 38:1042-1050. [PMID: 28440912 PMCID: PMC5561474 DOI: 10.1002/humu.23235] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Revised: 04/17/2017] [Accepted: 04/19/2017] [Indexed: 12/31/2022]
Abstract
Correct phenotypic interpretation of variants of unknown significance for cancer-associated genes is a diagnostic challenge as genetic screenings gain in popularity in the next-generation sequencing era. The Critical Assessment of Genome Interpretation (CAGI) experiment aims to test and define the state of the art of genotype-phenotype interpretation. Here, we present the assessment of the CAGI p16INK4a challenge. Participants were asked to predict the effect on cellular proliferation of 10 variants for the p16INK4a tumor suppressor, a cyclin-dependent kinase inhibitor encoded by the CDKN2A gene. Twenty-two pathogenicity predictors were assessed with a variety of accuracy measures for reliability in a medical context. Different assessment measures were combined in an overall ranking to provide more robust results. The R scripts used for assessment are publicly available from a GitHub repository for future use in similar assessment exercises. Despite a limited test-set size, our findings show a variety of results, with some methods performing significantly better. Methods combining different strategies frequently outperform simpler approaches. The best predictor, Yang&Zhou lab, uses a machine learning method combining an empirical energy function measuring protein stability with an evolutionary conservation term. The p16INK4a challenge highlights how subtle structural effects can neutralize otherwise deleterious variants.
Collapse
|
research-article |
8 |
11 |
14
|
Clark WT, Kasak L, Bakolitsa C, Hu Z, Andreoletti G, Babbi G, Bromberg Y, Casadio R, Dunbrack R, Folkman L, Ford CT, Jones D, Katsonis P, Kundu K, Lichtarge O, Martelli PL, Mooney SD, Nodzak C, Pal LR, Radivojac P, Savojardo C, Shi X, Zhou Y, Uppal A, Xu Q, Yin Y, Pejaver V, Wang M, Wei L, Moult J, Yu GK, Brenner SE, LeBowitz JH. Assessment of predicted enzymatic activity of α-N-acetylglucosaminidase variants of unknown significance for CAGI 2016. Hum Mutat 2019; 40:1519-1529. [PMID: 31342580 PMCID: PMC7156275 DOI: 10.1002/humu.23875] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2019] [Revised: 06/27/2019] [Accepted: 07/15/2019] [Indexed: 12/25/2022]
Abstract
The NAGLU challenge of the fourth edition of the Critical Assessment of Genome Interpretation experiment (CAGI4) in 2016, invited participants to predict the impact of variants of unknown significance (VUS) on the enzymatic activity of the lysosomal hydrolase α-N-acetylglucosaminidase (NAGLU). Deficiencies in NAGLU activity lead to a rare, monogenic, recessive lysosomal storage disorder, Sanfilippo syndrome type B (MPS type IIIB). This challenge attracted 17 submissions from 10 groups. We observed that top models were able to predict the impact of missense mutations on enzymatic activity with Pearson's correlation coefficients of up to .61. We also observed that top methods were significantly more correlated with each other than they were with observed enzymatic activity values, which we believe speaks to the importance of sequence conservation across the different methods. Improved functional predictions on the VUS will help population-scale analysis of disease epidemiology and rare variant association analysis.
Collapse
|
Research Support, N.I.H., Extramural |
6 |
10 |
15
|
Yin Y, Kundu K, Pal LR, Moult J. Ensemble variant interpretation methods to predict enzyme activity and assign pathogenicity in the CAGI4 NAGLU (Human N-acetyl-glucosaminidase) and UBE2I (Human SUMO-ligase) challenges. Hum Mutat 2017; 38:1109-1122. [PMID: 28544272 DOI: 10.1002/humu.23267] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Revised: 05/09/2017] [Accepted: 05/18/2017] [Indexed: 11/07/2022]
Abstract
CAGI (Critical Assessment of Genome Interpretation) conducts community experiments to determine the state of the art in relating genotype to phenotype. Here, we report results obtained using newly developed ensemble methods to address two CAGI4 challenges: enzyme activity for population missense variants found in NAGLU (Human N-acetyl-glucosaminidase) and random missense mutations in Human UBE2I (Human SUMO E2 ligase), assayed in a high-throughput competitive yeast complementation procedure. The ensemble methods are effective, ranked second for SUMO-ligase and third for NAGLU, according to the CAGI independent assessors. However, in common with other methods used in CAGI, there are large discrepancies between predicted and experimental activities for a subset of variants. Analysis of the structural context provides some insight into these. Post-challenge analysis shows that the ensemble methods are also effective at assigning pathogenicity for the NAGLU variants. In the clinic, providing an estimate of the reliability of pathogenic assignments is the key. We have also used the NAGLU dataset to show that ensemble methods have considerable potential for this task, and are already reliable enough for use with a subset of mutations.
Collapse
|
Research Support, N.I.H., Extramural |
8 |
10 |
16
|
McInnes G, Daneshjou R, Katsonis P, Lichtarge O, Srinivasan R, Rana S, Radivojac P, Mooney SD, Pagel KA, Stamboulian M, Jiang Y, Capriotti E, Wang Y, Bromberg Y, Bovo S, Savojardo C, Martelli PL, Casadio R, Pal LR, Moult J, Brenner SE, Altman R. Predicting venous thromboembolism risk from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges. Hum Mutat 2019; 40:1314-1320. [PMID: 31140652 DOI: 10.1002/humu.23825] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 05/07/2019] [Accepted: 05/27/2019] [Indexed: 01/14/2023]
Abstract
Genetics play a key role in venous thromboembolism (VTE) risk, however established risk factors in European populations do not translate to individuals of African descent because of the differences in allele frequencies between populations. As part of the fifth iteration of the Critical Assessment of Genome Interpretation, participants were asked to predict VTE status in exome data from African American subjects. Participants were provided with 103 unlabeled exomes from patients treated with warfarin for non-VTE causes or VTE and asked to predict which disease each subject had been treated for. Given the lack of training data, many participants opted to use unsupervised machine learning methods, clustering the exomes by variation in genes known to be associated with VTE. The best performing method using only VTE related genes achieved an area under the ROC curve of 0.65. Here, we discuss the range of methods used in the prediction of VTE from sequence data and explore some of the difficulties of conducting a challenge with known confounders. In addition, we show that an existing genetic risk score for VTE that was developed in European subjects works well in African Americans.
Collapse
|
Research Support, N.I.H., Extramural |
6 |
9 |
17
|
Cai B, Li B, Kiga N, Thusberg J, Bergquist T, Chen YC, Niknafs N, Carter H, Tokheim C, Beleva-Guthrie V, Douville C, Bhattacharya R, Yeo HTG, Fan J, Sengupta S, Kim D, Cline M, Turner T, Diekhans M, Zaucha J, Pal LR, Cao C, Yu CH, Yin Y, Carraro M, Giollo M, Ferrari C, Leonardi E, Tosatto SC, Bobe J, Ball M, Hoskins RA, Repo S, Church G, Brenner SE, Moult J, Gough J, Stanke M, Karchin R, Mooney SD. Matching phenotypes to whole genomes: Lessons learned from four iterations of the personal genome project community challenges. Hum Mutat 2017; 38:1266-1276. [PMID: 28544481 PMCID: PMC5645203 DOI: 10.1002/humu.23265] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Revised: 03/24/2017] [Accepted: 05/17/2017] [Indexed: 01/08/2023]
Abstract
The advent of next-generation sequencing has dramatically decreased the cost for whole-genome sequencing and increased the viability for its application in research and clinical care. The Personal Genome Project (PGP) provides unrestricted access to genomes of individuals and their associated phenotypes. This resource enabled the Critical Assessment of Genome Interpretation (CAGI) to create a community challenge to assess the bioinformatics community's ability to predict traits from whole genomes. In the CAGI PGP challenge, researchers were asked to predict whether an individual had a particular trait or profile based on their whole genome. Several approaches were used to assess submissions, including ROC AUC (area under receiver operating characteristic curve), probability rankings, the number of correct predictions, and statistical significance simulations. Overall, we found that prediction of individual traits is difficult, relying on a strong knowledge of trait frequency within the general population, whereas matching genomes to trait profiles relies heavily upon a small number of common traits including ancestry, blood type, and eye color. When a rare genetic disorder is present, profiles can be matched when one or more pathogenic variants are identified. Prediction accuracy has improved substantially over the last 6 years due to improved methodology and a better understanding of features.
Collapse
|
Research Support, N.I.H., Extramural |
8 |
9 |
18
|
Kasak L, Hunter JM, Udani R, Bakolitsa C, Hu Z, Adhikari AN, Babbi G, Casadio R, Gough J, Guerrero RF, Jiang Y, Joseph T, Katsonis P, Kotte S, Kundu K, Lichtarge O, Martelli PL, Mooney SD, Moult J, Pal LR, Poitras J, Radivojac P, Rao A, Sivadasan N, Sunderam U, VG S, Yin Y, Zaucha J, Brenner SE, Meyn MS. CAGI SickKids challenges: Assessment of phenotype and variant predictions derived from clinical and genomic data of children with undiagnosed diseases. Hum Mutat 2019; 40:1373-1391. [PMID: 31322791 PMCID: PMC7318886 DOI: 10.1002/humu.23874] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2019] [Revised: 07/15/2019] [Accepted: 07/15/2019] [Indexed: 01/02/2023]
Abstract
Whole-genome sequencing (WGS) holds great potential as a diagnostic test. However, the majority of patients currently undergoing WGS lack a molecular diagnosis, largely due to the vast number of undiscovered disease genes and our inability to assess the pathogenicity of most genomic variants. The CAGI SickKids challenges attempted to address this knowledge gap by assessing state-of-the-art methods for clinical phenotype prediction from genomes. CAGI4 and CAGI5 participants were provided with WGS data and clinical descriptions of 25 and 24 undiagnosed patients from the SickKids Genome Clinic Project, respectively. Predictors were asked to identify primary and secondary causal variants. In addition, for CAGI5, groups had to match each genome to one of three disorder categories (neurologic, ophthalmologic, and connective), and separately to each patient. The performance of matching genomes to categories was no better than random but two groups performed significantly better than chance in matching genomes to patients. Two of the ten variants proposed by two groups in CAGI4 were deemed to be diagnostic, and several proposed pathogenic variants in CAGI5 are good candidates for phenotype expansion. We discuss implications for improving in silico assessment of genomic variants and identifying new disease genes.
Collapse
|
Research Support, N.I.H., Extramural |
6 |
8 |
19
|
Darden L, Kundu K, Pal LR, Moult J. Harnessing formal concepts of biological mechanism to analyze human disease. PLoS Comput Biol 2018; 14:e1006540. [PMID: 30586388 PMCID: PMC6306204 DOI: 10.1371/journal.pcbi.1006540] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Mechanism is a widely used concept in biology. In 2017, more than 10% of PubMed abstracts used the term. Therefore, searching for and reasoning about mechanisms is fundamental to much of biomedical research, but until now there has been almost no computational infrastructure for this purpose. Recent work in the philosophy of science has explored the central role that the search for mechanistic accounts of biological phenomena plays in biomedical research, providing a conceptual basis for representing and analyzing biological mechanism. The foundational categories for components of mechanisms-entities and activities-guide the development of general, abstract types of biological mechanism parts. Building on that analysis, we have developed a formal framework for describing and representing biological mechanism, MecCog, and applied it to describing mechanisms underlying human genetic disease. Mechanisms are depicted using a graphical notation. Key features are assignment of mechanism components to stages of biological organization and classes; visual representation of uncertainty, ignorance, and ambiguity; and tight integration with literature sources. The MecCog framework facilitates analysis of many aspects of disease mechanism, including the prioritization of future experiments, probing of gene-drug and gene-environment interactions, identification of possible new drug targets, personalized drug choice, analysis of nonlinear interactions between relevant genetic loci, and classification of diseases based on mechanism.
Collapse
|
Editorial |
7 |
7 |
20
|
Pal LR, Kundu K, Yin Y, Moult J. CAGI4 SickKids clinical genomes challenge: A pipeline for identifying pathogenic variants. Hum Mutat 2017; 38:1169-1181. [PMID: 28512736 PMCID: PMC5577808 DOI: 10.1002/humu.23257] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2016] [Revised: 05/09/2017] [Accepted: 05/10/2017] [Indexed: 12/21/2022]
Abstract
Compared with earlier more restricted sequencing technologies, identification of rare disease variants using whole-genome sequence has the possibility of finding all causative variants, but issues of data quality and an overwhelming level of background variants complicate the analysis. The CAGI4 SickKids clinical genome challenge provided an opportunity to assess the landscape of variants found in a difficult set of 25 unsolved rare disease cases. To address the challenge, we developed a three-stage pipeline, first carefully analyzing data quality, then classifying high-quality gene-specific variants into seven categories, and finally examining each candidate variant for compatibility with the often complex phenotypes of these patients for final prioritization. Variants consistent with the phenotypes were found in 24 out of the 25 cases, and in a number of these, there are prioritized variants in multiple genes. Data quality analysis suggests that some of the selected variants are likely incorrect calls, complicating interpretation. The data providers followed up on three suggested variants with Sanger sequencing, and in one case, a prioritized variant was confirmed as likely causative by the referring physician, providing a diagnosis in a previously intractable case.
Collapse
|
Research Support, N.I.H., Extramural |
8 |
7 |
21
|
Voskanian A, Katsonis P, Lichtarge O, Pejaver V, Radivojac P, Mooney SD, Capriotti E, Bromberg Y, Wang Y, Miller M, Martelli PL, Savojardo C, Babbi G, Casadio R, Cao Y, Sun Y, Shen Y, Garg A, Pal D, Yu Y, Huff CD, Tavtigian SV, Young E, Neuhausen SL, Ziv E, Pal LR, Andreoletti G, Brenner S, Kann MG. Assessing the performance of in silico methods for predicting the pathogenicity of variants in the gene CHEK2, among Hispanic females with breast cancer. Hum Mutat 2019; 40:1612-1622. [PMID: 31241222 PMCID: PMC6744287 DOI: 10.1002/humu.23849] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 05/23/2019] [Accepted: 06/21/2019] [Indexed: 01/22/2023]
Abstract
The availability of disease-specific genomic data is critical for developing new computational methods that predict the pathogenicity of human variants and advance the field of precision medicine. However, the lack of gold standards to properly train and benchmark such methods is one of the greatest challenges in the field. In response to this challenge, the scientific community is invited to participate in the Critical Assessment for Genome Interpretation (CAGI), where unpublished disease variants are available for classification by in silico methods. As part of the CAGI-5 challenge, we evaluated the performance of 18 submissions and three additional methods in predicting the pathogenicity of single nucleotide variants (SNVs) in checkpoint kinase 2 (CHEK2) for cases of breast cancer in Hispanic females. As part of the assessment, the efficacy of the analysis method and the setup of the challenge were also considered. The results indicated that though the challenge could benefit from additional participant data, the combined generalized linear model analysis and odds of pathogenicity analysis provided a framework to evaluate the methods submitted for SNV pathogenicity identification and for comparison to other available methods. The outcome of this challenge and the approaches used can help guide further advancements in identifying SNV-disease relationships.
Collapse
|
Research Support, N.I.H., Extramural |
6 |
6 |
22
|
Chandonia JM, Adhikari A, Carraro M, Chhibber A, Cutting GR, Fu Y, Gasparini A, Jones DT, Kramer A, Kundu K, Lam HYK, Leonardi E, Moult J, Pal LR, Searls DB, Shah S, Sunyaev S, Tosatto SCE, Yin Y, Buckley BA. Lessons from the CAGI-4 Hopkins clinical panel challenge. Hum Mutat 2017; 38:1155-1168. [PMID: 28397312 DOI: 10.1002/humu.23225] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Revised: 03/24/2017] [Accepted: 03/29/2017] [Indexed: 12/17/2022]
Abstract
The CAGI-4 Hopkins clinical panel challenge was an attempt to assess state-of-the-art methods for clinical phenotype prediction from DNA sequence. Participants were provided with exonic sequences of 83 genes for 106 patients from the Johns Hopkins DNA Diagnostic Laboratory. Five groups participated in the challenge, predicting both the probability that each patient had each of the 14 possible classes of disease, as well as one or more causal variants. In cases where the Hopkins laboratory reported a variant, at least one predictor correctly identified the disease class in 36 of the 43 patients (84%). Even in cases where the Hopkins laboratory did not find a variant, at least one predictor correctly identified the class in 39 of the 63 patients (62%). Each prediction group correctly diagnosed at least one patient that was not successfully diagnosed by any other group. We discuss the causal variant predictions by different groups and their implications for further development of methods to assess variants of unknown significance. Our results suggest that clinically relevant variants may be missed when physicians order small panels targeted on a specific phenotype. We also quantify the false-positive rate of DNA-guided analysis in the absence of prior phenotypic indication.
Collapse
|
Research Support, U.S. Gov't, Non-P.H.S. |
8 |
6 |
23
|
Kasak L, Bakolitsa C, Hu Z, Yu C, Rine J, Dimster-Denk DF, Pandey G, Baets GD, Bromberg Y, Cao C, Capriotti E, Casadio R, Durme JV, Giollo M, Karchin R, Katsonis P, Leonardi E, Lichtarge O, Martelli PL, Masica D, Mooney SD, Olatubosun A, Pal LR, Radivojac P, Rousseau F, Savojardo C, Schymkowitz J, Thusberg J, Tosatto SC, Vihinen M, Väliaho J, Repo S, Moult J, Brenner SE, Friedberg I. Assessing computational predictions of the phenotypic effect of cystathionine-beta-synthase variants. Hum Mutat 2019; 40:1530-1545. [PMID: 31301157 PMCID: PMC7325732 DOI: 10.1002/humu.23868] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 06/22/2019] [Accepted: 07/09/2019] [Indexed: 12/28/2022]
Abstract
Accurate prediction of the impact of genomic variation on phenotype is a major goal of computational biology and an important contributor to personalized medicine. Computational predictions can lead to a better understanding of the mechanisms underlying genetic diseases, including cancer, but their adoption requires thorough and unbiased assessment. Cystathionine-beta-synthase (CBS) is an enzyme that catalyzes the first step of the transsulfuration pathway, from homocysteine to cystathionine, and in which variations are associated with human hyperhomocysteinemia and homocystinuria. We have created a computational challenge under the CAGI framework to evaluate how well different methods can predict the phenotypic effect(s) of CBS single amino acid substitutions using a blinded experimental data set. CAGI participants were asked to predict yeast growth based on the identity of the mutations. The performance of the methods was evaluated using several metrics. The CBS challenge highlighted the difficulty of predicting the phenotype of an ex vivo system in a model organism when classification models were trained on human disease data. We also discuss the variations in difficulty of prediction for known benign and deleterious variants, as well as identify methodological and experimental constraints with lessons to be learned for future challenges.
Collapse
|
Research Support, N.I.H., Extramural |
6 |
5 |
24
|
Guda C, Pal LR, Shindyalov IN. DMAPS: a database of multiple alignments for protein structures. Nucleic Acids Res 2006; 34:D273-6. [PMID: 16381863 PMCID: PMC1347381 DOI: 10.1093/nar/gkj018] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The database of multiple alignments for protein structures (DMAPS) provides instant access to pre-computed multiple structure alignments for all protein structure families in the Protein Data Bank (PDB). Protein structure families have been obtained from four distinct classification methods including SCOP, CATH, ENZYME and CE, and multiple structure alignments have been built for all families containing at least three members, using CE-MC software. Currently, multiple structure alignments are available for 3050 SCOP-, 3087 CATH-, 664 ENZYME- and 1707 CE-based families. A web-based query system has been developed to retrieve multiple alignments for these families using the PDB chain ID of any member of a family. Multiple alignments can be viewed or downloaded in six different formats, including JOY/html, TEXT, FASTA, PDB (superimposed coordinates), JOY/postscript and JOY/rtf. DMAPS is accessible online at .
Collapse
|
Research Support, Non-U.S. Gov't |
19 |
4 |
25
|
Kundu K, Pal LR, Yin Y, Moult J. Determination of disease phenotypes and pathogenic variants from exome sequence data in the CAGI 4 gene panel challenge. Hum Mutat 2017; 38:1201-1216. [PMID: 28497567 PMCID: PMC5576720 DOI: 10.1002/humu.23249] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2016] [Revised: 03/30/2017] [Accepted: 04/28/2017] [Indexed: 01/06/2023]
Abstract
The use of gene panel sequence for diagnostic and prognostic testing is now widespread, but there are so far few objective tests of methods to interpret these data. We describe the design and implementation of a gene panel sequencing data analysis pipeline (VarP) and its assessment in a CAGI4 community experiment. The method was applied to clinical gene panel sequencing data of 106 patients, with the goal of determining which of 14 disease classes each patient has and the corresponding causative variant(s). The disease class was correctly identified for 36 cases, including 10 where the original clinical pipeline did not find causative variants. For a further seven cases, we found strong evidence of an alternative disease to that tested. Many of the potentially causative variants are missense, with no previous association with disease, and these proved the hardest to correctly assign pathogenicity or otherwise. Post analysis showed that three-dimensional structure data could have helped for up to half of these cases. Over-reliance on HGMD annotation led to a number of incorrect disease assignments. We used a largely ad hoc method to assign probabilities of pathogenicity for each variant, and there is much work still to be done in this area.
Collapse
|
Research Support, N.I.H., Extramural |
8 |
4 |