1
|
Tabet DR, Kuang D, Lancaster MC, Li R, Liu K, Weile J, Coté AG, Wu Y, Hegele RA, Roden DM, Roth FP. Benchmarking computational variant effect predictors by their ability to infer human traits. Genome Biol 2024; 25:172. [PMID: 38951922 PMCID: PMC11218265 DOI: 10.1186/s13059-024-03314-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 06/17/2024] [Indexed: 07/03/2024] Open
Abstract
BACKGROUND Computational variant effect predictors offer a scalable and increasingly reliable means of interpreting human genetic variation, but concerns of circularity and bias have limited previous methods for evaluating and comparing predictors. Population-level cohorts of genotyped and phenotyped participants that have not been used in predictor training can facilitate an unbiased benchmarking of available methods. Using a curated set of human gene-trait associations with a reported rare-variant burden association, we evaluate the correlations of 24 computational variant effect predictors with associated human traits in the UK Biobank and All of Us cohorts. RESULTS AlphaMissense outperformed all other predictors in inferring human traits based on rare missense variants in UK Biobank and All of Us participants. The overall rankings of computational variant effect predictors in these two cohorts showed a significant positive correlation. CONCLUSION We describe a method to assess computational variant effect predictors that sidesteps the limitations of previous evaluations. This approach is generalizable to future predictors and could continue to inform predictor choice for personal and clinical genetics.
Collapse
Affiliation(s)
- Daniel R Tabet
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Da Kuang
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Megan C Lancaster
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Roujia Li
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Karen Liu
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Jochen Weile
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Atina G Coté
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Yingzhou Wu
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Robert A Hegele
- Department of Medicine, Department of Biochemistry, Schulich School of Medicine and Dentistry, Robarts Research Institute, Western University, London, ON, Canada
| | - Dan M Roden
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Pharmacology, Vanderbilt University Medical Centre, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Frederick P Roth
- Donnelly Centre, University of Toronto, Toronto, ON, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada.
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
2
|
Curtis D. Weighted burden analysis of rare coding variants in 470,000 exome-sequenced UK Biobank participants characterises effects on hyperlipidaemia risk. J Hum Genet 2024; 69:255-262. [PMID: 38454133 PMCID: PMC11126377 DOI: 10.1038/s10038-024-01235-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 02/02/2024] [Accepted: 02/20/2024] [Indexed: 03/09/2024]
Abstract
A previous study of 200,000 exome-sequenced UK Biobank participants investigating the association between rare coding variants and hyperlipidaemia had implicated four genes, LDLR, PCSK9, APOC3 and IFITM5, at exome-wide significance. In addition, a further 43 protein-coding genes were significant with an uncorrected p value of <0.001. Exome sequence data has become available for a further 270,000 participants and weighted burden analysis to test for association with hyperlipidaemia was carried out in this sample for the 47 genes highlighted by the previous study. There was no evidence to implicate IFITM5 but LDLR, PCSK9, APOC3, ANGPTL3, ABCG5 and NPC1L1 were all statistically significant after correction for multiple testing. These six genes were also all exome-wide significant in the combined sample of 470,000 participants. Variants impairing function of LDLR and ABCG5 were associated with increased risk whereas variants in the other genes were protective. Variant categories associated with large effect sizes are cumulatively very rare and the main benefit of this kind of study seems to be to throw light on the molecular mechanisms impacting hyperlipidaemia risk, hopefully supporting attempts to develop improved therapies.
Collapse
Affiliation(s)
- David Curtis
- UCL Genetics Institute, University College London, London, UK.
| |
Collapse
|
3
|
Curtis D. Investigation of Recessive Effects of Coding Variants on Common Clinical Phenotypes in Exome-Sequenced UK Biobank Participants. Hum Hered 2024; 89:1-7. [PMID: 38342085 DOI: 10.1159/000537771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/07/2024] [Indexed: 02/13/2024] Open
Abstract
INTRODUCTION Previous studies have demonstrated effects of rare coding variants on common, clinically relevant phenotypes although the additive burden of these variants makes only a small contribution to overall trait variance. Although recessive effects of individual homozygous variants have been studied, little work has been done to elucidate the impact of rare coding variants occurring together as compound heterozygotes. METHODS In this study, attempts were made to identify pairs of variants likely to be occurring as compound heterozygotes using 200,000 exome-sequenced subjects from the UK Biobank. Pairs of variants, which were seen together in the same subject more often than would be expected by chance, were excluded as it was assumed that these might be present in the same haplotype. Attention was restricted to variants with minor allele frequency ≤0.05 and to those predicted to alter amino acid sequence or prevent normal gene expression. For each gene, compound heterozygotes were assigned scores based on the rarity and predicted functional consequences of the constituent variants and the scores were used in a logistic regression analysis to test for association with hypertension, hyperlipidaemia, and type 2 diabetes. RESULTS No statistically significant associations were observed and the results conformed to the distribution, which would be expected under the null hypothesis. The average number of apparently compound heterozygous subjects for each gene was only 282.2. CONCLUSION It seems difficult to detect an effect of compound heterozygotes on the risk of these phenotypes. Even if recessive effects from compound heterozygotes do occur, they would only affect a small number of people and overall would not make a substantial contribution to phenotypic variance. This research has been conducted using the UK Biobank Resource.
Collapse
Affiliation(s)
- David Curtis
- UCL Genetics Institute, University College London, London, UK
| |
Collapse
|
4
|
Markel KA, Curtis D. Study of variants in genes implicated in rare familial migraine syndromes and their association with migraine in 200,000 exome-sequenced UK Biobank participants. Ann Hum Genet 2022; 86:353-360. [PMID: 36044383 DOI: 10.1111/ahg.12484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 08/11/2022] [Accepted: 08/15/2022] [Indexed: 01/07/2023]
Abstract
BACKGROUND A number of genes have been implicated in rare familial syndromes which have migraine as part of their phenotype but these genes have not previously been implicated in the common form of migraine. METHODS Among exome-sequenced participants in the UK Biobank, we identified 7194 migraine cases with the remaining 193,433 participants classified as controls. We investigated rare variants in 10 genes previously reported to be implicated in conditions with migraine as a prominent part of the phenotype and carried out gene- and variant-based tests for association. RESULTS We found no evidence for association of these genes or variants with the common form of migraine seen in our subjects. In particular, a frameshift variant in KCNK18, p.(Phe139Trpfs*24), which had been shown to segregate with migraine with aura in a multiply affected pedigree, was found in 196 (0.10%) controls as well as in 10 (0.14%) cases (χ2 = 0.96, 1 df, p = 0.33). CONCLUSIONS Since there is no other reported evidence to implicate KCNK18, we conclude that this gene and its product, TRESK, should no longer be regarded as being involved in migraine aetiology. Overall, we do not find that rare, functional variants in genes previously implicated to be involved in familial syndromes including migraine as part of the phenotype make a contribution to the commoner forms of migraine observed in this population.
Collapse
Affiliation(s)
| | - David Curtis
- UCL Genetics Institute, University College London, London, UK.,Centre for Psychiatry, Queen Mary University of London, London, UK
| |
Collapse
|
5
|
Lewis MA, Schulte BA, Dubno JR, Steel KP. Investigating the characteristics of genes and variants associated with self-reported hearing difficulty in older adults in the UK Biobank. BMC Biol 2022; 20:150. [PMID: 35761239 PMCID: PMC9238072 DOI: 10.1186/s12915-022-01349-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 06/10/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Age-related hearing loss is a common, heterogeneous disease with a strong genetic component. More than 100 loci have been reported to be involved in human hearing impairment to date, but most of the genes underlying human adult-onset hearing loss remain unknown. Most genetic studies have focussed on very rare variants (such as family studies and patient cohort screens) or very common variants (genome-wide association studies). However, the contribution of variants present in the human population at intermediate frequencies is hard to quantify using these methods, and as a result, the landscape of variation associated with adult-onset hearing loss remains largely unknown. RESULTS Here we present a study based on exome sequencing and self-reported hearing difficulty in the UK Biobank, a large-scale biomedical database. We have carried out variant load analyses using different minor allele frequency and impact filters, and compared the resulting gene lists to a manually curated list of nearly 700 genes known to be involved in hearing in humans and/or mice. An allele frequency cutoff of 0.1, combined with a high predicted variant impact, was found to be the most effective filter setting for our analysis. We also found that separating the participants by sex produced markedly different gene lists. The gene lists obtained were investigated using gene ontology annotation, functional prioritisation and expression analysis, and this identified good candidates for further study. CONCLUSIONS Our results suggest that relatively common as well as rare variants with a high predicted impact contribute to age-related hearing impairment and that the genetic contributions to adult hearing difficulty may differ between the sexes. Our manually curated list of deafness genes is a useful resource for candidate gene prioritisation in hearing loss.
Collapse
Affiliation(s)
- Morag A Lewis
- Wolfson Centre for Age-Related Diseases, King's College London, London, SE1 1UL, UK.
| | | | - Judy R Dubno
- The Medical University of South Carolina, Charleston, SC, USA
| | - Karen P Steel
- Wolfson Centre for Age-Related Diseases, King's College London, London, SE1 1UL, UK
| |
Collapse
|
7
|
Curtis D. Weighted burden analysis in 200,000 exome-sequenced subjects characterises rare variant effects on BMI. Int J Obes (Lond) 2022; 46:782-792. [PMID: 35067685 DOI: 10.1038/s41366-021-01053-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 11/29/2021] [Accepted: 12/13/2021] [Indexed: 11/09/2022]
Abstract
INTRODUCTION A number of genes have been identified in which rare variants can cause obesity. Here we analyse a sample of exome sequenced subjects from UK Biobank using BMI as a phenotype with the aims of identifying genes in which rare, functional variants influence BMI and characterising the effects of different categories of variant. METHODS There were 199,807 exome sequenced subjects for whom BMI was recorded. Weighted burden analysis of rare, functional variants was carried out, incorporating population principal components and sex as covariates. For selected genes, additional analyses were carried out to clarify the contribution of different categories of variant. Statistical significance was summarised as the signed log 10 of the p value (SLP), given a positive sign if the weighted burden score was positively correlated with BMI. RESULTS Two genes were exome-wide significant, MC4R (SLP = 15.79) and PCSK1 (SLP = 6.61). In MC4R, disruptive variants were associated with an increase in BMI of 2.72 units and probably damaging nonsynonymous variants with an increase of 2.02 units. In PCSK1, disruptive variants were associated with a BMI increase of 2.29 and protein-altering variants with an increase of 0.34. Results for other genes were not formally significant after correction for multiple testing, although SIRT1, ZBED6 and NPC2 were noted to be of potential interest. CONCLUSION Because the UK Biobank consists of a self-selected sample of relatively healthy volunteers, the effect sizes noted may be underestimates. The results demonstrate the effects of very rare variants on BMI and suggest that other genes and variants will be definitively implicated when the sequence data for additional subjects becomes available.
Collapse
Affiliation(s)
- David Curtis
- UCL Genetics Institute, UCL, Darwin Building, Gower Street, London, WC1E 6BT, UK.
- Centre for Psychiatry, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ, UK.
| |
Collapse
|
8
|
Curtis D. Analysis of rare coding variants in 200,000 exome-sequenced subjects reveals novel genetic risk factors for type 2 diabetes. Diabetes Metab Res Rev 2022; 38:e3482. [PMID: 34216101 DOI: 10.1002/dmrr.3482] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 04/27/2021] [Accepted: 06/21/2021] [Indexed: 12/26/2022]
Abstract
AIMS The study aimed to elucidate the effects of rare genetic variants on the risk of type 2 diabetes (T2D). MATERIALS AND METHODS Weighted burden analysis of rare variants was applied to a sample of 200,000 exome-sequenced participants in the UK Biobank project, of whom over 13,000 were identified as having T2D. Variant weights were allocated based on allele frequency and predicted effect, as informed by a previous analysis of hyperlipidaemia. RESULTS There was an exome-wide significant increased burden of rare, functional variants in three genes, GCK, HNF4A and GIGYF1. GIGYF1 has not previously been identified as a diabetes risk gene and its product appears to be involved in the modification of insulin signalling. A number of other genes did not attain exome-wide significance but were highly ranked and potentially of interest, including ALAD, PPARG, GYG1 and GHRL. Loss of function (LOF) variants were associated with T2D in GCK and GIGYF1 whereas nonsynonymous variants annotated as probably damaging were associated in GCK and HNF4A. Overall, fewer than 1% of T2D cases carried one of these variants. In HNF1A and HNF1B there was an excess of LOF variants among cases but the small numbers of these fell short of statistical significance. CONCLUSIONS Rare genetic variants make an identifiable contribution to T2D in a small number of cases but these may provide valuable insights into disease mechanisms. As larger samples become available it is likely that additional genetic factors will be identified.
Collapse
Affiliation(s)
- David Curtis
- UCL Genetics Institute, University College London, London, UK
- Centre for Psychiatry, Queen Mary University of London, London, UK
| |
Collapse
|
9
|
Curtis D. Analysis of 200,000 Exome-Sequenced UK Biobank Subjects Implicates Genes Involved in Increased and Decreased Risk of Hypertension. Pulse (Basel) 2021; 9:17-29. [PMID: 34722352 PMCID: PMC8527905 DOI: 10.1159/000517419] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 05/10/2021] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Previous analyses have identified common variants along with some specific genes and rare variants which are associated with risk of hypertension, but much remains to be discovered. METHODS AND RESULTS Exome-sequenced UK Biobank participants were phenotyped based on having a diagnosis of hypertension or taking anti-hypertensive medication to produce a sample of 66,123 cases and 134,504 controls. Variants with minor allele frequency (MAF) <0.01 were subjected to a gene-wise weighted burden analysis, with higher weights assigned to variants which are rarer and/or predicted to have more severe effects. Of 20,384 genes analysed, 2 genes were exome-wide significant, DNMT3A and FES. Also strongly implicated were GUCY1A1 and GUCY1B1, which code for the subunits of soluble guanylate cyclase. There was further support for the previously reported effects of variants in NPR1 and protective effects of variants in DBH. An inframe deletion in CACNA1D with MAF = 0.005, rs72556363, is associated with modestly increased risk of hypertension. Other biologically plausible genes highlighted consist of CSK, AGTR1, ZYX, and PREP. All variants implicated were rare, and cumulatively they are not predicted to make a large contribution to the population risk of hypertension. CONCLUSIONS This approach confirms and clarifies previously reported findings and also offers novel insights into biological processes influencing hypertension risk, potentially facilitating the development of improved therapeutic interventions. This research has been conducted using the UK Biobank Resource.
Collapse
Affiliation(s)
- David Curtis
- UCL Genetics Institute, University College London, London, United Kingdom
- Centre for Psychiatry, Queen Mary University of London, London, United Kingdom
| |
Collapse
|