1
|
Combining full-length gene assay and SpliceAI to interpret the splicing impact of all possible SPINK1 coding variants. Hum Genomics 2024; 18:21. [PMID: 38414044 PMCID: PMC10898081 DOI: 10.1186/s40246-024-00586-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 02/13/2024] [Indexed: 02/29/2024] Open
Abstract
BACKGROUND Single-nucleotide variants (SNVs) within gene coding sequences can significantly impact pre-mRNA splicing, bearing profound implications for pathogenic mechanisms and precision medicine. In this study, we aim to harness the well-established full-length gene splicing assay (FLGSA) in conjunction with SpliceAI to prospectively interpret the splicing effects of all potential coding SNVs within the four-exon SPINK1 gene, a gene associated with chronic pancreatitis. RESULTS Our study began with a retrospective analysis of 27 SPINK1 coding SNVs previously assessed using FLGSA, proceeded with a prospective analysis of 35 new FLGSA-tested SPINK1 coding SNVs, followed by data extrapolation, and ended with further validation. In total, we analyzed 67 SPINK1 coding SNVs, which account for 9.3% of the 720 possible coding SNVs. Among these 67 FLGSA-analyzed SNVs, 12 were found to impact splicing. Through detailed comparison of FLGSA results and SpliceAI predictions, we inferred that the remaining 653 untested coding SNVs in the SPINK1 gene are unlikely to significantly affect splicing. Of the 12 splice-altering events, nine produced both normally spliced and aberrantly spliced transcripts, while the remaining three only generated aberrantly spliced transcripts. These splice-impacting SNVs were found solely in exons 1 and 2, notably at the first and/or last coding nucleotides of these exons. Among the 12 splice-altering events, 11 were missense variants (2.17% of 506 potential missense variants), and one was synonymous (0.61% of 164 potential synonymous variants). Notably, adjusting the SpliceAI cut-off to 0.30 instead of the conventional 0.20 would improve specificity without reducing sensitivity. CONCLUSIONS By integrating FLGSA with SpliceAI, we have determined that less than 2% (1.67%) of all possible coding SNVs in SPINK1 significantly influence splicing outcomes. Our findings emphasize the critical importance of conducting splicing analysis within the broader genomic sequence context of the study gene and highlight the inherent uncertainties associated with intermediate SpliceAI scores (0.20 to 0.80). This study contributes to the field by being the first to prospectively interpret all potential coding SNVs in a disease-associated gene with a high degree of accuracy, representing a meaningful attempt at shifting from retrospective to prospective variant analysis in the era of exome and genome sequencing.
Collapse
|
2
|
Ravages: An R package for the simulation and analysis of rare variants in multicategory phenotypes. Genet Epidemiol 2023; 47:450-460. [PMID: 37158367 DOI: 10.1002/gepi.22529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 03/27/2023] [Accepted: 04/25/2023] [Indexed: 05/10/2023]
Abstract
Current software packages for the analysis and the simulations of rare variants are only available for binary and continuous traits. Ravages provides solutions in a single R package to perform rare variant association tests for multicategory, binary and continuous phenotypes, to simulate datasets under different scenarios and to compute statistical power. Association tests can be run in the whole genome thanks to C++ implementation of most of the functions, using either RAVA-FIRST, a recently developed strategy to filter and analyse genome-wide rare variants, or user-defined candidate regions. Ravages also includes a simulation module that generates genetic data for cases who can be stratified into several subgroups and for controls. Through comparisons with existing programmes, we show that Ravages complements existing tools and will be useful to study the genetic architecture of complex diseases. Ravages is available on the CRAN at https://cran.r-project.org/web/packages/Ravages/ and maintained on Github at https://github.com/genostats/Ravages.
Collapse
|
3
|
Testing for association with rare variants in the coding and non-coding genome: RAVA-FIRST, a new approach based on CADD deleteriousness score. PLoS Genet 2022; 18:e1009923. [PMID: 36112662 PMCID: PMC9518893 DOI: 10.1371/journal.pgen.1009923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 09/28/2022] [Accepted: 08/15/2022] [Indexed: 11/18/2022] Open
Abstract
Rare variant association tests (RVAT) have been developed to study the contribution of rare variants widely accessible through high-throughput sequencing technologies. RVAT require to aggregate rare variants in testing units and to filter variants to retain only the most likely causal ones. In the exome, genes are natural testing units and variants are usually filtered based on their functional consequences. However, when dealing with whole-genome sequence (WGS) data, both steps are challenging. No natural biological unit is available for aggregating rare variants. Sliding windows procedures have been proposed to circumvent this difficulty, however they are blind to biological information and result in a large number of tests. We propose a new strategy to perform RVAT on WGS data: “RAVA-FIRST” (RAre Variant Association using Functionally-InfoRmed STeps) comprising three steps. (1) New testing units are defined genome-wide based on functionally-adjusted Combined Annotation Dependent Depletion (CADD) scores of variants observed in the gnomAD populations, which are referred to as “CADD regions”. (2) A region-dependent filtering of rare variants is applied in each CADD region. (3) A functionally-informed burden test is performed with sub-scores computed for each genomic category within each CADD region. Both on simulations and real data, RAVA-FIRST was found to outperform other WGS-based RVAT. Applied to a WGS dataset of venous thromboembolism patients, we identified an intergenic region on chromosome 18 enriched for rare variants in early-onset patients. This region that was missed by standard sliding windows procedures is included in a TAD region that contains a strong candidate gene. RAVA-FIRST enables new investigations of rare non-coding variants in complex diseases, facilitated by its implementation in the R package Ravages.
Collapse
|
4
|
RAVAQ: An integrative pipeline from quality control to region-based rare variant association analysis. Genet Epidemiol 2022; 46:256-265. [PMID: 35419876 DOI: 10.1002/gepi.22450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 02/04/2022] [Accepted: 03/15/2022] [Indexed: 11/07/2022]
Abstract
Next-generation sequencing technologies have opened up the possibility to sequence large samples of cases and controls to test for association with rare variants. To limit cost and increase sample sizes, data from controls could be used in multiple studies and might thus be generated on different sequencing platforms. This could pose some problems of comparability between cases and controls due to batch effects that could be confounding factors, leading to false-positive association signals. To limit batch effects and ensure comparability of datasets, stringent quality controls are required. We propose an integrative five-steps pipeline, RAVAQ, that (a) performs a specific three-step quality control taking into account the case-control status to ensure data comparability, (b) selects qualifying variants as defined by the user, and (c) performs rare variant association tests per genomic region. The RAVAQ pipeline is wrapped in an R package. It is user-friendly and flexible in its arguments to adapt to the specificity of each research project. We provide examples showing how RAVAQ improves rare variant association tests. The default RAVAQ quality control outperformed the widely used Variant Quality Score Recalibration method, removing inflation due to spurious signals. RAVAQ is open source and freely available at https://gitlab.com/gmarenne/ravaq.
Collapse
|
5
|
Abstract
Glycemic traits are used to diagnose and monitor type 2 diabetes and cardiometabolic health. To date, most genetic studies of glycemic traits have focused on individuals of European ancestry. Here we aggregated genome-wide association studies comprising up to 281,416 individuals without diabetes (30% non-European ancestry) for whom fasting glucose, 2-h glucose after an oral glucose challenge, glycated hemoglobin and fasting insulin data were available. Trans-ancestry and single-ancestry meta-analyses identified 242 loci (99 novel; P < 5 × 10-8), 80% of which had no significant evidence of between-ancestry heterogeneity. Analyses restricted to individuals of European ancestry with equivalent sample size would have led to 24 fewer new loci. Compared with single-ancestry analyses, equivalent-sized trans-ancestry fine-mapping reduced the number of estimated variants in 99% credible sets by a median of 37.5%. Genomic-feature, gene-expression and gene-set analyses revealed distinct biological signatures for each trait, highlighting different underlying biological pathways. Our results increase our understanding of diabetes pathophysiology by using trans-ancestry studies for improved power and resolution.
Collapse
|
6
|
Exome Sequencing Identifies Genes and Gene Sets Contributing to Severe Childhood Obesity, Linking PHIP Variants to Repressed POMC Transcription. Cell Metab 2020; 31:1107-1119.e12. [PMID: 32492392 PMCID: PMC7267775 DOI: 10.1016/j.cmet.2020.05.007] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 03/06/2020] [Accepted: 05/09/2020] [Indexed: 12/12/2022]
Abstract
Obesity is genetically heterogeneous with monogenic and complex polygenic forms. Using exome and targeted sequencing in 2,737 severely obese cases and 6,704 controls, we identified three genes (PHIP, DGKI, and ZMYM4) with an excess burden of very rare predicted deleterious variants in cases. In cells, we found that nuclear PHIP (pleckstrin homology domain interacting protein) directly enhances transcription of pro-opiomelanocortin (POMC), a neuropeptide that suppresses appetite. Obesity-associated PHIP variants repressed POMC transcription. Our demonstration that PHIP is involved in human energy homeostasis through transcriptional regulation of central melanocortin signaling has potential diagnostic and therapeutic implications for patients with obesity and developmental delay. Additionally, we found an excess burden of predicted deleterious variants involving genes nearest to loci from obesity genome-wide association studies. Genes and gene sets influencing obesity with variable penetrance provide compelling evidence for a continuum of causality in the genetic architecture of obesity, and explain some of its missing heritability.
Collapse
|
7
|
Publisher Correction: Protein-altering variants associated with body mass index implicate pathways that control energy intake and expenditure in obesity. Nat Genet 2019; 51:1191-1192. [PMID: 31160809 DOI: 10.1038/s41588-019-0447-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Collapse
|
8
|
Rare variant association testing for multicategory phenotype. Genet Epidemiol 2019; 43:646-656. [PMID: 31087445 DOI: 10.1002/gepi.22210] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Revised: 04/03/2019] [Accepted: 04/17/2019] [Indexed: 01/09/2023]
Abstract
Genetic association studies have provided new insights into the genetic variability of human complex traits with a focus mainly on continuous or binary traits. Methods have been proposed to take into account disease heterogeneity between subgroups of patients when studying common variants but none was specifically designed for rare variants. Because rare variants are expected to have stronger effects and to be more heterogeneously distributed among cases than common ones, subgroup analyses might be particularly attractive in this context. To address this issue, we propose an extension of burden tests by using a multinomial regression model, which enables association tests between rare variants and multicategory phenotypes. We evaluated the type I error and the power of two burden tests, CAST and WSS, by simulating data under different scenarios. In the case of genetic heterogeneity between case subgroups, we showed an advantage of multinomial regression over logistic regression, which considers all the cases against the controls. We replicated these results on real data from Moyamoya disease where the burden tests performed better when cases were stratified according to age-of-onset. We implemented the functions for association tests in the R package "Ravages" available on Github.
Collapse
|
9
|
Bi-allelic Loss-of-Function CACNA1B Mutations in Progressive Epilepsy-Dyskinesia. Am J Hum Genet 2019; 104:948-956. [PMID: 30982612 DOI: 10.1016/j.ajhg.2019.03.005] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Accepted: 03/04/2019] [Indexed: 12/11/2022] Open
Abstract
The occurrence of non-epileptic hyperkinetic movements in the context of developmental epileptic encephalopathies is an increasingly recognized phenomenon. Identification of causative mutations provides an important insight into common pathogenic mechanisms that cause both seizures and abnormal motor control. We report bi-allelic loss-of-function CACNA1B variants in six children from three unrelated families whose affected members present with a complex and progressive neurological syndrome. All affected individuals presented with epileptic encephalopathy, severe neurodevelopmental delay (often with regression), and a hyperkinetic movement disorder. Additional neurological features included postnatal microcephaly and hypotonia. Five children died in childhood or adolescence (mean age of death: 9 years), mainly as a result of secondary respiratory complications. CACNA1B encodes the pore-forming subunit of the pre-synaptic neuronal voltage-gated calcium channel Cav2.2/N-type, crucial for SNARE-mediated neurotransmission, particularly in the early postnatal period. Bi-allelic loss-of-function variants in CACNA1B are predicted to cause disruption of Ca2+ influx, leading to impaired synaptic neurotransmission. The resultant effect on neuronal function is likely to be important in the development of involuntary movements and epilepsy. Overall, our findings provide further evidence for the key role of Cav2.2 in normal human neurodevelopment.
Collapse
|
10
|
Protein-coding variants implicate novel genes related to lipid homeostasis contributing to body-fat distribution. Nat Genet 2019; 51:452-469. [PMID: 30778226 PMCID: PMC6560635 DOI: 10.1038/s41588-018-0334-2] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Accepted: 12/17/2018] [Indexed: 02/02/2023]
Abstract
Body-fat distribution is a risk factor for adverse cardiovascular health consequences. We analyzed the association of body-fat distribution, assessed by waist-to-hip ratio adjusted for body mass index, with 228,985 predicted coding and splice site variants available on exome arrays in up to 344,369 individuals from five major ancestries (discovery) and 132,177 European-ancestry individuals (validation). We identified 15 common (minor allele frequency, MAF ≥5%) and nine low-frequency or rare (MAF <5%) coding novel variants. Pathway/gene set enrichment analyses identified lipid particle, adiponectin, abnormal white adipose tissue physiology and bone development and morphology as important contributors to fat distribution, while cross-trait associations highlight cardiometabolic traits. In functional follow-up analyses, specifically in Drosophila RNAi-knockdowns, we observed a significant increase in the total body triglyceride levels for two genes (DNAH10 and PLXND1). We implicate novel genes in fat distribution, stressing the importance of interrogating low-frequency and protein-coding variants.
Collapse
|
11
|
EPS4.02 Porphyromonas, a candidate biomarker for detection of Pseudomonas aeruginosa pulmonary infection in cystic fibrosis. J Cyst Fibros 2018. [DOI: 10.1016/s1569-1993(18)30254-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
12
|
Publisher Correction: Protein-altering variants associated with body mass index implicate pathways that control energy intake and expenditure in obesity. Nat Genet 2018; 50:766-767. [PMID: 29549330 DOI: 10.1038/s41588-018-0082-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
In the version of this article originally published, one of the two authors with the name Wei Zhao was omitted from the author list and the affiliations for both authors were assigned to the single Wei Zhao in the author list. In addition, the ORCID for Wei Zhao (Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA) was incorrectly assigned to author Wei Zhou. The errors have been corrected in the HTML and PDF versions of the article.
Collapse
|
13
|
Publisher Correction: Protein-altering variants associated with body mass index implicate pathways that control energy intake and expenditure in obesity. Nat Genet 2018; 50:765-766. [PMID: 29549329 DOI: 10.1038/s41588-018-0050-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
In the published version of this paper, the name of author Emanuele Di Angelantonio was misspelled. This error has now been corrected in the HTML and PDF versions of the article.
Collapse
|
14
|
Protein-altering variants associated with body mass index implicate pathways that control energy intake and expenditure in obesity. Nat Genet 2018; 50:26-41. [PMID: 29273807 PMCID: PMC5945951 DOI: 10.1038/s41588-017-0011-x] [Citation(s) in RCA: 220] [Impact Index Per Article: 36.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Accepted: 11/15/2017] [Indexed: 02/02/2023]
Abstract
Genome-wide association studies (GWAS) have identified >250 loci for body mass index (BMI), implicating pathways related to neuronal biology. Most GWAS loci represent clusters of common, noncoding variants from which pinpointing causal genes remains challenging. Here we combined data from 718,734 individuals to discover rare and low-frequency (minor allele frequency (MAF) < 5%) coding variants associated with BMI. We identified 14 coding variants in 13 genes, of which 8 variants were in genes (ZBTB7B, ACHE, RAPGEF3, RAB21, ZFHX3, ENTPD6, ZFR2 and ZNF169) newly implicated in human obesity, 2 variants were in genes (MC4R and KSR2) previously observed to be mutated in extreme obesity and 2 variants were in GIPR. The effect sizes of rare variants are ~10 times larger than those of common variants, with the largest effect observed in carriers of an MC4R mutation introducing a stop codon (p.Tyr35Ter, MAF = 0.01%), who weighed ~7 kg more than non-carriers. Pathway analyses based on the variants associated with BMI confirm enrichment of neuronal genes and provide new evidence for adipocyte and energy expenditure biology, widening the potential of genetically supported therapeutic targets in obesity.
Collapse
|
15
|
Genetic aetiology of glycaemic traits: approaches and insights. Hum Mol Genet 2017; 26:R172-R184. [PMID: 28977447 PMCID: PMC5886471 DOI: 10.1093/hmg/ddx293] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Revised: 07/18/2017] [Accepted: 07/21/2017] [Indexed: 12/17/2022] Open
Abstract
Glycaemic traits such as fasting and post-challenge glucose and insulin measures, as well as glycated haemoglobin (HbA1c), are used to diagnose and monitor diabetes. These traits are risk factors for cardiovascular disease even below the diabetic threshold, and their study can additionally yield insights into the pathophysiology of type 2 diabetes. To date, a diverse set of genetic approaches have led to the discovery of over 97 loci influencing glycaemic traits. In this review, we will focus on recent advances in the genetic aetiology of glycaemic traits, and the resulting biological insights. We will provide a brief overview of results ranging from common, to low- and rare-frequency variant-trait association studies, studies leveraging the diversity across populations, and studies harnessing the power of genetic and genomic approaches to gain insights into the biological underpinnings of these traits.
Collapse
|
16
|
Rare Variant Analysis of Human and Rodent Obesity Genes in Individuals with Severe Childhood Obesity. Sci Rep 2017; 7:4394. [PMID: 28663568 PMCID: PMC5491520 DOI: 10.1038/s41598-017-03054-8] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Accepted: 04/10/2017] [Indexed: 11/30/2022] Open
Abstract
Obesity is a genetically heterogeneous disorder. Using targeted and whole-exome sequencing, we studied 32 human and 87 rodent obesity genes in 2,548 severely obese children and 1,117 controls. We identified 52 variants contributing to obesity in 2% of cases including multiple novel variants in GNAS, which were sometimes found with accelerated growth rather than short stature as described previously. Nominally significant associations were found for rare functional variants in BBS1, BBS9, GNAS, MKKS, CLOCK and ANGPTL6. The p.S284X variant in ANGPTL6 drives the association signal (rs201622589, MAF~0.1%, odds ratio = 10.13, p-value = 0.042) and results in complete loss of secretion in cells. Further analysis including additional case-control studies and population controls (N = 260,642) did not support association of this variant with obesity (odds ratio = 2.34, p-value = 2.59 × 10-3), highlighting the challenges of testing rare variant associations and the need for very large sample sizes. Further validation in cohorts with severe obesity and engineering the variants in model organisms will be needed to explore whether human variants in ANGPTL6 and other genes that lead to obesity when deleted in mice, do contribute to obesity. Such studies may yield druggable targets for weight loss therapies.
Collapse
|
17
|
Rare and low-frequency coding variants alter human adult height. Nature 2017; 542:186-190. [PMID: 28146470 PMCID: PMC5302847 DOI: 10.1038/nature21039] [Citation(s) in RCA: 373] [Impact Index Per Article: 53.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2016] [Accepted: 12/04/2016] [Indexed: 02/07/2023]
Abstract
Height is a highly heritable, classic polygenic trait with approximately 700 common associated variants identified through genome-wide association studies so far. Here, we report 83 height-associated coding variants with lower minor-allele frequencies (in the range of 0.1-4.8%) and effects of up to 2 centimetres per allele (such as those in IHH, STC2, AR and CRISPLD2), greater than ten times the average effect of common variants. In functional follow-up studies, rare height-increasing alleles of STC2 (giving an increase of 1-2 centimetres per allele) compromised proteolytic inhibition of PAPP-A and increased cleavage of IGFBP-4 in vitro, resulting in higher bioavailability of insulin-like growth factors. These 83 height-associated variants overlap genes that are mutated in monogenic growth disorders and highlight new biological candidates (such as ADAMTS3, IL11RA and NOX4) and pathways (such as proteoglycan and glycosaminoglycan synthesis) involved in growth. Our results demonstrate that sufficiently large sample sizes can uncover rare and low-frequency variants of moderate-to-large effect associated with polygenic human phenotypes, and that these variants implicate relevant genes and pathways.
Collapse
|
18
|
A genomic approach to therapeutic target validation identifies a glucose-lowering GLP1R variant protective for coronary heart disease. Sci Transl Med 2016; 8:341ra76. [PMID: 27252175 PMCID: PMC5219001 DOI: 10.1126/scitranslmed.aad3744] [Citation(s) in RCA: 92] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2015] [Accepted: 05/10/2016] [Indexed: 02/06/2023]
Abstract
Regulatory authorities have indicated that new drugs to treat type 2 diabetes (T2D) should not be associated with an unacceptable increase in cardiovascular risk. Human genetics may be able to guide development of antidiabetic therapies by predicting cardiovascular and other health endpoints. We therefore investigated the association of variants in six genes that encode drug targets for obesity or T2D with a range of metabolic traits in up to 11,806 individuals by targeted exome sequencing and follow-up in 39,979 individuals by targeted genotyping, with additional in silico follow-up in consortia. We used these data to first compare associations of variants in genes encoding drug targets with the effects of pharmacological manipulation of those targets in clinical trials. We then tested the association of those variants with disease outcomes, including coronary heart disease, to predict cardiovascular safety of these agents. A low-frequency missense variant (Ala316Thr; rs10305492) in the gene encoding glucagon-like peptide-1 receptor (GLP1R), the target of GLP1R agonists, was associated with lower fasting glucose and T2D risk, consistent with GLP1R agonist therapies. The minor allele was also associated with protection against heart disease, thus providing evidence that GLP1R agonists are not likely to be associated with an unacceptable increase in cardiovascular risk. Our results provide an encouraging signal that these agents may be associated with benefit, a question currently being addressed in randomized controlled trials. Genetic variants associated with metabolic traits and multiple disease outcomes can be used to validate therapeutic targets at an early stage in the drug development process.
Collapse
|
19
|
Next generation modeling in GWAS: comparing different genetic architectures. Hum Genet 2014; 133:1235-53. [DOI: 10.1007/s00439-014-1461-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2014] [Accepted: 06/05/2014] [Indexed: 12/14/2022]
|
20
|
Whole genome prediction of bladder cancer risk with the Bayesian LASSO. Genet Epidemiol 2014; 38:467-76. [PMID: 24796258 DOI: 10.1002/gepi.21809] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2013] [Revised: 03/05/2014] [Accepted: 03/20/2014] [Indexed: 11/11/2022]
Abstract
To build a predictive model for urothelial carcinoma of the bladder (UCB) risk combining both genomic and nongenomic data, 1,127 cases and 1,090 controls from the Spanish Bladder Cancer/EPICURO study were genotyped using the HumanHap 1M SNP array. After quality control filters, genotypes from 475,290 variants were available. Nongenomic information comprised age, gender, region, and smoking status. Three Bayesian threshold models were implemented including: (1) only genomic information, (2) only nongenomic data, and (3) both sources of information. The three models were applied to the whole population, to only nonsmokers, to male smokers, and to extreme phenotypes to potentiate the UCB genetic component. The area under the ROC curve allowed evaluating the predictive ability of each model in a 10-fold cross-validation scenario. Smoking status showed the highest predictive ability of UCB risk (AUCtest = 0.62). On the other hand, the AUC of all genetic variants was poorer (0.53). When the extreme phenotype approach was applied, the predictive ability of the genomic model improved 15%. This study represents a first attempt to build a predictive model for UCB risk combining both genomic and nongenomic data and applying state-of-the-art statistical approaches. However, the lack of genetic relatedness among individuals, the complexity of UCB etiology, as well as a relatively small statistical power, may explain the low predictive ability for UCB risk. The study confirms the difficulty of predicting complex diseases using genetic data, and suggests the limited translational potential of findings from this type of data into public health interventions.
Collapse
|
21
|
Genome-wide CNV analysis replicates the association between GSTM1 deletion and bladder cancer: a support for using continuous measurement from SNP-array data. BMC Genomics 2012; 13:326. [PMID: 22817656 PMCID: PMC3425254 DOI: 10.1186/1471-2164-13-326] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2011] [Accepted: 07/20/2012] [Indexed: 12/15/2022] Open
Abstract
Background Structural variations such as copy number variants (CNV) influence the expression of different phenotypic traits. Algorithms to identify CNVs through SNP-array platforms are available. The ability to evaluate well-characterized CNVs such as GSTM1 (1p13.3) deletion provides an important opportunity to assess their performance. Results 773 cases and 759 controls from the SBC/EPICURO Study were genotyped in the GSTM1 region using TaqMan, Multiplex Ligation-dependent Probe Amplification (MLPA), and Illumina Infinium 1 M SNP-array platforms. CNV callings provided by TaqMan and MLPA were highly concordant and replicated the association between GSTM1 and bladder cancer. This was not the case when CNVs were called using Illumina 1 M data through available algorithms since no deletion was detected across the study samples. In contrast, when the Log R Ratio (LRR) was used as a continuous measure for the 5 probes contained in this locus, we were able to detect their association with bladder cancer using simple regression models or more sophisticated methods such as the ones implemented in the CNVtools package. Conclusions This study highlights an important limitation in the CNV calling from SNP-array data in regions of common aberrations and suggests that there may be added advantage for using LRR as a continuous measure in association tests rather than relying on calling algorithms.
Collapse
|
22
|
Assessment of copy number variation using the Illumina Infinium 1M SNP-array: a comparison of methodological approaches in the Spanish Bladder Cancer/EPICURO study. Hum Mutat 2011; 32:240-8. [PMID: 21089066 DOI: 10.1002/humu.21398] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2010] [Accepted: 10/13/2010] [Indexed: 12/13/2022]
Abstract
High-throughput single nucleotide polymorphism (SNP)-array technologies allow to investigate copy number variants (CNVs) in genome-wide scans and specific calling algorithms have been developed to determine CNV location and copy number. We report the results of a reliability analysis comparing data from 96 pairs of samples processed with CNVpartition, PennCNV, and QuantiSNP for Infinium Illumina Human 1Million probe chip data. We also performed a validity assessment with multiplex ligation-dependent probe amplification (MLPA) as a reference standard. The number of CNVs per individual varied according to the calling algorithm. Higher numbers of CNVs were detected in saliva than in blood DNA samples regardless of the algorithm used. All algorithms presented low agreement with mean Kappa Index (KI) <66. PennCNV was the most reliable algorithm (KI(w=) 98.96) when assessing the number of copies. The agreement observed in detecting CNV was higher in blood than in saliva samples. When comparing to MLPA, all algorithms identified poorly known copy aberrations (sensitivity = 0.19-0.28). In contrast, specificity was very high (0.97-0.99). Once a CNV was detected, the number of copies was truly assessed (sensitivity >0.62). Our results indicate that the current calling algorithms should be improved for high performance CNV analysis in genome-wide scans. Further refinement is required to assess CNVs as risk factors in complex diseases.
Collapse
|
23
|
Mosaic uniparental disomies and aneuploidies as large structural variants of the human genome. Am J Hum Genet 2010; 87:129-38. [PMID: 20598279 DOI: 10.1016/j.ajhg.2010.06.002] [Citation(s) in RCA: 89] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2010] [Revised: 05/28/2010] [Accepted: 06/08/2010] [Indexed: 10/19/2022] Open
Abstract
Mosaicism is defined as the coexistence of cells with different genetic composition within an individual, caused by postzygotic somatic mutation. Although somatic mosaicism for chromosomal abnormalities is a well-established cause of developmental and somatic disorders and has also been detected in different tissues, its frequency and extent in the adult normal population are still unknown. We provide here a genome-wide survey of mosaic genomic variation obtained by analyzing Illumina 1M SNP array data from blood or buccal DNA samples of 1991 adult individuals from the Spanish Bladder Cancer/EPICURO genome-wide association study. We found mosaic abnormalities in autosomes in 1.7% of samples, including 23 segmental uniparental disomies, 8 complete trisomies, and 11 large (1.5-37 Mb) copy-number variants. Alterations were observed across the different autosomes with recurrent events in chromosomes 9 and 20. No case-control differences were found in the frequency of events or the percentage of cells affected, thus indicating that most rearrangements found are not central to the development of bladder cancer. However, five out of six events tested were detected in both blood and bladder tissue from the same individual, indicating an early developmental origin. The high cellular frequency of the anomalies detected and their presence in normal adult individuals suggest that this type of mosaicism is a widespread phenomenon in the human genome. Somatic mosaicism should be considered in the expanding repertoire of inter- and intraindividual genetic variation, some of which may cause somatic human diseases but also contribute to modifying inherited disorders and/or late-onset multifactorial traits.
Collapse
|
24
|
Impaired performance of FDR-based strategies in whole-genome association studies when SNPs are excluded prior to the analysis. Genet Epidemiol 2009; 33:45-53. [PMID: 18618761 DOI: 10.1002/gepi.20355] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
With recent advances in genomewide microarray technologies, whole-genome association (WGA) studies have aimed at identifying susceptibility genes for complex human diseases using hundreds of thousands of single nucleotide polymorphisms (SNPs) genotyped at the same time. In this context and to take into account multiple testing, false discovery rate (FDR)-based strategies are now used frequently. However, a critical aspect of these strAtegies is that they are applied to a collection or a family of hypotheses and, thus, critically depend on these precise hypotheses. We investigated how modifying the family of hypotheses to be tested affected the performance of FDR-based procedures in WGA studies. We showed that FDR-based procedures performed more poorly when excluding SNPs with high prior probability of being associated. Results of simulation studies mimicking WGA studies according to three scenarios are reported, and show the extent to which SNPs elimination (family contraction) prior to the analysis impairs the performance of FDR-based procedures. To illustrate this situation, we used the data from a recent WGA study on type-1 diabetes (Clayton et al. [2005] Nat. Genet. 37:1243-1246) and report the results obtained when excluding or not SNPs located inside the human leukocyte antigen region. Based on our findings, excluding markers with high prior probability of being associated cannot be recommended for the analysis of WGA data with FDR-based strategies.
Collapse
|
25
|
Application d'un plan de sondage à deux degrés dans le cadre des études pharmacoépidémiologiques — représentativité de la population étudiée, représentativité de la réponse et redressement des résultats par les méthodes de calage généralisé. Rev Epidemiol Sante Publique 2007. [DOI: 10.1016/j.respe.2007.07.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
|