1
|
Li R, Li M, Zhao N. A Mixed-Effect Kernel Machine Regression Model for Integrative Analysis of Alpha Diversity in Microbiome Studies. Genet Epidemiol 2025; 49:e22596. [PMID: 39350346 DOI: 10.1002/gepi.22596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 08/22/2024] [Accepted: 09/05/2024] [Indexed: 12/20/2024]
Abstract
Increasing evidence suggests that human microbiota plays a crucial role in many diseases. Alpha diversity, a commonly used summary statistic that captures the richness and/or evenness of the microbial community, has been associated with many clinical conditions. However, individual studies that assess the association between alpha diversity and clinical conditions often provide inconsistent results due to insufficient sample size, heterogeneous study populations and technical variability. In practice, meta-analysis tools have been applied to integrate data from multiple studies. However, these methods do not consider the heterogeneity caused by sequencing protocols, and the contribution of each study to the final model depends mainly on its sample size (or variance estimate). To combine studies with distinct sequencing protocols, a robust statistical framework for integrative analysis of microbiome datasets is needed. Here, we propose a mixed-effect kernel machine regression model to assess the association of alpha diversity with a phenotype of interest. Our approach readily incorporates the study-specific characteristics (including sequencing protocols) to allow for flexible modeling of microbiome effect via a kernel similarity matrix. Within the proposed framework, we provide three hypothesis testing approaches to answer different questions that are of interest to researchers. We evaluate the model performance through extensive simulations based on two distinct data generation mechanisms. We also apply our framework to data from HIV reanalysis consortium to investigate gut dysbiosis in HIV infection.
Collapse
Affiliation(s)
- Runzhe Li
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA
| | - Mo Li
- Department of Mathematics, University of Louisiana at Lafayette, Lafayette, Louisiana, USA
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA
| |
Collapse
|
2
|
Delplanque J, Le Collen L, Loiselle H, Leloire A, Toussaint B, Vaillant E, Charpentier G, Franc S, Balkau B, Marre M, Henriques E, Buse Falay E, Derhourhi M, Froguel P, Bonnefond A. Monoallelic pathogenic variants in LEPR do not cause obesity. Am J Hum Genet 2024; 111:2668-2674. [PMID: 39561769 PMCID: PMC11639077 DOI: 10.1016/j.ajhg.2024.10.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Revised: 10/23/2024] [Accepted: 10/23/2024] [Indexed: 11/21/2024] Open
Abstract
Individuals with obesity caused by biallelic pathogenic LEPR (leptin receptor) variants can benefit from setmelanotide, the novel MC4R agonist. An ongoing phase 3 clinical trial (NCT05093634) includes individuals with obesity who carry a heterozygous LEPR variant, although the obesogenic impact of these variants remains incompletely evaluated. The aim of this study was to functionally assess heterozygous variants in LEPR and to evaluate their effect on obesity. We sequenced LEPR in ∼10,000 participants from the French RaDiO study. We found 86 rare heterozygous variants. Each identified variant was then investigated in vitro using luciferase and western blot assays. Using the criteria of the American College of Medical Genetics and Genomics (ACMG), including the strong criterion related to functional assays, we found 12 pathogenic LEPR variants. Most heterozygotes did not present with obesity, and we found no association between these pathogenic variants and body mass index (BMI). This lack of association between pathogenic LEPR variants and obesity risk or BMI was confirmed using exome data from 200,000 individuals in the UK Biobank. In the literature, among 55 reported heterozygotes for of a rare pathogenic LEPR variant, only 27% had obesity. In conclusion, monoallelic pathogenic LEPR variants were functionally tested, and they do not elevate the risk of obesity or BMI levels. This raises questions about the use of setmelanotide, a costly drug with potential side effects, based solely on the presence of a heterozygous LEPR variant.
Collapse
Affiliation(s)
- Jérôme Delplanque
- Inserm/CNRS UMR 1283/8199, Institut Pasteur de Lille, EGID, Lille University Hospital, Lille, France; University of Lille, Lille, France
| | - Lauriane Le Collen
- Inserm/CNRS UMR 1283/8199, Institut Pasteur de Lille, EGID, Lille University Hospital, Lille, France; University of Lille, Lille, France; Department of Molecular Medicine, Division of Biochemistry, Molecular Biology, Nutrition, Nancy University Hospital, Nancy, France; Department of Metabolism, Nancy University Hospital, Nancy, France
| | - Hélène Loiselle
- Inserm/CNRS UMR 1283/8199, Institut Pasteur de Lille, EGID, Lille University Hospital, Lille, France; University of Lille, Lille, France
| | - Audrey Leloire
- Inserm/CNRS UMR 1283/8199, Institut Pasteur de Lille, EGID, Lille University Hospital, Lille, France; University of Lille, Lille, France
| | - Bénédicte Toussaint
- Inserm/CNRS UMR 1283/8199, Institut Pasteur de Lille, EGID, Lille University Hospital, Lille, France; University of Lille, Lille, France
| | - Emmanuel Vaillant
- Inserm/CNRS UMR 1283/8199, Institut Pasteur de Lille, EGID, Lille University Hospital, Lille, France; University of Lille, Lille, France
| | - Guillaume Charpentier
- CERITD (Centre d'Étude et de Recherche pour l'Intensification du Traitement du Diabète), Evry, France
| | - Sylvia Franc
- CERITD (Centre d'Étude et de Recherche pour l'Intensification du Traitement du Diabète), Evry, France
| | - Beverley Balkau
- Paris-Saclay University, Paris-Sud University, University of Versailles Saint-Quentin-en-Yvelines, Center for Research in Epidemiology and Population Health, Inserm U1018 Clinical Epidemiology, Villejuif, France
| | - Michel Marre
- Necker-Enfants Malades Institute, Inserm, University of Paris, Paris, France; Ambroise Paré Clinic, Neuilly-sur-Seine, France
| | - Emma Henriques
- Inserm/CNRS UMR 1283/8199, Institut Pasteur de Lille, EGID, Lille University Hospital, Lille, France; University of Lille, Lille, France
| | - Emmanuel Buse Falay
- Inserm/CNRS UMR 1283/8199, Institut Pasteur de Lille, EGID, Lille University Hospital, Lille, France; University of Lille, Lille, France
| | - Mehdi Derhourhi
- Inserm/CNRS UMR 1283/8199, Institut Pasteur de Lille, EGID, Lille University Hospital, Lille, France; University of Lille, Lille, France
| | - Philippe Froguel
- Inserm/CNRS UMR 1283/8199, Institut Pasteur de Lille, EGID, Lille University Hospital, Lille, France; University of Lille, Lille, France; Department of Metabolism, Imperial College London, Hammersmith Hospital, London, UK.
| | - Amélie Bonnefond
- Inserm/CNRS UMR 1283/8199, Institut Pasteur de Lille, EGID, Lille University Hospital, Lille, France; University of Lille, Lille, France; Department of Metabolism, Imperial College London, Hammersmith Hospital, London, UK.
| |
Collapse
|
3
|
Maurin L, Marselli L, Boissel M, Ning L, Boutry R, Fernandes J, Suleiman M, De Luca C, Leloire A, Pascat V, Toussaint B, Amanzougarene S, Derhourhi M, Jörns A, Lenzen S, Pattou F, Kerr-Conte J, Canouil M, Marchetti P, Bonnefond A, Froguel P, Khamis A. PNLIPRP1 Hypermethylation in Exocrine Pancreas Links Type 2 Diabetes and Cholesterol Metabolism. Diabetes 2024; 73:1908-1918. [PMID: 39137110 DOI: 10.2337/db24-0215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 07/31/2024] [Indexed: 08/15/2024]
Abstract
We postulated that type 2 diabetes (T2D) predisposes patients to exocrine pancreatic diseases through (epi)genetic mechanisms. We explored the methylome (using MethylationEPIC arrays) of the exocrine pancreas in 141 donors, assessing the impact of T2D. An epigenome-wide association study of T2D identified hypermethylation in an enhancer of the pancreatic lipase-related protein 1 (PNLIPRP1) gene, associated with decreased PNLIPRP1 expression. PNLIPRP1 null variants (found in 191,000 participants in the UK Biobank) were associated with elevated glycemia and LDL cholesterol. Mendelian randomization using 2.5M SNP Omni arrays in 111 donors revealed that T2D was causal of PNLIPRP1 hypermethylation, which in turn was causal of LDL cholesterol. Additional AR42J rat exocrine cell analyses demonstrated that Pnliprp1 knockdown induced acinar-to-ductal metaplasia, a known prepancreatic cancer state, and increased cholesterol levels, reversible with statin. This (epi)genetic study suggests a role for PNLIPRP1 in human metabolism and exocrine pancreatic function, with potential implications for pancreatic diseases. ARTICLE HIGHLIGHTS
Collapse
Affiliation(s)
- Lucas Maurin
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- Lille University Hospital, University of Lille, Lille, France
| | - Lorella Marselli
- Department of Clinical and Experimental Medicine, University of Pisa, Pisa, Italy
| | - Mathilde Boissel
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- Lille University Hospital, University of Lille, Lille, France
| | - Lijiao Ning
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- Lille University Hospital, University of Lille, Lille, France
| | - Raphael Boutry
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- Lille University Hospital, University of Lille, Lille, France
| | - Justine Fernandes
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- Lille University Hospital, University of Lille, Lille, France
| | - Mara Suleiman
- Department of Clinical and Experimental Medicine, University of Pisa, Pisa, Italy
| | - Carmela De Luca
- Department of Clinical and Experimental Medicine, University of Pisa, Pisa, Italy
| | - Audrey Leloire
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- Lille University Hospital, University of Lille, Lille, France
| | - Vincent Pascat
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- Lille University Hospital, University of Lille, Lille, France
| | - Bénédicte Toussaint
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- Lille University Hospital, University of Lille, Lille, France
| | - Souhila Amanzougarene
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- Lille University Hospital, University of Lille, Lille, France
| | - Mehdi Derhourhi
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- Lille University Hospital, University of Lille, Lille, France
| | - Anne Jörns
- Institute of Clinical Biochemistry, Hannover Medical School, Hannover, Germany
| | - Sigurd Lenzen
- Institute of Clinical Biochemistry, Hannover Medical School, Hannover, Germany
| | - François Pattou
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
| | - Julie Kerr-Conte
- University of Lille, Inserm, Centre Hospitalier Universitaire Lille, Lille Pasteur Institute, U1190, EGID, Lille, France
| | - Mickaël Canouil
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
| | - Piero Marchetti
- Department of Clinical and Experimental Medicine, University of Pisa, Pisa, Italy
| | - Amélie Bonnefond
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- Lille University Hospital, University of Lille, Lille, France
- Section of Genomics of Common Disease, Department of Metabolism, Imperial College London, London, U.K
| | - Philippe Froguel
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- Lille University Hospital, University of Lille, Lille, France
- Section of Genomics of Common Disease, Department of Metabolism, Imperial College London, London, U.K
| | - Amna Khamis
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- Lille University Hospital, University of Lille, Lille, France
- Section of Genomics of Common Disease, Department of Metabolism, Imperial College London, London, U.K
| |
Collapse
|
4
|
de Abreu Nunes L, Hooper R, McGettigan P, Phillips R. Statistical methods leveraging the hierarchical structure of adverse events for signal detection in clinical trials: a scoping review of the methodological literature. BMC Med Res Methodol 2024; 24:253. [PMID: 39468481 PMCID: PMC11514772 DOI: 10.1186/s12874-024-02369-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 10/14/2024] [Indexed: 10/30/2024] Open
Abstract
BACKGROUND In randomised controlled trials with efficacy-related primary outcomes, adverse events are collected to monitor potential intervention harms. The analysis of adverse event data is challenging, due to the complex nature of the data and the large number of unprespecified outcomes. This is compounded by a lack of guidance on best analysis approaches, resulting in widespread inadequate practices and the use of overly simplistic methods; leading to sub-optimal exploitation of these rich datasets. To address the complexities of adverse events analysis, statistical methods are proposed that leverage existing structures within the data, for instance by considering groupings of adverse events based on biological or clinical relationships. METHODS We conducted a methodological scoping review of the literature to identify all existing methods using structures within the data to detect signals for adverse reactions in a trial. Embase, MEDLINE, Scopus and Web of Science databases were systematically searched. We reviewed the analysis approaches of each method, extracted methodological characteristics and constructed a narrative summary of the findings. RESULTS We identified 18 different methods from 14 sources. These were categorised as either Bayesian approaches (n=11), which flagged events based on posterior estimates of treatment effects, or error controlling procedures (n=7), which flagged events based on adjusted p-values while controlling for some type of error rate. We identified 5 defining methodological characteristics: the type of outcomes considered (e.g. binary outcomes), the nature of the data (e.g. summary data), the timing of the analysis (e.g. final analysis), the restrictions on the events considered (e.g. rare events) and the grouping systems used. CONCLUSIONS We found a large number of analysis methods that use the group structures of adverse events. Continuous methodological developments in this area highlight the growing awareness that better practices are needed. The use of more adequate analysis methods could help trialists obtain a better picture of the safety-risk profile of an intervention. The results of this review can be used by statisticians to better understand the current methodological landscape and identify suitable methods for data analysis - although further research is needed to determine which methods are best suited and create adequate recommendations.
Collapse
Affiliation(s)
| | - Richard Hooper
- Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | - Patricia McGettigan
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Rachel Phillips
- Imperial Clinical Trials Unit, School of Public Health, Imperial College London, London, UK
| |
Collapse
|
5
|
Clarke B, Holtkamp E, Öztürk H, Mück M, Wahlberg M, Meyer K, Munzlinger F, Brechtmann F, Hölzlwimmer FR, Lindner J, Chen Z, Gagneur J, Stegle O. Integration of variant annotations using deep set networks boosts rare variant association testing. Nat Genet 2024; 56:2271-2280. [PMID: 39322779 PMCID: PMC11525182 DOI: 10.1038/s41588-024-01919-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 08/20/2024] [Indexed: 09/27/2024]
Abstract
Rare genetic variants can have strong effects on phenotypes, yet accounting for rare variants in genetic analyses is statistically challenging due to the limited number of allele carriers and the burden of multiple testing. While rich variant annotations promise to enable well-powered rare variant association tests, methods integrating variant annotations in a data-driven manner are lacking. Here we propose deep rare variant association testing (DeepRVAT), a model based on set neural networks that learns a trait-agnostic gene impairment score from rare variant annotations and phenotypes, enabling both gene discovery and trait prediction. On 34 quantitative and 63 binary traits, using whole-exome-sequencing data from UK Biobank, we find that DeepRVAT yields substantial gains in gene discoveries and improved detection of individuals at high genetic risk. Finally, we demonstrate how DeepRVAT enables calibrated and computationally efficient rare variant tests at biobank scale, aiding the discovery of genetic risk factors for human disease traits.
Collapse
Affiliation(s)
- Brian Clarke
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
- AI Health Innovation Cluster, German Cancer Research Center (DKFZ), Heidelberg, Germany.
| | - Eva Holtkamp
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Helmholtz Association-Munich School for Data Science (MUDS), Munich, Germany
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany
| | - Hakime Öztürk
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Marcel Mück
- AI Health Innovation Cluster, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Magnus Wahlberg
- AI Health Innovation Cluster, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Kayla Meyer
- AI Health Innovation Cluster, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Felix Munzlinger
- AI Health Innovation Cluster, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Felix Brechtmann
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - Florian R Hölzlwimmer
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Jonas Lindner
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Zhifen Chen
- Department of Cardiology, Deutsches Herzzentrum München, Technical University Munich, Munich, Germany
- Deutsches Zentrum für Herz- und Kreislaufforschung (DZHK), Partner Site Munich Heart Alliance, Munich, Germany
| | - Julien Gagneur
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
- Munich Center for Machine Learning, Munich, Germany.
- Institute of Human Genetics, School of Medicine and Health, Technical University of Munich, Munich, Germany.
| | - Oliver Stegle
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK.
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.
| |
Collapse
|
6
|
Meulebrouck S, Merrheim J, Queniat G, Bourouh C, Derhourhi M, Boissel M, Yi X, Badreddine A, Boutry R, Leloire A, Toussaint B, Amanzougarene S, Vaillant E, Durand E, Loiselle H, Huyvaert M, Dechaume A, Scherrer V, Marchetti P, Balkau B, Charpentier G, Franc S, Marre M, Roussel R, Scharfmann R, Cnop M, Canouil M, Baron M, Froguel P, Bonnefond A. Functional genetics reveals the contribution of delta opioid receptor to type 2 diabetes and beta-cell function. Nat Commun 2024; 15:6627. [PMID: 39103322 PMCID: PMC11300616 DOI: 10.1038/s41467-024-51004-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 07/29/2024] [Indexed: 08/07/2024] Open
Abstract
Functional genetics has identified drug targets for metabolic disorders. Opioid use impacts metabolic homeostasis, although mechanisms remain elusive. Here, we explore the OPRD1 gene (encoding delta opioid receptor, DOP) to understand its impact on type 2 diabetes. Large-scale sequencing of OPRD1 and in vitro analysis reveal that loss-of-function variants are associated with higher adiposity and lower hyperglycemia risk, whereas gain-of-function variants are associated with lower adiposity and higher type 2 diabetes risk. These findings align with studies of opium addicts. OPRD1 is expressed in human islets and beta cells, with decreased expression under type 2 diabetes conditions. DOP inhibition by an antagonist enhances insulin secretion from human beta cells and islets. RNA-sequencing identifies pathways regulated by DOP antagonism, including nerve growth factor, circadian clock, and nuclear receptor pathways. Our study highlights DOP as a key player between opioids and metabolic homeostasis, suggesting its potential as a therapeutic target for type 2 diabetes.
Collapse
Grants
- This study was funded by the French National Research Agency (ANR-10-LABX-46 [European Genomics Institute for Diabetes] to PF and AB), the French National Research Agency (ANR-10-EQPX-07-01 [LIGAN-PM] to PF and AB), the European Research Council (ERC Reg-Seq – 715575 and ERC OpiO – 101043671, to AB), the EFSD New Targets for Diabetes or Obesity-related Metabolic Diseases Programme supported by an educational research grant from MSD (to AB) and the National Center for Precision Diabetic Medicine – PreciDIAB, which is jointly supported by the French National Agency for Research (ANR-18-IBHU-0001), by the European Union (FEDER), by the Hauts-de-France Regional Council and by the European Metropolis of Lille (MEL). The study was also supported by "France Génomique" consortium (ANR-10-INBS-009). XY was supported by the Fondation ULB and the China Scholarship Council. MCnop acknowledges support by the Walloon Region SPW-EER (Win2Wal project BetaSource), the Fonds National de la Recherche Scientifique (FRS-FNRS) and the Francophone Foundation for Diabetes Research (FFRD, that is sponsored by the French Diabetes Federation, Abbott, Eli Lilly, Merck Sharp & Dohme and Novo Nordisk).
Collapse
Affiliation(s)
- Sarah Meulebrouck
- Université de Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Judith Merrheim
- Université de Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Gurvan Queniat
- Université de Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Cyril Bourouh
- Université de Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Mehdi Derhourhi
- Université de Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Mathilde Boissel
- Université de Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Xiaoyan Yi
- ULB Center for Diabetes Research, Université Libre de Bruxelles, Brussels, Belgium
| | - Alaa Badreddine
- Université de Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Raphaël Boutry
- Université de Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Audrey Leloire
- Université de Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Bénédicte Toussaint
- Université de Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Souhila Amanzougarene
- Université de Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Emmanuel Vaillant
- Université de Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Emmanuelle Durand
- Université de Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Hélène Loiselle
- Université de Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Marlène Huyvaert
- Université de Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Aurélie Dechaume
- Université de Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Victoria Scherrer
- Université de Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Piero Marchetti
- Islet Cell Laboratory, Department of Clinical and Experimental Medicine, University of Pisa, Pisa, Italy
| | - Beverley Balkau
- Paris-Saclay University, Paris-Sud University, UVSQ, Center for Research in Epidemiology and Population Health, Inserm U1018 Clinical Epidemiology, Villejuif, France
| | - Guillaume Charpentier
- CERITD (Centre d'Étude et de Recherche pour l'Intensification du Traitement du Diabète), Evry, France
| | - Sylvia Franc
- CERITD (Centre d'Étude et de Recherche pour l'Intensification du Traitement du Diabète), Evry, France
- Department of Diabetes, Sud-Francilien Hospital, Paris-Sud University, Corbeil-Essonnes, France
| | - Michel Marre
- Institut Necker-Enfants Malades, Inserm, Université de Paris, Paris, France
- Clinique Ambroise Paré, Neuilly-sur-Seine, France
| | - Ronan Roussel
- Institut Necker-Enfants Malades, Inserm, Université de Paris, Paris, France
- Department of Diabetology Endocrinology Nutrition, Hôpital Bichat, DHU FIRE, Assistance Publique Hôpitaux de Paris, Paris, France
| | - Raphaël Scharfmann
- Institut Cochin, Inserm U1016, CNRS UMR8104, Université de Paris, Paris, France
| | - Miriam Cnop
- ULB Center for Diabetes Research, Université Libre de Bruxelles, Brussels, Belgium
- Division of Endocrinology, ULB Erasmus Hospital, Université Libre de Bruxelles, Brussels, Belgium
| | - Mickaël Canouil
- Université de Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Morgane Baron
- Université de Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Philippe Froguel
- Université de Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France.
- Department of Metabolism, Imperial College London, London, UK.
| | - Amélie Bonnefond
- Université de Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France.
- Department of Metabolism, Imperial College London, London, UK.
| |
Collapse
|
7
|
Liu M, Su YR, Liu Y, Hsu L, He Q. Structured testing of genetic association with mixed clinical outcomes. Genet Epidemiol 2024; 48:226-237. [PMID: 38606632 PMCID: PMC11470132 DOI: 10.1002/gepi.22560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 02/15/2024] [Accepted: 03/27/2024] [Indexed: 04/13/2024]
Abstract
Genetic factors play a fundamental role in disease development. Studying the genetic association with clinical outcomes is critical for understanding disease biology and devising novel treatment targets. However, the frequencies of genetic variations are often low, making it difficult to examine the variants one-by-one. Moreover, the clinical outcomes are complex, including patients' survival time and other binary or continuous outcomes such as recurrences and lymph node count, and how to effectively analyze genetic association with these outcomes remains unclear. In this article, we proposed a structured test statistic for testing genetic association with mixed types of survival, binary, and continuous outcomes. The structured testing incorporates known biological information of variants while allowing for their heterogeneous effects and is a powerful strategy for analyzing infrequent genetic factors. Simulation studies show that the proposed test statistic has correct type I error and is highly effective in detecting significant genetic variants. We applied our approach to a uterine corpus endometrial carcinoma study and identified several genetic pathways associated with the clinical outcomes.
Collapse
Affiliation(s)
- Meiling Liu
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - Yu-Ru Su
- Biostatistics Division, Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA
| | - Yang Liu
- Department of Mathematics and Statistics, Wright State University, Dayton, Ohio, USA
| | - Li Hsu
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - Qianchuan He
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| |
Collapse
|
8
|
Alfayyadh MM, Maksemous N, Sutherland HG, Lea RA, Griffiths LR. Unravelling the Genetic Landscape of Hemiplegic Migraine: Exploring Innovative Strategies and Emerging Approaches. Genes (Basel) 2024; 15:443. [PMID: 38674378 PMCID: PMC11049430 DOI: 10.3390/genes15040443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 03/25/2024] [Indexed: 04/28/2024] Open
Abstract
Migraine is a severe, debilitating neurovascular disorder. Hemiplegic migraine (HM) is a rare and debilitating neurological condition with a strong genetic basis. Sequencing technologies have improved the diagnosis and our understanding of the molecular pathophysiology of HM. Linkage analysis and sequencing studies in HM families have identified pathogenic variants in ion channels and related genes, including CACNA1A, ATP1A2, and SCN1A, that cause HM. However, approximately 75% of HM patients are negative for these mutations, indicating there are other genes involved in disease causation. In this review, we explored our current understanding of the genetics of HM. The evidence presented herein summarises the current knowledge of the genetics of HM, which can be expanded further to explain the remaining heritability of this debilitating condition. Innovative bioinformatics and computational strategies to cover the entire genetic spectrum of HM are also discussed in this review.
Collapse
Affiliation(s)
| | | | | | | | - Lyn R. Griffiths
- Centre for Genomics and Personalised Health, Genomics Research Centre, School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, QLD 4059, Australia; (M.M.A.); (N.M.); (H.G.S.); (R.A.L.)
| |
Collapse
|
9
|
Zhang S, Jiang Z, Zeng P. Incorporating genetic similarity of auxiliary samples into eGene identification under the transfer learning framework. J Transl Med 2024; 22:258. [PMID: 38461317 PMCID: PMC10924384 DOI: 10.1186/s12967-024-05053-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 03/01/2024] [Indexed: 03/11/2024] Open
Abstract
BACKGROUND The term eGene has been applied to define a gene whose expression level is affected by at least one independent expression quantitative trait locus (eQTL). It is both theoretically and empirically important to identify eQTLs and eGenes in genomic studies. However, standard eGene detection methods generally focus on individual cis-variants and cannot efficiently leverage useful knowledge acquired from auxiliary samples into target studies. METHODS We propose a multilocus-based eGene identification method called TLegene by integrating shared genetic similarity information available from auxiliary studies under the statistical framework of transfer learning. We apply TLegene to eGene identification in ten TCGA cancers which have an explicit relevant tissue in the GTEx project, and learn genetic effect of variant in TCGA from GTEx. We also adopt TLegene to the Geuvadis project to evaluate its usefulness in non-cancer studies. RESULTS We observed substantial genetic effect correlation of cis-variants between TCGA and GTEx for a larger number of genes. Furthermore, consistent with the results of our simulations, we found that TLegene was more powerful than existing methods and thus identified 169 distinct candidate eGenes, which was much larger than the approach that did not consider knowledge transfer across target and auxiliary studies. Previous studies and functional enrichment analyses provided empirical evidence supporting the associations of discovered eGenes, and it also showed evidence of allelic heterogeneity of gene expression. Furthermore, TLegene identified more eGenes in Geuvadis and revealed that these eGenes were mainly enriched in cells EBV transformed lymphocytes tissue. CONCLUSION Overall, TLegene represents a flexible and powerful statistical method for eGene identification through transfer learning of genetic similarity shared across auxiliary and target studies.
Collapse
Affiliation(s)
- Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Xuzhou Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Jiangsu Engineering Research Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
10
|
Folon L, Baron M, Scherrer V, Toussaint B, Vaillant E, Loiselle H, Dechaume A, De Pooter F, Boutry R, Boissel M, Diallo A, Ning L, Balkau B, Charpentier G, Franc S, Marre M, Derhourhi M, Froguel P, Bonnefond A. Pathogenic, Total Loss-of-Function DYRK1B Variants Cause Monogenic Obesity Associated With Type 2 Diabetes. Diabetes Care 2024; 47:444-451. [PMID: 38170957 DOI: 10.2337/dc23-1851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 12/11/2023] [Indexed: 01/05/2024]
Abstract
OBJECTIVE Rare variants in DYRK1B have been described in some patients with central obesity, type 2 diabetes, and early-onset coronary disease. Owing to the limited number of conducted studies, the broader impact of DYRK1B variants on a larger scale has yet to be investigated. RESEARCH DESIGN AND METHODS DYRK1B was sequenced in 9,353 participants from a case-control study for obesity and type 2 diabetes. Each DYRK1B variant was functionally assessed in vitro. Variant pathogenicity was determined using criteria from the American College of Medical Genetics and Genomics (ACMG). The effect of pathogenic or likely pathogenic (P/LP) variants on metabolic traits was assessed using adjusted mixed-effects score tests. RESULTS Sixty-five rare, heterozygous DYRK1B variants were identified and were not associated with obesity or type 2 diabetes. Following functional analyses, 20 P/LP variants were pinpointed, including 6 variants that exhibited a fully inhibitory effect (P/LP-null) on DYRK1B activity. P/LP and P/LP-null DYRK1B variants were associated with increased BMI and obesity risk; however, the impact was notably more pronounced for the P/LP-null variants (effect of 8.0 ± 3.2 and odds ratio of 7.9 [95% CI 1.2-155]). Furthermore, P/LP-null variants were associated with higher fasting glucose and type 2 diabetes risk (effect of 2.9 ± 1.0 and odds ratio of 4.8 [95% CI 0.85-37]), while P/LP variants had no effect on glucose homeostasis. CONCLUSIONS P/LP, total loss-of-function DYRK1B variants cause monogenic obesity associated with type 2 diabetes. This study underscores the significance of conducting functional assessments in order to accurately ascertain the tangible effects of P/LP DYRK1B variants.
Collapse
Affiliation(s)
- Lise Folon
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Université de Lille, Lille, France
| | - Morgane Baron
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Université de Lille, Lille, France
| | - Victoria Scherrer
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Université de Lille, Lille, France
| | - Bénédicte Toussaint
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Université de Lille, Lille, France
| | - Emmanuel Vaillant
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Université de Lille, Lille, France
| | - Hélène Loiselle
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Université de Lille, Lille, France
| | - Aurélie Dechaume
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Université de Lille, Lille, France
| | - Frédérique De Pooter
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Université de Lille, Lille, France
| | - Raphaël Boutry
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Université de Lille, Lille, France
| | - Mathilde Boissel
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Université de Lille, Lille, France
| | - Aboubacar Diallo
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Université de Lille, Lille, France
| | - Lijiao Ning
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Université de Lille, Lille, France
| | - Beverley Balkau
- Paris-Saclay University, Paris-Sud University, UVSQ, Center for Research in Epidemiology and Population Health, Inserm U1018 Clinical Epidemiology, Villejuif, France
| | - Guillaume Charpentier
- CERITD (Centre d'Étude et de Recherche pour l'Intensification du Traitement du Diabète), Evry, France
| | - Sylvia Franc
- CERITD (Centre d'Étude et de Recherche pour l'Intensification du Traitement du Diabète), Evry, France
- Department of Diabetes, Sud-Francilien Hospital, Paris-Sud University, Corbeil-Essonnes, France
| | - Michel Marre
- Institut Necker-Enfants Malades, INSERM, Université de Paris, Paris, France
- Clinique Ambroise Paré, Neuilly-sur-Seine, France
| | - Mehdi Derhourhi
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Université de Lille, Lille, France
| | - Philippe Froguel
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Université de Lille, Lille, France
- Department of Metabolism, Digestion and Reproduction, Imperial College London, London, U.K
| | - Amélie Bonnefond
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Université de Lille, Lille, France
- Department of Metabolism, Digestion and Reproduction, Imperial College London, London, U.K
| |
Collapse
|
11
|
Cao C, Shao M, Zuo C, Kwok D, Liu L, Ge Y, Zhang Z, Cui F, Chen M, Fan R, Ding Y, Jiang H, Wang G, Zou Q. RAVAR: a curated repository for rare variant-trait associations. Nucleic Acids Res 2024; 52:D990-D997. [PMID: 37831073 PMCID: PMC10767942 DOI: 10.1093/nar/gkad876] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/20/2023] [Accepted: 09/28/2023] [Indexed: 10/14/2023] Open
Abstract
Rare variants contribute significantly to the genetic causes of complex traits, as they can have much larger effects than common variants and account for much of the missing heritability in genome-wide association studies. The emergence of UK Biobank scale datasets and accurate gene-level rare variant-trait association testing methods have dramatically increased the number of rare variant associations that have been detected. However, no systematic collection of these associations has been carried out to date, especially at the gene level. To address the issue, we present the Rare Variant Association Repository (RAVAR), a comprehensive collection of rare variant associations. RAVAR includes 95 047 high-quality rare variant associations (76186 gene-level and 18 861 variant-level associations) for 4429 reported traits which are manually curated from 245 publications. RAVAR is the first resource to collect and curate published rare variant associations in an interactive web interface with integrated visualization, search, and download features. Detailed gene and SNP information are provided for each association, and users can conveniently search for related studies by exploring the EFO tree structure and interactive Manhattan plots. RAVAR could vastly improve the accessibility of rare variant studies. RAVAR is freely available for all users without login requirement at http://www.ravar.bio.
Collapse
Affiliation(s)
- Chen Cao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Mengting Shao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Chunman Zuo
- Institute of Artificial Intelligence, Donghua University, Shanghai, China
| | - Devin Kwok
- School of Computer Science, McGill University, Montreal, Canada
| | - Lin Liu
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Yuli Ge
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Zilong Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Feifei Cui
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Mingshuai Chen
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Rui Fan
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Hangjin Jiang
- Center for Data Science, Zhejiang University, Hangzhou, China
| | - Guishen Wang
- College of Computer Science and Engineering, Changchun University of Technology, Changchun, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|
12
|
Chen H, Naseri A, Zhi D. FiMAP: A fast identity-by-descent mapping test for biobank-scale cohorts. PLoS Genet 2023; 19:e1011057. [PMID: 38039339 PMCID: PMC10718418 DOI: 10.1371/journal.pgen.1011057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 12/13/2023] [Accepted: 11/07/2023] [Indexed: 12/03/2023] Open
Abstract
Although genome-wide association studies (GWAS) have identified tens of thousands of genetic loci, the genetic architecture is still not fully understood for many complex traits. Most GWAS and sequencing association studies have focused on single nucleotide polymorphisms or copy number variations, including common and rare genetic variants. However, phased haplotype information is often ignored in GWAS or variant set tests for rare variants. Here we leverage the identity-by-descent (IBD) segments inferred from a random projection-based IBD detection algorithm in the mapping of genetic associations with complex traits, to develop a computationally efficient statistical test for IBD mapping in biobank-scale cohorts. We used sparse linear algebra and random matrix algorithms to speed up the computation, and a genome-wide IBD mapping scan of more than 400,000 samples finished within a few hours. Simulation studies showed that our new method had well-controlled type I error rates under the null hypothesis of no genetic association in large biobank-scale cohorts, and outperformed traditional GWAS single-variant tests when the causal variants were untyped and rare, or in the presence of haplotype effects. We also applied our method to IBD mapping of six anthropometric traits using the UK Biobank data and identified a total of 3,442 associations, 2,131 (62%) of which remained significant after conditioning on suggestive tag variants in the ± 3 centimorgan flanking regions from GWAS.
Collapse
Affiliation(s)
- Han Chen
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Ardalan Naseri
- Center for Artificial Intelligence and Genome Informatics, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Degui Zhi
- Center for Artificial Intelligence and Genome Informatics, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| |
Collapse
|
13
|
Mahmood K, Thomas M, Qu C, Hsu L, Buchanan DD, Peters U. Elucidating the Risk of Colorectal Cancer for Variants in Hereditary Colorectal Cancer Genes. Gastroenterology 2023; 165:1070-1076.e3. [PMID: 37453563 PMCID: PMC10866455 DOI: 10.1053/j.gastro.2023.06.032] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 06/07/2023] [Accepted: 06/27/2023] [Indexed: 07/18/2023]
Affiliation(s)
- Khalid Mahmood
- Colorectal Oncogenomics Group, Department of Clinical Pathology, The University of Melbourne, Parkville, Victoria, Australia; University of Melbourne Center for Cancer Research, Victorian Comprehensive Cancer Center, Parkville, Victoria, Australia; Melbourne Bioinformatics, The University of Melbourne, Parkville, Victoria, Australia
| | - Minta Thomas
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington
| | - Conghui Qu
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington
| | - Li Hsu
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington; Department of Biostatistics, University of Washington, Seattle, Washington.
| | - Daniel D Buchanan
- Colorectal Oncogenomics Group, Department of Clinical Pathology, The University of Melbourne, Parkville, Victoria, Australia; University of Melbourne Center for Cancer Research, Victorian Comprehensive Cancer Center, Parkville, Victoria, Australia; Genomic Medicine and Family Cancer Clinic, The Royal Melbourne Hospital, Parkville, Victoria, Australia.
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington; Department of Epidemiology, University of Washington, Seattle, Washington.
| |
Collapse
|
14
|
Gao W, Liu L, Huh E, Gbahou F, Cecon E, Oshima M, Houzé L, Katsonis P, Hegron A, Fan Z, Hou G, Charpentier G, Boissel M, Derhourhi M, Marre M, Balkau B, Froguel P, Scharfmann R, Lichtarge O, Dam J, Bonnefond A, Liu J, Jockers R. Human GLP1R variants affecting GLP1R cell surface expression are associated with impaired glucose control and increased adiposity. Nat Metab 2023; 5:1673-1684. [PMID: 37709961 PMCID: PMC11610247 DOI: 10.1038/s42255-023-00889-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 08/09/2023] [Indexed: 09/16/2023]
Abstract
The glucagon-like peptide 1 receptor (GLP1R) is a major drug target with several agonists being prescribed in individuals with type 2 diabetes and obesity1,2. The impact of genetic variability of GLP1R on receptor function and its association with metabolic traits are unclear with conflicting reports. Here, we show an unexpected diversity of phenotypes ranging from defective cell surface expression to complete or pathway-specific gain of function (GoF) and loss of function (LoF), after performing a functional profiling of 60 GLP1R variants across four signalling pathways. The defective insulin secretion of GLP1R LoF variants is rescued by allosteric GLP1R ligands or high concentrations of exendin-4/semaglutide in INS-1 823/3 cells. Genetic association studies in 200,000 participants from the UK Biobank show that impaired GLP1R cell surface expression contributes to poor glucose control and increased adiposity with increased glycated haemoglobin A1c and body mass index. This study defines impaired GLP1R cell surface expression as a risk factor for traits associated with type 2 diabetes and obesity and provides potential treatment options for GLP1R LoF variant carriers.
Collapse
Affiliation(s)
- Wenwen Gao
- Université Paris Cité, Institut Cochin, INSERM, CNRS, Paris, France
- State Key Laboratory for Diagnosis and Treatment of Severe Zoonotic Infectious Diseases, Key Laboratory for Zoonosis Research of the Ministry of Education, Institute of Zoonosis, and College of Veterinary Medicine, Jilin University, Changchun, China
| | - Lei Liu
- Université Paris Cité, Institut Cochin, INSERM, CNRS, Paris, France
- Cellular Signaling Laboratory, International Research Center for Sensory Biology and Technology of MOST, Key Laboratory of Molecular Biophysics of Ministry of Education, School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Eunna Huh
- Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, TX, USA
| | - Florence Gbahou
- Université Paris Cité, Institut Cochin, INSERM, CNRS, Paris, France
| | - Erika Cecon
- Université Paris Cité, Institut Cochin, INSERM, CNRS, Paris, France
| | - Masaya Oshima
- Université Paris Cité, Institut Cochin, INSERM, CNRS, Paris, France
| | - Ludivine Houzé
- Université Paris Cité, Institut Cochin, INSERM, CNRS, Paris, France
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Alan Hegron
- Université Paris Cité, Institut Cochin, INSERM, CNRS, Paris, France
- Institute for Research in Immunology and Cancer, University of Montreal, Montreal, Quebec, Canada
- Department of Biochemistry and Molecular Medicine, University of Montreal, Montreal, Quebec, Canada
| | - Zhiran Fan
- Cellular Signaling Laboratory, International Research Center for Sensory Biology and Technology of MOST, Key Laboratory of Molecular Biophysics of Ministry of Education, School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Guofei Hou
- Cellular Signaling Laboratory, International Research Center for Sensory Biology and Technology of MOST, Key Laboratory of Molecular Biophysics of Ministry of Education, School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Guillaume Charpentier
- CERITD (Centre d'Étude et de Recherche pour l'Intensification du Traitement du Diabète), Evry, France
| | - Mathilde Boissel
- University of Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Mehdi Derhourhi
- University of Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Michel Marre
- Institut Necker-Enfants Malades, INSERM, Université Paris Cité, Paris, France
- Clinique Ambroise Paré, Neuilly-sur-Seine, France
| | - Beverley Balkau
- Inserm U1018, Center for Research in Epidemiology and Population Health, Villejuif, France
- University Paris-Saclay, University Paris-Sud, Villejuif, France
| | - Philippe Froguel
- University of Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Department of Metabolism, Imperial College London, London, UK
| | | | - Olivier Lichtarge
- Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Julie Dam
- Université Paris Cité, Institut Cochin, INSERM, CNRS, Paris, France
| | - Amélie Bonnefond
- University of Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Department of Metabolism, Imperial College London, London, UK
| | - Jianfeng Liu
- Cellular Signaling Laboratory, International Research Center for Sensory Biology and Technology of MOST, Key Laboratory of Molecular Biophysics of Ministry of Education, School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China.
| | - Ralf Jockers
- Université Paris Cité, Institut Cochin, INSERM, CNRS, Paris, France.
| |
Collapse
|
15
|
Liang X, Sun H. Weighted Selection Probability to Prioritize Susceptible Rare Variants in Multi-Phenotype Association Studies with Application to a Soybean Genetic Data Set. J Comput Biol 2023; 30:1075-1088. [PMID: 37871292 DOI: 10.1089/cmb.2022.0487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2023] Open
Abstract
Rare variant association studies with multiple traits or diseases have drawn a lot of attention since association signals of rare variants can be boosted if more than one phenotype outcome is associated with the same rare variants. Most of the existing statistical methods to identify rare variants associated with multiple phenotypes are based on a group test, where a pre-specified genetic region is tested one at a time. However, these methods are not designed to locate susceptible rare variants within the genetic region. In this article, we propose new statistical methods to prioritize rare variants within a genetic region when a group test for the genetic region identifies a statistical association with multiple phenotypes. It computes the weighted selection probability (WSP) of individual rare variants and ranks them from largest to smallest according to their WSP. In simulation studies, we demonstrated that the proposed method outperforms other statistical methods in terms of true positive selection, when multiple phenotypes are correlated with each other. We also applied it to our soybean single nucleotide polymorphism (SNP) data with 13 highly correlated amino acids, where we identified some potentially susceptible rare variants in chromosome 19.
Collapse
Affiliation(s)
- Xianglong Liang
- Department of Statistic, Pusan National University, Busan, Korea
| | - Hokeun Sun
- Department of Statistic, Pusan National University, Busan, Korea
| |
Collapse
|
16
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Rare variant association on unrelated individuals in case-control studies using aggregation tests: existing methods and current limitations. Brief Bioinform 2023; 24:bbad412. [PMID: 37974506 DOI: 10.1093/bib/bbad412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 10/14/2023] [Accepted: 10/28/2023] [Indexed: 11/19/2023] Open
Abstract
Over the past years, progress made in next-generation sequencing technologies and bioinformatics have sparked a surge in association studies. Especially, genome-wide association studies (GWASs) have demonstrated their effectiveness in identifying disease associations with common genetic variants. Yet, rare variants can contribute to additional disease risk or trait heterogeneity. Because GWASs are underpowered for detecting association with such variants, numerous statistical methods have been recently proposed. Aggregation tests collapse multiple rare variants within a genetic region (e.g. gene, gene set, genomic loci) to test for association. An increasing number of studies using such methods successfully identified trait-associated rare variants and led to a better understanding of the underlying disease mechanism. In this review, we compare existing aggregation tests, their statistical features and scope of application, splitting them into the five classical classes: burden, adaptive burden, variance-component, omnibus and other. Finally, we describe some limitations of current aggregation tests, highlighting potential direction for further investigations.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- WELBIO department, WEL Research Institute, avenue Pasteur, 6, 1300 Wavre, Belgium
| |
Collapse
|
17
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Excalibur: A new ensemble method based on an optimal combination of aggregation tests for rare-variant association testing for sequencing data. PLoS Comput Biol 2023; 19:e1011488. [PMID: 37708232 PMCID: PMC10522036 DOI: 10.1371/journal.pcbi.1011488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 09/26/2023] [Accepted: 09/04/2023] [Indexed: 09/16/2023] Open
Abstract
The development of high-throughput next-generation sequencing technologies and large-scale genetic association studies produced numerous advances in the biostatistics field. Various aggregation tests, i.e. statistical methods that analyze associations of a trait with multiple markers within a genomic region, have produced a variety of novel discoveries. Notwithstanding their usefulness, there is no single test that fits all needs, each suffering from specific drawbacks. Selecting the right aggregation test, while considering an unknown underlying genetic model of the disease, remains an important challenge. Here we propose a new ensemble method, called Excalibur, based on an optimal combination of 36 aggregation tests created after an in-depth study of the limitations of each test and their impact on the quality of result. Our findings demonstrate the ability of our method to control type I error and illustrate that it offers the best average power across all scenarios. The proposed method allows for novel advances in Whole Exome/Genome sequencing association studies, able to handle a wide range of association models, providing researchers with an optimal aggregation analysis for the genetic regions of interest.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- WELBIO department, WEL Research Institute, Wavre, Belgium
| |
Collapse
|
18
|
Saeed S, Ning L, Badreddine A, Mirza MU, Boissel M, Khanam R, Manzoor J, Janjua QM, Khan WI, Toussaint B, Vaillant E, Amanzougarene S, Derhourhi M, Trant JF, Siegert AM, Lam BYH, Yeo GSH, Chabraoui L, Touzani A, Kulkarni A, Farooqi IS, Bonnefond A, Arslan M, Froguel P. Biallelic Mutations in P4HTM Cause Syndromic Obesity. Diabetes 2023; 72:1228-1234. [PMID: 37083980 DOI: 10.2337/db22-1017] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 03/30/2023] [Indexed: 04/22/2023]
Abstract
We previously demonstrated that 50% of children with obesity from consanguineous families from Pakistan carry pathogenic variants in known monogenic obesity genes. Here, we have discovered a novel monogenetic recessive form of severe childhood obesity using an in-house computational staged approach. The analysis included whole-exome sequencing data of 366 children with severe obesity, 1,000 individuals of the Pakistan Risk of Myocardial Infarction Study (PROMIS) study, and 200,000 participants of the UK Biobank to prioritize genes harboring rare homozygous variants with putative effect on human obesity. We identified five rare or novel homozygous missense mutations predicted deleterious in five consanguineous families in P4HTM encoding prolyl 4-hydroxylase transmembrane (P4H-TM). We further found two additional homozygous missense mutations in children with severe obesity of Indian and Moroccan origin. Molecular dynamics simulation suggested that these mutations destabilized the active conformation of the substrate binding domain. Most carriers also presented with hypotonia, cognitive impairment, and/or developmental delay. Three of the five probands died of pneumonia during the first 2 years of the follow-up. P4HTM deficiency is a novel form of syndromic obesity, affecting 1.5% of our children with obesity associated with high mortality. P4H-TM is a hypoxia-inducible factor that is necessary for survival and adaptation under oxygen deprivation, but the role of this pathway in energy homeostasis and obesity pathophysiology remains to be elucidated.
Collapse
Affiliation(s)
- Sadia Saeed
- Department of Metabolism, Digestion and Reproduction, Imperial College London, London, U.K
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- University of Lille, Lille University Hospital, Lille, France
| | - Lijiao Ning
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- University of Lille, Lille University Hospital, Lille, France
| | - Alaa Badreddine
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- University of Lille, Lille University Hospital, Lille, France
| | - Muhammad Usman Mirza
- Department of Chemistry and Biochemistry, University of Windsor, Windsor, Ontario, Canada
| | - Mathilde Boissel
- Department of Metabolism, Digestion and Reproduction, Imperial College London, London, U.K
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- University of Lille, Lille University Hospital, Lille, France
| | - Roohia Khanam
- School of Life Sciences, Forman Christian College, Lahore, Pakistan
| | - Jaida Manzoor
- Department of Paediatric Endocrinology, Children's Hospital, Lahore, Pakistan
| | - Qasim M Janjua
- Department of Physiology and Biophysics, National University of Science and Technology, Sohar, Oman
| | - Waqas I Khan
- The Children Hospital and the Institute of Child Health, Multan, Pakistan
| | - Bénédicte Toussaint
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- University of Lille, Lille University Hospital, Lille, France
| | - Emmanuel Vaillant
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- University of Lille, Lille University Hospital, Lille, France
| | - Souhila Amanzougarene
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- University of Lille, Lille University Hospital, Lille, France
| | - Mehdi Derhourhi
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- University of Lille, Lille University Hospital, Lille, France
| | - John F Trant
- Department of Chemistry and Biochemistry, University of Windsor, Windsor, Ontario, Canada
| | - Anna-Maria Siegert
- Medical Research Council Metabolic Diseases Unit, Wellcome-MRC Institute of Metabolic Science, Metabolic Research Laboratories, University of Cambridge, Cambridge, U.K
| | - Brian Y H Lam
- Medical Research Council Metabolic Diseases Unit, Wellcome-MRC Institute of Metabolic Science, Metabolic Research Laboratories, University of Cambridge, Cambridge, U.K
| | - Giles S H Yeo
- Medical Research Council Metabolic Diseases Unit, Wellcome-MRC Institute of Metabolic Science, Metabolic Research Laboratories, University of Cambridge, Cambridge, U.K
| | - Layachi Chabraoui
- Laboratory of Biochemistry and Molecular Biology, Faculty of Medicine and Pharmacy, Mohammed V University, Rabat, Morocco
| | - Asmae Touzani
- Children's Hospital of Rabat and Laboratory of Biochemistry and Molecular Biology, Faculty of Medicine and Pharmacy, Mohammed V University, Rabat, Morocco
| | - Abhishek Kulkarni
- Department of Paediatric Endocrinology, Sir H. N. Reliance Foundation, SRCC Children's Hospital, Mumbai, India
| | - I Sadaf Farooqi
- Medical Research Council Metabolic Diseases Unit, Wellcome-MRC Institute of Metabolic Science, Metabolic Research Laboratories, University of Cambridge, Cambridge, U.K
| | - Amélie Bonnefond
- Department of Metabolism, Digestion and Reproduction, Imperial College London, London, U.K
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- University of Lille, Lille University Hospital, Lille, France
| | - Muhammad Arslan
- School of Life Sciences, Forman Christian College, Lahore, Pakistan
| | - Philippe Froguel
- Department of Metabolism, Digestion and Reproduction, Imperial College London, London, U.K
- INSERM UMR 1283, CNRS UMR 8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille, France
- University of Lille, Lille University Hospital, Lille, France
| |
Collapse
|
19
|
Le Collen L, Delemer B, Poitou C, Vaxillaire M, Toussaint B, Dechaume A, Badreddine A, Boissel M, Derhourhi M, Clément K, Petit JM, Mau-Them FT, Bruel AL, Thauvin-Robinet C, Saveanu A, Cherifi BG, Le Beyec-Le Bihan J, Froguel P, Bonnefond A. Heterozygous pathogenic variants in POMC are not responsible for monogenic obesity: Implication for MC4R agonist use. Genet Med 2023; 25:100857. [PMID: 37092539 DOI: 10.1016/j.gim.2023.100857] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 04/15/2023] [Accepted: 04/16/2023] [Indexed: 04/25/2023] Open
Abstract
PURPOSE Recessive deficiency of proopiomelanocortin (POMC) causes childhood-onset severe obesity. Cases can now benefit from the melanocortin 4 receptor agonist setmelanotide. Furthermore, a phase 3 clinical trial is evaluating setmelanotide in heterozygotes for POMC. We performed a large-scale genetic analysis to assess the effect of heterozygous, pathogenic POMC variants on obesity. METHODS A genetic analysis was performed in a family including 2 cousins with childhood-onset obesity. We analyzed the obesity status of heterozygotes for pathogenic POMC variants in the Human Gene Mutation Database. The association between heterozygous pathogenic POMC variants and obesity risk was assessed using 190,000 exome samples from UK Biobank. RESULTS The 2 cousins carried a compound heterozygous pathogenic variant in POMC. Six siblings were heterozygotes; only 1 of them had obesity. In Human Gene Mutation Database, we identified 60 heterozygotes for pathogenic POMC variants, of whom 14 had obesity. In UK Biobank, heterozygous pathogenic POMC variants were not associated with obesity risk, but they modestly increased body mass index levels. CONCLUSION Heterozygous pathogenic POMC variants do not contribute to monogenic obesity, but they slightly increase body mass index. Setmelanotide use in patients with obesity, which would only be based on the presence of a heterozygous POMC variant, can be questioned.
Collapse
Affiliation(s)
- Lauriane Le Collen
- Inserm/CNRS UMR 1283/8199, Pasteur Institute of Lille, EGID, Lille, France; Department of Endocrinology Diabetology, University Hospital Center of Reims, Reims, France; Department of Clinical Genetic, University Hospital Center of Reims, Reims, France; University of Lille, Lille, France.
| | - Brigitte Delemer
- Department of Endocrinology Diabetology, University Hospital Center of Reims, Reims, France
| | - Christine Poitou
- Assistance Publique Hôpitaux de Paris, Nutrition Department, Pitié-Salpêtrière Hospital, Paris, France; Sorbonne Université, INSERM, Nutrition and Obesities: Systemic Approaches Research Unit (NutriOmics), Paris, France
| | - Martine Vaxillaire
- Inserm/CNRS UMR 1283/8199, Pasteur Institute of Lille, EGID, Lille, France; University of Lille, Lille, France
| | - Bénédicte Toussaint
- Inserm/CNRS UMR 1283/8199, Pasteur Institute of Lille, EGID, Lille, France; University of Lille, Lille, France
| | - Aurélie Dechaume
- Inserm/CNRS UMR 1283/8199, Pasteur Institute of Lille, EGID, Lille, France; University of Lille, Lille, France
| | - Alaa Badreddine
- Inserm/CNRS UMR 1283/8199, Pasteur Institute of Lille, EGID, Lille, France; University of Lille, Lille, France
| | - Mathilde Boissel
- Inserm/CNRS UMR 1283/8199, Pasteur Institute of Lille, EGID, Lille, France; University of Lille, Lille, France
| | - Mehdi Derhourhi
- Inserm/CNRS UMR 1283/8199, Pasteur Institute of Lille, EGID, Lille, France; University of Lille, Lille, France
| | - Karine Clément
- Assistance Publique Hôpitaux de Paris, Nutrition Department, Pitié-Salpêtrière Hospital, Paris, France; Sorbonne Université, INSERM, Nutrition and Obesities: Systemic Approaches Research Unit (NutriOmics), Paris, France
| | - Jean M Petit
- Department of Endocrinology Diabetology, University Hospital Central of F. Mitterrand Dijon-Bourgogne, Dijon, France
| | - Frédéric Tran Mau-Them
- Unité Fonctionnelle Innovation en Diagnostic Génomique des maladies rares, CHU Dijon Bourgogne, Dijon, France; INSERM UMR1231 GAD, Dijon, France
| | - Ange-Line Bruel
- Unité Fonctionnelle Innovation en Diagnostic Génomique des maladies rares, CHU Dijon Bourgogne, Dijon, France; INSERM UMR1231 GAD, Dijon, France
| | - Christel Thauvin-Robinet
- Unité Fonctionnelle Innovation en Diagnostic Génomique des maladies rares, CHU Dijon Bourgogne, Dijon, France; INSERM UMR1231 GAD, Dijon, France; Centre de Référence Maladies Rares "Anomalies du développement et syndromes malformatifs," Centre de Génétique, FHU TRANSLAD et Institut GIMI, CHU Dijon Bourgogne, Dijon, France
| | - Alexandru Saveanu
- Aix-Marseille University, Institut National de la Santé et de la Recherche Médicale (INSERM), U1251, Marseille Medical Genetics (MMG), Marseille, France; Assistance Publique Hôpitaux de Marseille, Reference Center for Rare Pituitary Diseases HYPO, Marseille, France; Assistance-Publique des Hôpitaux de Marseille, Laboratory of Molecular Biology, Conception Hospital, Marseille, France
| | - Blandine Gatta Cherifi
- CHU Bordeaux, Endocrinology, Diabetology & Nutrition, Bordeaux, France; University of Bordeaux, Bordeaux, France; INSERMU1215 Neurocentre Magendie, University of Bordeaux, Bordeaux, France
| | - Johanne Le Beyec-Le Bihan
- Assistance Publique Hôpitaux de Paris, Endocrine and Oncological Biochemistry Department, Pitié-Salpêtrière Hospital, Sorbonne University, Paris, France; INSERM U1149, Centre de recherche sur l'inflammation, Paris, France
| | - Philippe Froguel
- Inserm/CNRS UMR 1283/8199, Pasteur Institute of Lille, EGID, Lille, France; University of Lille, Lille, France; Department of Metabolism, Imperial College London, London, United Kingdom
| | - Amélie Bonnefond
- Inserm/CNRS UMR 1283/8199, Pasteur Institute of Lille, EGID, Lille, France; University of Lille, Lille, France; Department of Metabolism, Imperial College London, London, United Kingdom.
| |
Collapse
|
20
|
Zhou Z, Ku HC, Manning SE, Zhang M, Xing C. A Varying Coefficient Model to Jointly Test Genetic and Gene-Environment Interaction Effects. Behav Genet 2023; 53:374-382. [PMID: 36622576 PMCID: PMC10277225 DOI: 10.1007/s10519-022-10131-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Accepted: 12/18/2022] [Indexed: 01/10/2023]
Abstract
Most human traits are influenced by the interplay between genetic and environmental factors. Many statistical methods have been proposed to screen for gene-environment interaction (GxE) in the post genome-wide association study era. However, most of the existing methods assume a linear interaction between genetic and environmental factors toward phenotypic variations, which diminishes statistical power in the case of nonlinear GxE. In this paper, we present a flexible statistical procedure to detect GxE regardless of whether the underlying relationship is linear or not. By modeling the joint genetic and GxE effects as a varying-coefficient function of the environmental factor, the proposed model is able to capture dynamic trajectories of GxE. We employ a likelihood ratio test with a fast Monte Carlo algorithm for hypothesis testing. Simulations were conducted to evaluate validity and power of the proposed model in various settings. Real data analysis was performed to illustrate its power, in particular, in the case of nonlinear GxE.
Collapse
Affiliation(s)
- Zhengyang Zhou
- Department of Biostatistics and Epidemiology, University of North Texas Health Science Center, Fort Worth, TX, USA.
| | - Hung-Chih Ku
- Department of Mathematical Sciences, DePaul University, Chicago, IL, USA
| | - Sydney E Manning
- Department of Pharmacotherapy, University of North Texas Health Science Center, Fort Worth, TX, USA
| | - Ming Zhang
- Department of Statistical Science, Southern Methodist University, Dallas, TX, USA
| | - Chao Xing
- McDermott Center for Human Growth and Development and Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
21
|
Lu H, Zhang S, Jiang Z, Zeng P. Leveraging trans-ethnic genetic risk scores to improve association power for complex traits in underrepresented populations. Brief Bioinform 2023:bbad232. [PMID: 37332016 DOI: 10.1093/bib/bbad232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 05/06/2023] [Accepted: 06/04/2023] [Indexed: 06/20/2023] Open
Abstract
Trans-ethnic genome-wide association studies have revealed that many loci identified in European populations can be reproducible in non-European populations, indicating widespread trans-ethnic genetic similarity. However, how to leverage such shared information more efficiently in association analysis is less investigated for traits in underrepresented populations. We here propose a statistical framework, trans-ethnic genetic risk score informed gene-based association mixed model (GAMM), by hierarchically modeling single-nucleotide polymorphism effects in the target population as a function of effects of the same trait in well-studied populations. GAMM powerfully integrates genetic similarity across distinct ancestral groups to enhance power in understudied populations, as confirmed by extensive simulations. We illustrate the usefulness of GAMM via the application to 13 blood cell traits (i.e. basophil count, eosinophil count, hematocrit, hemoglobin concentration, lymphocyte count, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, mean corpuscular volume, monocyte count, neutrophil count, platelet count, red blood cell count and total white blood cell count) in Africans of the UK Biobank (n = 3204) while utilizing genetic overlap shared in Europeans (n = 746 667) and East Asians (n = 162 255). We discovered multiple new associated genes, which had otherwise been missed by existing methods, and revealed that the trans-ethnic information indirectly contributed much to the phenotypic variance. Overall, GAMM represents a flexible and powerful statistical framework of association analysis for complex traits in underrepresented populations by integrating trans-ethnic genetic similarity across well-studied populations, and helps attenuate health inequities in current genetics research for people of minority populations.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| |
Collapse
|
22
|
Wang N, Yu B, Jun G, Qi Q, Durazo-Arvizu RA, Lindstrom S, Morrison AC, Kaplan RC, Boerwinkle E, Chen H. StocSum: stochastic summary statistics for whole genome sequencing studies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.06.535886. [PMID: 37066281 PMCID: PMC10104122 DOI: 10.1101/2023.04.06.535886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Genomic summary statistics, usually defined as single-variant test results from genome-wide association studies, have been widely used to advance the genetics field in a wide range of applications. Applications that involve multiple genetic variants also require their correlations or linkage disequilibrium (LD) information, often obtained from an external reference panel. In practice, it is usually difficult to find suitable external reference panels that represent the LD structure for underrepresented and admixed populations, or rare genetic variants from whole genome sequencing (WGS) studies, limiting the scope of applications for genomic summary statistics. Here we introduce StocSum, a novel reference-panel-free statistical framework for generating, managing, and analyzing stochastic summary statistics using random vectors. We develop various downstream applications using StocSum including single-variant tests, conditional association tests, gene-environment interaction tests, variant set tests, as well as meta-analysis and LD score regression tools. We demonstrate the accuracy and computational efficiency of StocSum using two cohorts from the Trans-Omics for Precision Medicine Program. StocSum will facilitate sharing and utilization of genomic summary statistics from WGS studies, especially for underrepresented and admixed populations.
Collapse
Affiliation(s)
- Nannan Wang
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Bing Yu
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Goo Jun
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Qibin Qi
- Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Ramon A. Durazo-Arvizu
- The Saban Research Institute, Children’s Hospital Los Angeles, Los Angeles, California
- Department of Pediatrics, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Sara Lindstrom
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Epidemiology, School of Public Health, University of Washington, 3980 15th Ave NE, Seattle, WA, USA
| | - Alanna C. Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Robert C. Kaplan
- Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
23
|
Knutson KA, Pan W. MATS: a novel multi-ancestry transcriptome-wide association study to account for heterogeneity in the effects of cis-regulated gene expression on complex traits. Hum Mol Genet 2023; 32:1237-1251. [PMID: 36179104 PMCID: PMC10077507 DOI: 10.1093/hmg/ddac247] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 09/16/2022] [Accepted: 09/28/2022] [Indexed: 01/16/2023] Open
Abstract
The Transcriptome-Wide Association Study (TWAS) is a widely used approach which integrates gene expression and Genome Wide Association Study (GWAS) data to study the role of cis-regulated gene expression (GEx) in complex traits. However, the genetic architecture of GEx varies across populations, and recent findings point to possible ancestral heterogeneity in the effects of GEx on complex traits, which may be amplified in TWAS by modeling GEx as a function of cis-eQTLs. Here, we present a novel extension to TWAS to account for heterogeneity in the effects of cis-regulated GEx which are correlated with ancestry. Our proposed Multi-Ancestry TwaS (MATS) framework jointly analyzes samples from multiple populations and distinguishes between shared, ancestry-specific and/or subject-specific expression-trait associations. As such, MATS amplifies power to detect shared GEx associations over ancestry-stratified TWAS through increased sample sizes, and facilitates the detection of genes with subgroup-specific associations which may be masked by standard TWAS. Our simulations highlight the improved Type-I error conservation and power of MATS compared with competing approaches. Our real data applications to Alzheimer's disease (AD) case-control genotypes from the Alzheimer's Disease Sequencing Project (ADSP) and continuous phenotypes from the UK Biobank (UKBB) identify a number of unique gene-trait associations which were not discovered through standard and/or ancestry-stratified TWAS. Ultimately, these findings promote MATS as a powerful method for detecting and estimating significant gene expression effects on complex traits within multi-ancestry cohorts and corroborates the mounting evidence for inter-population heterogeneity in gene-trait associations.
Collapse
Affiliation(s)
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
24
|
Folon L, Baron M, Toussaint B, Vaillant E, Boissel M, Scherrer V, Loiselle H, Leloire A, Badreddine A, Balkau B, Charpentier G, Franc S, Marre M, Aboulouard S, Salzet M, Canouil M, Derhourhi M, Froguel P, Bonnefond A. Contribution of heterozygous PCSK1 variants to obesity and implications for precision medicine: a case-control study. Lancet Diabetes Endocrinol 2023; 11:182-190. [PMID: 36822744 DOI: 10.1016/s2213-8587(22)00392-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 12/16/2022] [Accepted: 12/19/2022] [Indexed: 02/24/2023]
Abstract
BACKGROUND Rare biallelic pathogenic mutations in PCSK1 (encoding proprotein convertase subtilisin/kexin type 1 [PC1/3]) cause early-onset obesity associated with various endocrinopathies. Setmelanotide has been approved for carriers of these biallelic mutations in the past 3 years. We aimed to perform a large-scale functional genomic study focusing on rare heterozygous variants of PCSK1 to decipher their putative impact on obesity risk. METHODS This case-control study included all participants with overweight and obesity (ie, cases) or healthy weight (ie, controls) from the RaDiO study of three community-based and one hospital-based cohort in France recruited between Jan 1, 1995, and Dec 31, 2000. In adults older than 18 years, healthy weight was defined as BMI of less than 25·0 kg/m2, overweight as 25·0-29·9 kg/m2, and obesity as 30·0 kg/m2 or higher. Participants with type 2 diabetes had fasting glucose of 7·0 mmol/L or higher or used treatment for hyperglycaemia (or both) and were negative for islet or insulin autoantibodies. Functional assessment of rare missense variants of PCSK1 was performed. Pathogenicity clusters of variants were determined with machine learning. The effect of each cluster of PCSK1 variants on obesity was assessed using the adjusted mixed-effects score test. FINDINGS All 13 coding exons of PCSK1 were sequenced in 9320 participants (including 7260 adults and 2060 children and adolescents) recruited from the RaDiO study. We detected 65 rare heterozygous PCSK1 variants, including four null variants and 61 missense variants that were analysed in vitro and clustered into five groups (A-E), according to enzymatic activity. Compared with the wild-type, 15 missense variants led to complete PC1/3 loss of function (group A; reference) and rare exome variant ensemble learner (REVEL) led to 15 (25%) false positives and four (7%) false negatives. Carrying complete loss-of-function or null PCSK1 variants was significantly associated with obesity (six [86%] of seven carriers vs 1518 [35%] of 4395 non-carriers; OR 9·3 [95% CI 1·5-177·4]; p=0·014) and higher BMI (32·0 kg/m2 [SD 9·3] in carriers vs 27·3 kg/m2 [6·5] in non-carriers; mean effect π 6·94 [SE 1·95]; p=0·00029). Clusters of PCSK1 variants with partial or neutral effect on PC1/3 activity did not have an effect on obesity or overweight and on BMI. INTERPRETATION Only carriers of heterozygous, null, or complete loss-of-function PCSK1 variants cause monogenic obesity and, therefore, might be eligible for setmelanotide. In silico tests were unable to accurately detect these variants, which suggests that in vitro assays are necessary to determine the variant pathogenicity for genetic diagnosis and precision medicine purposes. FUNDING Agence Nationale de la Recherche, European Research Council, National Center for Precision Diabetic Medicine, European Regional Development Fund, Hauts-de-France Regional Council, and the European Metropolis of Lille.
Collapse
Affiliation(s)
- Lise Folon
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille University Hospital, Lille, France; Université de Lille, Lille, France
| | - Morgane Baron
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille University Hospital, Lille, France; Université de Lille, Lille, France
| | - Bénédicte Toussaint
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille University Hospital, Lille, France; Université de Lille, Lille, France
| | - Emmanuel Vaillant
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille University Hospital, Lille, France; Université de Lille, Lille, France
| | - Mathilde Boissel
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille University Hospital, Lille, France; Université de Lille, Lille, France
| | - Victoria Scherrer
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille University Hospital, Lille, France; Université de Lille, Lille, France
| | - Hélène Loiselle
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille University Hospital, Lille, France; Université de Lille, Lille, France
| | - Audrey Leloire
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille University Hospital, Lille, France; Université de Lille, Lille, France
| | - Alaa Badreddine
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille University Hospital, Lille, France; Université de Lille, Lille, France
| | - Beverley Balkau
- Paris-Saclay University, Paris-Sud University, Université de Versailles Saint-Quentin-en-Yvelines, Center for Research in Epidemiology and Population Health, Inserm U1018 Clinical Epidemiology, Villejuif, France
| | - Guillaume Charpentier
- Centre d'Étude et de Recherche pour l'Intensification du Traitement du Diabète, Evry, France
| | - Sylvia Franc
- Centre d'Étude et de Recherche pour l'Intensification du Traitement du Diabète, Evry, France; Department of Diabetes, Sud-Francilien Hospital, Paris-Sud University, Corbeil-Essonnes, France
| | - Michel Marre
- Institut Necker-Enfants Malades, INSERM, Université de Paris, Paris, France; Clinique Ambroise Paré, Neuilly-sur-Seine, France
| | - Soulaimane Aboulouard
- Université de Lille, Lille, France; Inserm U1192, Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse, Lille, France
| | - Michel Salzet
- Université de Lille, Lille, France; Inserm U1192, Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse, Lille, France
| | - Mickaël Canouil
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille University Hospital, Lille, France; Université de Lille, Lille, France
| | - Mehdi Derhourhi
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille University Hospital, Lille, France; Université de Lille, Lille, France
| | - Philippe Froguel
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille University Hospital, Lille, France; Université de Lille, Lille, France; Department of Metabolism, Imperial College London, London, UK
| | - Amélie Bonnefond
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes, Institut Pasteur de Lille, Lille University Hospital, Lille, France; Université de Lille, Lille, France; Department of Metabolism, Imperial College London, London, UK.
| |
Collapse
|
25
|
Woodward AA, Urbanowicz RJ, Naj AC, Moore JH. Genetic heterogeneity: Challenges, impacts, and methods through an associative lens. Genet Epidemiol 2022; 46:555-571. [PMID: 35924480 PMCID: PMC9669229 DOI: 10.1002/gepi.22497] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 07/06/2022] [Accepted: 07/19/2022] [Indexed: 01/07/2023]
Abstract
Genetic heterogeneity describes the occurrence of the same or similar phenotypes through different genetic mechanisms in different individuals. Robustly characterizing and accounting for genetic heterogeneity is crucial to pursuing the goals of precision medicine, for discovering novel disease biomarkers, and for identifying targets for treatments. Failure to account for genetic heterogeneity may lead to missed associations and incorrect inferences. Thus, it is critical to review the impact of genetic heterogeneity on the design and analysis of population level genetic studies, aspects that are often overlooked in the literature. In this review, we first contextualize our approach to genetic heterogeneity by proposing a high-level categorization of heterogeneity into "feature," "outcome," and "associative" heterogeneity, drawing on perspectives from epidemiology and machine learning to illustrate distinctions between them. We highlight the unique nature of genetic heterogeneity as a heterogeneous pattern of association that warrants specific methodological considerations. We then focus on the challenges that preclude effective detection and characterization of genetic heterogeneity across a variety of epidemiological contexts. Finally, we discuss systems heterogeneity as an integrated approach to using genetic and other high-dimensional multi-omic data in complex disease research.
Collapse
Affiliation(s)
- Alexa A. Woodward
- Department of Biostatistics, Epidemiology and InformaticsUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Ryan J. Urbanowicz
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCaliforniaUSA
| | - Adam C. Naj
- Department of Biostatistics, Epidemiology and InformaticsUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Jason H. Moore
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCaliforniaUSA
| |
Collapse
|
26
|
Astiazaran-Symonds E, Graham C, Kim J, Tucker MA, Ingvar C, Helgadottir H, Pastorino L, van Doorn R, Sampson JN, Zhu B, Bruno W, Queirolo P, Fornarini G, Sciallero S, Carter B, Hicks B, Hutchinson A, Jones K, Stewart DR, Chanock SJ, Freedman ND, Landi MT, Höiom V, Puig S, Gruis N, Yang XR, Ghiorzo P, Goldstein AM. Gene-Level Associations in Patients With and Without Pathogenic Germline Variants in CDKN2A and Pancreatic Cancer. JCO Precis Oncol 2022; 6:e2200145. [PMID: 36409970 PMCID: PMC10166474 DOI: 10.1200/po.22.00145] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 06/28/2022] [Accepted: 10/03/2022] [Indexed: 11/22/2022] Open
Abstract
PURPOSE Pancreatic ductal adenocarcinoma (PDAC) is a component of familial melanoma due to germline pathogenic variants (GPVs) in CDKN2A. However, it is unclear what role this gene or other genes play in its etiology. MATERIALS AND METHODS We analyzed 189 cancer predisposition genes using parametric rare-variant association (RVA) tests and nonparametric permutation tests to identify gene-level associations in PDAC for patients with (CDKN2A+) and without (CDKN2A-) GPV. Exome sequencing was performed on 84 patients with PDAC, 47 CDKN2A+ and 37 CDKN2A-. After variant filtering, various RVA tests and permutation tests were run separately by CDKN2A status. Genes with the strongest nominal associations were evaluated in patients with PDAC from The Cancer Genome Atlas and the UK Biobank (UKB). A secondary analysis including only GPV from UKB was also performed. RESULTS In RVA tests, ERCC4 and RET showed the most compelling evidence as plausible PDAC candidate genes for CDKN2A+ patients. In contrast, the findings in CDKN2A- patients provided evidence for HMBS, EPCAM, and MRE11 as potential new candidate genes and confirmed ATM, BRCA2, and PALB2 as PDAC genes, consistent with findings in The Cancer Genome Atlas and the UKB. As expected, CDKN2A- patients were more likely to harbor GPVs from the 189 genes investigated. When including only GPVs from UKB, significant associations with PDAC were seen for ATM, BRCA2, and CDKN2A. CONCLUSION These results suggest that variants in other genes likely play a role in PDAC in all patients and that PDAC in CDKN2A+ patients has a distinct etiology from PDAC in CDKN2A- patients.
Collapse
Affiliation(s)
- Esteban Astiazaran-Symonds
- Division of Cancer Epidemiology and Genetics, NCI, NIH, Bethesda, MD
- National Human Genome Research Institute, NIH, Bethesda, MD
- Department of Medicine, College of Medicine-Tucson, University of Arizona, Tucson, AZ
| | - Cole Graham
- Division of Cancer Epidemiology and Genetics, NCI, NIH, Bethesda, MD
| | - Jung Kim
- Division of Cancer Epidemiology and Genetics, NCI, NIH, Bethesda, MD
| | | | | | - Hildur Helgadottir
- Department of Oncology Pathology, Karolinska Institutet and Karolinska University Hospital, Stockholm, Sweden
| | - Lorenza Pastorino
- Genetics of Rare Cancers, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- Department of Internal Medicine and Medical Specialties, University of Genoa, Genoa, Italy
| | - Remco van Doorn
- Department of Dermatology, Leiden University Medical Center, Leiden, the Netherlands
| | - Joshua N. Sampson
- Division of Cancer Epidemiology and Genetics, NCI, NIH, Bethesda, MD
| | - Bin Zhu
- Division of Cancer Epidemiology and Genetics, NCI, NIH, Bethesda, MD
- Cancer Genomics Research Laboratory, Leidos Biomedical Research Inc, Frederick National Laboratory for Cancer Research, Frederick, MD
| | - William Bruno
- Genetics of Rare Cancers, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- Department of Internal Medicine and Medical Specialties, University of Genoa, Genoa, Italy
| | - Paola Queirolo
- Melanoma Sarcoma and Rare Tumors, IEO European Institute of Oncology, Milano, Italy
| | - Giuseppe Fornarini
- Medical Oncology Unit 1, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | - Stefania Sciallero
- Medical Oncology Unit 1, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | | | - Belynda Hicks
- Division of Cancer Epidemiology and Genetics, NCI, NIH, Bethesda, MD
- Cancer Genomics Research Laboratory, Leidos Biomedical Research Inc, Frederick National Laboratory for Cancer Research, Frederick, MD
| | - Amy Hutchinson
- Division of Cancer Epidemiology and Genetics, NCI, NIH, Bethesda, MD
- Cancer Genomics Research Laboratory, Leidos Biomedical Research Inc, Frederick National Laboratory for Cancer Research, Frederick, MD
| | - Kristine Jones
- Division of Cancer Epidemiology and Genetics, NCI, NIH, Bethesda, MD
- Cancer Genomics Research Laboratory, Leidos Biomedical Research Inc, Frederick National Laboratory for Cancer Research, Frederick, MD
| | | | | | - Neal D. Freedman
- Division of Cancer Epidemiology and Genetics, NCI, NIH, Bethesda, MD
| | | | - Veronica Höiom
- Department of Oncology Pathology, Karolinska Institutet and Karolinska University Hospital, Stockholm, Sweden
| | - Susana Puig
- Melanoma Unit, Hospital Clínic de Barcelona, IDIBAPS, Universitat de Barcelona and CIBERER, Barcelona, Spain
| | - Nelleke Gruis
- Department of Dermatology, Leiden University Medical Center, Leiden, the Netherlands
| | - Xiaohong R. Yang
- Division of Cancer Epidemiology and Genetics, NCI, NIH, Bethesda, MD
| | - Paola Ghiorzo
- Genetics of Rare Cancers, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- Department of Internal Medicine and Medical Specialties, University of Genoa, Genoa, Italy
| | | |
Collapse
|
27
|
Shao Z, Wang T, Qiao J, Zhang Y, Huang S, Zeng P. A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies. BMC Bioinformatics 2022; 23:359. [PMID: 36042399 PMCID: PMC9429742 DOI: 10.1186/s12859-022-04897-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 08/22/2022] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Multilocus analysis on a set of single nucleotide polymorphisms (SNPs) pre-assigned within a gene constitutes a valuable complement to single-marker analysis by aggregating data on complex traits in a biologically meaningful way. However, despite the existence of a wide variety of SNP-set methods, few comprehensive comparison studies have been previously performed to evaluate the effectiveness of these methods. RESULTS We herein sought to fill this knowledge gap by conducting a comprehensive empirical comparison for 22 commonly-used summary-statistics based SNP-set methods. We showed that only seven methods could effectively control the type I error, and that these well-calibrated approaches had varying power performance under the simulation scenarios. Overall, we confirmed that the burden test was generally underpowered and score-based variance component tests (e.g., sequence kernel association test) were much powerful under the polygenic genetic architecture in both common and rare variant association analyses. We further revealed that two linkage-disequilibrium-free P value combination methods (e.g., harmonic mean P value method and aggregated Cauchy association test) behaved very well under the sparse genetic architecture in simulations and real-data applications to common and rare variant association analyses as well as in expression quantitative trait loci weighted integrative analysis. We also assessed the scalability of these approaches by recording computational time and found that all these methods can be scalable to biobank-scale data although some might be relatively slow. CONCLUSION In conclusion, we hope that our findings can offer an important guidance on how to choose appropriate multilocus association analysis methods in post-GWAS era. All the SNP-set methods are implemented in the R package called MCA, which is freely available at https://github.com/biostatpzeng/ .
Collapse
Affiliation(s)
- Zhonghe Shao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yuchen Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
28
|
Li B, Jin B, Capra JA, Bush WS. Integration of Protein Structure and Population-Scale DNA Sequence Data for Disease Gene Discovery and Variant Interpretation. Annu Rev Biomed Data Sci 2022; 5:141-161. [PMID: 35508071 DOI: 10.1146/annurev-biodatasci-122220-112147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The experimental and computational techniques for capturing information about protein structures and genetic variation within the human genome have advanced dramatically in the past 20 years, generating extensive new data resources. In this review, we discuss these advances, along with new approaches for determining the impact a genetic variant has on protein function. We focus on the potential of new methods that integrate human genetic variation into protein structures to discover relationships to disease, including the discovery of mutational hotspots in cancer-related proteins, the localization of protein-altering variants within protein regions for common complex diseases, and the assessment of variants of unknown significance for Mendelian traits. We expect that approaches that integrate these data sources will play increasingly important roles in disease gene discovery and variant interpretation. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Bian Li
- Department of Biological Sciences and Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, USA
| | - Bowen Jin
- Graduate Program in Systems Biology and Bioinformatics, Department of Nutrition, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | - John A Capra
- Bakar Computational Health Sciences Institute and Department of Epidemiology and Biostatistics, University of California, San Francisco, California, USA;
| | - William S Bush
- Cleveland Institute for Computational Biology, Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio, USA;
| |
Collapse
|
29
|
Barfield R, Huyghe JR, Lemire M, Dong X, Su YR, Brezina S, Buchanan DD, Figueiredo JC, Gallinger S, Giannakis M, Gsur A, Gunter MJ, Hampel H, Harrison TA, Hopper JL, Hudson TJ, Li CI, Moreno V, Newcomb PA, Pai RK, Pharoah PDP, Phipps AI, Qu C, Steinfelder RS, Sun W, Win AK, Zaidi SH, Campbell PT, Peters U, Hsu L. Genetic Regulation of DNA Methylation Yields Novel Discoveries in GWAS of Colorectal Cancer. Cancer Epidemiol Biomarkers Prev 2022; 31:1068-1076. [PMID: 35247911 PMCID: PMC9081265 DOI: 10.1158/1055-9965.epi-21-0724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 10/05/2021] [Accepted: 02/23/2022] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Colorectal cancer has a strong epigenetic component that is accompanied by frequent DNA methylation (DNAm) alterations in addition to heritable genetic risk. It is of interest to understand the interrelationship of germline genetics, DNAm, and colorectal cancer risk. METHODS We performed a genome-wide methylation quantitative trait locus (meQTL) analysis in 1,355 people, assessing the pairwise associations between genetic variants and lymphocytes methylation data. In addition, we used penalized regression with cis-genetic variants ± 1 Mb of methylation to identify genome-wide heritable DNAm. We evaluated the association of genetically predicted methylation with colorectal cancer risk based on genome-wide association studies (GWAS) of over 125,000 cases and controls using the multivariate sMiST as well as univariately via examination of marginal association with colorectal cancer risk. RESULTS Of the 142 known colorectal cancer GWAS loci, 47 were identified as meQTLs. We identified four novel colorectal cancer-associated loci (NID2, ATXN10, KLHDC10, and CEP41) that reside over 1 Mb outside of known colorectal cancer loci and 10 secondary signals within 1 Mb of known loci. CONCLUSIONS Leveraging information of DNAm regulation into genetic association of colorectal cancer risk reveals novel pathways in colorectal cancer tumorigenesis. Our summary statistics-based framework sMiST provides a powerful approach by combining information from the effect through methylation and residual direct effects of the meQTLs on disease risk. Further validation and functional follow-up of these novel pathways are needed. IMPACT Using genotype, DNAm, and GWAS, we identified four new colorectal cancer risk loci. We studied the landscape of genetic regulation of DNAm via single-SNP and multi-SNP meQTL analyses.
Collapse
Affiliation(s)
- Richard Barfield
- Department of Biostatistics and Bioinformatics, Duke University, Durham NC USA
| | - Jeroen R Huyghe
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Mathieu Lemire
- Neurosciences & Mental Health Program, Hospital for Sick Children, Toronto, ON, Canada
| | - Xinyuan Dong
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
| | - Yu-Ru Su
- Biostatistics Unit, Kaiser Permanente Washington Health Research Institute, Seattle, Washington
| | - Stefanie Brezina
- Institute of Cancer Research, Department of Medicine I, Medical University Vienna, Vienna, Austria
| | - Daniel D Buchanan
- Colorectal Oncogenomics Group, Department of Clinical Pathology, The University of Melbourne, Parkville, Victoria 3010 Australia
- University of Melbourne Centre for Cancer Research, Victorian Comprehensive Cancer Centre, Parkville, Victoria 3010 Australia
- Genomic Medicine and Family Cancer Clinic, The Royal Melbourne Hospital, Parkville, Victoria, Australia
| | - Jane C Figueiredo
- Department of Medicine, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| | - Steven Gallinger
- Lunenfeld Tanenbaum Research Institute, Mount Sinai Hospital, University of Toronto, Toronto, Ontario, Canada
| | - Marios Giannakis
- Department of Medical Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, Massachusetts, USA
- The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Andrea Gsur
- Institute of Cancer Research, Department of Medicine I, Medical University Vienna, Vienna, Austria
| | - Marc J Gunter
- International Agency for Research on Cancer (IARC/WHO), Nutrition and Metabolism Branch, Lyon, France
| | - Heather Hampel
- Division of Human Genetics, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio, USA
| | - Tabitha A Harrison
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - John L Hopper
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, Victoria, Australia
- Department of Epidemiology, School of Public Health and Institute of Health and Environment, Seoul National University, Seoul, South Korea
| | - Thomas J Hudson
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Christopher I Li
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Victor Moreno
- Oncology Data Analytics Program, Catalan Institute of Oncology-IDIBELL, L’Hospitalet de Llobregat, Barcelona, Spain
- CIBER Epidemiología y Salud Pública (CIBERESP), Madrid, Spain
- Department of Clinical Sciences, Faculty of Medicine, University of Barcelona, Barcelona, Spain
- ONCOBEL Program, Bellvitge Biomedical Research Institute (IDIBELL), L’Hospitalet de Llobregat, Barcelona, Spain
| | - Polly A Newcomb
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
- School of Public Health, University of Washington, Seattle, Washington, USA
| | - Rish K Pai
- Department of Laboratory Medicine and Pathology, Mayo Clinic Arizona, Scottsdale, Arizona, USA
| | - Paul D P Pharoah
- Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Amanda I Phipps
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
- Department of Epidemiology, University of Washington, Seattle, Washington, USA
| | - Conghui Qu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Robert S Steinfelder
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Wei Sun
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Aung Ko Win
- Department of Epidemiology, School of Public Health and Institute of Health and Environment, Seoul National University, Seoul, South Korea
| | - Syed H Zaidi
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Peter T Campbell
- Department of Population Science, American Cancer Society, Atlanta, Georgia, USA
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
- Department of Epidemiology, University of Washington, Seattle, Washington, USA
| | - Li Hsu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
| |
Collapse
|
30
|
Liu Y, Sun W, Hsu L, He Q. Statistical inference for high-dimensional pathway analysis with multiple responses. Comput Stat Data Anal 2022; 169. [PMID: 35125572 PMCID: PMC8813039 DOI: 10.1016/j.csda.2021.107418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Pathway analysis, i.e., grouping analysis, has important applications in genomic studies. Existing pathway analysis approaches are mostly focused on a single response and are not suitable for analyzing complex diseases that are often related with multiple response variables. Although a handful of approaches have been developed for multiple responses, these methods are mainly designed for pathways with a moderate number of features. A multi-response pathway analysis approach that is able to conduct statistical inference when the dimension is potentially higher than sample size is introduced. Asymptotical properties of the test statistic are established and theoretical investigation of the statistical power is conducted. Simulation studies and real data analysis show that the proposed approach performs well in identifying important pathways that influence multiple expression quantitative trait loci (eQTL).
Collapse
|
31
|
Liu M, Goo J, Liu Y, Sun W, Wu MC, Hsu L, He Q. TCR-L: an analysis tool for evaluating the association between the T-cell receptor repertoire and clinical phenotypes. BMC Bioinformatics 2022; 23:152. [PMID: 35484495 PMCID: PMC9052542 DOI: 10.1186/s12859-022-04690-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 04/13/2022] [Indexed: 11/10/2022] Open
Abstract
Background T cell receptors (TCRs) play critical roles in adaptive immune responses, and recent advances in genome technology have made it possible to examine the T cell receptor (TCR) repertoire at the individual sequence level. The analysis of the TCR repertoire with respect to clinical phenotypes can yield novel insights into the etiology and progression of immune-mediated diseases. However, methods for association analysis of the TCR repertoire have not been well developed. Methods We introduce an analysis tool, TCR-L, for evaluating the association between the TCR repertoire and disease outcomes. Our approach is developed under a mixed effect modeling, where the fixed effect represents features that can be explicitly extracted from TCR sequences while the random effect represents features that are hidden in TCR sequences and are difficult to be extracted. Statistical tests are developed to examine the two types of effects independently, and then the p values are combined. Results Simulation studies demonstrate that (1) the proposed approach can control the type I error well; and (2) the power of the proposed approach is greater than approaches that consider fixed effect only or random effect only. The analysis of real data from a skin cutaneous melanoma study identifies an association between the TCR repertoire and the short/long-term survival of patients. Conclusion The TCR-L can accommodate features that can be extracted as well as features that are hidden in TCR sequences. TCR-L provides a powerful approach for identifying association between TCR repertoire and disease outcomes.
Collapse
Affiliation(s)
- Meiling Liu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, USA
| | - Juna Goo
- Department of Mathematics, Boise State University, Boise, USA
| | - Yang Liu
- Department of Mathematics and Statistics, Wright State University, Dayton, USA
| | - Wei Sun
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, USA
| | - Michael C Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, USA
| | - Li Hsu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, USA
| | - Qianchuan He
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, USA.
| |
Collapse
|
32
|
Wang T, Qiao J, Zhang S, Wei Y, Zeng P. Simultaneous test and estimation of total genetic effect in eQTL integrative analysis through mixed models. Brief Bioinform 2022; 23:6535679. [PMID: 35212359 DOI: 10.1093/bib/bbac038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 01/22/2022] [Accepted: 02/07/2021] [Indexed: 11/14/2022] Open
Abstract
Integration of expression quantitative trait loci (eQTL) into genome-wide association studies (GWASs) is a promising manner to reveal functional roles of associated single-nucleotide polymorphisms (SNPs) in complex phenotypes and has become an active research field in post-GWAS era. However, how to efficiently incorporate eQTL mapping study into GWAS for prioritization of causal genes remains elusive. We herein proposed a novel method termed as Mixed transcriptome-wide association studies (TWAS) and mediated Variance estimation (MTV) by modeling the effects of cis-SNPs of a gene as a function of eQTL. MTV formulates the integrative method and TWAS within a unified framework via mixed models and therefore includes many prior methods/tests as special cases. We further justified MTV from another two statistical perspectives of mediation analysis and two-stage Mendelian randomization. Relative to existing methods, MTV is superior for pronounced features including the processing of direct effects of cis-SNPs on phenotypes, the powerful likelihood ratio test for assessment of joint effects of cis-SNPs and genetically regulated gene expression (GReX), two useful quantities to measure relative genetic contributions of GReX and cis-SNPs to phenotypic variance, and the computationally efferent parameter expansion expectation maximum algorithm. With extensive simulations, we identified that MTV correctly controlled the type I error in joint evaluation of the total genetic effect and proved more powerful to discover true association signals across various scenarios compared to existing methods. We finally applied MTV to 41 complex traits/diseases available from three GWASs and discovered many new associated genes that had otherwise been missed by existing methods. We also revealed that a small but substantial fraction of phenotypic variation was mediated by GReX. Overall, MTV constructs a robust and realistic modeling foundation for integrative omics analysis and has the advantage of offering more attractive biological interpretations of GWAS results.
Collapse
Affiliation(s)
- Ting Wang
- Department of Biostatistics at Xuzhou Medical University, China
| | - Jiahao Qiao
- Department of Biostatistics at Xuzhou Medical University, China
| | - Shuo Zhang
- Department of Biostatistics at Xuzhou Medical University, China
| | - Yongyue Wei
- Department of Biostatistics at Nanjing Medical University, China
| | - Ping Zeng
- Department of Biostatistics, Center for Medical Statistics and Data Analysis and Key Laboratory of Human Genetics and Environmental Medicine at Xuzhou Medical University, China
| |
Collapse
|
33
|
Cheng S, Lyu J, Shi X, Wang K, Wang Z, Deng M, Sun B, Wang C. Rare variant association tests for ancestry-matched case-control data based on conditional logistic regression. Brief Bioinform 2022; 23:6502553. [PMID: 35021184 DOI: 10.1093/bib/bbab572] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 11/29/2021] [Accepted: 12/13/2021] [Indexed: 12/13/2022] Open
Abstract
With the increasing volume of human sequencing data available, analysis incorporating external controls becomes a popular and cost-effective approach to boost statistical power in disease association studies. To prevent spurious association due to population stratification, it is important to match the ancestry backgrounds of cases and controls. However, rare variant association tests based on a standard logistic regression model are conservative when all ancestry-matched strata have the same case-control ratio and might become anti-conservative when case-control ratio varies across strata. Under the conditional logistic regression (CLR) model, we propose a weighted burden test (CLR-Burden), a variance component test (CLR-SKAT) and a hybrid test (CLR-MiST). We show that the CLR model coupled with ancestry matching is a general approach to control for population stratification, regardless of the spatial distribution of disease risks. Through extensive simulation studies, we demonstrate that the CLR-based tests robustly control type 1 errors under different matching schemes and are more powerful than the standard Burden, SKAT and MiST tests. Furthermore, because CLR-based tests allow for different case-control ratios across strata, a full-matching scheme can be employed to efficiently utilize all available cases and controls to accelerate the discovery of disease associated genes.
Collapse
Affiliation(s)
- Shanshan Cheng
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| | - Jingjing Lyu
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| | - Xian Shi
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| | - Kai Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| | - Zengmiao Wang
- Center for Quantitative Biology, Peking University, Beijing 100871, P. R. China
| | - Minghua Deng
- Center for Quantitative Biology, Peking University, Beijing 100871, P. R. China.,LMAM, School of Mathematical Sciences, Peking University, Beijing 100871, P. R. China.,Center for Statistical Sciences, Peking University, Beijing 100871, P. R. China
| | - Baoluo Sun
- Department of Statistics and Data Science, National University of Singapore, Singapore 117546, Singapore
| | - Chaolong Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China.,Department of Orthopedic Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| |
Collapse
|
34
|
Arthur VL, Li Z, Cao R, Oetting WS, Israni AK, Jacobson PA, Ritchie MD, Guan W, Chen J. A Multi-Marker Test for Analyzing Paired Genetic Data in Transplantation. Front Genet 2021; 12:745773. [PMID: 34721531 PMCID: PMC8548646 DOI: 10.3389/fgene.2021.745773] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 09/23/2021] [Indexed: 12/02/2022] Open
Abstract
Emerging evidence suggests that donor/recipient matching in non-HLA (human leukocyte antigen) regions of the genome may impact transplant outcomes and recognizing these matching effects may increase the power of transplant genetics studies. Most available matching scores account for either single-nucleotide polymorphism (SNP) matching only or sum these SNP matching scores across multiple gene-coding regions, which makes it challenging to interpret the association findings. We propose a multi-marker Joint Score Test (JST) to jointly test for association between recipient genotype SNP effects and a gene-based matching score with transplant outcomes. This method utilizes Eigen decomposition as a dimension reduction technique to potentially increase statistical power by decreasing the degrees of freedom for the test. In addition, JST allows for the matching effect and the recipient genotype effect to follow different biological mechanisms, which is not the case for other multi-marker methods. Extensive simulation studies show that JST is competitive when compared with existing methods, such as the sequence kernel association test (SKAT), especially under scenarios where associated SNPs are in low linkage disequilibrium with non-associated SNPs or in gene regions containing a large number of SNPs. Applying the method to paired donor/recipient genetic data from kidney transplant studies yields various gene regions that are potentially associated with incidence of acute rejection after transplant.
Collapse
Affiliation(s)
- Victoria L. Arthur
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, United States
| | - Zhengbang Li
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, United States
- Departments of Statistics, Central China Normal University, Wuhan, China
| | - Rui Cao
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, United States
| | - William S. Oetting
- Department of Experimental and Clinical Pharmacology, College of Pharmacy, University of Minnesota, Minneapolis, MN, United States
| | - Ajay K. Israni
- Minneapolis Medical Research Foundation, Minneapolis, MN, United States
- Department of Medicine, Hennepin County Medical Center, Minneapolis, MN, United States
- Department of Epidemiology and Community Health, University of Minnesota, Minneapolis, MN, United States
| | - Pamala A. Jacobson
- Department of Experimental and Clinical Pharmacology, College of Pharmacy, University of Minnesota, Minneapolis, MN, United States
| | - Marylyn D. Ritchie
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Weihua Guan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, United States
| | - Jinbo Chen
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, United States
| |
Collapse
|
35
|
Lu H, Wei Y, Jiang Z, Zhang J, Wang T, Huang S, Zeng P. Integrative eQTL-weighted hierarchical Cox models for SNP-set based time-to-event association studies. J Transl Med 2021; 19:418. [PMID: 34627275 PMCID: PMC8502405 DOI: 10.1186/s12967-021-03090-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Accepted: 09/26/2021] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Integrating functional annotations into SNP-set association studies has been proven a powerful analysis strategy. Statistical methods for such integration have been developed for continuous and binary phenotypes; however, the SNP-set integrative approaches for time-to-event or survival outcomes are lacking. METHODS We here propose IEHC, an integrative eQTL (expression quantitative trait loci) hierarchical Cox regression, for SNP-set based survival association analysis by modeling effect sizes of genetic variants as a function of eQTL via a hierarchical manner. Three p-values combination tests are developed to examine the joint effects of eQTL and genetic variants after a novel decorrelated modification of statistics for the two components. An omnibus test (IEHC-ACAT) is further adapted to aggregate the strengths of all available tests. RESULTS Simulations demonstrated that the IEHC joint tests were more powerful if both eQTL and genetic variants contributed to association signal, while IEHC-ACAT was robust and often outperformed other approaches across various simulation scenarios. When applying IEHC to ten TCGA cancers by incorporating eQTL from relevant tissues of GTEx, we revealed that substantial correlations existed between the two types of effect sizes of genetic variants from TCGA and GTEx, and identified 21 (9 unique) cancer-associated genes which would otherwise be missed by approaches not incorporating eQTL. CONCLUSION IEHC represents a flexible, robust, and powerful approach to integrate functional omics information to enhance the power of identifying association signals for the survival risk of complex human cancers.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yongyue Wei
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, 211166, Jiangsu, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jinhui Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
36
|
He Q, Liu Y, Liu M, Wu MC, Hsu L. Random effect based tests for multinomial logistic regression in genetic association studies. Genet Epidemiol 2021; 45:736-740. [PMID: 34403161 DOI: 10.1002/gepi.22427] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 07/31/2021] [Accepted: 08/01/2021] [Indexed: 11/11/2022]
Affiliation(s)
- Qianchuan He
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Yang Liu
- Department of Mathematics and Statistics, Wright State University, Dayton, Ohio, USA
| | - Meiling Liu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Michael C Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Li Hsu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| |
Collapse
|
37
|
Sheng Z, Liu Y, Li P, Qin J. Likelihood ratio test for genetic association study with case–control data under Probit model. J Appl Stat 2021; 49:3717-3731. [DOI: 10.1080/02664763.2021.1962261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Zhen Sheng
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science, MOE, Shanghai, People's Republic of China
- School of Statistics, East China Normal University, Shanghai, People's Republic of China
| | - Yukun Liu
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science, MOE, Shanghai, People's Republic of China
- School of Statistics, East China Normal University, Shanghai, People's Republic of China
| | - Pengfei Li
- Department of Statistics and Actuarial Sciences, University of Waterloo, Waterloo, ON, Canada
| | - Jing Qin
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
38
|
Impact of rare and common genetic variation in the interleukin-1 pathway on human cytokine responses. Genome Med 2021; 13:94. [PMID: 34034819 PMCID: PMC8145796 DOI: 10.1186/s13073-021-00907-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 05/11/2021] [Indexed: 01/26/2023] Open
Abstract
Background The interleukin (IL)-1 pathway is primarily associated with innate immunological defense and plays a major role in the induction and regulation of inflammation. Both common and rare genetic variation in this pathway underlies various inflammation-mediated diseases, but the role of rare variants relative to common variants in immune response variability in healthy individuals remains unclear. Methods We performed molecular inversion probe sequencing on 48 IL-1 pathway-related genes in 463 healthy individuals from the Human Functional Genomics Project. We functionally grouped common and rare variants, over gene, subpathway, and inflammatory levels and performed the Sequence Kernel Association Test to test for association with in vitro stimulation-induced cytokine responses; specifically, IL-1β and IL-6 cytokine measurements upon stimulations that represent an array of microbial infections: lipopolysaccharide (LPS), phytohaemagglutinin (PHA), Candida albicans (C. albicans), and Staphylococcus aureus (S. aureus). Results We identified a burden of NCF4 rare variants with PHA-induced IL-6 cytokine and showed that the respective carriers are in the 1% lowest IL-6 producers. Collapsing rare variants in IL-1 subpathway genes produces a bidirectional association with LPS-induced IL-1β cytokine levels, which is reflected by a significant Spearman correlation. On the inflammatory level, we identified a burden of rare variants in genes encoding for proteins with an anti-inflammatory function with S. aureus-induced IL-6 cytokine. In contrast to these rare variant findings which were based on different types of stimuli, common variant associations were exclusively identified with C. albicans-induced cytokine over various levels of grouping, from the gene, to subpathway, to inflammatory level. Conclusions In conclusion, this study shows that functionally grouping common and rare genetic variants enables the elucidation IL-1-mediated biological mechanisms, specifically, for IL-1β and IL-6 cytokine responses induced by various stimuli. The framework used in this study may allow for the analysis of rare and common genetic variants in a wider variety of (non-immune) complex phenotypes and therefore has the potential to contribute to better understanding of unresolved, complex traits and diseases. Supplementary Information The online version contains supplementary material available at 10.1186/s13073-021-00907-w.
Collapse
|
39
|
Liu M, Liu Y, Wu MC, Hsu L, He Q. A method for subtype analysis with somatic mutations. Bioinformatics 2021; 37:50-56. [PMID: 33416828 PMCID: PMC11394914 DOI: 10.1093/bioinformatics/btaa1090] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 12/15/2020] [Accepted: 12/22/2020] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION Cancer is a highly heterogeneous disease, and virtually all types of cancer have subtypes. Understanding the association between cancer subtypes and genetic variations is fundamental to the development of targeted therapies for patients. Somatic mutation plays important roles in tumor development and has emerged as a new type of genetic variations for studying the association with cancer subtypes. However, the low prevalence of individual mutations poses a tremendous challenge to the related statistical analysis. RESULTS In this article, we propose an approach, subtype analysis with somatic mutations (SASOM), for the association analysis of cancer subtypes with somatic mutations. Our approach tests the association between a set of somatic mutations (from a genetic pathway) and subtypes, while incorporating functional information of the mutations into the analysis. We further propose a robust p-value combination procedure, DAPC, to synthesize statistical significance from different sources. Simulation studies show that the proposed approach has correct type I error and tends to be more powerful than possible alternative methods. In a real data application, we examine the somatic mutations from a cutaneous melanoma dataset, and identify a genetic pathway that is associated with immune-related subtypes. AVAILABILITY AND IMPLEMENTATION The SASOM R package is available at https://github.com/rksyouyou/SASOM-pkg. R scripts and data are available at https://github.com/rksyouyou/SASOM-analysis. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Meiling Liu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Yang Liu
- Department of Mathematics and Statistics, Wright State University, Dayton, OH 45435, USA
| | - Michael C Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Li Hsu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Qianchuan He
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| |
Collapse
|
40
|
Zhan X, Banerjee K, Chen J. Variant-set association test for generalized linear mixed model. Genet Epidemiol 2021; 45:402-412. [PMID: 33604919 DOI: 10.1002/gepi.22378] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 01/18/2021] [Accepted: 01/25/2021] [Indexed: 12/22/2022]
Abstract
Advances in high-throughput biotechnologies have culminated in a wide range of omics (such as genomics, epigenomics, transcriptomics, metabolomics, and metagenomics) studies, and increasing evidence in these studies indicates that the biological architecture of complex traits involves a large number of omics variants each with minor effects but collectively accounting for the full phenotypic variability. Thus, a major challenge in many "ome-wide" association analyses is to achieve adequate statistical power to identify multiple variants of small effect sizes, which is notoriously difficult for studies with relatively small-sample sizes. A small-sample adjustment incorporated in the kernel machine regression framework was proposed to solve this for association studies under various settings. However, such an adjustment in the generalized linear mixed model (GLMM) framework, which accounts for both sample relatedness and non-Gaussian outcomes, has not yet been attempted. In this study, we fill this gap by extending small-sample adjustment in kernel machine association test to GLMM. We propose a new Variant-Set Association Test (VSAT), a powerful and efficient analysis tool in GLMM, to examine the association between a set of omics variants and correlated phenotypes. The usefulness of VSAT is demonstrated using both numerical simulation studies and applications to data collected from multiple association studies. The software for implementing the proposed method in R is available at https://www.github.com/jchen1981/SSKAT.
Collapse
Affiliation(s)
- Xiang Zhan
- Department of Public Health Sciences, Pennsylvania State University, Hershey, Pennsylvania, USA
| | - Kalins Banerjee
- Department of Public Health Sciences, Pennsylvania State University, Hershey, Pennsylvania, USA
| | - Jun Chen
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
41
|
He Z, Pan Y, Shao F, Wang H. Identifying Differentially Expressed Genes of Zero Inflated Single Cell RNA Sequencing Data Using Mixed Model Score Tests. Front Genet 2021; 12:616686. [PMID: 33613638 PMCID: PMC7894898 DOI: 10.3389/fgene.2021.616686] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 01/14/2021] [Indexed: 12/13/2022] Open
Abstract
Single cell RNA sequencing (scRNA-seq) allows quantitative measurement and comparison of gene expression at the resolution of single cells. Ignoring the batch effects and zero inflation of scRNA-seq data, many proposed differentially expressed (DE) methods might generate bias. We propose a method, single cell mixed model score tests (scMMSTs), to efficiently identify DE genes of scRNA-seq data with batch effects using the generalized linear mixed model (GLMM). scMMSTs treat the batch effect as a random effect. For zero inflation, scMMSTs use a weighting strategy to calculate observational weights for counts independently under zero-inflated and zero-truncated distributions. Counts data with calculated weights were subsequently analyzed using weighted GLMMs. The theoretical null distributions of the score statistics were constructed by mixed Chi-square distributions. Intensive simulations and two real datasets were used to compare edgeR-zinbwave, DESeq2-zinbwave, and scMMSTs. Our study demonstrates that scMMSTs, as supplement to standard methods, are advantageous to define DE genes of zero-inflated scRNA-seq data with batch effects.
Collapse
Affiliation(s)
- Zhiqiang He
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Yueyun Pan
- First Clinical Medical College, Nanjing Medical University, Nanjing, China
| | - Fang Shao
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Hui Wang
- Department of Maternal and Child Health, School of Public Health, Peking University Health Science Center, Beijing, China
| |
Collapse
|
42
|
Susak H, Serra-Saurina L, Demidov G, Rabionet R, Domènech L, Bosio M, Muyas F, Estivill X, Escaramís G, Ossowski S. Efficient and flexible Integration of variant characteristics in rare variant association studies using integrated nested Laplace approximation. PLoS Comput Biol 2021; 17:e1007784. [PMID: 33606672 PMCID: PMC7928502 DOI: 10.1371/journal.pcbi.1007784] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 03/03/2021] [Accepted: 01/04/2021] [Indexed: 12/02/2022] Open
Abstract
Rare variants are thought to play an important role in the etiology of complex diseases and may explain a significant fraction of the missing heritability in genetic disease studies. Next-generation sequencing facilitates the association of rare variants in coding or regulatory regions with complex diseases in large cohorts at genome-wide scale. However, rare variant association studies (RVAS) still lack power when cohorts are small to medium-sized and if genetic variation explains a small fraction of phenotypic variance. Here we present a novel Bayesian rare variant Association Test using Integrated Nested Laplace Approximation (BATI). Unlike existing RVAS tests, BATI allows integration of individual or variant-specific features as covariates, while efficiently performing inference based on full model estimation. We demonstrate that BATI outperforms established RVAS methods on realistic, semi-synthetic whole-exome sequencing cohorts, especially when using meaningful biological context, such as functional annotation. We show that BATI achieves power above 70% in scenarios in which competing tests fail to identify risk genes, e.g. when risk variants in sum explain less than 0.5% of phenotypic variance. We have integrated BATI, together with five existing RVAS tests in the 'Rare Variant Genome Wide Association Study' (rvGWAS) framework for data analyzed by whole-exome or whole genome sequencing. rvGWAS supports rare variant association for genes or any other biological unit such as promoters, while allowing the analysis of essential functionalities like quality control or filtering. Applying rvGWAS to a Chronic Lymphocytic Leukemia study we identified eight candidate predisposition genes, including EHMT2 and COPS7A.
Collapse
Affiliation(s)
- Hana Susak
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Laura Serra-Saurina
- Biomedical Research Networking Centre consortium of Public Health and Epidemiology (CIBERESP), Madrid, Spain
- Center for research in occupational Health (CiSAL), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
- Research Group on Statistics, Econometrics and Health (GRECS), Universitat de Girona (UdG), Girona, Spain
| | - German Demidov
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
| | - Raquel Rabionet
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Genetics, Microbiology and Statistics, Faculty of Biology, IBUB, Universitat de Barcelona; CIBERER, IRSJD, Barcelona, Spain
| | - Laura Domènech
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Biomedical Research Networking Centre consortium of Public Health and Epidemiology (CIBERESP), Madrid, Spain
| | - Mattia Bosio
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Francesc Muyas
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
| | - Xavier Estivill
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Women’s Health Dexeus, Barcelona, Spain
| | - Geòrgia Escaramís
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Biomedical Research Networking Centre consortium of Public Health and Epidemiology (CIBERESP), Madrid, Spain
- Departament de Biomedicina, Facultat de Medicina i Ciències de la Salut, Institut de Neurociències, Universitat de Barcelona, Spain
| | - Stephan Ossowski
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
| |
Collapse
|
43
|
Bonnefond A, Boissel M, Bolze A, Durand E, Toussaint B, Vaillant E, Gaget S, Graeve FD, Dechaume A, Allegaert F, Guilcher DL, Yengo L, Dhennin V, Borys JM, Lu JT, Cirulli ET, Elhanan G, Roussel R, Balkau B, Marre M, Franc S, Charpentier G, Vaxillaire M, Canouil M, Washington NL, Grzymski JJ, Froguel P. Pathogenic variants in actionable MODY genes are associated with type 2 diabetes. Nat Metab 2020; 2:1126-1134. [PMID: 33046911 DOI: 10.1038/s42255-020-00294-3] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Accepted: 09/08/2020] [Indexed: 02/07/2023]
Abstract
Genome-wide association studies have identified 240 independent loci associated with type 2 diabetes (T2D) risk, but this knowledge has not advanced precision medicine. In contrast, the genetic diagnosis of monogenic forms of diabetes (including maturity-onset diabetes of the young (MODY)) are textbook cases of genomic medicine. Recent studies trying to bridge the gap between monogenic diabetes and T2D have been inconclusive. Here, we show a significant burden of pathogenic variants in genes linked with monogenic diabetes among people with common T2D, particularly in actionable MODY genes, thus implying that there should be a substantial change in care for carriers with T2D. We show that, among 74,629 individuals, this burden is probably driven by the pathogenic variants found in GCK, and to a lesser extent in HNF4A, KCNJ11, HNF1B and ABCC8. The carriers with T2D are leaner, which evidences a functional metabolic effect of these mutations. Pathogenic variants in actionable MODY genes are more frequent than was previously expected in common T2D. These results open avenues for future interventions assessing the clinical interest of these pathogenic mutations in precision medicine.
Collapse
Affiliation(s)
- Amélie Bonnefond
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Université de Lille, Institut Pasteur de Lille, Lille University Hospital, Lille, France.
- Department of Metabolism, Imperial College London, London, UK.
| | - Mathilde Boissel
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Université de Lille, Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | | | - Emmanuelle Durand
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Université de Lille, Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Bénédicte Toussaint
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Université de Lille, Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Emmanuel Vaillant
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Université de Lille, Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Stefan Gaget
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Université de Lille, Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Franck De Graeve
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Université de Lille, Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Aurélie Dechaume
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Université de Lille, Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Frédéric Allegaert
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Université de Lille, Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - David Le Guilcher
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Université de Lille, Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Loïc Yengo
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Université de Lille, Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Institute for Molecular Bioscience, the University of Queensland, St Lucia, Australia
| | - Véronique Dhennin
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Université de Lille, Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | | | | | | | - Gai Elhanan
- Desert Research Institute, Reno, NV, USA
- Renown Institute of Health Innovation, Reno, NV, USA
| | - Ronan Roussel
- Department of Diabetology Endocrinology Nutrition, Hôpital Bichat, DHU FIRE, Assistance Publique Hôpitaux de Paris, Paris, France
- Inserm U1138, Centre de Recherche des Cordeliers, Paris, France
- UFR de Médecine, University Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Beverley Balkau
- Inserm U1018, Institut Gustave Roussy, Center for Research in Epidemiology and Population Health, Villejuif, France
- University Paris-Saclay, University Paris-Sud, Villejuif, France
| | - Michel Marre
- Inserm U1138, Centre de Recherche des Cordeliers, Paris, France
- CMC Ambroise Paré, Neuilly-sur-Seine, France
| | - Sylvia Franc
- CERITD (Centre d'Étude et de Recherche pour l'Intensification du Traitement du Diabète), Evry, France
- Department of Diabetes, Sud-Francilien Hospital, University Paris-Sud, Orsay, Corbeil-Essonnes, France
| | - Guillaume Charpentier
- CERITD (Centre d'Étude et de Recherche pour l'Intensification du Traitement du Diabète), Evry, France
| | - Martine Vaxillaire
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Université de Lille, Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Mickaël Canouil
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Université de Lille, Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | | | - Joseph J Grzymski
- Desert Research Institute, Reno, NV, USA
- Renown Institute of Health Innovation, Reno, NV, USA
| | - Philippe Froguel
- Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Université de Lille, Institut Pasteur de Lille, Lille University Hospital, Lille, France.
- Department of Metabolism, Imperial College London, London, UK.
| |
Collapse
|
44
|
Li X, Li Z, Zhou H, Gaynor SM, Liu Y, Chen H, Sun R, Dey R, Arnett DK, Aslibekyan S, Ballantyne CM, Bielak LF, Blangero J, Boerwinkle E, Bowden DW, Broome JG, Conomos MP, Correa A, Cupples LA, Curran JE, Freedman BI, Guo X, Hindy G, Irvin MR, Kardia SLR, Kathiresan S, Khan AT, Kooperberg CL, Laurie CC, Liu XS, Mahaney MC, Manichaikul AW, Martin LW, Mathias RA, McGarvey ST, Mitchell BD, Montasser ME, Moore JE, Morrison AC, O'Connell JR, Palmer ND, Pampana A, Peralta JM, Peyser PA, Psaty BM, Redline S, Rice KM, Rich SS, Smith JA, Tiwari HK, Tsai MY, Vasan RS, Wang FF, Weeks DE, Weng Z, Wilson JG, Yanek LR, Neale BM, Sunyaev SR, Abecasis GR, Rotter JI, Willer CJ, Peloso GM, Natarajan P, Lin X. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nat Genet 2020; 52:969-983. [PMID: 32839606 PMCID: PMC7483769 DOI: 10.1038/s41588-020-0676-4] [Citation(s) in RCA: 141] [Impact Index Per Article: 28.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 07/02/2020] [Indexed: 12/13/2022]
Abstract
Large-scale whole-genome sequencing studies have enabled the analysis of rare variants (RVs) associated with complex phenotypes. Commonly used RV association tests have limited scope to leverage variant functions. We propose STAAR (variant-set test for association using annotation information), a scalable and powerful RV association test method that effectively incorporates both variant categories and multiple complementary annotations using a dynamic weighting scheme. For the latter, we introduce 'annotation principal components', multidimensional summaries of in silico variant annotations. STAAR accounts for population structure and relatedness and is scalable for analyzing very large cohort and biobank whole-genome sequencing studies of continuous and dichotomous traits. We applied STAAR to identify RVs associated with four lipid traits in 12,316 discovery and 17,822 replication samples from the Trans-Omics for Precision Medicine Program. We discovered and replicated new RV associations, including disruptive missense RVs of NPC1L1 and an intergenic region near APOC1P1 associated with low-density lipoprotein cholesterol.
Collapse
Affiliation(s)
- Xihao Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Zilin Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Hufeng Zhou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Sheila M Gaynor
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Yaowu Liu
- School of Statistics, Southwestern University of Finance and Economics, Chengdu, China
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Center for Precision Health, School of Public Health and School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Ryan Sun
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Rounak Dey
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Donna K Arnett
- College of Public Health, University of Kentucky, Lexington, KY, USA
| | - Stella Aslibekyan
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| | | | - Lawrence F Bielak
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Donald W Bowden
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Jai G Broome
- Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Matthew P Conomos
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Adolfo Correa
- Jackson Heart Study, Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - L Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, MA, USA
| | - Joanne E Curran
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Barry I Freedman
- Department of Internal Medicine, Nephrology, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - George Hindy
- Department of Population Medicine, Qatar University College of Medicine, QU Health, Doha, Qatar
| | - Marguerite R Irvin
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Sharon L R Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Sekar Kathiresan
- Verve Therapeutics, Cambridge, MA, USA
- Cardiology Division, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Alyna T Khan
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Charles L Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Cathy C Laurie
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - X Shirley Liu
- Department of Data Sciences, Dana-Farber Cancer Institute and Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Statistics, Harvard University, Cambridge, MA, USA
| | - Michael C Mahaney
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Ani W Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Lisa W Martin
- Division of Cardiology, George Washington School of Medicine and Health Sciences, Washington, DC, USA
| | - Rasika A Mathias
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Stephen T McGarvey
- Department of Epidemiology, International Health Institute, Department of Anthropology, Brown University, Providence, RI, USA
| | - Braxton D Mitchell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Geriatrics Research and Education Clinical Center, Baltimore VA Medical Center, Baltimore, MD, USA
| | - May E Montasser
- Division of Endocrinology, Diabetes, and Nutrition, Program for Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Jill E Moore
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Jeffrey R O'Connell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Nicholette D Palmer
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Akhil Pampana
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Juan M Peralta
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Patricia A Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Departments of Medicine, Epidemiology, and Health Services, University of Washington, Seattle, WA, USA
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
- Division of Pulmonary, Critical Care, and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Kenneth M Rice
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Jennifer A Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Hemant K Tiwari
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Michael Y Tsai
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, USA
| | - Ramachandran S Vasan
- Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, MA, USA
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Fei Fei Wang
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Daniel E Weeks
- Department of Human Genetics and Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - James G Wilson
- Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS, USA
- Division of Cardiology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Lisa R Yanek
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Benjamin M Neale
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Shamil R Sunyaev
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Division of Genetics, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Gonçalo R Abecasis
- Regeneron Pharmaceuticals, Tarrytown, NY, USA
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Cristen J Willer
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Gina M Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Pradeep Natarajan
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Department of Statistics, Harvard University, Cambridge, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| |
Collapse
|
45
|
Wang X, Lim E, Liu CT, Sung YJ, Rao DC, Morrison AC, Boerwinkle E, Manning AK, Chen H. Efficient gene-environment interaction tests for large biobank-scale sequencing studies. Genet Epidemiol 2020; 44:908-923. [PMID: 32864785 DOI: 10.1002/gepi.22351] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Revised: 07/22/2020] [Accepted: 08/08/2020] [Indexed: 01/01/2023]
Abstract
Complex human diseases are affected by genetic and environmental risk factors and their interactions. Gene-environment interaction (GEI) tests for aggregate genetic variant sets have been developed in recent years. However, existing statistical methods become rate limiting for large biobank-scale sequencing studies with correlated samples. We propose efficient Mixed-model Association tests for GEne-Environment interactions (MAGEE), for testing GEI between an aggregate variant set and environmental exposures on quantitative and binary traits in large-scale sequencing studies with related individuals. Joint tests for the aggregate genetic main effects and GEI effects are also developed. A null generalized linear mixed model adjusting for covariates but without any genetic effects is fit only once in a whole genome GEI analysis, thereby vastly reducing the overall computational burden. Score tests for variant sets are performed as a combination of genetic burden and variance component tests by accounting for the genetic main effects using matrix projections. The computational complexity is dramatically reduced in a whole genome GEI analysis, which makes MAGEE scalable to hundreds of thousands of individuals. We applied MAGEE to the exome sequencing data of 41,144 related individuals from the UK Biobank, and the analysis of 18,970 protein coding genes finished within 10.4 CPU hours.
Collapse
Affiliation(s)
- Xinyu Wang
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas
| | - Elise Lim
- Department of Biostatistics, Boston University, Boston, Massachusetts
| | - Ching-Ti Liu
- Department of Biostatistics, Boston University, Boston, Massachusetts
| | - Yun Ju Sung
- Division of Biostatistics, Washington University School of Medicine, St. Louis, Missouri
| | - Dabeeru C Rao
- Division of Biostatistics, Washington University School of Medicine, St. Louis, Missouri
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas
| | - Alisa K Manning
- Center for Human Genetics Research, Massachusetts General Hospital, Boston, Massachusetts.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas.,Center for Precision Health, School of Public Health and School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas
| |
Collapse
|
46
|
Jiang L, Huguet G, Schramm C, Ciampi A, Main A, Passo C, Jean‐Louis M, Auger M, Schumann G, Porteous D, Jacquemont S, Greenwood CMT. Estimating the effects of copy‐number variants on intelligence using hierarchical Bayesian models. Genet Epidemiol 2020; 44:825-840. [DOI: 10.1002/gepi.22344] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Revised: 06/24/2020] [Accepted: 07/21/2020] [Indexed: 01/01/2023]
Affiliation(s)
- Lai Jiang
- Lady Davis Institute Jewish General Hospital Montreal Canada
- Department of Epidemiology, Biostatistics and Occupational Health McGill University Montreal Canada
- Centre Hospitalier Universitaire (CHU) Sainte‐Justine Montreal Canada
| | - Guillaume Huguet
- Centre Hospitalier Universitaire (CHU) Sainte‐Justine Montreal Canada
- Universite de Montreal Montreal Canada
| | - Catherine Schramm
- Lady Davis Institute Jewish General Hospital Montreal Canada
- Centre Hospitalier Universitaire (CHU) Sainte‐Justine Montreal Canada
- Universite de Montreal Montreal Canada
| | - Antonio Ciampi
- Department of Epidemiology, Biostatistics and Occupational Health McGill University Montreal Canada
| | - Antoine Main
- Centre Hospitalier Universitaire (CHU) Sainte‐Justine Montreal Canada
- Universite de Montreal Montreal Canada
- Department of Decision Sciences Hautes etudes commerciales de Montreal (HEC) Montreal Canada
| | - Claudine Passo
- Centre Hospitalier Universitaire (CHU) Sainte‐Justine Montreal Canada
- Universite de Montreal Montreal Canada
| | - Martineau Jean‐Louis
- Centre Hospitalier Universitaire (CHU) Sainte‐Justine Montreal Canada
- Universite de Montreal Montreal Canada
| | - Maude Auger
- Centre Hospitalier Universitaire (CHU) Sainte‐Justine Montreal Canada
- Universite de Montreal Montreal Canada
| | - Gunter Schumann
- Institute of Psychiatry, Psychology, and Neuroscience King's College London London UK
| | - David Porteous
- Department of Psychology, Lothian Birth Cohorts Group, School of Philosophy, Psychology and Language Sciences The University of Edinburgh Edinburgh UK
- Medical Genetics Section, Centre for Genomic Experimental Medicine, MRC Institute of Genetics Molecular Medicine, Western General Hospital The University of Edinburgh Edinburgh UK
- Generation Scotland, Centre for Genomic and Experimental Medicine University of Edinburgh Edinburgh UK
| | - Sébastien Jacquemont
- Centre Hospitalier Universitaire (CHU) Sainte‐Justine Montreal Canada
- Universite de Montreal Montreal Canada
| | - Celia M. T. Greenwood
- Lady Davis Institute Jewish General Hospital Montreal Canada
- Department of Epidemiology, Biostatistics and Occupational Health McGill University Montreal Canada
- Gerald Bronfman Department of Oncology McGill University Montreal Canada
- Department of Human Genetics McGill University Montreal Canada
| |
Collapse
|
47
|
Dong X, Su YR, Barfield R, Bien SA, He Q, Harrison TA, Huyghe JR, Keku TO, Lindor NM, Schafmayer C, Chan AT, Gruber SB, Jenkins MA, Kooperberg C, Peters U, Hsu L. A general framework for functionally informed set-based analysis: Application to a large-scale colorectal cancer study. PLoS Genet 2020; 16:e1008947. [PMID: 32833970 PMCID: PMC7470748 DOI: 10.1371/journal.pgen.1008947] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 09/03/2020] [Accepted: 06/22/2020] [Indexed: 11/18/2022] Open
Abstract
Genome-wide association studies (GWAS) have successfully identified tens of thousands of genetic variants associated with various phenotypes, but together they explain only a fraction of heritability, suggesting many variants have yet to be discovered. Recently it has been recognized that incorporating functional information of genetic variants can improve power for identifying novel loci. For example, S-PrediXcan and TWAS tested the association of predicted gene expression with phenotypes based on GWAS summary statistics by leveraging the information on genetic regulation of gene expression and found many novel loci. However, as genetic variants may have effects on more than one gene and through different mechanisms, these methods likely only capture part of the total effects of these variants. In this paper, we propose a summary statistics-based mixed effects score test (sMiST) that tests for the total effect of both the effect of the mediator by imputing genetically predicted gene expression, like S-PrediXcan and TWAS, and the direct effects of individual variants. It allows for multiple functional annotations and multiple genetically predicted mediators. It can also perform conditional association analysis while adjusting for other genetic variants (e.g., known loci for the phenotype). Extensive simulation and real data analyses demonstrate that sMiST yields p-values that agree well with those obtained from individual level data but with substantively improved computational speed. Importantly, a broad application of sMiST to GWAS is possible, as only summary statistics of genetic variant associations are required. We apply sMiST to a large-scale GWAS of colorectal cancer using summary statistics from ∼120, 000 study participants and gene expression data from the Genotype-Tissue Expression (GTEx) project. We identify several novel and secondary independent genetic loci.
Collapse
Affiliation(s)
- Xinyuan Dong
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Yu-Ru Su
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Richard Barfield
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Stephanie A. Bien
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Qianchuan He
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Tabitha A. Harrison
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Jeroen R. Huyghe
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Temitope O. Keku
- Center for Gastrointestinal Biology and Disease, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Noralane M. Lindor
- Department of Health Science Research, Mayo Clinic, Scottsdale, Arizona, USA
| | - Clemens Schafmayer
- Department of General Surgery, University Hospital Rostock, Rostock, Germany
| | - Andrew T. Chan
- Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, and Channing Division of Network Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Stephen B. Gruber
- City of Hope National Medical Center, Duarte, and Department of Preventive Medicine & USC Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| | - Mark A. Jenkins
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, Victoria, Australia
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Ulrike Peters
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Li Hsu
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| |
Collapse
|
48
|
Tan X, Chen BE, Sun J, Patel T, Ibrahim JG. A hierarchical testing approach for detecting safety signals in clinical trials. Stat Med 2020; 39:1541-1557. [PMID: 32050050 PMCID: PMC8258607 DOI: 10.1002/sim.8495] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Revised: 05/01/2019] [Accepted: 08/16/2019] [Indexed: 11/10/2022]
Abstract
Detecting safety signals in clinical trial safety data is known to be challenging due to high dimensionality, rare occurrence, weak signal, and complex dependence. We propose a new hierarchical testing approach for analyzing safety data from a typical randomized clinical trial. This approach accounts for the hierarchical structure of adverse events (AEs), that is, AEs are categorized by system organ class (SOC). Our approach contains two steps: the first step tests, for each SOC, whether any AEs within this SOC are differently distributed between treatment arms; and the second step identifies signal AEs from SOCs passing the first step tests. We show the superiority, in terms of power of detecting safety signals given controlled false discovery rate, of the new approach comparing with currently available approaches through simulation studies. We also demonstrate this approach with two real data examples.
Collapse
Affiliation(s)
- Xianming Tan
- Department of Biostatistics, UNC at Chapel Hill, Chapel Hill, North Carolina
| | - Bingshu E. Chen
- Canadian Cancer Trials Group and Department of Public Health Sciences, Queen’s University, Kingston, Ontario Canada
| | - Jianping Sun
- Department of Mathematics and Statistics, UNC at Greensboro, Greensboro, North Carolina
| | - Tejendra Patel
- Division of Pharmacotherapy and Experimental Therapeutics, Eshelman School of Pharmacy, UNC at Chapel Hill, Chapel Hill, North Carolina
| | - Joseph G. Ibrahim
- Department of Biostatistics, UNC at Chapel Hill, Chapel Hill, North Carolina
| |
Collapse
|
49
|
Liu Y, Li P, Song L, Yu K, Qin J. Retrospective versus prospective score tests for genetic association with case-control data. Biometrics 2020; 77:102-112. [PMID: 32275064 DOI: 10.1111/biom.13270] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Revised: 03/09/2020] [Accepted: 03/24/2020] [Indexed: 11/30/2022]
Abstract
Since the seminal work of Prentice and Pyke, the prospective logistic likelihood has become the standard method of analysis for retrospectively collected case-control data, in particular for testing the association between a single genetic marker and a disease outcome in genetic case-control studies. In the study of multiple genetic markers with relatively small effects, especially those with rare variants, various aggregated approaches based on the same prospective likelihood have been developed to integrate subtle association evidence among all the markers considered. Many of the commonly used tests are derived from the prospective likelihood under a common-random-effect assumption, which assumes a common random effect for all subjects. We develop the locally most powerful aggregation test based on the retrospective likelihood under an independent-random-effect assumption, which allows the genetic effect to vary among subjects. In contrast to the fact that disease prevalence information cannot be used to improve efficiency for the estimation of odds ratio parameters in logistic regression models, we show that it can be utilized to enhance the testing power in genetic association studies. Extensive simulations demonstrate the advantages of the proposed method over the existing ones. A real genome-wide association study is analyzed for illustration.
Collapse
Affiliation(s)
- Yukun Liu
- KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai, China
| | - Pengfei Li
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON, Canada
| | - Lei Song
- National Cancer Institute, National Institutes of Health, Bethesda, Maryland.,Cancer Genomics Research Laboratory, Leidos Biomedical Research, Inc., Frederick, Maryland
| | - Kai Yu
- National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Jing Qin
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland
| |
Collapse
|
50
|
Zhang H, Zhao N, Ahearn TU, Wheeler W, García-Closas M, Chatterjee N. A mixed-model approach for powerful testing of genetic associations with cancer risk incorporating tumor characteristics. Biostatistics 2020; 22:772-788. [PMID: 32112086 DOI: 10.1093/biostatistics/kxz065] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2019] [Revised: 12/17/2019] [Accepted: 12/20/2019] [Indexed: 01/05/2023] Open
Abstract
Cancers are routinely classified into subtypes according to various features, including histopathological characteristics and molecular markers. Previous genome-wide association studies have reported heterogeneous associations between loci and cancer subtypes. However, it is not evident what is the optimal modeling strategy for handling correlated tumor features, missing data, and increased degrees-of-freedom in the underlying tests of associations. We propose to test for genetic associations using a mixed-effect two-stage polytomous model score test (MTOP). In the first stage, a standard polytomous model is used to specify all possible subtypes defined by the cross-classification of the tumor characteristics. In the second stage, the subtype-specific case-control odds ratios are specified using a more parsimonious model based on the case-control odds ratio for a baseline subtype, and the case-case parameters associated with tumor markers. Further, to reduce the degrees-of-freedom, we specify case-case parameters for additional exploratory markers using a random-effect model. We use the Expectation-Maximization algorithm to account for missing data on tumor markers. Through simulations across a range of realistic scenarios and data from the Polish Breast Cancer Study (PBCS), we show MTOP outperforms alternative methods for identifying heterogeneous associations between risk loci and tumor subtypes. The proposed methods have been implemented in a user-friendly and high-speed R statistical package called TOP (https://github.com/andrewhaoyu/TOP).
Collapse
Affiliation(s)
- Haoyu Zhang
- Department of Biostatistics, Johns Hopkins Bloomberg SPH, 615 N Wolfe St, Baltimore, MD 21205, USA and Division of Cancer Epidemiology and Genetics, National Cancer Institute, Shady Grove, 9609 Medical Center Drive, Rockville, MD 20850, USA
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg SPH, 615 N Wolfe St, Baltimore, MD 21205, USA
| | - Thomas U Ahearn
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Shady Grove, 9609 Medical Center Drive, Rockville, MD 20850, USA
| | - William Wheeler
- National Cancer Institute, Information Management Service, Inc. 11730 Plaza America Dr, Reston, VA 20190, USA
| | - Montserrat García-Closas
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Shady Grove, 9609 Medical Center Drive, Rockville, MD 20850, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg SPH, 615 N Wolfe St, Baltimore, MD 21205, USA; Department of Oncology, Johns Hopkins University School of Medicine SPH, 733 N Broadway, Baltimore, MD 21205, USA and Department of Epidemiology, Johns Hopkins Bloomberg SPH, 615 N Wolfe St, Baltimore, MD 21205, USA
| |
Collapse
|