1
|
Fu S, Wheeler W, Wang X, Hua X, Godbole D, Duan J, Zhu B, Deng L, Qin F, Zhang H, Shi J, Yu K. A comprehensive framework for trans-ancestry pathway analysis using GWAS summary data from diverse populations. PLoS Genet 2024; 20:e1011322. [PMID: 39441834 PMCID: PMC11534268 DOI: 10.1371/journal.pgen.1011322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 11/04/2024] [Accepted: 10/07/2024] [Indexed: 10/25/2024] Open
Abstract
As more multi-ancestry GWAS summary data become available, we have developed a comprehensive trans-ancestry pathway analysis framework that effectively utilizes this diverse genetic information. Within this framework, we evaluated various strategies for integrating genetic data at different levels-SNP, gene, and pathway-from multiple ancestry groups. Through extensive simulation studies, we have identified robust strategies that demonstrate superior performance across diverse scenarios. Applying these methods, we analyzed 6,970 pathways for their association with schizophrenia, incorporating data from African, East Asian, and European populations. Our analysis identified over 200 pathways significantly associated with schizophrenia, even after excluding genes near genome-wide significant loci. This approach substantially enhances detection efficiency compared to traditional single-ancestry pathway analysis and the conventional approach that amalgamates single-ancestry pathway analysis results across different ancestry groups. Our framework provides a flexible and effective tool for leveraging the expanding pool of multi-ancestry GWAS summary data, thereby improving our ability to identify biologically relevant pathways that contribute to disease susceptibility.
Collapse
Affiliation(s)
- Sheng Fu
- School of Statistics and Data Science, Nankai University, Tianjin, China
- Key Laboratory of Pure Mathematics and Combinatorics, Nankai University, Tianjin, China
| | - William Wheeler
- Information Management Services, Inc, Bethesda, Maryland, United States of America
| | - Xiaoyu Wang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
- Cancer Genomics Research Laboratory, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc, Rockville, Maryland, United States of America
| | - Xing Hua
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
- Cancer Genomics Research Laboratory, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc, Rockville, Maryland, United States of America
| | - Devika Godbole
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
- Cancer Genomics Research Laboratory, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc, Rockville, Maryland, United States of America
| | - Jubao Duan
- Center for Psychiatric Genetics, NorthShore University HealthSystem, Evanston, Illinois, United States of America
- Department of Psychiatry and Behavioral Neuroscience, University of Chicago, Chicago, Illinois, United States of America
| | - Bin Zhu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Lu Deng
- School of Statistics and Data Science, Nankai University, Tianjin, China
| | - Fei Qin
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Jianxin Shi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Kai Yu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| |
Collapse
|
2
|
Defo J, Awany D, Ramesar R. From SNP to pathway-based GWAS meta-analysis: do current meta-analysis approaches resolve power and replication in genetic association studies? Brief Bioinform 2023; 24:6972298. [PMID: 36611240 DOI: 10.1093/bib/bbac600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 11/30/2022] [Accepted: 12/06/2022] [Indexed: 01/09/2023] Open
Abstract
Genome-wide association studies (GWAS) have benefited greatly from enhanced high-throughput technology in recent decades. GWAS meta-analysis has become increasingly popular to highlight the genetic architecture of complex traits, informing about the replicability and variability of effect estimations across human ancestries. A wealth of GWAS meta-analysis methodologies have been developed depending on the input data and the outcome information of interest. We present a survey of current approaches from SNP to pathway-based meta-analysis by acknowledging the range of resources and methodologies in the field, and we provide a comprehensive review of different categories of Genome-Wide Meta-analysis methods employed. These methods highlight different levels at which GWAS meta-analysis may be done, including Single Nucleotide Polymorphisms, Genes and Pathways, for which we describe their framework outline. We also discuss the strengths and pitfalls of each approach and make suggestions regarding each of them.
Collapse
Affiliation(s)
- Joel Defo
- Division of Human Genetics, Department of Pathology, Faculty of Health Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, 7925, Observatory, South Africa.,South African Medical Research Council Genomic and Personalized Medicine Research Unit
| | - Denis Awany
- South African Tuberculosis Vaccine Initiative (SATVI), University of Cape Town, 7925, South Africa
| | - Raj Ramesar
- Division of Human Genetics, Department of Pathology, Faculty of Health Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, 7925, Observatory, South Africa.,South African Medical Research Council Genomic and Personalized Medicine Research Unit
| |
Collapse
|
3
|
Li X, Quick C, Zhou H, Gaynor SM, Liu Y, Chen H, Selvaraj MS, Sun R, Dey R, Arnett DK, Bielak LF, Bis JC, Blangero J, Boerwinkle E, Bowden DW, Brody JA, Cade BE, Correa A, Cupples LA, Curran JE, de Vries PS, Duggirala R, Freedman BI, Göring HHH, Guo X, Haessler J, Kalyani RR, Kooperberg C, Kral BG, Lange LA, Manichaikul A, Martin LW, McGarvey ST, Mitchell BD, Montasser ME, Morrison AC, Naseri T, O'Connell JR, Palmer ND, Peyser PA, Psaty BM, Raffield LM, Redline S, Reiner AP, Reupena MS, Rice KM, Rich SS, Sitlani CM, Smith JA, Taylor KD, Vasan RS, Willer CJ, Wilson JG, Yanek LR, Zhao W, Rotter JI, Natarajan P, Peloso GM, Li Z, Lin X. Powerful, scalable and resource-efficient meta-analysis of rare variant associations in large whole genome sequencing studies. Nat Genet 2023; 55:154-164. [PMID: 36564505 PMCID: PMC10084891 DOI: 10.1038/s41588-022-01225-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 10/13/2022] [Indexed: 12/24/2022]
Abstract
Meta-analysis of whole genome sequencing/whole exome sequencing (WGS/WES) studies provides an attractive solution to the problem of collecting large sample sizes for discovering rare variants associated with complex phenotypes. Existing rare variant meta-analysis approaches are not scalable to biobank-scale WGS data. Here we present MetaSTAAR, a powerful and resource-efficient rare variant meta-analysis framework for large-scale WGS/WES studies. MetaSTAAR accounts for relatedness and population structure, can analyze both quantitative and dichotomous traits and boosts the power of rare variant tests by incorporating multiple variant functional annotations. Through meta-analysis of four lipid traits in 30,138 ancestrally diverse samples from 14 studies of the Trans Omics for Precision Medicine (TOPMed) Program, we show that MetaSTAAR performs rare variant meta-analysis at scale and produces results comparable to using pooled data. Additionally, we identified several conditionally significant rare variant associations with lipid traits. We further demonstrate that MetaSTAAR is scalable to biobank-scale cohorts through meta-analysis of TOPMed WGS data and UK Biobank WES data of ~200,000 samples.
Collapse
Affiliation(s)
- Xihao Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Corbin Quick
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Hufeng Zhou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Sheila M Gaynor
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Yaowu Liu
- School of Statistics, Southwestern University of Finance and Economics, Chengdu, China
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Margaret Sunitha Selvaraj
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Ryan Sun
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Rounak Dey
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Donna K Arnett
- University of Kentucky, College of Public Health, Lexington, KY, USA
| | - Lawrence F Bielak
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Joshua C Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Donald W Bowden
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Jennifer A Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Brian E Cade
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - Adolfo Correa
- Jackson Heart Study, Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - L Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, MA, USA
| | - Joanne E Curran
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Paul S de Vries
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Ravindranath Duggirala
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Barry I Freedman
- Department of Internal Medicine, Nephrology, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Harald H H Göring
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Jeffrey Haessler
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Rita R Kalyani
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Brian G Kral
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Leslie A Lange
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Lisa W Martin
- Division of Cardiology, George Washington School of Medicine and Health Sciences, Washington, DC, USA
| | - Stephen T McGarvey
- Department of Epidemiology, International Health Institute, Department of Anthropology, Brown University, Providence, RI, USA
| | - Braxton D Mitchell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Geriatrics Research and Education Clinical Center, Baltimore VA Medical Center, Baltimore, MD, USA
| | - May E Montasser
- Division of Endocrinology, Diabetes, and Nutrition, Program for Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Take Naseri
- Ministry of Health, Government of Samoa, Apia, Samoa
| | - Jeffrey R O'Connell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Nicholette D Palmer
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Patricia A Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
- Departments of Epidemiology, University of Washington, Seattle, WA, USA
- Department of Health Systems and Population Health, University of Washington, Seattle, WA, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
- Division of Pulmonary, Critical Care, and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Alexander P Reiner
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Departments of Epidemiology, University of Washington, Seattle, WA, USA
| | | | - Kenneth M Rice
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Colleen M Sitlani
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Jennifer A Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Kent D Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Ramachandran S Vasan
- Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, MA, USA
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Cristen J Willer
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - James G Wilson
- Division of Cardiology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Lisa R Yanek
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Wei Zhao
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Pradeep Natarajan
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Gina M Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Zilin Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA.
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Department of Statistics, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
4
|
Chen X, Zhang H, Liu M, Deng HW, Wu Z. Simultaneous detection of novel genes and SNPs by adaptive p-value combination. Front Genet 2022; 13:1009428. [DOI: 10.3389/fgene.2022.1009428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 11/03/2022] [Indexed: 11/18/2022] Open
Abstract
Combining SNP p-values from GWAS summary data is a promising strategy for detecting novel genetic factors. Existing statistical methods for the p-value-based SNP-set testing confront two challenges. First, the statistical power of different methods depends on unknown patterns of genetic effects that could drastically vary over different SNP sets. Second, they do not identify which SNPs primarily contribute to the global association of the whole set. We propose a new signal-adaptive analysis pipeline to address these challenges using the omnibus thresholding Fisher’s method (oTFisher). The oTFisher remains robustly powerful over various patterns of genetic effects. Its adaptive thresholding can be applied to estimate important SNPs contributing to the overall significance of the given SNP set. We develop efficient calculation algorithms to control the type I error rate, which accounts for the linkage disequilibrium among SNPs. Extensive simulations show that the oTFisher has robustly high power and provides a higher balanced accuracy in screening SNPs than the traditional Bonferroni and FDR procedures. We applied the oTFisher to study the genetic association of genes and haplotype blocks of the bone density-related traits using the summary data of the Genetic Factors for Osteoporosis Consortium. The oTFisher identified more novel and literature-reported genetic factors than existing p-value combination methods. Relevant computation has been implemented into the R package TFisher to support similar data analysis.
Collapse
|
5
|
Wang X, Gharahkhani P, Levine DM, Fitzgerald RC, Gockel I, Corley DA, Risch HA, Bernstein L, Chow WH, Onstad L, Shaheen NJ, Lagergren J, Hardie LJ, Wu AH, Pharoah PDP, Liu G, Anderson LA, Iyer PG, Gammon MD, Caldas C, Ye W, Barr H, Moayyedi P, Harrison R, Watson RGP, Attwood S, Chegwidden L, Love SB, MacDonald D, deCaestecker J, Prenen H, Ott K, Moebus S, Venerito M, Lang H, Mayershofer R, Knapp M, Veits L, Gerges C, Weismüller J, Reeh M, Nöthen MM, Izbicki JR, Manner H, Neuhaus H, Rösch T, Böhmer AC, Hölscher AH, Anders M, Pech O, Schumacher B, Schmidt C, Schmidt T, Noder T, Lorenz D, Vieth M, May A, Hess T, Kreuser N, Becker J, Ell C, Tomlinson I, Palles C, Jankowski JA, Whiteman DC, MacGregor S, Schumacher J, Vaughan TL, Buas MF, Dai JY. eQTL Set-Based Association Analysis Identifies Novel Susceptibility Loci for Barrett Esophagus and Esophageal Adenocarcinoma. Cancer Epidemiol Biomarkers Prev 2022; 31:1735-1745. [PMID: 35709760 PMCID: PMC9444939 DOI: 10.1158/1055-9965.epi-22-0096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 04/13/2022] [Accepted: 06/13/2022] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Over 20 susceptibility single-nucleotide polymorphisms (SNP) have been identified for esophageal adenocarcinoma (EAC) and its precursor, Barrett esophagus (BE), explaining a small portion of heritability. METHODS Using genetic data from 4,323 BE and 4,116 EAC patients aggregated by international consortia including the Barrett's and Esophageal Adenocarcinoma Consortium (BEACON), we conducted a comprehensive transcriptome-wide association study (TWAS) for BE/EAC, leveraging Genotype Tissue Expression (GTEx) gene-expression data from six tissue types of plausible relevance to EAC etiology: mucosa and muscularis from the esophagus, gastroesophageal (GE) junction, stomach, whole blood, and visceral adipose. Two analytical approaches were taken: standard TWAS using the predicted gene expression from local expression quantitative trait loci (eQTL), and set-based SKAT association using selected eQTLs that predict the gene expression. RESULTS Although the standard approach did not identify significant signals, the eQTL set-based approach identified eight novel associations, three of which were validated in independent external data (eQTL SNP sets for EXOC3, ZNF641, and HSP90AA1). CONCLUSIONS This study identified novel genetic susceptibility loci for EAC and BE using an eQTL set-based genetic association approach. IMPACT This study expanded the pool of genetic susceptibility loci for EAC and BE, suggesting the potential of the eQTL set-based genetic association approach as an alternative method for TWAS analysis.
Collapse
Affiliation(s)
- Xiaoyu Wang
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Puya Gharahkhani
- QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - David M. Levine
- Department of Biostatistics, University of Washington, School of Public Health, Seattle, Washington, USA
| | - Rebecca C. Fitzgerald
- Medical Research Council (MRC) Cancer Unit, Hutchison-MRC Research Centre, University of Cambridge, Cambridge, UK
| | - Ines Gockel
- Department of Visceral, Transplant, Thoracic and Vascular Surgery, University Hospital of Leipzig, Leipzig, Germany
| | - Douglas A. Corley
- Division of Research, Kaiser Permanente Northern California, Oakland, California, USA
- San Francisco Medical Center, Kaiser Permanente Northern California, San Francisco, California, USA
| | - Harvey A. Risch
- Department of Chronic Disease Epidemiology, Yale School of Public Health, New Haven, Connecticut, USA
| | - Leslie Bernstein
- Department of Population Sciences, Beckman Research Institute and City of Hope Comprehensive Cancer Center, Duarte, California, USA
| | - Wong-Ho Chow
- Department of Epidemiology, MD Anderson Cancer Center, Houston, Texas, USA
| | - Lynn Onstad
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Nicholas J. Shaheen
- Division of Gastroenterology and Hepatology, University of North Carolina School of Medicine, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Jesper Lagergren
- Department of Molecular Medicine and Surgery, Karolinska Institute, Stockholm, Sweden
- School of Cancer and Pharmaceutical Sciences, King’s College London
| | | | - Anna H. Wu
- Department of Population and Public Health Sciences, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, California, USA
| | - Paul D. P. Pharoah
- Department of Oncology, University of Cambridge, Cambridge, UK
- Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Geoffrey Liu
- Pharmacogenomic Epidemiology, Ontario Cancer Institute, Toronto, Ontario, Canada
| | - Lesley A. Anderson
- Department of Epidemiology and Public Health, Queen's University of Belfast, Royal Group of Hospitals, Northern Ireland
| | - Prasad G. Iyer
- Division of Gastroenterology and Hepatology, Mayo Clinic, Rochester, Minnesota, USA
| | - Marilie D. Gammon
- Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Carlos Caldas
- Cancer Research UK, Cambridge Institute, Cambridge, UK
| | - Weimin Ye
- Department of Molecular Medicine and Surgery, Karolinska Institute, Stockholm, Sweden
| | - Hugh Barr
- Department of Upper GI Surgery, Gloucestershire Royal Hospital, Gloucester, UK
| | - Paul Moayyedi
- Farncombe Family Digestive Health Research Institute, Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Rebecca Harrison
- Department of Pathology, Leicester Royal Infirmary, Leicester, UK
| | - RG Peter Watson
- Department of Medicine, Institute of Clinical Science, Royal Victoria Hospital, Belfast, UK
| | - Stephen Attwood
- Department of General Surgery, North Tyneside General Hospital, North Shields, UK
| | - Laura Chegwidden
- University of Cambridge Metabolic Research Laboratories, Wellcome Trust-MRC Institute of Metabolic Science, Addenbrooke’s Hospital, Cambridge, UK
| | - Sharon B. Love
- Centre for Statistics in Medicine and Oxford Clinical Trials Research Unit, Oxford, UK
| | - David MacDonald
- Department of Oral Biological and Medical Sciences, University of British Columbia, Vancouver, British Columbia, Canada
| | - John deCaestecker
- Digestive Diseases Centre, University Hospitals of Leicester, Leicester, UK
| | - Hans Prenen
- Oncology Department, University Hospital Antwerp, Edegem, Belgium
| | - Katja Ott
- Department of General, Visceral and Transplantation Surgery, University of Heidelberg, Heidelberg, Germany
- Department of General, Visceral and Thorax Surgery, RoMed Klinikum Rosenheim, Rosenheim, Germany
| | - Susanne Moebus
- Institute for Urban Public Health, University Hospitals, University of Duisburg-Essen, Essen, Germany
| | - Marino Venerito
- Department of Gastroenterology, Hepatology and Infectious Diseases, Otto-von-Guericke University Hospital, Magdeburg, Germany
| | - Hauke Lang
- Department of General, Visceral and Transplant Surgery, University Medical Center, University of Mainz, Mainz, Germany
| | | | - Michael Knapp
- Institute for Medical Biometry, Informatics, and Epidemiology, University of Bonn, Bonn, Germany
| | - Lothar Veits
- Institute of Pathology, Friedrich-Alexander-University Erlangen-Nuremberg, Klinikum Bayreuth, Bayreuth, Germany
| | - Christian Gerges
- Department of Internal Medicine, Evangelisches Krankenhaus, Düsseldorf, Germany
| | | | - Matthias Reeh
- Department of General, Visceral and Thoracic Surgery, Asklepios Harzklinik Goslar, Goslar, Germany
| | - Markus M. Nöthen
- Institute of Human Genetics, Medical Faculty, University of Bonn, Bonn, Germany
| | - Jakob R. Izbicki
- General, Visceral and Thoracic Surgery Department and Clinic. University Medical Center Hamburg-Eppendorf. Hamburg. Germany
| | - Hendrik Manner
- Department of Internal Medicine II, Frankfurt Hoechst Hospital, Frankfurt, Germany
| | - Horst Neuhaus
- Department of Internal Medicine, Evangelisches Krankenhaus, Düsseldorf, Germany
| | - Thomas Rösch
- Department of Interdisciplinary Endoscopy, University Hospital Hamburg-Eppendorf, Hamburg, Germany
| | - Anne C. Böhmer
- Institute of Human Genetics, Medical Faculty, University of Bonn, Bonn, Germany
| | - Arnulf H. Hölscher
- Clinic for General, Visceral and Trauma Surgery, Contilia Center for Esophageal Diseases. Elisabeth Hospital Essen, Germany
| | - Mario Anders
- Department of Interdisciplinary Endoscopy, University Hospital Hamburg-Eppendorf, Hamburg, Germany
- Department of Gastroenterology and Interdisciplinary Endoscopy, Vivantes Wenckebach-Klinikum, Berlin, Germany
| | - Oliver Pech
- Department of Gastroenterology and Interventional Endoscopy, St. John of God Hospital, Regensburg, Germany
| | - Brigitte Schumacher
- Department of Internal Medicine, Evangelisches Krankenhaus, Düsseldorf, Germany
- Department of Internal Medicine and Gastroenterology, Elisabeth Hospital, Essen, Germany
| | - Claudia Schmidt
- Department of General, Visceral and Cancer Surgery, University of Cologne, Cologne, Germany
| | - Thomas Schmidt
- Department of General, Visceral and Transplantation Surgery, University of Heidelberg, Heidelberg, Germany
| | - Tania Noder
- Department of Interdisciplinary Endoscopy, University Hospital Hamburg-Eppendorf, Hamburg, Germany
| | - Dietmar Lorenz
- Department of General and Visceral Surgery, Sana Klinikum, Offenbach, Germany
| | - Michael Vieth
- Institute of Pathology, Friedrich-Alexander-University Erlangen-Nuremberg, Klinikum Bayreuth, Bayreuth, Germany
| | - Andrea May
- Department of Gastroenterology, Oncology and Pneumology, Asklepios Paulinen Klinik, Wiesbaden, Germany
| | - Timo Hess
- Center for Human Genetics, University Hospital of Marburg, Marburg, Germany
| | - Nicole Kreuser
- Department of Visceral, Transplant, Thoracic and Vascular Surgery, University Hospital of Leipzig, Leipzig, Germany
| | - Jessica Becker
- Institute of Human Genetics, Medical Faculty, University of Bonn, Bonn, Germany
| | - Christian Ell
- Department of Medicine II, Sana Klinikum, Offenbach, Germany
| | - Ian Tomlinson
- Edinburgh Cancer Research Centre, IGMM, University of Edinburgh, UK
| | - Claire Palles
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK
| | | | - David C. Whiteman
- Cancer Control, QIMR Berghofer Medical Research Institute, Brisbane, Australia
| | - Stuart MacGregor
- QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | | | - Thomas L. Vaughan
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Epidemiology, University of Washington, School of Public Health, Seattle, Washington, USA
| | - Matthew F. Buas
- Department of Cancer Prevention and Control, Roswell Park Comprehensive Cancer Center, Buffalo, New York 14263 USA
| | - James Y. Dai
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Biostatistics, University of Washington, School of Public Health, Seattle, Washington, USA
| |
Collapse
|
6
|
Wang R, Lin DY, Jiang Y. EPIC: Inferring relevant cell types for complex traits by integrating genome-wide association studies and single-cell RNA sequencing. PLoS Genet 2022; 18:e1010251. [PMID: 35709291 PMCID: PMC9242467 DOI: 10.1371/journal.pgen.1010251] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 06/29/2022] [Accepted: 05/12/2022] [Indexed: 11/18/2022] Open
Abstract
More than a decade of genome-wide association studies (GWASs) have identified genetic risk variants that are significantly associated with complex traits. Emerging evidence suggests that the function of trait-associated variants likely acts in a tissue- or cell-type-specific fashion. Yet, it remains challenging to prioritize trait-relevant tissues or cell types to elucidate disease etiology. Here, we present EPIC (cEll tyPe enrIChment), a statistical framework that relates large-scale GWAS summary statistics to cell-type-specific gene expression measurements from single-cell RNA sequencing (scRNA-seq). We derive powerful gene-level test statistics for common and rare variants, separately and jointly, and adopt generalized least squares to prioritize trait-relevant cell types while accounting for the correlation structures both within and between genes. Using enrichment of loci associated with four lipid traits in the liver and enrichment of loci associated with three neurological disorders in the brain as ground truths, we show that EPIC outperforms existing methods. We apply our framework to multiple scRNA-seq datasets from different platforms and identify cell types underlying type 2 diabetes and schizophrenia. The enrichment is replicated using independent GWAS and scRNA-seq datasets and further validated using PubMed search and existing bulk case-control testing results.
Collapse
Affiliation(s)
- Rujin Wang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Dan-Yu Lin
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, United States of America
- * E-mail: (D-YL); (YJ)
| | - Yuchao Jiang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Department of Genetics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina, United States of America
- * E-mail: (D-YL); (YJ)
| |
Collapse
|
7
|
Muller YL, Sutherland J, Nair AK, Koroglu C, Kobes S, Knowler WC, Van Hout CV, Shuldiner AR, Hanson RL, Bogardus C, Baier LJ. A missense variant Arg611Cys in LIPE which encodes hormone sensitive lipase decreases lipolysis and increases risk of type 2 diabetes in American Indians. Diabetes Metab Res Rev 2022; 38:e3504. [PMID: 34655148 DOI: 10.1002/dmrr.3504] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 10/09/2021] [Indexed: 11/08/2022]
Abstract
AIMS Hormone sensitive lipase (HSL), encoded by the LIPE gene, is involved in lipolysis. Based on prior animal and human studies, LIPE was analysed as a candidate gene for the development of type 2 diabetes (T2D) in a community-based sample of American Indians. MATERIALS AND METHODS Whole-exome sequence data from 6782 participants with longitudinal clinical measures were used to identify variation in LIPE. RESULTS Amongst the 16 missense variants identified, an Arg611Cys variant (rs34052647; Cys-allele frequency = 0.087) significantly associated with T2D (OR [95% CI] = 1.38 [1.17-1.64], p = 0.0002, adjusted for age, sex, birth year, and the first five genetic principal components) and an earlier onset age of T2D (HR = 1.22 [1.09-1.36], p = 0.0005). This variant was further analysed for quantitative traits related to T2D. Amongst non-diabetic American Indians, those with the T2D risk Cys-allele had increased insulin levels during an oral glucose tolerance test (0.07 SD per Cys-allele, p = 0.04) and a mixed meal test (0.08 log10 µU/ml per Cys-allele, p = 0.003), and had increased lipid oxidation rates post-absorptively and during insulin infusion (0.07 mg [kg estimated metabolic body size {EMBS}]-1 min-1 per Cys-allele for both, p = 0.01 and 0.009, respectively), compared to individuals with the non-risk Arg-allele. In vitro functional studies showed that cells expressing the Cys-allele had a 17.2% decrease in lipolysis under isoproterenol stimulation (p = 0.03) and a 21.3% decrease in lipase enzyme activity measured by using p-nitrophenyl butyrate as a substrate (p = 0.04) compared to the Arg-allele. CONCLUSION The Arg611Cys variant causes a modest impairment in lipolysis, thereby affecting glucose homoeostasis and risk of T2D.
Collapse
Affiliation(s)
- Yunhua L Muller
- Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, Arizona, USA
| | - Jeff Sutherland
- Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, Arizona, USA
| | - Anup K Nair
- Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, Arizona, USA
| | - Cigdem Koroglu
- Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, Arizona, USA
| | - Sayuko Kobes
- Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, Arizona, USA
| | - William C Knowler
- Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, Arizona, USA
| | | | - Alan R Shuldiner
- Regeneron Genetics Centre, Regeneron Pharmaceuticals, Tarrytown, New York, USA
| | - Robert L Hanson
- Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, Arizona, USA
| | - Clifton Bogardus
- Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, Arizona, USA
| | - Leslie J Baier
- Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, Arizona, USA
| |
Collapse
|
8
|
Shi J, Boehnke M, Lee S. Trans-ethnic meta-analysis of rare variants in sequencing association studies. Biostatistics 2021; 22:706-722. [PMID: 31883325 DOI: 10.1093/biostatistics/kxz061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 11/06/2019] [Accepted: 12/02/2019] [Indexed: 11/15/2022] Open
Abstract
Trans-ethnic meta-analysis is a powerful tool for detecting novel loci in genetic association studies. However, in the presence of heterogeneity among different populations, existing gene-/region-based rare variants meta-analysis methods may be unsatisfactory because they do not consider genetic similarity or dissimilarity among different populations. In response, we propose a score test under the modified random effects model for gene-/region-based rare variants associations. We adapt the kernel regression framework to construct the model and incorporate genetic similarities across populations into modeling the heterogeneity structure of the genetic effect coefficients. We use a resampling-based copula method to approximate asymptotic distribution of the test statistic, enabling efficient estimation of p-values. Simulation studies show that our proposed method controls type I error rates and increases power over existing approaches in the presence of heterogeneity. We illustrate our method by analyzing T2D-GENES consortium exome sequence data to explore rare variant associations with several traits.
Collapse
Affiliation(s)
- Jingchunzi Shi
- Thomas Francis, Jr. School of Public Health II, 1420 Washington Heights, Ann Arbor, MI 48109, USA
| | - Michael Boehnke
- Thomas Francis, Jr. School of Public Health II, 1420 Washington Heights, Ann Arbor, MI 48109, USA
| | - Seunggeun Lee
- Thomas Francis, Jr. School of Public Health II, 1420 Washington Heights, Ann Arbor, MI 48109, USA
| |
Collapse
|
9
|
Jin X, Shi G. Variance-component-based meta-analysis of gene-environment interactions for rare variants. G3-GENES GENOMES GENETICS 2021; 11:6298593. [PMID: 34544119 PMCID: PMC8661424 DOI: 10.1093/g3journal/jkab203] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 06/07/2021] [Indexed: 11/13/2022]
Abstract
Complex diseases are often caused by interplay between genetic and environmental factors. Existing gene-environment interaction (G × E) tests for rare variants largely focus on detecting gene-based G × E effects in a single study; thus, their statistical power is limited by the sample size of the study. Meta-analysis methods that synthesize summary statistics of G × E effects from multiple studies for rare variants are still limited. Based on variance component models, we propose four meta-analysis methods of testing G × E effects for rare variants: HOM-INT-FIX, HET-INT-FIX, HOM-INT-RAN, and HET-INT-RAN. Our methods consider homogeneous or heterogeneous G × E effects across studies and treat the main genetic effect as either fixed or random. Through simulations, we show that the empirical distributions of the four meta-statistics under the null hypothesis align with their expected theoretical distributions. When the interaction effect is homogeneous across studies, HOM-INT-FIX and HOM-INT-RAN have as much statistical power as a pooled analysis conducted on a single interaction test with individual-level data from all studies. When the interaction effect is heterogeneous across studies, HET-INT-FIX and HET-INT-RAN provide higher power than pooled analysis. Our methods are further validated via testing 12 candidate gene-age interactions in blood pressure traits using whole-exome sequencing data from UK Biobank.
Collapse
Affiliation(s)
- Xiaoqin Jin
- State Key Laboratory of Integrated Services Networks, Xidian University, Xi'an 710071, China
| | - Gang Shi
- State Key Laboratory of Integrated Services Networks, Xidian University, Xi'an 710071, China
| |
Collapse
|
10
|
Zhu J, Fan Q, Deng W, Wang Y, Guo X. BTOB: Extending the Biased GWAS to Bivariate GWAS. Front Genet 2021; 12:654821. [PMID: 34025719 PMCID: PMC8134661 DOI: 10.3389/fgene.2021.654821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2021] [Accepted: 04/07/2021] [Indexed: 11/13/2022] Open
Abstract
In recent years, a number of literatures published large-scale genome-wide association studies (GWASs) for human diseases or traits while adjusting for other heritable covariate. However, it is known that these GWASs are biased, which may lead to biased genetic estimates or even false positives. In this study, we provide a method called "BTOB" which extends the biased GWAS to bivariate GWAS by integrating the summary association statistics from the biased GWAS and the GWAS for the adjusted heritable covariate. We employ the proposed BTOB method to analyze the summary association statistics from the large scale meta-GWASs for waist-to-hip ratio (WHR) and body mass index (BMI), and show that the proposed approach can help identify more susceptible genes compared with the corresponding univariate GWASs. Theoretical results and simulations also confirm the validity and efficiency of the proposed BTOB method.
Collapse
Affiliation(s)
- Junxian Zhu
- Department of Statistical Science, School of Mathematics, Sun Yat-sen University, Guangzhou, China
| | - Qiao Fan
- Center for Quantitative Medicine, Duke-National University of Singapore Medical School, Singapore, Singapore
| | - Wenying Deng
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, United States
| | - Yimeng Wang
- Department of Statistical Science, School of Mathematics, Sun Yat-sen University, Guangzhou, China
| | - Xiaobo Guo
- Department of Statistical Science, School of Mathematics, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
11
|
Day SE, Muller YL, Koroglu C, Kobes S, Wiedrich K, Mahkee D, Kim HI, Van Hout C, Gosalia N, Ye B, Shuldiner AR, Knowler WC, Hanson RL, Bogardus C, Baier LJ. Exome Sequencing of 21 Bardet-Biedl Syndrome (BBS) Genes to Identify Obesity Variants in 6,851 American Indians. Obesity (Silver Spring) 2021; 29:748-754. [PMID: 33616283 PMCID: PMC8048836 DOI: 10.1002/oby.23115] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 11/30/2020] [Accepted: 12/03/2020] [Indexed: 11/08/2022]
Abstract
OBJECTIVE In an ongoing effort to identify the genetic variation that contributes to obesity in American Indians, known Bardet-Biedl syndrome (BBS) genes were analyzed for an effect on BMI and leptin signaling. METHODS Potentially deleterious variants (Combined Annotation Dependent Depletion score > 20) in BBS genes were identified in whole-exome sequence data from 6,851 American Indians informative for BMI. Common variants (detected in ≥ 10 individuals) were analyzed for association with BMI; rare variants (detected in < 10 individuals) were analyzed for mean BMI of carriers. Functional assessment of variants' effect on signal transducer and activator of transcription 3 (STAT3) activity was performed in vitro. RESULTS One common variant, rs59252892 (Thr549Ile) in BBS9, was associated with BMI (P = 0.0008, β = 25% increase per risk allele). Among rare variants for which carriers had severe obesity (mean BMI > 40 kg/m2 ), four were in BBS9. In vitro analysis of BBS9 found the Ile allele at Thr549Ile had a 20% increase in STAT3 activity compared with the Thr allele (P = 0.01). Western blot analysis showed the Ile allele had a 15% increase in STAT3 phosphorylation (P = 0.006). Comparable functional results were observed with Ser545Gly and Val209Leu but not Leu665Phe and Lys810Glu. CONCLUSIONS Potentially functional variants in BBS genes in American Indians are reported. However, functional evidence supporting a causal role for BBS9 in obesity is inconclusive.
Collapse
Affiliation(s)
- Samantha E. Day
- Phoenix Epidemiology and Clinical Research BranchNational Institute of Diabetes and Digestive and Kidney DiseasesNational Institutes of HealthPhoenixArizonaUSA
| | - Yunhua L. Muller
- Phoenix Epidemiology and Clinical Research BranchNational Institute of Diabetes and Digestive and Kidney DiseasesNational Institutes of HealthPhoenixArizonaUSA
| | - Cigdem Koroglu
- Phoenix Epidemiology and Clinical Research BranchNational Institute of Diabetes and Digestive and Kidney DiseasesNational Institutes of HealthPhoenixArizonaUSA
| | - Sayuko Kobes
- Phoenix Epidemiology and Clinical Research BranchNational Institute of Diabetes and Digestive and Kidney DiseasesNational Institutes of HealthPhoenixArizonaUSA
| | - Kim Wiedrich
- Phoenix Epidemiology and Clinical Research BranchNational Institute of Diabetes and Digestive and Kidney DiseasesNational Institutes of HealthPhoenixArizonaUSA
| | - Darin Mahkee
- Phoenix Epidemiology and Clinical Research BranchNational Institute of Diabetes and Digestive and Kidney DiseasesNational Institutes of HealthPhoenixArizonaUSA
| | - Hye In Kim
- Regeneron Genetics CenterTarrytownNew YorkUSA
| | | | | | - Bin Ye
- Regeneron Genetics CenterTarrytownNew YorkUSA
| | | | | | - William C. Knowler
- Phoenix Epidemiology and Clinical Research BranchNational Institute of Diabetes and Digestive and Kidney DiseasesNational Institutes of HealthPhoenixArizonaUSA
| | - Robert L. Hanson
- Phoenix Epidemiology and Clinical Research BranchNational Institute of Diabetes and Digestive and Kidney DiseasesNational Institutes of HealthPhoenixArizonaUSA
| | - Clifton Bogardus
- Phoenix Epidemiology and Clinical Research BranchNational Institute of Diabetes and Digestive and Kidney DiseasesNational Institutes of HealthPhoenixArizonaUSA
| | - Leslie J. Baier
- Phoenix Epidemiology and Clinical Research BranchNational Institute of Diabetes and Digestive and Kidney DiseasesNational Institutes of HealthPhoenixArizonaUSA
| |
Collapse
|
12
|
Liu Z, Derkach A, Yu KJ, Yeager M, Chang YS, Chen CJ, Gyllensten U, Lan Q, Lee MH, McKay JD, Rothman N, Yang HI, Hildesheim A, Pfeiffer RM. Patterns of Human Leukocyte Antigen Class I and Class II Associations and Cancer. Cancer Res 2021; 81:1148-1152. [PMID: 33272927 PMCID: PMC9986718 DOI: 10.1158/0008-5472.can-20-2292] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Revised: 10/26/2020] [Accepted: 11/30/2020] [Indexed: 11/16/2022]
Abstract
Human leukocyte antigen (HLA) gene variation is associated with risk of cancers, particularly those with infectious etiology or hematopoietic origin, given its role in immune presentation. Previous studies focused primarily on HLA allele/haplotype-specific associations. To answer whether associations are driven by HLA class I (essential for T-cell cytotoxicity) or class II (important for T-cell helper responses) genes, we analyzed GWAS from 24 case-control studies and consortia comprising 27 cancers (totaling >71,000 individuals). Associations for most cancers with infectious etiology or of hematopoietic origin were driven by multiple HLA regions, suggesting that both cytotoxic and helper T-cell responses are important. Notable exceptions were observed for nasopharyngeal carcinoma, an EBV-associated cancer, and CLL/SLL forms of non-Hodgkin lymphomas; these cancers were associated with HLA class I region only and HLA class II region only, implying the importance of cytotoxic T-cell responses for the former and CD4+ T-cell helper responses for the latter. Our findings suggest that increased understanding of the pattern of HLA associations for individual cancers could lead to better insights into specific mechanisms involved in cancer pathogenesis. SIGNIFICANCE: GWAS of >71,000 individuals across 27 cancer types suggest that patterns of HLA Class I and Class II associations may provide etiologic insights for cancer.
Collapse
Affiliation(s)
- Zhiwei Liu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland.
| | - Andriy Derkach
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Kelly J Yu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland
| | - Meredith Yeager
- Division of Cancer Epidemiology and Genetics, Cancer Genomics Research Laboratory, National Cancer Institute, Bethesda, Maryland
- Frederick National Laboratory for Cancer Research, Frederick, Maryland
| | - Yu-Sun Chang
- Department of Microbiology and Immunology and Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
- Department of Otolaryngology-Head and Neck Surgery, Chang Gung Memorial Hospital, Lin-Kou, Taiwan
| | - Chien-Jen Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Ulf Gyllensten
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory Uppsala, Uppsala University, Uppsala, Sweden
| | - Qing Lan
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland
| | - Mei-Hsuan Lee
- Institute of Clinical Medicine, National Yang-Ming University, Taipei, Taiwan
| | - James D McKay
- International Agency for Research on Cancer, World Health Organization, Lyon, France
| | - Nathaniel Rothman
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland
| | - Hwai-I Yang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Institute of Clinical Medicine, National Yang-Ming University, Taipei, Taiwan
- Graduate Institute of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Allan Hildesheim
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland
| | - Ruth M Pfeiffer
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland
| |
Collapse
|
13
|
Dong X, Su YR, Barfield R, Bien SA, He Q, Harrison TA, Huyghe JR, Keku TO, Lindor NM, Schafmayer C, Chan AT, Gruber SB, Jenkins MA, Kooperberg C, Peters U, Hsu L. A general framework for functionally informed set-based analysis: Application to a large-scale colorectal cancer study. PLoS Genet 2020; 16:e1008947. [PMID: 32833970 PMCID: PMC7470748 DOI: 10.1371/journal.pgen.1008947] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 09/03/2020] [Accepted: 06/22/2020] [Indexed: 11/18/2022] Open
Abstract
Genome-wide association studies (GWAS) have successfully identified tens of thousands of genetic variants associated with various phenotypes, but together they explain only a fraction of heritability, suggesting many variants have yet to be discovered. Recently it has been recognized that incorporating functional information of genetic variants can improve power for identifying novel loci. For example, S-PrediXcan and TWAS tested the association of predicted gene expression with phenotypes based on GWAS summary statistics by leveraging the information on genetic regulation of gene expression and found many novel loci. However, as genetic variants may have effects on more than one gene and through different mechanisms, these methods likely only capture part of the total effects of these variants. In this paper, we propose a summary statistics-based mixed effects score test (sMiST) that tests for the total effect of both the effect of the mediator by imputing genetically predicted gene expression, like S-PrediXcan and TWAS, and the direct effects of individual variants. It allows for multiple functional annotations and multiple genetically predicted mediators. It can also perform conditional association analysis while adjusting for other genetic variants (e.g., known loci for the phenotype). Extensive simulation and real data analyses demonstrate that sMiST yields p-values that agree well with those obtained from individual level data but with substantively improved computational speed. Importantly, a broad application of sMiST to GWAS is possible, as only summary statistics of genetic variant associations are required. We apply sMiST to a large-scale GWAS of colorectal cancer using summary statistics from ∼120, 000 study participants and gene expression data from the Genotype-Tissue Expression (GTEx) project. We identify several novel and secondary independent genetic loci.
Collapse
Affiliation(s)
- Xinyuan Dong
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Yu-Ru Su
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Richard Barfield
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Stephanie A. Bien
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Qianchuan He
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Tabitha A. Harrison
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Jeroen R. Huyghe
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Temitope O. Keku
- Center for Gastrointestinal Biology and Disease, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Noralane M. Lindor
- Department of Health Science Research, Mayo Clinic, Scottsdale, Arizona, USA
| | - Clemens Schafmayer
- Department of General Surgery, University Hospital Rostock, Rostock, Germany
| | - Andrew T. Chan
- Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, and Channing Division of Network Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Stephen B. Gruber
- City of Hope National Medical Center, Duarte, and Department of Preventive Medicine & USC Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| | - Mark A. Jenkins
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, Victoria, Australia
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Ulrike Peters
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Li Hsu
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| |
Collapse
|
14
|
Guo B, Wu B. Integrate multiple traits to detect novel trait-gene association using GWAS summary data with an adaptive test approach. Bioinformatics 2020; 35:2251-2257. [PMID: 30476000 DOI: 10.1093/bioinformatics/bty961] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Revised: 10/30/2018] [Accepted: 11/22/2018] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Genetics hold great promise to precision medicine by tailoring treatment to the individual patient based on their genetic profiles. Toward this goal, many large-scale genome-wide association studies (GWAS) have been performed in the last decade to identify genetic variants associated with various traits and diseases. They have successfully identified tens of thousands of disease-related variants. However they have explained only a small proportion of the overall trait heritability for most traits and are of very limited clinical use. This is partly owing to the small effect sizes of most genetic variants, and the common practice of testing association between one trait and one genetic variant at a time in most GWAS, even when multiple related traits are often measured for each individual. Increasing evidence suggests that many genetic variants can influence multiple traits simultaneously, and we can gain more power by testing association of multiple traits simultaneously. It is appealing to develop novel multi-trait association test methods that need only GWAS summary data, since it is generally very hard to access the individual-level GWAS phenotype and genotype data. RESULTS Many existing GWAS summary data-based association test methods have relied on ad hoc approach or crude Monte Carlo approximation. In this article, we develop rigorous statistical methods for efficient and powerful multi-trait association test. We develop robust and efficient methods to accurately estimate the marginal trait correlation matrix using only GWAS summary data. We construct the principal component (PC)-based association test from the summary statistics. PC-based test has optimal power when the underlying multi-trait signal can be captured by the first PC, and otherwise it will have suboptimal performance. We develop an adaptive test by optimally weighting the PC-based test and the omnibus chi-square test to achieve robust performance under various scenarios. We develop efficient numerical algorithms to compute the analytical P-values for all the proposed tests without the need of Monte Carlo sampling. We illustrate the utility of proposed methods through application to the GWAS meta-analysis summary data for multiple lipids and glycemic traits. We identify multiple novel loci that were missed by individual trait-based association test. AVAILABILITY AND IMPLEMENTATION All the proposed methods are implemented in an R package available at http://www.github.com/baolinwu/MTAR. The developed R programs are extremely efficient: it takes less than 2 min to compute the list of genome-wide significant single nucleotide polymorphisms (SNPs) for all proposed multi-trait tests for the lipids GWAS summary data with 2.5 million SNPs on a single Linux desktop. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
15
|
Svishcheva GR, Belonogova NM, Zorkoltseva IV, Kirichenko AV, Axenovich TI. Gene-based association tests using GWAS summary statistics. Bioinformatics 2020; 35:3701-3708. [PMID: 30860568 DOI: 10.1093/bioinformatics/btz172] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 02/12/2019] [Accepted: 03/11/2019] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION A huge number of genome-wide association studies (GWAS) summary statistics freely available in databases provide a new material for gene-based association analysis aimed at identifying rare genetic variants. Only a few of the many popular gene-based methods developed for individual genotype and phenotype data are adapted for the practical use of the GWAS summary statistics as input. RESULTS We analytically prove and numerically illustrate that all popular powerful methods developed for gene-based association analysis of individual phenotype and genotype data can be modified to utilize GWAS summary statistics. We have modified and implemented all of the popular methods, including burden and kernel machine-based tests, multiple and functional linear regression, principal components analysis and others, in the R package sumFREGAT. Using real summary statistics for coronary artery disease, we show that the new package is able to detect genes not found by the existing packages. AVAILABILITY AND IMPLEMENTATION The R package sumFREGAT is freely and publicly available at: https://CRAN.R-project.org/package=sumFREGAT. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gulnara R Svishcheva
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Vavilov Institute of General Genetics, the Russian Academy of Sciences, Moscow, Russia
| | - Nadezhda M Belonogova
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Irina V Zorkoltseva
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Anatoly V Kirichenko
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Tatiana I Axenovich
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Department of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia.,Department of Biotechnology, L.K. Ernst Federal Center for Animal Husbandry, Dubrovitsy, Russia
| |
Collapse
|
16
|
Luo L, Shen J, Zhang H, Chhibber A, Mehrotra DV, Tang ZZ. Multi-trait analysis of rare-variant association summary statistics using MTAR. Nat Commun 2020; 11:2850. [PMID: 32503972 PMCID: PMC7275056 DOI: 10.1038/s41467-020-16591-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Accepted: 05/09/2020] [Indexed: 12/13/2022] Open
Abstract
Integrating association evidence across multiple traits can improve the power of gene discovery and reveal pleiotropy. Most multi-trait analysis methods focus on individual common variants in genome-wide association studies. Here, we introduce multi-trait analysis of rare-variant associations (MTAR), a framework for joint analysis of association summary statistics between multiple rare variants and different traits. MTAR achieves substantial power gain by leveraging the genome-wide genetic correlation measure to inform the degree of gene-level effect heterogeneity across traits. We apply MTAR to rare-variant summary statistics for three lipid traits in the Global Lipids Genetics Consortium. 99 genome-wide significant genes were identified in the single-trait-based tests, and MTAR increases this to 139. Among the 11 novel lipid-associated genes discovered by MTAR, 7 are replicated in an independent UK Biobank GWAS analysis. Our study demonstrates that MTAR is substantially more powerful than single-trait-based tests and highlights the value of MTAR for novel gene discovery.
Collapse
Affiliation(s)
- Lan Luo
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, 53706, USA
| | - Judong Shen
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, 07065, USA
| | - Hong Zhang
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, 07065, USA
| | - Aparna Chhibber
- Genetics and Pharmacogenomics, Merck & Co., Inc., West Point, Pennsylvania, 19446, USA
| | - Devan V Mehrotra
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., North Wales, Pennsylvania, 19454, USA
| | - Zheng-Zheng Tang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, 53715, USA.
- Wisconsin Institute for Discovery, Madison, Wisconsin, 53715, USA.
| |
Collapse
|
17
|
Lee SP, Ashley EA, Homburger J, Caleshu C, Green EM, Jacoby D, Colan SD, Arteaga-Fernández E, Day SM, Girolami F, Olivotto I, Michels M, Ho CY, Perez MV. Incident Atrial Fibrillation Is Associated With MYH7 Sarcomeric Gene Variation in Hypertrophic Cardiomyopathy. Circ Heart Fail 2019; 11:e005191. [PMID: 30354366 DOI: 10.1161/circheartfailure.118.005191] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Background Although atrial fibrillation (AF) is common in hypertrophic cardiomyopathy (HCM) patients, the relationship between genetic variation and AF has been poorly defined. Characterizing genetic subtypes of HCM and their associations with AF may help to improve personalized medical care. We aimed to investigate the link between sarcomeric gene variation and incident AF in HCM patients. Methods and Results Patients from the multinational Sarcomeric Human Cardiomyopathy Registry were followed for incident AF. Those with likely pathogenic or pathogenic variants in sarcomeric genes were included. The AF incidence was ascertained by review of medical records and electrocardiograms at each investigative site. One thousand forty adult HCM patients, without baseline AF and with likely pathogenic or pathogenic variation in either MYH7 (n=296), MYBPC3 (n=659), or thin filament genes (n=85), were included. Compared with patients with variation in other sarcomeric genes, those with MYH7 variants were younger on first clinical encounter at the Sarcomeric Human Cardiomyopathy Registry site and more likely to be probands than the MYBPC3 variants. During an average follow-up of 7.2 years, 198 incident AF events occurred. Patients with likely pathogenic or pathogenic mutations in MYH7 had the highest incidence of AF after adjusting for age, sex, proband status, left atrial size, maximal wall thickness, and peak pressure gradient (hazard ratio, 1.7; 95% CI, 1.1-2.6; P=0.009). Conclusions During a mean follow-up of 7.2 years, new-onset AF developed in 19% of HCM patients with sarcomeric mutations. Compared with other sarcomeric genes, patients with likely pathogenic or pathogenic variation in MYH7 had a higher rate of incident AF independent of clinical and echocardiographic factors.
Collapse
Affiliation(s)
- Seung-Pyo Lee
- Division of Cardiovascular Medicine, Department of Medicine, Stanford University, CA (S.-P.L., E.A.A., M.V.P.).,Department of Internal Medicine, Seoul National University Hospital, South Korea (S.-P.L.)
| | - Euan A Ashley
- Division of Cardiovascular Medicine, Department of Medicine, Stanford University, CA (S.-P.L., E.A.A., M.V.P.).,Stanford Center for Inherited Cardiovascular Disease, Stanford University, CA (E.A.A., C.C., M.V.P.).,Department of Genetics, Stanford University, CA (E.A.A., J.H.)
| | | | - Colleen Caleshu
- Stanford Center for Inherited Cardiovascular Disease, Stanford University, CA (E.A.A., C.C., M.V.P.)
| | | | - Daniel Jacoby
- Section of Cardiovascular Medicine, Yale School of Medicine, New Haven, CT (D.J.)
| | - Steven D Colan
- Department of Cardiology, Boston Children's Hospital, MA (S.D.C.)
| | - Edmundo Arteaga-Fernández
- Laboratory of Genetics and Molecular Cardiology and Cardiomyopathies Unit, Heart Institute (InCor), University of São Paulo, Brazil (E.A.-F.)
| | - Sharlene M Day
- Division of Cardiovascular Medicine, University of Michigan Medical School, Ann Arbor (S.M.D.)
| | - Francesca Girolami
- Referral Center for Cardiomyopathies, Careggi University Hospital, Florence, Italy (F.G., I.O.)
| | - Iacopo Olivotto
- Referral Center for Cardiomyopathies, Careggi University Hospital, Florence, Italy (F.G., I.O.)
| | - Michelle Michels
- Department of Cardiology, Thoraxcenter, Erasmus Medical Center, Rotterdam, The Netherlands (M.M.)
| | - Carolyn Y Ho
- Cardiovascular Division, Brigham and Women's Hospital, Boston, MA (C.Y.H.)
| | - Marco V Perez
- Division of Cardiovascular Medicine, Department of Medicine, Stanford University, CA (S.-P.L., E.A.A., M.V.P.).,Stanford Center for Inherited Cardiovascular Disease, Stanford University, CA (E.A.A., C.C., M.V.P.)
| | | |
Collapse
|
18
|
Abstract
Background In genome-wide association studies (GWASs), meta-analysis has been widely used to improve statistical power by combining the results of different studies. Meta-analysis can detect phenotype associated variants that are failed to be detected in single studies. Especially, in biomedical sciences, meta-analysis is often necessary not only for improving statistical power, but also for reducing unavoidable limitation in data collection. As next-generation sequencing (NGS) technology has been developed, meta-analysis of rare variants is proceeding briskly along with meta-analysis of common variants in GWASs. However, meta-analysis on a single variant that is commonly used in common variant association test is improper for rare variants. A sparse signal of rare variant undermines the association signal and its large number causes multiple testing problem. To over-come these problems, we propose a meta-analysis method at the gene-level rather than variant level. Results Among many methods that have been developed, we used the unified quadratic tests (Q-tests); Q-test is more powerful than or as powerful as other tests such as Sequence Kernel Association Tests (SKAT). Since there are three different versions of Q-test (QTest1, QTest2, QTest3), each assumes different relationships among multiple rare variants, we extended them into meta-study accordingly. For meta-analysis, we consider two types of approaches, the one is to combine regression coefficients and the other is to combine test statistics from each single study. We extend the Q-test for meta-analysis, proposing Meta Quadratic Test (Meta-Qtest). Meta Q-test avoids the limitations of MetaSKAT. It does not only consider genetic heterogeneity among studies as MetaSKAT but also reflects diverse real situations; since we extend three different Q-tests into meta-analysis respectively, flexible Meta Q-test suggests way to deal with gene-level rare variant meta-analysis efficiently From the results of real data analysis of blood pressure trait, our meta-analysis could successfully discovered genes, KCNA5 and CABIN1 that are already well known for relevance with hypertension disease and they are not detected in MetaSKAT. Conclusion As exemplified by an application to T2D Genes projects data set, Meta-Qtest more effectively identified genes associated with hypertension disease than MetaSKAT did.
Collapse
Affiliation(s)
- Jieun Ka
- Department of Statistics, Seoul National University, Seoul, South Korea
| | - Jaehoon Lee
- Department of Statistics, Seoul National University, Seoul, South Korea
| | - Yongkang Kim
- Department of Statistics, Seoul National University, Seoul, South Korea
| | - Bermseok Oh
- Department of Biochemistry and Molecular Biology, School of Medicine, Kyung Hee University, Seoul, South Korea
| | | | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, South Korea. .,Interdisciplinary program in Bioinformatics, Seoul National University, Seoul, South Korea.
| |
Collapse
|
19
|
Derkach A, Pfeiffer RM. Subset testing and analysis of multiple phenotypes. Genet Epidemiol 2019; 43:492-505. [PMID: 30920058 DOI: 10.1002/gepi.22199] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Revised: 02/08/2019] [Accepted: 02/19/2019] [Indexed: 11/08/2022]
Abstract
Meta-analysis of multiple genome-wide association studies (GWAS) is effective for detecting single- or multimarker associations with complex traits. We develop a flexible procedure (subset testing and analysis of multiple phenotypes [STAMP]) based on mixture models to perform a region-based meta-analysis of different phenotypes using data from different GWAS and identify subsets of associated phenotypes. Our model framework helps distinguish true associations from between-study heterogeneity. As a measure of association, we compute for each phenotype the posterior probability that the genetic region under investigation is truly associated. Extensive simulations show that STAMP is more powerful than standard approaches for meta-analyses when the proportion of truly associated outcomes is between 25% and 50%. For other settings, the power of STAMP is similar to that of existing methods. We illustrate our method on two examples, the association of a region on chromosome 9p21 with the risk of 14 cancers, and the associations of expression of quantitative trait loci from two genetic regions with their cis-single-nucleotide polymorphisms measured in 17 tissue types using data from The Cancer Genome Atlas.
Collapse
Affiliation(s)
- Andriy Derkach
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland
| | - Ruth M Pfeiffer
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland
| |
Collapse
|
20
|
Zhang H, Wheeler W, Song L, Yu K. Proper joint analysis of summary association statistics requires the adjustment of heterogeneity in SNP coverage pattern. Brief Bioinform 2018; 19:1337-1343. [PMID: 28981575 DOI: 10.1093/bib/bbx072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Indexed: 11/12/2022] Open
Abstract
As meta-analysis results published by consortia of genome-wide association studies (GWASs) become increasingly available, many association summary statistics-based multi-locus tests have been developed to jointly evaluate multiple single-nucleotide polymorphisms (SNPs) to reveal novel genetic architectures of various complex traits. The validity of these approaches relies on the accurate estimate of z-score correlations at considered SNPs, which in turn requires knowledge on the set of SNPs assessed by each study participating in the meta-analysis. However, this exact SNP coverage information is usually unavailable from the meta-analysis results published by GWAS consortia. In the absence of the coverage information, researchers typically estimate the z-score correlations by making oversimplified coverage assumptions. We show through real studies that such a practice can generate highly inflated type I errors, and we demonstrate the proper way to incorporate correct coverage information into multi-locus analyses. We advocate that consortia should make SNP coverage information available when posting their meta-analysis results, and that investigators who develop analytic tools for joint analyses based on summary data should pay attention to the variation in SNP coverage and adjust for it appropriately.
Collapse
Affiliation(s)
- Han Zhang
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, USA
| | | | - Lei Song
- Cancer Genomics Research Laboratory, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc., USA
| | - Kai Yu
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, USA
| |
Collapse
|
21
|
Chien LC, Chiu YF. General retrospective mega-analysis framework for rare variant association tests. Genet Epidemiol 2018; 42:621-635. [PMID: 30188589 DOI: 10.1002/gepi.22147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Revised: 06/05/2018] [Accepted: 06/05/2018] [Indexed: 11/09/2022]
Abstract
Here, we describe a retrospective mega-analysis framework for gene- or region-based multimarker rare variant association tests. Our proposed mega-analysis association tests allow investigators to combine longitudinal and cross-sectional family- and/or population-based studies. This framework can be applied to a continuous, categorical, or survival trait. In addition to autosomal variants, the tests can be applied to conduct mega-analyses on X-chromosome variants. Tests were built on study-specific region- or gene-level quasiscore statistics and, therefore, do not require estimates of effects of individual rare variants. We used the generalized estimating equation approach to account for complex multiple correlation structures between family members, repeated measurements, and genetic markers. While accounting for multilevel correlations and heterogeneity across studies, the test statistics were computationally efficient and feasible for large-scale sequencing studies. The retrospective aspect of association tests helps alleviate bias due to phenotype-related sampling and type I errors due to misspecification of phenotypic distribution. We evaluated our developed mega-analysis methods through comprehensive simulations with varying sample sizes, covariates, population stratification structures, and study designs across multiple studies. To illustrate application of the proposed framework, we conducted a mega-association analysis combining a longitudinal family study and a cross-sectional case-control study from Genetic Analysis Workshop 19.
Collapse
Affiliation(s)
- Li-Chu Chien
- Center for Fundamental Science, Kaohsiung Medical University, Kaohsiung, Taiwan, ROC
| | - Yen-Feng Chiu
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan, ROC
| |
Collapse
|
22
|
Jiang Y, Chen S, McGuire D, Chen F, Liu M, Iacono WG, Hewitt JK, Hokanson JE, Krauter K, Laakso M, Li KW, Lutz SM, McGue M, Pandit A, Zajac GJM, Boehnke M, Abecasis GR, Vrieze SI, Zhan X, Jiang B, Liu DJ. Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes. PLoS Genet 2018; 14:e1007452. [PMID: 30016313 PMCID: PMC6063450 DOI: 10.1371/journal.pgen.1007452] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2017] [Revised: 07/27/2018] [Accepted: 05/25/2018] [Indexed: 11/19/2022] Open
Abstract
Meta-analysis of genetic association studies increases sample size and the power for mapping complex traits. Existing methods are mostly developed for datasets without missing values, i.e. the summary association statistics are measured for all variants in contributing studies. In practice, genotype imputation is not always effective. This may be the case when targeted genotyping/sequencing assays are used or when the un-typed genetic variant is rare. Therefore, contributed summary statistics often contain missing values. Existing methods for imputing missing summary association statistics and using imputed values in meta-analysis, approximate conditional analysis, or simple strategies such as complete case analysis all have theoretical limitations. Applying these approaches can bias genetic effect estimates and lead to seriously inflated type-I or type-II errors in conditional analysis, which is a critical tool for identifying independently associated variants. To address this challenge and complement imputation methods, we developed a method to combine summary statistics across participating studies and consistently estimate joint effects, even when the contributed summary statistics contain large amounts of missing values. Based on this estimator, we proposed a score statistic called PCBS (partial correlation based score statistic) for conditional analysis of single-variant and gene-level associations. Through extensive analysis of simulated and real data, we showed that the new method produces well-calibrated type-I errors and is substantially more powerful than existing approaches. We applied the proposed approach to one of the largest meta-analyses to date for the cigarettes-per-day phenotype. Using the new method, we identified multiple novel independently associated variants at known loci for tobacco use, which were otherwise missed by alternative methods. Together, the phenotypic variance explained by these variants was 1.1%, improving that of previously reported associations by 71%. These findings illustrate the extent of locus allelic heterogeneity and can help pinpoint causal variants. It is of great interest to estimate the joint effects of multiple variants from large scale meta-analyses, in order to fine-map causal variants and understand the genetic architecture for complex traits. The summary association statistics from participating studies in a meta-analysis often contain missing values at some variant sites, as the imputation methods may not work well and the variants with low imputation quality will be filtered out. Missingness is especially likely when the underlying genetic variant is rare or the participating studies use targeted genotyping array that is not suitable for imputation. Existing methods for conditional meta-analysis do not properly handle missing data, and can incorrectly estimate correlations between score statistics. As a result, they can produce highly inflated type-I errors for conditional analysis, which will result in overestimated phenotypic variance explained and incorrect identification of causal variants. We systematically evaluated this bias and proposed a novel partial correlation based score statistic. The new statistic has valid type-I errors for conditional analysis and much higher power than the existing methods, even when the contributed summary statistics contain a large fraction of missing values. We expect this method to be highly useful in the sequencing age for complex trait genetics.
Collapse
Affiliation(s)
- Yu Jiang
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania, United States of America
| | - Sai Chen
- Center of Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Daniel McGuire
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania, United States of America
| | - Fang Chen
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania, United States of America
| | - Mengzhen Liu
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - William G. Iacono
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - John K. Hewitt
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, United States of America
| | - John E. Hokanson
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| | - Kenneth Krauter
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, United States of America
| | - Markku Laakso
- Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland
| | - Kevin W. Li
- Center of Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Sharon M. Lutz
- Department of Biostatistics and Informatics, University of Colorado, Anschutz Medical Campus, Aurora, Colorado, United States of America
| | - Matthew McGue
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Anita Pandit
- Center of Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Gregory J. M. Zajac
- Center of Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Michael Boehnke
- Center of Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Goncalo R. Abecasis
- Center of Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Scott I. Vrieze
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Xiaowei Zhan
- Department of Clinical Science, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Bibo Jiang
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania, United States of America
- * E-mail: (DJL); (BJ)
| | - Dajiang J. Liu
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania, United States of America
- * E-mail: (DJL); (BJ)
| |
Collapse
|
23
|
Wang J, Liu Q, Pierce BL, Huo D, Olopade OI, Ahsan H, Chen LS. A meta-analysis approach with filtering for identifying gene-level gene-environment interactions. Genet Epidemiol 2018; 42:434-446. [PMID: 29430690 PMCID: PMC6013347 DOI: 10.1002/gepi.22115] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Revised: 12/13/2017] [Accepted: 01/02/2018] [Indexed: 02/02/2023]
Abstract
There is a growing recognition that gene-environment interaction (G × E) plays a pivotal role in the development and progression of complex diseases. Despite a wealth of genetic data on various complex diseases/traits generated from association and sequencing studies, detecting G × E via genome-wide analysis remains challenging due to power issues. In genome-wide G × E studies, a common strategy to improve power is to first conduct a filtering test and retain only the genetic variants that pass the filtering step for subsequent G × E analyses. Two-stage, multistage, and unified tests have been proposed to jointly consider the filtering statistics in G × E tests. However, such G × E tests based on data from a single study may still be underpowered. Meanwhile, large-scale consortia have been formed to borrow strength across studies and populations. In this work, motivated by existing single-study G × E tests with filtering and the needs for meta-analysis G × E approaches based on consortia data, we propose a meta-analysis framework for detecting gene-based G × E effects, and introduce meta-analysis-based filtering statistics in the gene-level G × E tests. Simulations demonstrate the advantages of the proposed method-the ofGEM test. We apply the proposed tests to existing data from two breast cancer consortia to identify the genes harboring genetic variants with age-dependent penetrance (i.e., gene-age interaction effects). We develop an R software package ofGEM for the proposed meta-analysis tests.
Collapse
Affiliation(s)
- Jiebiao Wang
- Department of Public Health Sciences, The University of Chicago, Chicago, Illinois, USA
| | | | - Brandon L. Pierce
- Department of Public Health Sciences, The University of Chicago, Chicago, Illinois, USA
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, USA
| | - Dezheng Huo
- Department of Public Health Sciences, The University of Chicago, Chicago, Illinois, USA
- Department of Medicine, The University of Chicago, Chicago, Illinois, USA
| | - Olufunmilayo I. Olopade
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, USA
- Department of Medicine, The University of Chicago, Chicago, Illinois, USA
- Center for Clinical Cancer Genetics & Global Health, The University of Chicago Medical Center, Chicago, Illinois, USA
| | - Habibul Ahsan
- Department of Public Health Sciences, The University of Chicago, Chicago, Illinois, USA
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, USA
- Department of Medicine, The University of Chicago, Chicago, Illinois, USA
| | - Lin S. Chen
- Department of Public Health Sciences, The University of Chicago, Chicago, Illinois, USA
| |
Collapse
|
24
|
He Q, Liu Y, Peters U, Hsu L. Multivariate association analysis with somatic mutation data. Biometrics 2018; 74:176-184. [PMID: 28722765 PMCID: PMC5967890 DOI: 10.1111/biom.12745] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2016] [Revised: 04/01/2017] [Accepted: 05/01/2017] [Indexed: 12/21/2022]
Abstract
Somatic mutations are the driving forces for tumor development, and recent advances in cancer genome sequencing have made it feasible to evaluate the association between somatic mutations and cancer-related traits in large sample sizes. However, despite increasingly large sample sizes, it remains challenging to conduct statistical analysis for somatic mutations, because the vast majority of somatic mutations occur at very low frequencies. Furthermore, cancer is a complex disease and it is often accompanied by multiple traits that reflect various aspects of cancer; how to combine the information of these traits to identify important somatic mutations poses additional challenges. In this article, we introduce a statistical approach, named as SOMAT, for detecting somatic mutations associated with multiple cancer-related traits. Our approach provides a flexible framework for analyzing continuous, binary, or a mixture of both types of traits, and is statistically powerful and computationally efficient. In addition, we propose a data-adaptive procedure, which is grid-search free, for effectively combining test statistics to enhance statistical power. We conduct an extensive study and show that the proposed approach maintains correct type I error and is more powerful than existing approaches under the scenarios considered. We also apply our approach to an exome-sequencing study of liver tumor for illustration.
Collapse
Affiliation(s)
- Qianchuan He
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, U.S.A
| | - Yang Liu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, U.S.A
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, U.S.A
| | - Li Hsu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, U.S.A
| |
Collapse
|
25
|
Guo B, Wu B. Statistical methods to detect novel genetic variants using publicly available GWAS summary data. Comput Biol Chem 2018; 74:76-79. [PMID: 29558699 DOI: 10.1016/j.compbiolchem.2018.02.016] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Revised: 01/19/2018] [Accepted: 02/22/2018] [Indexed: 01/09/2023]
Abstract
We propose statistical methods to detect novel genetic variants using only genome-wide association studies (GWAS) summary data without access to raw genotype and phenotype data. With more and more summary data being posted for public access in the post GWAS era, the proposed methods are practically very useful to identify additional interesting genetic variants and shed lights on the underlying disease mechanism. We illustrate the utility of our proposed methods with application to GWAS meta-analysis results of fasting glucose from the international MAGIC consortium. We found several novel genome-wide significant loci that are worth further study.
Collapse
Affiliation(s)
- Bin Guo
- Division of Biostatistics, School of Public Health, University of Minnesota, United States
| | - Baolin Wu
- Division of Biostatistics, School of Public Health, University of Minnesota, United States.
| |
Collapse
|
26
|
Abstract
Meta-analysis is a statistical technique that is widely used for improving the power to detect associations, by synthesizing data from independent studies, and is extensively used in the genomic analyses of complex traits. Estimates from different studies are combined and the results effectively provide the power of a much larger study. Meta-analysis also has the potential of discovering heterogeneity in the effects among the different studies. This chapter provides an overview of the methods used for meta-analysis of common and rare single variants and also for gene/region-based analyses; common variants are mainly identified via genome-wide association studies (GWAS) and rare variants through various types of sequencing experiments.
Collapse
Affiliation(s)
- Kyriaki Michailidou
- Department of Electron Microscopy/Molecular Pathology, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus.
| |
Collapse
|
27
|
Lu W, Wang X, Zhan X, Gazdar A. Meta-analysis approaches to combine multiple gene set enrichment studies. Stat Med 2017; 37:659-672. [PMID: 29052247 DOI: 10.1002/sim.7540] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2016] [Revised: 07/02/2017] [Accepted: 09/29/2017] [Indexed: 11/09/2022]
Abstract
In the field of gene set enrichment analysis (GSEA), meta-analysis has been used to integrate information from multiple studies to present a reliable summarization of the expanding volume of individual biomedical research, as well as improve the power of detecting essential gene sets involved in complex human diseases. However, existing methods, Meta-Analysis for Pathway Enrichment (MAPE), may be subject to power loss because of (1) using gross summary statistics for combining end results from component studies and (2) using enrichment scores whose distributions depend on the set sizes. In this paper, we adapt meta-analysis approaches recently developed for genome-wide association studies, which are based on fixed effect and random effects (RE) models, to integrate multiple GSEA studies. We further develop a mixed strategy via adaptive testing for choosing RE versus FE models to achieve greater statistical efficiency as well as flexibility. In addition, a size-adjusted enrichment score based on a one-sided Kolmogorov-Smirnov statistic is proposed to formally account for varying set sizes when testing multiple gene sets. Our methods tend to have much better performance than the MAPE methods and can be applied to both discrete and continuous phenotypes. Specifically, the performance of the adaptive testing method seems to be the most stable in general situations.
Collapse
Affiliation(s)
- Wentao Lu
- Department of Statistical Science, Southern Methodist University, Dallas, TX 75275, USA
| | - Xinlei Wang
- Department of Statistical Science, Southern Methodist University, Dallas, TX 75275, USA
| | - Xiaowei Zhan
- Quantitative Biomedical Research Center, Center for the Genetics of Host Defense, Department of Clinical Science, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Adi Gazdar
- Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX 75235, USA
| |
Collapse
|
28
|
Li L, Wang X, Xiao G, Gazdar A. Integrative gene set enrichment analysis utilizing isoform-specific expression. Genet Epidemiol 2017; 41:498-510. [PMID: 28580727 DOI: 10.1002/gepi.22052] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Revised: 02/12/2017] [Accepted: 03/14/2017] [Indexed: 01/01/2023]
Abstract
Gene set enrichment analysis (GSEA) aims at identifying essential pathways, or more generally, sets of biologically related genes that are involved in complex human diseases. In the past, many studies have shown that GSEA is a very useful bioinformatics tool that plays critical roles in the innovation of disease prevention and intervention strategies. Despite its tremendous success, it is striking that conclusions of GSEA drawn from isolated studies are often sparse, and different studies may lead to inconsistent and sometimes contradictory results. Further, in the wake of next generation sequencing technologies, it has been made possible to measure genome-wide isoform-specific expression levels, calling for innovations that can utilize the unprecedented resolution. Currently, enormous amounts of data have been created from various RNA-seq experiments. All these give rise to a pressing need for developing integrative methods that allow for explicit utilization of isoform-specific expression, to combine multiple enrichment studies, in order to enhance the power, reproducibility, and interpretability of the analysis. We develop and evaluate integrative GSEA methods, based on two-stage procedures, which, for the first time, allow statistically efficient use of isoform-specific expression from multiple RNA-seq experiments. Through simulation and real data analysis, we show that our methods can greatly improve the performance in identifying essential gene sets compared to existing methods that can only use gene-level expression.
Collapse
Affiliation(s)
- Lie Li
- Department of Statistical Science, Southern Methodist University, Dallas, Texas, United States of America
| | - Xinlei Wang
- Department of Statistical Science, Southern Methodist University, Dallas, Texas, United States of America
| | - Guanghua Xiao
- Department of Clinical Sciences, The University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Adi Gazdar
- Department of Pathology, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| |
Collapse
|
29
|
Martinez-Duarte R. Fabrication challenges and perspectives on the use of carbon-electrode dielectrophoresis in sample preparation. IET Nanobiotechnol 2017; 11:127-133. [PMID: 28476994 PMCID: PMC8676545 DOI: 10.1049/iet-nbt.2016.0154] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Revised: 09/17/2016] [Accepted: 09/23/2016] [Indexed: 11/20/2022] Open
Abstract
The focus of this review is to assess the current status of three-dimensional (3D) carbon-electrode dielectrophoresis (carbonDEP) and identify the challenges currently preventing it from its use in high-throughput applications such as sample preparation for diagnostics. The use of 3D electrodes over more traditional planar ones is emphasised here as a way to increase the throughput of DEP devices. Glass-like carbon electrodes are derived through the carbonisation of photoresist structures made using photolithography. These biocompatible carbon electrodes are not ideal electrical conductors but are more electrochemically stable than noble metals such as gold and platinum. They are also significantly less expensive than common electrode materials, both in terms of material cost and fabrication process. CarbonDEP has been demonstrated for the manipulation of microorganisms and biomolecules. This review is divided in three main sections: (i) carbonDEP fabrication process; (ii) applications using 3D carbonDEP; and (iii) challenges and perspectives on the use of carbonDEP for high-throughput applications.
Collapse
Affiliation(s)
- Rodrigo Martinez-Duarte
- Department of Mechanical Engineering, Multiscale Manufacturing Laboratory, Clemson University, 204 Fluor Daniel, Clemson, SC 29672, USA.
| |
Collapse
|
30
|
Pasaniuc B, Price AL. Dissecting the genetics of complex traits using summary association statistics. Nat Rev Genet 2017; 18:117-127. [PMID: 27840428 PMCID: PMC5449190 DOI: 10.1038/nrg.2016.142] [Citation(s) in RCA: 268] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
During the past decade, genome-wide association studies (GWAS) have been used to successfully identify tens of thousands of genetic variants associated with complex traits and diseases. These studies have produced extensive repositories of genetic variation and trait measurements across large numbers of individuals, providing tremendous opportunities for further analyses. However, privacy concerns and other logistical considerations often limit access to individual-level genetic data, motivating the development of methods that analyse summary association statistics. Here, we review recent progress on statistical methods that leverage summary association data to gain insights into the genetic basis of complex traits and diseases.
Collapse
Affiliation(s)
- Bogdan Pasaniuc
- Departments of Human Genetics, and Pathology and Laboratory Medicine, University of California, Los Angeles, California 90095, USA
| | - Alkes L Price
- Departments of Epidemiology and Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts 02115, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts 02142, USA
| |
Collapse
|
31
|
Chen S, Nunez S, Reilly MP, Foulkes AS. Bayesian variable selection for post-analytic interrogation of susceptibility loci. Biometrics 2016; 73:603-614. [PMID: 27858978 DOI: 10.1111/biom.12620] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2015] [Revised: 09/01/2016] [Accepted: 09/01/2016] [Indexed: 11/26/2022]
Abstract
Understanding the complex interplay among protein coding genes and regulatory elements requires rigorous interrogation with analytic tools designed for discerning the relative contributions of overlapping genomic regions. To this aim, we offer a novel application of Bayesian variable selection (BVS) for classifying genomic class level associations using existing large meta-analysis summary level resources. This approach is applied using the expectation maximization variable selection (EMVS) algorithm to typed and imputed SNPs across 502 protein coding genes (PCGs) and 220 long intergenic non-coding RNAs (lncRNAs) that overlap 45 known loci for coronary artery disease (CAD) using publicly available Global Lipids Gentics Consortium (GLGC) (Teslovich et al., 2010; Willer et al., 2013) meta-analysis summary statistics for low-density lipoprotein cholesterol (LDL-C). The analysis reveals 33 PCGs and three lncRNAs across 11 loci with >50% posterior probabilities for inclusion in an additive model of association. The findings are consistent with previous reports, while providing some new insight into the architecture of LDL-cholesterol to be investigated further. As genomic taxonomies continue to evolve, additional classes such as enhancer elements and splicing regions, can easily be layered into the proposed analysis framework. Moreover, application of this approach to alternative publicly available meta-analysis resources, or more generally as a post-analytic strategy to further interrogate regions that are identified through single point analysis, is straightforward. All coding examples are implemented in R version 3.2.1 and provided as supplemental material.
Collapse
Affiliation(s)
- Siying Chen
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, Massachusetts, U.S.A
| | - Sara Nunez
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, Massachusetts, U.S.A
| | - Muredach P Reilly
- Department of Medicine, Division of Cardiology, and the Irving Institute for Clinical and Translational Research at Columbia University, New York City, New York, U.S.A
| | - Andrea S Foulkes
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, Massachusetts, U.S.A
| |
Collapse
|
32
|
Moreno Uribe LM, Miller SF. Genetics of the dentofacial variation in human malocclusion. Orthod Craniofac Res 2016; 18 Suppl 1:91-9. [PMID: 25865537 DOI: 10.1111/ocr.12083] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/14/2014] [Indexed: 01/12/2023]
Abstract
Malocclusions affect individuals worldwide, resulting in compromised function and esthetics. Understanding the etiological factors contributing to the variation in dentofacial morphology associated with malocclusions is the key to develop novel treatment approaches. Advances in dentofacial phenotyping, which is the comprehensive characterization of hard and soft tissue variation in the craniofacial complex, together with the acquisition of large-scale genomic data have started to unravel genetic mechanisms underlying facial variation. Knowledge on the genetics of human malocclusion is limited even though results attained thus far are encouraging, with promising opportunities for future research. This review summarizes the most common dentofacial variations associated with malocclusions and reviews the current knowledge of the roles of genes in the development of malocclusions. Lastly, this review will describe ways to advance malocclusion research, following examples from the expanding fields of phenomics and genomic medicine, which aim to better patient outcomes.
Collapse
Affiliation(s)
- L M Moreno Uribe
- Department of Orthodontics, College of Dentistry, University of Iowa, Iowa City, IA, USA; Dows Institute for Dental Research, College of Dentistry, University of Iowa, Iowa City, IA, USA
| | | |
Collapse
|
33
|
Zhang H, Wheeler W, Hyland PL, Yang Y, Shi J, Chatterjee N, Yu K. A Powerful Procedure for Pathway-Based Meta-analysis Using Summary Statistics Identifies 43 Pathways Associated with Type II Diabetes in European Populations. PLoS Genet 2016; 12:e1006122. [PMID: 27362418 PMCID: PMC4928884 DOI: 10.1371/journal.pgen.1006122] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2016] [Accepted: 05/20/2016] [Indexed: 12/17/2022] Open
Abstract
Meta-analysis of multiple genome-wide association studies (GWAS) has become an effective approach for detecting single nucleotide polymorphism (SNP) associations with complex traits. However, it is difficult to integrate the readily accessible SNP-level summary statistics from a meta-analysis into more powerful multi-marker testing procedures, which generally require individual-level genetic data. We developed a general procedure called Summary based Adaptive Rank Truncated Product (sARTP) for conducting gene and pathway meta-analysis that uses only SNP-level summary statistics in combination with genotype correlation estimated from a panel of individual-level genetic data. We demonstrated the validity and power advantage of sARTP through empirical and simulated data. We conducted a comprehensive pathway-based meta-analysis with sARTP on type 2 diabetes (T2D) by integrating SNP-level summary statistics from two large studies consisting of 19,809 T2D cases and 111,181 controls with European ancestry. Among 4,713 candidate pathways from which genes in neighborhoods of 170 GWAS established T2D loci were excluded, we detected 43 T2D globally significant pathways (with Bonferroni corrected p-values < 0.05), which included the insulin signaling pathway and T2D pathway defined by KEGG, as well as the pathways defined according to specific gene expression patterns on pancreatic adenocarcinoma, hepatocellular carcinoma, and bladder carcinoma. Using summary data from 8 eastern Asian T2D GWAS with 6,952 cases and 11,865 controls, we showed 7 out of the 43 pathways identified in European populations remained to be significant in eastern Asians at the false discovery rate of 0.1. We created an R package and a web-based tool for sARTP with the capability to analyze pathways with thousands of genes and tens of thousands of SNPs.
Collapse
Affiliation(s)
- Han Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - William Wheeler
- Information Management Services Inc., Calverton, Maryland, United States of America
| | - Paula L. Hyland
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Yifan Yang
- Department of Statistics, University of Kentucky, Lexington, Kentucky, United States of America
| | - Jianxin Shi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Nilanjan Chatterjee
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, Maryland, United States of America
- * E-mail: (NC); (KY)
| | - Kai Yu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (NC); (KY)
| |
Collapse
|
34
|
Wang S, Zhao JH, An P, Guo X, Jensen RA, Marten J, Huffman JE, Meidtner K, Boeing H, Campbell A, Rice KM, Scott RA, Yao J, Schulze MB, Wareham NJ, Borecki IB, Province MA, Rotter JI, Hayward C, Goodarzi MO, Meigs JB, Dupuis J. General Framework for Meta-Analysis of Haplotype Association Tests. Genet Epidemiol 2016; 40:244-52. [PMID: 27027517 PMCID: PMC4869684 DOI: 10.1002/gepi.21959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2015] [Revised: 11/03/2015] [Accepted: 12/14/2015] [Indexed: 11/24/2022]
Abstract
For complex traits, most associated single nucleotide variants (SNV) discovered to date have a small effect, and detection of association is only possible with large sample sizes. Because of patient confidentiality concerns, it is often not possible to pool genetic data from multiple cohorts, and meta‐analysis has emerged as the method of choice to combine results from multiple studies. Many meta‐analysis methods are available for single SNV analyses. As new approaches allow the capture of low frequency and rare genetic variation, it is of interest to jointly consider multiple variants to improve power. However, for the analysis of haplotypes formed by multiple SNVs, meta‐analysis remains a challenge, because different haplotypes may be observed across studies. We propose a two‐stage meta‐analysis approach to combine haplotype analysis results. In the first stage, each cohort estimate haplotype effect sizes in a regression framework, accounting for relatedness among observations if appropriate. For the second stage, we use a multivariate generalized least square meta‐analysis approach to combine haplotype effect estimates from multiple cohorts. Haplotype‐specific association tests and a global test of independence between haplotypes and traits are obtained within our framework. We demonstrate through simulation studies that we control the type‐I error rate, and our approach is more powerful than inverse variance weighted meta‐analysis of single SNV analysis when haplotype effects are present. We replicate a published haplotype association between fasting glucose‐associated locus (G6PC2) and fasting glucose in seven studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium and we provide more precise haplotype effect estimates.
Collapse
Affiliation(s)
- Shuai Wang
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, United States of America
| | - Jing Hua Zhao
- MRC Epidemiology Unit, School of Clinical Medicine, University of Cambridge, Box 285 Institute of Metabolic Science, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Ping An
- Division of Statistical Genomics, Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Xiuqing Guo
- Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, LABioMed at Harbor-UCLA Medical Center, Torrance, California, United States of America
| | - Richard A Jensen
- Cardiovascular Health Research Unit, University of Washington, Seattle, Washington, United States of America.,Department of Medicine, University of Washington, Seattle, Washington, United States of America
| | - Jonathan Marten
- MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Edinburgh, United Kingdom
| | - Jennifer E Huffman
- MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Edinburgh, United Kingdom
| | - Karina Meidtner
- Department of Molecular Epidemiology, German Institute of Human Nutrition Potsdam-Rehbruecke, Nuthetal, Germany
| | - Heiner Boeing
- Department of Epidemiology, German Institute of Human Nutrition Potsdam-Rehbruecke, Nuthetal, Germany
| | - Archie Campbell
- Generation Scotland, Centre for Genomic and Experimental Medicine, Institute of Genetic and Molecular Medicine, Western General Hospital, University of Edinburgh, Edinburgh, United Kindom
| | - Kenneth M Rice
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Robert A Scott
- MRC Epidemiology Unit, School of Clinical Medicine, University of Cambridge, Box 285 Institute of Metabolic Science, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Jie Yao
- Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, LABioMed at Harbor-UCLA Medical Center, Torrance, California, United States of America
| | - Matthias B Schulze
- Department of Molecular Epidemiology, German Institute of Human Nutrition Potsdam-Rehbruecke, Nuthetal, Germany.,German Center for Diabetes Research (DZD), Germany
| | - Nicholas J Wareham
- MRC Epidemiology Unit, School of Clinical Medicine, University of Cambridge, Box 285 Institute of Metabolic Science, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Ingrid B Borecki
- Division of Statistical Genomics, Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Michael A Province
- Division of Statistical Genomics, Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Jerome I Rotter
- Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, LABioMed at Harbor-UCLA Medical Center, Torrance, California, United States of America
| | - Caroline Hayward
- Department of Medicine, University of Washington, Seattle, Washington, United States of America.,Generation Scotland, Centre for Genomic and Experimental Medicine, Institute of Genetic and Molecular Medicine, Western General Hospital, University of Edinburgh, Edinburgh, United Kindom
| | - Mark O Goodarzi
- Division of Endocrinology, Diabetes and Metabolism, Cedars-Sinai Medical Center, Los Angeles, California, United States of America
| | - James B Meigs
- General Medicine Division, Massachusetts General Hospital, Boston, Massachusetts, United States of America.,Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Josée Dupuis
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, United States of America.,National Heart, Lung, Blood Institute (NHLBI), Framingham Heart Study, Framingham, Massachusetts, United States of America
| |
Collapse
|
35
|
A Simple Test of Class-Level Genetic Association Can Reveal Novel Cardiometabolic Trait Loci. PLoS One 2016; 11:e0148218. [PMID: 26859766 PMCID: PMC4747495 DOI: 10.1371/journal.pone.0148218] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Accepted: 01/14/2016] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Characterizing the genetic determinants of complex diseases can be further augmented by incorporating knowledge of underlying structure or classifications of the genome, such as newly developed mappings of protein-coding genes, epigenetic marks, enhancer elements and non-coding RNAs. METHODS We apply a simple class-level testing framework, termed Genetic Class Association Testing (GenCAT), to identify protein-coding gene association with 14 cardiometabolic (CMD) related traits across 6 publicly available genome wide association (GWA) meta-analysis data resources. GenCAT uses SNP-level meta-analysis test statistics across all SNPs within a class of elements, as well as the size of the class and its unique correlation structure, to determine if the class is statistically meaningful. The novelty of findings is evaluated through investigation of regional signals. A subset of findings are validated using recently updated, larger meta-analysis resources. A simulation study is presented to characterize overall performance with respect to power, control of family-wise error and computational efficiency. All analysis is performed using the GenCAT package, R version 3.2.1. RESULTS We demonstrate that class-level testing complements the common first stage minP approach that involves individual SNP-level testing followed by post-hoc ascribing of statistically significant SNPs to genes and loci. GenCAT suggests 54 protein-coding genes at 41 distinct loci for the 13 CMD traits investigated in the discovery analysis, that are beyond the discoveries of minP alone. An additional application to biological pathways demonstrates flexibility in defining genetic classes. CONCLUSIONS We conclude that it would be prudent to include class-level testing as standard practice in GWA analysis. GenCAT, for example, can be used as a simple, complementary and efficient strategy for class-level testing that leverages existing data resources, requires only summary level data in the form of test statistics, and adds significant value with respect to its potential for identifying multiple novel and clinically relevant trait associations.
Collapse
|
36
|
Meta-analysis of Complex Diseases at Gene Level with Generalized Functional Linear Models. Genetics 2015; 202:457-70. [PMID: 26715663 DOI: 10.1534/genetics.115.180869] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2015] [Accepted: 12/09/2015] [Indexed: 11/18/2022] Open
Abstract
We developed generalized functional linear models (GFLMs) to perform a meta-analysis of multiple case-control studies to evaluate the relationship of genetic data to dichotomous traits adjusting for covariates. Unlike the previously developed meta-analysis for sequence kernel association tests (MetaSKATs), which are based on mixed-effect models to make the contributions of major gene loci random, GFLMs are fixed models; i.e., genetic effects of multiple genetic variants are fixed. Based on GFLMs, we developed chi-squared-distributed Rao's efficient score test and likelihood-ratio test (LRT) statistics to test for an association between a complex dichotomous trait and multiple genetic variants. We then performed extensive simulations to evaluate the empirical type I error rates and power performance of the proposed tests. The Rao's efficient score test statistics of GFLMs are very conservative and have higher power than MetaSKATs when some causal variants are rare and some are common. When the causal variants are all rare [i.e., minor allele frequencies (MAF) < 0.03], the Rao's efficient score test statistics have similar or slightly lower power than MetaSKATs. The LRT statistics generate accurate type I error rates for homogeneous genetic-effect models and may inflate type I error rates for heterogeneous genetic-effect models owing to the large numbers of degrees of freedom and have similar or slightly higher power than the Rao's efficient score test statistics. GFLMs were applied to analyze genetic data of 22 gene regions of type 2 diabetes data from a meta-analysis of eight European studies and detected significant association for 18 genes (P < 3.10 × 10(-6)), tentative association for 2 genes (HHEX and HMGA2; P ≈ 10(-5)), and no association for 2 genes, while MetaSKATs detected none. In addition, the traditional additive-effect model detects association at gene HHEX. GFLMs and related tests can analyze rare or common variants or a combination of the two and can be useful in whole-genome and whole-exome association studies.
Collapse
|
37
|
Zhan X, Liu DJ. SEQMINER: An R-Package to Facilitate the Functional Interpretation of Sequence-Based Associations. Genet Epidemiol 2015; 39:619-23. [PMID: 26394715 PMCID: PMC4794281 DOI: 10.1002/gepi.21918] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2015] [Revised: 07/01/2015] [Accepted: 07/17/2015] [Indexed: 11/23/2022]
Abstract
Next‐generation sequencing has enabled the study of a comprehensive catalogue of genetic variants for their impact on various complex diseases. Numerous consortia studies of complex traits have publically released their summary association statistics, which have become an invaluable resource for learning the underlying biology, understanding the genetic architecture, and guiding clinical translations. There is great interest in the field in developing novel statistical methods for analyzing and interpreting results from these genotype‐phenotype association studies. One popular platform for method development and data analysis is R. In order to enable these analyses in R, it is necessary to develop packages that can efficiently query files of summary association statistics, explore the linkage disequilibrium structure between variants, and integrate various bioinformatics databases. The complexity and scale of sequence datasets and databases pose significant computational challenges for method developers. To address these challenges and facilitate method development, we developed the R package SEQMINER for annotating and querying files of sequence variants (e.g., VCF/BCF files) and summary association statistics (e.g., METAL/RAREMETAL files), and for integrating bioinformatics databases. SEQMINER provides an infrastructure where novel methods can be distributed and applied to analyzing sequence datasets in practice. We illustrate the performance of SEQMINER using datasets from the 1000 Genomes Project. We show that SEQMINER is highly efficient and easy to use. It will greatly accelerate the process of applying statistical innovations to analyze and interpret sequence‐based associations. The R package, its source code and documentations are available from http://cran.r‐project.org/web/packages/seqminer and http://seqminer.genomic.codes/.
Collapse
Affiliation(s)
- Xiaowei Zhan
- Department of Clinical Sciences, Quantitative Biomedical Research Center, Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Dajiang J Liu
- Institute for Personalized Medicine, College of Medicine, Pennsylvania State University, Pennsylvania, Hershey, United States of America.,Division of Biostatistics, Department of Public Health Sciences, College of Medicine, Pennsylvania State University, Hershey, Pennsylvania, United States of America
| |
Collapse
|
38
|
He Q, Zhang HH, Avery CL, Lin DY. Sparse meta-analysis with high-dimensional data. Biostatistics 2015; 17:205-20. [PMID: 26395907 DOI: 10.1093/biostatistics/kxv038] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Accepted: 08/31/2015] [Indexed: 01/10/2023] Open
Abstract
Meta-analysis plays an important role in summarizing and synthesizing scientific evidence derived from multiple studies. With high-dimensional data, the incorporation of variable selection into meta-analysis improves model interpretation and prediction. Existing variable selection methods require direct access to raw data, which may not be available in practical situations. We propose a new approach, sparse meta-analysis (SMA), in which variable selection for meta-analysis is based solely on summary statistics and the effect sizes of each covariate are allowed to vary among studies. We show that the SMA enjoys the oracle property if the estimated covariance matrix of the parameter estimators from each study is available. We also show that our approach achieves selection consistency and estimation consistency even when summary statistics include only the variance estimators or no variance/covariance information at all. Simulation studies and applications to high-throughput genomics studies demonstrate the usefulness of our approach.
Collapse
Affiliation(s)
- Qianchuan He
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Hao Helen Zhang
- Department of Mathematics, The University of Arizona, Tucson, AZ 85721, USA
| | - Christy L Avery
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC 27599, USA
| | - D Y Lin
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA
| |
Collapse
|
39
|
Reed E, Nunez S, Kulp D, Qian J, Reilly MP, Foulkes AS. A guide to genome-wide association analysis and post-analytic interrogation. Stat Med 2015; 34:3769-92. [PMID: 26343929 PMCID: PMC5019244 DOI: 10.1002/sim.6605] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2015] [Revised: 06/09/2015] [Accepted: 07/06/2015] [Indexed: 01/14/2023]
Abstract
This tutorial is a learning resource that outlines the basic process and provides specific software tools for implementing a complete genome‐wide association analysis. Approaches to post‐analytic visualization and interrogation of potentially novel findings are also presented. Applications are illustrated using the free and open‐source R statistical computing and graphics software environment, Bioconductor software for bioinformatics and the UCSC Genome Browser. Complete genome‐wide association data on 1401 individuals across 861,473 typed single nucleotide polymorphisms from the PennCATH study of coronary artery disease are used for illustration. All data and code, as well as additional instructional resources, are publicly available through the Open Resources in Statistical Genomics project: http://www.stat-gen.org. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Collapse
Affiliation(s)
- Eric Reed
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA, U.S.A
| | - Sara Nunez
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA, U.S.A
| | - David Kulp
- Department of Computer Science, University of Massachusetts, Amherst, MA, U.S.A
| | - Jing Qian
- Department of Biostatistics and Epidemiology, University of Massachusetts, Amherst, MA, U.S.A
| | - Muredach P Reilly
- Department of Medicine, University of Pennsylvania, Philadelphia, PA, U.S.A
| | - Andrea S Foulkes
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA, U.S.A
| |
Collapse
|
40
|
Schmidt EM, Willer CJ. Insights into blood lipids from rare variant discovery. Curr Opin Genet Dev 2015; 33:25-31. [PMID: 26241468 DOI: 10.1016/j.gde.2015.06.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Revised: 06/19/2015] [Accepted: 06/22/2015] [Indexed: 12/18/2022]
Abstract
Large-scale genome wide screens have discovered over 160 common variants associated with plasma lipids, which are risk factors often linked to heart disease. A large fraction of lipid heritability remains unexplained, and it is hypothesized that rare variants of functional consequence may account for some of the missing heritability. Finding lipid-associated variants that occur less frequently in the human population poses a challenge, primarily due to lack of power and difficulties to identify and test them. Interrogation of the protein-coding regions of the genome using array and sequencing techniques has led to important discoveries of rare variants that affect lipid levels and related disease risk. Here, we summarize the latest methods and findings that contribute to our current understanding of rare variant lipid genetics.
Collapse
Affiliation(s)
- Ellen M Schmidt
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Cristen J Willer
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
41
|
Meta-analysis for Discovering Rare-Variant Associations: Statistical Methods and Software Programs. Am J Hum Genet 2015; 97:35-53. [PMID: 26094574 DOI: 10.1016/j.ajhg.2015.05.001] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2015] [Accepted: 05/01/2015] [Indexed: 01/01/2023] Open
Abstract
There is heightened interest in using next-generation sequencing technologies to identify rare variants that influence complex human diseases and traits. Meta-analysis is essential to this endeavor because large sample sizes are required for detecting associations with rare variants. In this article, we provide a comprehensive overview of statistical methods for meta-analysis of sequencing studies for discovering rare-variant associations. Specifically, we discuss the calculation of relevant summary statistics from participating studies, the construction of gene-level association tests, the choice of transformation for quantitative traits, the use of fixed-effects versus random-effects models, and the removal of shadow association signals through conditional analysis. We also show that meta-analysis based on properly calculated summary statistics is as powerful as joint analysis of individual-participant data. In addition, we demonstrate the performance of different meta-analysis methods by using both simulated and empirical data. We then compare four major software packages for meta-analysis of rare-variant associations-MASS, RAREMETAL, MetaSKAT, and seqMeta-in terms of the underlying statistical methodology, analysis pipeline, and software interface. Finally, we present PreMeta, a software interface that integrates the four meta-analysis packages and allows a consortium to combine otherwise incompatible summary statistics.
Collapse
|
42
|
Gene Level Meta-Analysis of Quantitative Traits by Functional Linear Models. Genetics 2015; 200:1089-104. [PMID: 26058849 DOI: 10.1534/genetics.115.178343] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 06/05/2015] [Indexed: 11/18/2022] Open
Abstract
Meta-analysis of genetic data must account for differences among studies including study designs, markers genotyped, and covariates. The effects of genetic variants may differ from population to population, i.e., heterogeneity. Thus, meta-analysis of combining data of multiple studies is difficult. Novel statistical methods for meta-analysis are needed. In this article, functional linear models are developed for meta-analyses that connect genetic data to quantitative traits, adjusting for covariates. The models can be used to analyze rare variants, common variants, or a combination of the two. Both likelihood-ratio test (LRT) and F-distributed statistics are introduced to test association between quantitative traits and multiple variants in one genetic region. Extensive simulations are performed to evaluate empirical type I error rates and power performance of the proposed tests. The proposed LRT and F-distributed statistics control the type I error very well and have higher power than the existing methods of the meta-analysis sequence kernel association test (MetaSKAT). We analyze four blood lipid levels in data from a meta-analysis of eight European studies. The proposed methods detect more significant associations than MetaSKAT and the P-values of the proposed LRT and F-distributed statistics are usually much smaller than those of MetaSKAT. The functional linear models and related test statistics can be useful in whole-genome and whole-exome association studies.
Collapse
|
43
|
Wang Q, Lu Q, Zhao H. A review of study designs and statistical methods for genomic epidemiology studies using next generation sequencing. Front Genet 2015; 6:149. [PMID: 25941534 PMCID: PMC4403555 DOI: 10.3389/fgene.2015.00149] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2015] [Accepted: 03/30/2015] [Indexed: 12/22/2022] Open
Abstract
Results from numerous linkage and association studies have greatly deepened scientists’ understanding of the genetic basis of many human diseases, yet some important questions remain unanswered. For example, although a large number of disease-associated loci have been identified from genome-wide association studies in the past 10 years, it is challenging to interpret these results as most disease-associated markers have no clear functional roles in disease etiology, and all the identified genomic factors only explain a small portion of disease heritability. With the help of next-generation sequencing (NGS), diverse types of genomic and epigenetic variations can be detected with high accuracy. More importantly, instead of using linkage disequilibrium to detect association signals based on a set of pre-set probes, NGS allows researchers to directly study all the variants in each individual, therefore promises opportunities for identifying functional variants and a more comprehensive dissection of disease heritability. Although the current scale of NGS studies is still limited due to the high cost, the success of several recent studies suggests the great potential for applying NGS in genomic epidemiology, especially as the cost of sequencing continues to drop. In this review, we discuss several pioneer applications of NGS, summarize scientific discoveries for rare and complex diseases, and compare various study designs including targeted sequencing and whole-genome sequencing using population-based and family-based cohorts. Finally, we highlight recent advancements in statistical methods proposed for sequencing analysis, including group-based association tests, meta-analysis techniques, and annotation tools for variant prioritization.
Collapse
Affiliation(s)
- Qian Wang
- Program of Computational Biology and Bioinformatics, Yale University New Haven, CT, USA
| | - Qiongshi Lu
- Department of Biostatistics, Yale School of Public Health New Haven, CT, USA
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University New Haven, CT, USA ; Department of Biostatistics, Yale School of Public Health New Haven, CT, USA ; Veterans Affairs Cooperative Studies Program Coordinating Center West Haven, CT, USA
| |
Collapse
|
44
|
Zhang Q. Associating rare genetic variants with human diseases. Front Genet 2015; 6:133. [PMID: 25904936 PMCID: PMC4389536 DOI: 10.3389/fgene.2015.00133] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Accepted: 03/19/2015] [Indexed: 11/20/2022] Open
Affiliation(s)
- Qunyuan Zhang
- Division of Statistical Genomics, Washington University School of Medicine St. Louis, MO, USA
| |
Collapse
|
45
|
Integrative analysis of sequencing and array genotype data for discovering disease associations with rare mutations. Proc Natl Acad Sci U S A 2015; 112:1019-24. [PMID: 25583502 DOI: 10.1073/pnas.1406143112] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
In the large cohorts that have been used for genome-wide association studies (GWAS), it is prohibitively expensive to sequence all cohort members. A cost-effective strategy is to sequence subjects with extreme values of quantitative traits or those with specific diseases. By imputing the sequencing data from the GWAS data for the cohort members who are not selected for sequencing, one can dramatically increase the number of subjects with information on rare variants. However, ignoring the uncertainties of imputed rare variants in downstream association analysis will inflate the type I error when sequenced subjects are not a random subset of the GWAS subjects. In this article, we provide a valid and efficient approach to combining observed and imputed data on rare variants. We consider commonly used gene-level association tests, all of which are constructed from the score statistic for assessing the effects of individual variants on the trait of interest. We show that the score statistic based on the observed genotypes for sequenced subjects and the imputed genotypes for nonsequenced subjects is unbiased. We derive a robust variance estimator that reflects the true variability of the score statistic regardless of the sampling scheme and imputation quality, such that the corresponding association tests always have correct type I error. We demonstrate through extensive simulation studies that the proposed tests are substantially more powerful than the use of accurately imputed variants only and the use of sequencing data alone. We provide an application to the Women's Health Initiative. The relevant software is freely available.
Collapse
|
46
|
Lee S, Abecasis G, Boehnke M, Lin X. Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet 2014; 95:5-23. [PMID: 24995866 DOI: 10.1016/j.ajhg.2014.06.009] [Citation(s) in RCA: 689] [Impact Index Per Article: 62.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2014] [Indexed: 12/30/2022] Open
Abstract
Despite the extensive discovery of trait- and disease-associated common variants, much of the genetic contribution to complex traits remains unexplained. Rare variants can explain additional disease risk or trait variability. An increasing number of studies are underway to identify trait- and disease-associated rare variants. In this review, we provide an overview of statistical issues in rare-variant association studies with a focus on study designs and statistical tests. We present the design and analysis pipeline of rare-variant studies and review cost-effective sequencing designs and genotyping platforms. We compare various gene- or region-based association tests, including burden tests, variance-component tests, and combined omnibus tests, in terms of their assumptions and performance. Also discussed are the related topics of meta-analysis, population-stratification adjustment, genotype imputation, follow-up studies, and heritability due to rare variants. We provide guidelines for analysis and discuss some of the challenges inherent in these studies and future research directions.
Collapse
|
47
|
Pasaniuc B, Zaitlen N, Shi H, Bhatia G, Gusev A, Pickrell J, Hirschhorn J, Strachan DP, Patterson N, Price AL. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. ACTA ACUST UNITED AC 2014; 30:2906-14. [PMID: 24990607 DOI: 10.1093/bioinformatics/btu416] [Citation(s) in RCA: 121] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
MOTIVATION Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. RESULTS In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1-5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case-control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of [Formula: see text] association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. AVAILABILITY AND IMPLEMENTATION Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. CONTACT bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu SUPPLEMENTARY INFORMATION Supplementary materials are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bogdan Pasaniuc
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK
| | - Noah Zaitlen
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK
| | - Huwenbo Shi
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK
| | - Gaurav Bhatia
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Har
| | - Alexander Gusev
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Har
| | - Joseph Pickrell
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK
| | - Joel Hirschhorn
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK
| | - David P Strachan
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK
| | - Nick Patterson
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK
| | - Alkes L Price
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Har
| |
Collapse
|
48
|
Moutsianas L, Morris AP. Methodology for the analysis of rare genetic variation in genome-wide association and re-sequencing studies of complex human traits. Brief Funct Genomics 2014; 13:362-70. [PMID: 24916163 PMCID: PMC4168660 DOI: 10.1093/bfgp/elu012] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Genome-wide association studies have been successful in identifying common variants that impact complex human traits and diseases. However, despite this success, the joint effects of these variants explain only a small proportion of the genetic variance in these phenotypes, leading to speculation that rare genetic variation might account for much of the ‘missing heritability’. Consequently, there has been an exciting period of research and development into the methodology for the analysis of rare genetic variants, typically by considering their joint effects on complex traits within the same functional unit or genomic region. In this review, we describe a general framework for modelling the joint effects of rare genetic variants on complex traits in association studies of unrelated individuals. We summarise a range of widely used association tests that have been developed from this model and provide an overview of the relative performance of these approaches from published simulation studies.
Collapse
|
49
|
Sham PC, Purcell SM. Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet 2014; 15:335-46. [PMID: 24739678 DOI: 10.1038/nrg3706] [Citation(s) in RCA: 377] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Significance testing was developed as an objective method for summarizing statistical evidence for a hypothesis. It has been widely adopted in genetic studies, including genome-wide association studies and, more recently, exome sequencing studies. However, significance testing in both genome-wide and exome-wide studies must adopt stringent significance thresholds to allow multiple testing, and it is useful only when studies have adequate statistical power, which depends on the characteristics of the phenotype and the putative genetic variant, as well as the study design. Here, we review the principles and applications of significance testing and power calculation, including recently proposed gene-based tests for rare variants.
Collapse
Affiliation(s)
- Pak C Sham
- Centre for Genomic Sciences, Jockey Club Building for Interdisciplinary Research; State Key Laboratory of Brain and Cognitive Sciences, and Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Shaun M Purcell
- 1] Center for Statistical Genetics, Icahn School of Medicine at Mount Sinai, New York 10029-6574, USA. [2] Center for Human Genetic Research, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA
| |
Collapse
|
50
|
Liu JZ, Anderson CA. Genetic studies of Crohn's disease: past, present and future. Best Pract Res Clin Gastroenterol 2014; 28:373-86. [PMID: 24913378 PMCID: PMC4075408 DOI: 10.1016/j.bpg.2014.04.009] [Citation(s) in RCA: 70] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Revised: 04/14/2014] [Accepted: 04/24/2014] [Indexed: 01/31/2023]
Abstract
The exact aetiology of Crohn's disease is unknown, though it is clear from early epidemiological studies that a combination of genetic and environmental risk factors contributes to an individual's disease susceptibility. Here, we review the history of gene-mapping studies of Crohn's disease, from the linkage-based studies that first implicated the NOD2 locus, through to modern-day genome-wide association studies that have discovered over 140 loci associated with Crohn's disease and yielded novel insights into the biological pathways underlying pathogenesis. We describe on-going and future gene-mapping studies that utilise next generation sequencing technology to pinpoint causal variants and identify rare genetic variation underlying Crohn's disease risk. We comment on the utility of genetic markers for predicting an individual's disease risk and discuss their potential for identifying novel drug targets and influencing disease management. Finally, we describe how these studies have shaped and continue to shape our understanding of the genetic architecture of Crohn's disease.
Collapse
Affiliation(s)
- Jimmy Z Liu
- The Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK
| | | |
Collapse
|