51
|
Weissbrod O, Kanai M, Shi H, Gazal S, Peyrot WJ, Khera AV, Okada Y, Martin AR, Finucane HK, Price AL. Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores. Nat Genet 2022; 54:450-458. [PMID: 35393596 PMCID: PMC9009299 DOI: 10.1038/s41588-022-01036-9] [Citation(s) in RCA: 98] [Impact Index Per Article: 49.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 02/25/2022] [Indexed: 01/25/2023]
Abstract
Polygenic risk scores suffer reduced accuracy in non-European populations, exacerbating health disparities. We propose PolyPred, a method that improves cross-population polygenic risk scores by combining two predictors: a new predictor that leverages functionally informed fine-mapping to estimate causal effects (instead of tagging effects), addressing linkage disequilibrium differences, and BOLT-LMM, a published predictor. When a large training sample is available in the non-European target population, we propose PolyPred+, which further incorporates the non-European training data. We applied PolyPred to 49 diseases/traits in four UK Biobank populations using UK Biobank British training data, and observed relative improvements versus BOLT-LMM ranging from +7% in south Asians to +32% in Africans, consistent with simulations. We applied PolyPred+ to 23 diseases/traits in UK Biobank east Asians using both UK Biobank British and Biobank Japan training data, and observed improvements of +24% versus BOLT-LMM and +12% versus PolyPred. Summary statistics-based analogs of PolyPred and PolyPred+ attained similar improvements.
Collapse
Affiliation(s)
- Omer Weissbrod
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA.
| | - Masahiro Kanai
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Huwenbo Shi
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA
- OMNI Bioinformatics, San Francisco, CA, USA
| | - Steven Gazal
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Wouter J Peyrot
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA
- Department of Psychiatry, Amsterdam UMC, Vrije Universiteit, Amsterdam, the Netherlands
| | - Amit V Khera
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Verve Therapeutics, Cambridge, MA, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | | | - Hilary K Finucane
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Alkes L Price
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
52
|
Dareng EO, Tyrer JP, Barnes DR, Jones MR, Yang X, Aben KKH, Adank MA, Agata S, Andrulis IL, Anton-Culver H, Antonenkova NN, Aravantinos G, Arun BK, Augustinsson A, Balmaña J, Bandera EV, Barkardottir RB, Barrowdale D, Beckmann MW, Beeghly-Fadiel A, Benitez J, Bermisheva M, Bernardini MQ, Bjorge L, Black A, Bogdanova NV, Bonanni B, Borg A, Brenton JD, Budzilowska A, Butzow R, Buys SS, Cai H, Caligo MA, Campbell I, Cannioto R, Cassingham H, Chang-Claude J, Chanock SJ, Chen K, Chiew YE, Chung WK, Claes KBM, Colonna S, Cook LS, Couch FJ, Daly MB, Dao F, Davies E, de la Hoya M, de Putter R, Dennis J, DePersia A, Devilee P, Diez O, Ding YC, Doherty JA, Domchek SM, Dörk T, du Bois A, Dürst M, Eccles DM, Eliassen HA, Engel C, Evans GD, Fasching PA, Flanagan JM, Fortner RT, Machackova E, Friedman E, Ganz PA, Garber J, Gensini F, Giles GG, Glendon G, Godwin AK, Goodman MT, Greene MH, Gronwald J, Hahnen E, Haiman CA, Håkansson N, Hamann U, Hansen TVO, Harris HR, Hartman M, Heitz F, Hildebrandt MAT, Høgdall E, Høgdall CK, Hopper JL, Huang RY, Huff C, Hulick PJ, Huntsman DG, Imyanitov EN, Isaacs C, Jakubowska A, James PA, Janavicius R, Jensen A, Johannsson OT, John EM, Jones ME, Kang D, Karlan BY, Karnezis A, Kelemen LE, Khusnutdinova E, Kiemeney LA, Kim BG, Kjaer SK, Komenaka I, Kupryjanczyk J, Kurian AW, Kwong A, Lambrechts D, Larson MC, Lazaro C, Le ND, Leslie G, Lester J, Lesueur F, Levine DA, Li L, Li J, Loud JT, Lu KH, Lubiński J, Mai PL, Manoukian S, Marks JR, Matsuno RK, Matsuo K, May T, McGuffog L, McLaughlin JR, McNeish IA, Mebirouk N, Menon U, Miller A, Milne RL, Minlikeeva A, Modugno F, Montagna M, Moysich KB, Munro E, Nathanson KL, Neuhausen SL, Nevanlinna H, Yie JNY, Nielsen HR, Nielsen FC, Nikitina-Zake L, Odunsi K, Offit K, Olah E, Olbrecht S, Olopade OI, Olson SH, Olsson H, Osorio A, Papi L, Park SK, Parsons MT, Pathak H, Pedersen IS, Peixoto A, Pejovic T, Perez-Segura P, Permuth JB, Peshkin B, Peterlongo P, Piskorz A, Prokofyeva D, Radice P, Rantala J, Riggan MJ, Risch HA, Rodriguez-Antona C, Ross E, Rossing MA, Runnebaum I, Sandler DP, Santamariña M, Soucy P, Schmutzler RK, Setiawan VW, Shan K, Sieh W, Simard J, Singer CF, Sokolenko AP, Song H, Southey MC, Steed H, Stoppa-Lyonnet D, Sutphen R, Swerdlow AJ, Tan YY, Teixeira MR, Teo SH, Terry KL, Terry MB, Thomassen M, Thompson PJ, Thomsen LCV, Thull DL, Tischkowitz M, Titus L, Toland AE, Torres D, Trabert B, Travis R, Tung N, Tworoger SS, Valen E, van Altena AM, van der Hout AH, Van Nieuwenhuysen E, van Rensburg EJ, Vega A, Edwards DV, Vierkant RA, Wang F, Wappenschmidt B, Webb PM, Weinberg CR, Weitzel JN, Wentzensen N, White E, Whittemore AS, Winham SJ, Wolk A, Woo YL, Wu AH, Yan L, Yannoukakos D, Zavaglia KM, Zheng W, Ziogas A, Zorn KK, Kleibl Z, Easton D, Lawrenson K, DeFazio A, Sellers TA, Ramus SJ, Pearce CL, Monteiro AN, Cunningham J, Goode EL, Schildkraut JM, Berchuck A, Chenevix-Trench G, Gayther SA, Antoniou AC, Pharoah PDP. Polygenic risk modeling for prediction of epithelial ovarian cancer risk. Eur J Hum Genet 2022; 30:349-362. [PMID: 35027648 PMCID: PMC8904525 DOI: 10.1038/s41431-021-00987-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 08/09/2021] [Accepted: 09/27/2021] [Indexed: 12/14/2022] Open
Abstract
Polygenic risk scores (PRS) for epithelial ovarian cancer (EOC) have the potential to improve risk stratification. Joint estimation of Single Nucleotide Polymorphism (SNP) effects in models could improve predictive performance over standard approaches of PRS construction. Here, we implemented computationally efficient, penalized, logistic regression models (lasso, elastic net, stepwise) to individual level genotype data and a Bayesian framework with continuous shrinkage, "select and shrink for summary statistics" (S4), to summary level data for epithelial non-mucinous ovarian cancer risk prediction. We developed the models in a dataset consisting of 23,564 non-mucinous EOC cases and 40,138 controls participating in the Ovarian Cancer Association Consortium (OCAC) and validated the best models in three populations of different ancestries: prospective data from 198,101 women of European ancestries; 7,669 women of East Asian ancestries; 1,072 women of African ancestries, and in 18,915 BRCA1 and 12,337 BRCA2 pathogenic variant carriers of European ancestries. In the external validation data, the model with the strongest association for non-mucinous EOC risk derived from the OCAC model development data was the S4 model (27,240 SNPs) with odds ratios (OR) of 1.38 (95% CI: 1.28-1.48, AUC: 0.588) per unit standard deviation, in women of European ancestries; 1.14 (95% CI: 1.08-1.19, AUC: 0.538) in women of East Asian ancestries; 1.38 (95% CI: 1.21-1.58, AUC: 0.593) in women of African ancestries; hazard ratios of 1.36 (95% CI: 1.29-1.43, AUC: 0.592) in BRCA1 pathogenic variant carriers and 1.49 (95% CI: 1.35-1.64, AUC: 0.624) in BRCA2 pathogenic variant carriers. Incorporation of the S4 PRS in risk prediction models for ovarian cancer may have clinical utility in ovarian cancer prevention programs.
Collapse
Affiliation(s)
- Eileen O Dareng
- University of Cambridge, Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, Cambridge, UK
| | - Jonathan P Tyrer
- University of Cambridge, Centre for Cancer Genetic Epidemiology, Department of Oncology, Cambridge, UK
| | - Daniel R Barnes
- University of Cambridge, Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, Cambridge, UK
| | - Michelle R Jones
- Center for Bioinformatics and Functional Genomics, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Xin Yang
- University of Cambridge, Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, Cambridge, UK
| | - Katja K H Aben
- Radboud University Medical Center, Radboud Institute for Health Sciences, Nijmegen, The Netherlands
- Netherlands Comprehensive Cancer Organisation, Utrecht, The Netherlands
| | - Muriel A Adank
- The Netherlands Cancer Institute-Antoni van Leeuwenhoek hospital, Family Cancer Clinic, Amsterdam, The Netherlands
| | - Simona Agata
- Veneto Institute of Oncology IOV-IRCCS, Immunology and Molecular Oncology Unit, Padua, Italy
| | - Irene L Andrulis
- Lunenfeld-Tanenbaum Research Institute of Mount Sinai Hospital, Fred A. Litwin Center for Cancer Genetics, Toronto, ON, Canada
- University of Toronto, Department of Molecular Genetics, Toronto, ON, Canada
| | - Hoda Anton-Culver
- University of California Irvine, Department of Epidemiology, Genetic Epidemiology Research Institute, Irvine, CA, USA
| | - Natalia N Antonenkova
- N.N. Alexandrov Research Institute of Oncology and Medical Radiology, Minsk, Belarus
| | | | - Banu K Arun
- University of Texas MD Anderson Cancer Center, Department of Breast Medical Oncology, Houston, TX, USA
| | - Annelie Augustinsson
- Lund University, Department of Cancer Epidemiology, Clinical Sciences, Lund, Sweden
| | - Judith Balmaña
- Vall d'Hebron Institute of Oncology, Hereditary cancer Genetics Group, Barcelona, Spain
- University Hospital of Vall d'Hebron, Department of Medical Oncology, Barcelona, Spain
| | - Elisa V Bandera
- Rutgers Cancer Institute of New Jersey, Cancer Prevention and Control Program, New Brunswick, NJ, USA
| | - Rosa B Barkardottir
- Landspitali University Hospital, Department of Pathology, Reykjavik, Iceland
- University of Iceland, BMC (Biomedical Centre), Faculty of Medicine, Reykjavik, Iceland
| | - Daniel Barrowdale
- University of Cambridge, Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, Cambridge, UK
| | - Matthias W Beckmann
- University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nuremberg, Department of Gynecology and Obstetrics, Comprehensive Cancer Center ER-EMN, Erlangen, Germany
| | - Alicia Beeghly-Fadiel
- Vanderbilt University School of Medicine, Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Nashville, TN, USA
| | - Javier Benitez
- Biomedical Network on Rare Diseases (CIBERER), Madrid, Spain
- Spanish National Cancer Research Centre (CNIO), Human Cancer Genetics Programme, Madrid, Spain
| | - Marina Bermisheva
- Ufa Federal Research Centre of the Russian Academy of Sciences, Institute of Biochemistry and Genetics, Ufa, Russia
| | - Marcus Q Bernardini
- Princess Margaret Hospital, Division of Gynecologic Oncology, University Health Network, Toronto, ON, Canada
| | - Line Bjorge
- Haukeland University Hospital, Department of Obstetrics and Gynecology, Bergen, Norway
- University of Bergen, Centre for Cancer Biomarkers CCBIO, Department of Clinical Science, Bergen, Norway
| | - Amanda Black
- National Cancer Institute, Division of Cancer Epidemiology and Genetics, Bethesda, MD, USA
| | - Natalia V Bogdanova
- N.N. Alexandrov Research Institute of Oncology and Medical Radiology, Minsk, Belarus
- Hannover Medical School, Department of Radiation Oncology, Hannover, Germany
- Hannover Medical School, Gynaecology Research Unit, Hannover, Germany
| | - Bernardo Bonanni
- IEO, European Institute of Oncology IRCCS, Division of Cancer Prevention and Genetics, Milan, Italy
| | - Ake Borg
- Lund University and Skåne University Hospital, Department of Oncology, Lund, Sweden
| | - James D Brenton
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Agnieszka Budzilowska
- Maria Sklodowska-Curie National Research Institute of Oncology, Department of Pathology and Laboratory Diagnostics, Warsaw, Poland
| | - Ralf Butzow
- University of Helsinki, Department of Pathology, Helsinki University Hospital, Helsinki, Finland
| | - Saundra S Buys
- Huntsman Cancer Institute, Department of Medicine, Salt Lake City, UT, USA
| | - Hui Cai
- Vanderbilt University School of Medicine, Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Nashville, TN, USA
| | - Maria A Caligo
- University Hospital, SOD Genetica Molecolare, Pisa, Italy
| | - Ian Campbell
- Peter MacCallum Cancer Center, Melbourne, VIC, Australia
- The University of Melbourne, Sir Peter MacCallum Department of Oncology, Melbourne, VIC, Australia
| | - Rikki Cannioto
- Roswell Park Cancer Institute, Cancer Pathology & Prevention, Division of Cancer Prevention and Population Sciences, Buffalo, NY, USA
| | - Hayley Cassingham
- Division of Human Genetics, The Ohio State University, Department of Internal Medicine, Columbus, OH, USA
| | - Jenny Chang-Claude
- German Cancer Research Center (DKFZ), Division of Cancer Epidemiology, Heidelberg, Germany
- University Medical Center Hamburg-Eppendorf, Cancer Epidemiology Group, University Cancer Center Hamburg (UCCH), Hamburg, Germany
| | - Stephen J Chanock
- National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Division of Cancer Epidemiology and Genetics, Bethesda, MD, USA
| | - Kexin Chen
- Tianjin Medical University Cancer Institute and Hospital, Department of Epidemiology, Tianjin, China
| | - Yoke-Eng Chiew
- The University of Sydney, Centre for Cancer Research, The Westmead Institute for Medical Research, Sydney, NSW, Australia
- Westmead Hospital, Department of Gynaecological Oncology, Sydney, NSW, Australia
| | - Wendy K Chung
- Columbia University, Departments of Pediatrics and Medicine, New York, NY, USA
| | | | - Sarah Colonna
- Huntsman Cancer Institute, Department of Medicine, Salt Lake City, UT, USA
| | - Linda S Cook
- University of New Mexico, University of New Mexico Health Sciences Center, Albuquerque, NM, USA
- Alberta Health Services, Department of Cancer Epidemiology and Prevention Research, Calgary, AB, Canada
| | - Fergus J Couch
- Mayo Clinic, Department of Laboratory Medicine and Pathology, Rochester, MN, USA
| | - Mary B Daly
- Fox Chase Cancer Center, Department of Clinical Genetics, Philadelphia, PA, USA
| | - Fanny Dao
- Memorial Sloan Kettering Cancer Center, Gynecology Service, Department of Surgery, New York, NY, USA
| | | | - Miguel de la Hoya
- CIBERONC, Hospital Clinico San Carlos, IdISSC (Instituto de Investigación Sanitaria del Hospital Clínico San Carlos), Molecular Oncology Laboratory, Madrid, Spain
| | - Robin de Putter
- Ghent University, Centre for Medical Genetics, Gent, Belgium
| | - Joe Dennis
- University of Cambridge, Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, Cambridge, UK
| | - Allison DePersia
- NorthShore University Health System, Center for Medical Genetics, Evanston, IL, USA
- The University of Chicago Pritzker School of Medicine, Chicago, IL, USA
| | - Peter Devilee
- Leiden University Medical Center, Department of Pathology, Leiden, The Netherlands
- Leiden University Medical Center, Department of Human Genetics, Leiden, The Netherlands
| | - Orland Diez
- Vall dHebron Institute of Oncology (VHIO), Oncogenetics Group, Barcelona, Spain
- University Hospital Vall dHebron, Clinical and Molecular Genetics Area, Barcelona, Spain
| | - Yuan Chun Ding
- Beckman Research Institute of City of Hope, Department of Population Sciences, Duarte, CA, USA
| | - Jennifer A Doherty
- University of Utah, Huntsman Cancer Institute, Department of Population Health Sciences, Salt Lake City, UT, USA
| | - Susan M Domchek
- University of Pennsylvania, Basser Center for BRCA, Abramson Cancer Center, Philadelphia, PA, USA
| | - Thilo Dörk
- Hannover Medical School, Gynaecology Research Unit, Hannover, Germany
| | - Andreas du Bois
- Ev. Kliniken Essen-Mitte (KEM), Department of Gynecology and Gynecologic Oncology, Essen, Germany
- Dr. Horst Schmidt Kliniken Wiesbaden, Department of Gynecology and Gynecologic Oncology, Wiesbaden, Germany
| | - Matthias Dürst
- Jena University Hospital-Friedrich Schiller University, Department of Gynaecology, Jena, Germany
| | - Diana M Eccles
- University of Southampton, Faculty of Medicine, Southampton, UK
| | - Heather A Eliassen
- Harvard T.H. Chan School of Public Health, Department of Epidemiology, Boston, MA, USA
- Brigham and Women's Hospital and Harvard Medical School, Channing Division of Network Medicine, Boston, MA, USA
| | - Christoph Engel
- University of Leipzig, Institute for Medical Informatics, Statistics and Epidemiology, Leipzig, Germany
- University of Leipzig, LIFE-Leipzig Research Centre for Civilization Diseases, Leipzig, Germany
| | - Gareth D Evans
- University of Manchester, Manchester Academic Health Science Centre, Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, Manchester, UK
- St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, North West Genomics Laboratory Hub, Manchester Centre for Genomic Medicine, Manchester, UK
| | - Peter A Fasching
- University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nuremberg, Department of Gynecology and Obstetrics, Comprehensive Cancer Center ER-EMN, Erlangen, Germany
- University of California at Los Angeles, David Geffen School of Medicine, Department of Medicine Division of Hematology and Oncology, Los Angeles, CA, USA
| | - James M Flanagan
- Imperial College London, Division of Cancer and Ovarian Cancer Action Research Centre, Department of Surgery and Cancer, London, UK
| | - Renée T Fortner
- German Cancer Research Center (DKFZ), Division of Cancer Epidemiology, Heidelberg, Germany
| | - Eva Machackova
- Masaryk Memorial Cancer Institute, Department of Cancer Epidemiology and Genetics, Brno, Czech Republic
| | - Eitan Friedman
- Chaim Sheba Medical Center, The Susanne Levy Gertner Oncogenetics Unit, Ramat Gan, Israel
- Tel Aviv University, Sackler Faculty of Medicine, Ramat Aviv, Israel
| | - Patricia A Ganz
- Jonsson Comprehensive Cancer Centre, UCLA, Schools of Medicine and Public Health, Division of Cancer Prevention & Control Research, Los Angeles, CA, USA
| | - Judy Garber
- Dana-Farber Cancer Institute, Cancer Risk and Prevention Clinic, Boston, MA, USA
| | - Francesca Gensini
- University of Florence, Department of Experimental and Clinical Biomedical Sciences 'Mario Serio', Medical Genetics Unit, Florence, Italy
| | - Graham G Giles
- Cancer Council Victoria, Cancer Epidemiology Division, Melbourne, VIC, Australia
- The University of Melbourne, Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, Melbourne, VIC, Australia
- Monash University, Precision Medicine, School of Clinical Sciences at Monash Health, Clayton, VIC, Australia
| | - Gord Glendon
- Lunenfeld-Tanenbaum Research Institute of Mount Sinai Hospital, Fred A. Litwin Center for Cancer Genetics, Toronto, ON, Canada
| | - Andrew K Godwin
- University of Kansas Medical Center, Department of Pathology and Laboratory Medicine, Kansas City, KS, USA
| | - Marc T Goodman
- Cedars-Sinai Medical Center, Samuel Oschin Comprehensive Cancer Institute, Cancer Prevention and Genetics Program, Los Angeles, CA, USA
| | - Mark H Greene
- National Cancer Institute, Clinical Genetics Branch, Division of Cancer Epidemiology and Genetics, Bethesda, MD, USA
| | - Jacek Gronwald
- Pomeranian Medical University, Department of Genetics and Pathology, Szczecin, Poland
| | - Eric Hahnen
- Faculty of Medicine and University Hospital Cologne, University of Cologne, Center for Familial Breast and Ovarian Cancer, Cologne, Germany
- Faculty of Medicine and University Hospital Cologne, University of Cologne, Center for Integrated Oncology (CIO), Cologne, Germany
| | - Christopher A Haiman
- University of Southern California, Department of Preventive Medicine, Keck School of Medicine, Los Angeles, CA, USA
| | - Niclas Håkansson
- Karolinska Institutet, Institute of Environmental Medicine, Stockholm, Sweden
| | - Ute Hamann
- German Cancer Research Center (DKFZ), Molecular Genetics of Breast Cancer, Heidelberg, Germany
| | - Thomas V O Hansen
- Rigshospitalet, Copenhagen University Hospital, Department of Clinical Genetics, Copenhagen, Denmark
| | - Holly R Harris
- Fred Hutchinson Cancer Research Center, Program in Epidemiology, Division of Public Health Sciences, Seattle, WA, USA
- University of Washington, Department of Epidemiology, Seattle, WA, USA
| | - Mikael Hartman
- National University of Singapore and National University Health System, Saw Swee Hock School of Public Health, Singapore, Singapore
- National University Health System, Department of Surgery, Singapore, Singapore
| | - Florian Heitz
- Ev. Kliniken Essen-Mitte (KEM), Department of Gynecology and Gynecologic Oncology, Essen, Germany
- Dr. Horst Schmidt Kliniken Wiesbaden, Department of Gynecology and Gynecologic Oncology, Wiesbaden, Germany
- Humboldt-Universität zu Berlin, and Berlin Institute of Health, Department for Gynecology with the Center for Oncologic Surgery Charité Campus Virchow-Klinikum, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Berlin, Germany
| | | | - Estrid Høgdall
- Danish Cancer Society Research Center, Department of Virus, Lifestyle and Genes, Copenhagen, Denmark
- University of Copenhagen, Molecular Unit, Department of Pathology, Herlev Hospital, Copenhagen, Denmark
| | - Claus K Høgdall
- University of Copenhagen, Department of Gynaecology, Rigshospitalet, Copenhagen, Denmark
| | - John L Hopper
- The University of Melbourne, Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, Melbourne, VIC, Australia
| | - Ruea-Yea Huang
- Roswell Park Cancer Institute, Center For Immunotherapy, Buffalo, NY, USA
| | - Chad Huff
- University of Texas MD Anderson Cancer Center, Department of Epidemiology, Houston, TX, USA
| | - Peter J Hulick
- NorthShore University Health System, Center for Medical Genetics, Evanston, IL, USA
- The University of Chicago Pritzker School of Medicine, Chicago, IL, USA
| | - David G Huntsman
- BC Cancer, Vancouver General Hospital, and University of British Columbia, British Columbia's Ovarian Cancer Research (OVCARE) Program, Vancouver, BC, Canada
- University of British Columbia, Department of Pathology and Laboratory Medicine, Vancouver, BC, Canada
- University of British Columbia, Department of Obstetrics and Gynecology, Vancouver, BC, Canada
- BC Cancer Research Centre, Department of Molecular Oncology, Vancouver, BC, Canada
| | | | - Claudine Isaacs
- Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC, USA
| | - Anna Jakubowska
- Pomeranian Medical University, Department of Genetics and Pathology, Szczecin, Poland
- Pomeranian Medical University, Independent Laboratory of Molecular Biology and Genetic Diagnostics, Szczecin, Poland
| | - Paul A James
- The University of Melbourne, Sir Peter MacCallum Department of Oncology, Melbourne, VIC, Australia
- Peter MacCallum Cancer Center, Parkville Familial Cancer Centre, Melbourne, VIC, Australia
| | - Ramunas Janavicius
- Vilnius University Hospital Santariskiu Clinics, Hematology, oncology and transfusion medicine center, Dept. of Molecular and Regenerative Medicine, Vilnius, Lithuania
- State Research Institute Centre for Innovative Medicine, Vilnius, Lithuania
| | - Allan Jensen
- Danish Cancer Society Research Center, Department of Virus, Lifestyle and Genes, Copenhagen, Denmark
| | | | - Esther M John
- Stanford University School of Medicine, Department of Epidemiology & Population Health, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University School of Medicine, Department of Medicine, Division of Oncology, Stanford, CA, USA
| | - Michael E Jones
- The Institute of Cancer Research, Division of Genetics and Epidemiology, London, UK
| | - Daehee Kang
- Seoul National University College of Medicine, Department of Preventive Medicine, Seoul, Korea
- Seoul National University Graduate School, Department of Biomedical Sciences, Seoul, Korea
- Seoul National University, Cancer Research Institute, Seoul, Korea
| | - Beth Y Karlan
- University of California at Los Angeles, David Geffen School of Medicine, Department of Obstetrics and Gynecology, Los Angeles, CA, USA
| | - Anthony Karnezis
- UC Davis Medical Center, Department of Pathology and Laboratory Medicine, Sacramento, CA, USA
| | - Linda E Kelemen
- Medical University of South Carolina, Hollings Cancer Center, Charleston, SC, USA
| | - Elza Khusnutdinova
- Ufa Federal Research Centre of the Russian Academy of Sciences, Institute of Biochemistry and Genetics, Ufa, Russia
- Saint Petersburg State University, Saint Petersburg, Russia
| | - Lambertus A Kiemeney
- Radboud University Medical Center, Radboud Institute for Health Sciences, Nijmegen, The Netherlands
| | - Byoung-Gie Kim
- Sungkyunkwan University School of Medicine, Department of Obstetrics and Gynecology, Samsung Medical Center, Seoul, Korea
| | - Susanne K Kjaer
- Danish Cancer Society Research Center, Department of Virus, Lifestyle and Genes, Copenhagen, Denmark
- University of Copenhagen, Department of Gynaecology, Rigshospitalet, Copenhagen, Denmark
| | - Ian Komenaka
- City of Hope Clinical Cancer Genetics Community Research Network, Duarte, CA, USA
| | - Jolanta Kupryjanczyk
- Maria Sklodowska-Curie National Research Institute of Oncology, Department of Pathology and Laboratory Diagnostics, Warsaw, Poland
| | - Allison W Kurian
- Stanford University School of Medicine, Department of Epidemiology & Population Health, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University School of Medicine, Department of Medicine, Division of Oncology, Stanford, CA, USA
| | - Ava Kwong
- Cancer Genetics Centre, Hong Kong Hereditary Breast Cancer Family Registry, Happy Valley, Hong Kong
- The University of Hong Kong, Department of Surgery, Pok Fu Lam, Hong Kong
- Hong Kong Sanatorium and Hospital, Department of Surgery, Happy Valley, Hong Kong
| | - Diether Lambrechts
- VIB Center for Cancer Biology, Leuven, Belgium
- University of Leuven, Laboratory for Translational Genetics, Department of Human Genetics, Leuven, Belgium
| | - Melissa C Larson
- Mayo Clinic, Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Rochester, MN, USA
| | - Conxi Lazaro
- ONCOBELL-IDIBELL-IGTP, Catalan Institute of Oncology, CIBERONC, Hereditary Cancer Program, Barcelona, Spain
| | - Nhu D Le
- BC Cancer, Cancer Control Research, Vancouver, BC, Canada
| | - Goska Leslie
- University of Cambridge, Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, Cambridge, UK
| | - Jenny Lester
- University of California at Los Angeles, David Geffen School of Medicine, Department of Obstetrics and Gynecology, Los Angeles, CA, USA
| | - Fabienne Lesueur
- Institut Curie, Paris, France
- Mines ParisTech, Fontainebleau, France
- Inserm U900, Genetic Epidemiology of Cancer team, Paris, France
| | - Douglas A Levine
- Memorial Sloan Kettering Cancer Center, Gynecology Service, Department of Surgery, New York, NY, USA
- NYU Langone Medical Center, Gynecologic Oncology, Laura and Isaac Pearlmutter Cancer Center, New York, NY, USA
| | - Lian Li
- Tianjin Medical University Cancer Institute and Hospital, Department of Epidemiology, Tianjin, China
| | - Jingmei Li
- Genome Institute of Singapore, Human Genetics Division, Singapore, Singapore
| | - Jennifer T Loud
- National Cancer Institute, Clinical Genetics Branch, Division of Cancer Epidemiology and Genetics, Bethesda, MD, USA
| | - Karen H Lu
- University of Texas MD Anderson Cancer Center, Department of Gynecologic Oncology and Clinical Cancer Genetics Program, Houston, TX, USA
| | - Jan Lubiński
- Pomeranian Medical University, Department of Genetics and Pathology, Szczecin, Poland
| | - Phuong L Mai
- Magee-Womens Hospital, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Siranoush Manoukian
- Fondazione IRCCS Istituto Nazionale dei Tumori di Milano, Unit of Medical Genetics, Department of Medical Oncology and Hematology, Milan, Italy
| | - Jeffrey R Marks
- Duke University Hospital, Department of Surgery, Durham, NC, USA
| | - Rayna Kim Matsuno
- University of Hawaii Cancer Center, Cancer Epidemiology Program, Honolulu, HI, USA
| | - Keitaro Matsuo
- Aichi Cancer Center Research Institute, Division of Cancer Epidemiology and Prevention, Nagoya, Japan
- Nagoya University Graduate School of Medicine, Division of Cancer Epidemiology, Nagoya, Japan
| | - Taymaa May
- Princess Margaret Hospital, Division of Gynecologic Oncology, University Health Network, Toronto, ON, Canada
| | - Lesley McGuffog
- University of Cambridge, Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, Cambridge, UK
| | - John R McLaughlin
- Samuel Lunenfeld Research Institute, Public Health Ontario, Toronto, ON, Canada
| | - Iain A McNeish
- Imperial College London, Division of Cancer and Ovarian Cancer Action Research Centre, Department Surgery & Cancer, London, UK
- University of Glasgow, Institute of Cancer Sciences, Glasgow, UK
| | - Noura Mebirouk
- Institut Curie, Paris, France
- Mines ParisTech, Fontainebleau, France
- Inserm U900, Genetic Epidemiology of Cancer team, Paris, France
| | - Usha Menon
- University College London, MRC Clinical Trials Unit at UCL, Institute of Clinical Trials & Methodology, London, UK
| | - Austin Miller
- Roswell Park Cancer Institute, NRG Oncology, Statistics and Data Management Center, Buffalo, NY, USA
| | - Roger L Milne
- Cancer Council Victoria, Cancer Epidemiology Division, Melbourne, VIC, Australia
- The University of Melbourne, Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, Melbourne, VIC, Australia
- Monash University, Precision Medicine, School of Clinical Sciences at Monash Health, Clayton, VIC, Australia
| | - Albina Minlikeeva
- Roswell Park Cancer Institute, Division of Cancer Prevention and Control, Buffalo, NY, USA
| | - Francesmary Modugno
- Magee-Womens Research Institute and Hillman Cancer Center, Womens Cancer Research Center, Pittsburgh, PA, USA
- University of Pittsburgh School of Medicine, Division of Gynecologic Oncology, Department of Obstetrics, Gynecology and Reproductive Sciences, Pittsburgh, PA, USA
| | - Marco Montagna
- Veneto Institute of Oncology IOV-IRCCS, Immunology and Molecular Oncology Unit, Padua, Italy
| | - Kirsten B Moysich
- Roswell Park Cancer Institute, Division of Cancer Prevention and Control, Buffalo, NY, USA
| | - Elizabeth Munro
- Oregon Health & Science University, Department of Obstetrics and Gynecology, Portland, OR, USA
- Oregon Health & Science University, Knight Cancer Institute, Portland, OR, USA
| | - Katherine L Nathanson
- University of Pennsylvania, Basser Center for BRCA, Abramson Cancer Center, Philadelphia, PA, USA
| | - Susan L Neuhausen
- Beckman Research Institute of City of Hope, Department of Population Sciences, Duarte, CA, USA
| | - Heli Nevanlinna
- University of Helsinki, Department of Obstetrics and Gynecology, Helsinki University Hospital, Helsinki, Finland
| | - Joanne Ngeow Yuen Yie
- National Cancer Centre, Cancer Genetics Service, Singapore, Singapore
- Nanyang Technological University, Lee Kong Chian School of Medicine, Singapore, Singapore
| | | | - Finn C Nielsen
- Rigshospitalet, Copenhagen University Hospital, Department of Clinical Genetics, Copenhagen, Denmark
| | | | - Kunle Odunsi
- Roswell Park Cancer Institute, Department of Gynecologic Oncology, Buffalo, NY, USA
| | - Kenneth Offit
- Memorial Sloan Kettering Cancer Center, Clinical Genetics Research Lab, Department of Cancer Biology and Genetics, New York, NY, USA
- Memorial Sloan Kettering Cancer Center, Clinical Genetics Service, Department of Medicine, New York, NY, USA
| | - Edith Olah
- National Institute of Oncology, Department of Molecular Genetics, Budapest, Hungary
| | - Siel Olbrecht
- University Hospitals Leuven, Division of Gynecologic Oncology, Department of Obstetrics and Gynaecology and Leuven Cancer Institute, Leuven, Belgium
| | | | - Sara H Olson
- Memorial Sloan-Kettering Cancer Center, Department of Epidemiology and Biostatistics, New York, NY, USA
| | - Håkan Olsson
- Lund University, Department of Cancer Epidemiology, Clinical Sciences, Lund, Sweden
| | - Ana Osorio
- Spanish National Cancer Research Centre (CNIO), Human Cancer Genetics Programme, Madrid, Spain
- Centro de Investigación en Red de Enfermedades Raras (CIBERER), Madrid, Spain
| | - Laura Papi
- University of Florence, Department of Experimental and Clinical Biomedical Sciences 'Mario Serio', Medical Genetics Unit, Florence, Italy
| | - Sue K Park
- Seoul National University College of Medicine, Department of Preventive Medicine, Seoul, Korea
- Seoul National University Graduate School, Department of Biomedical Sciences, Seoul, Korea
- Seoul National University, Cancer Research Institute, Seoul, Korea
| | - Michael T Parsons
- QIMR Berghofer Medical Research Institute, Department of Genetics and Computational Biology, Brisbane, QLD, Australia
| | - Harsha Pathak
- University of Kansas Medical Center, Department of Pathology and Laboratory Medicine, Kansas City, KS, USA
| | - Inge Sokilde Pedersen
- Aalborg University Hospital, Molecular Diagnostics, Aalborg, Denmark
- Aalborg University Hospital, Clinical Cancer Research Center, Aalborg, Denmark
- Aalborg University, Department of Clinical Medicine, Aalborg, Denmark
| | - Ana Peixoto
- Portuguese Oncology Institute, Department of Genetics, Porto, Portugal
| | - Tanja Pejovic
- Oregon Health & Science University, Department of Obstetrics and Gynecology, Portland, OR, USA
- Oregon Health & Science University, Knight Cancer Institute, Portland, OR, USA
| | - Pedro Perez-Segura
- CIBERONC, Hospital Clinico San Carlos, IdISSC (Instituto de Investigación Sanitaria del Hospital Clínico San Carlos), Molecular Oncology Laboratory, Madrid, Spain
| | - Jennifer B Permuth
- Moffitt Cancer Center, Department of Cancer Epidemiology, Tampa, FL, USA
| | - Beth Peshkin
- Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC, USA
| | - Paolo Peterlongo
- IFOM-the FIRC Institute of Molecular Oncology, Genome Diagnostics Program, Milan, Italy
| | - Anna Piskorz
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Darya Prokofyeva
- Bashkir State University, Department of Genetics and Fundamental Medicine, Ufa, Russia
| | - Paolo Radice
- Fondazione IRCCS Istituto Nazionale dei Tumori (INT), Unit of Molecular Bases of Genetic Risk and Genetic Testing, Department of Research, Milan, Italy
| | | | - Marjorie J Riggan
- Duke University Hospital, Department of Gynecologic Oncology, Durham, NC, USA
| | - Harvey A Risch
- Yale School of Public Health, Chronic Disease Epidemiology, New Haven, CT, USA
| | - Cristina Rodriguez-Antona
- Biomedical Network on Rare Diseases (CIBERER), Madrid, Spain
- Spanish National Cancer Research Centre (CNIO), Human Cancer Genetics Programme, Madrid, Spain
| | - Eric Ross
- Fox Chase Cancer Center, Population Studies Facility, Philadelphia, PA, USA
| | - Mary Anne Rossing
- Fred Hutchinson Cancer Research Center, Program in Epidemiology, Division of Public Health Sciences, Seattle, WA, USA
- University of Washington, Department of Epidemiology, Seattle, WA, USA
| | - Ingo Runnebaum
- Jena University Hospital-Friedrich Schiller University, Department of Gynaecology, Jena, Germany
| | - Dale P Sandler
- National Institute of Environmental Health Sciences, NIH, Epidemiology Branch, Research Triangle Park, NC, USA
| | - Marta Santamariña
- Centro de Investigación en Red de Enfermedades Raras (CIBERER), Madrid, Spain
- Fundación Pública Galega Medicina Xenómica, Santiago De Compostela, Spain
- Instituto de Investigación Sanitaria de Santiago de Compostela, Santiago De Compostela, Spain
| | - Penny Soucy
- Centre Hospitalier Universitaire de Québec - Université Laval Research Center, Genomics Center, Québec City, QC, Canada
| | - Rita K Schmutzler
- Faculty of Medicine and University Hospital Cologne, University of Cologne, Center for Familial Breast and Ovarian Cancer, Cologne, Germany
- Faculty of Medicine and University Hospital Cologne, University of Cologne, Center for Integrated Oncology (CIO), Cologne, Germany
- Faculty of Medicine and University Hospital Cologne, University of Cologne, Center for Molecular Medicine Cologne (CMMC), Cologne, Germany
| | - V Wendy Setiawan
- University of Southern California, Department of Preventive Medicine, Keck School of Medicine, Los Angeles, CA, USA
| | - Kang Shan
- Hebei Medical University, Fourth Hospital, Department of Obstetrics and Gynaecology, Shijiazhuang, China
| | - Weiva Sieh
- Icahn School of Medicine at Mount Sinai, Department of Population Health Science and Policy, New York, NY, USA
- Icahn School of Medicine at Mount Sinai, Department of Genetics and Genomic Sciences, New York, NY, USA
| | - Jacques Simard
- Centre Hospitalier Universitaire de Québec-Université Laval Research Center, Genomic Center, Québec City, QC, Canada
| | - Christian F Singer
- Medical University of Vienna, Dept of OB/GYN and Comprehensive Cancer Center, Vienna, Austria
| | | | - Honglin Song
- University of Cambridge, Department of Public Health and Primary Care, Cambridge, UK
| | - Melissa C Southey
- Cancer Council Victoria, Cancer Epidemiology Division, Melbourne, VIC, Australia
- Monash University, Precision Medicine, School of Clinical Sciences at Monash Health, Clayton, VIC, Australia
- The University of Melbourne, Department of Clinical Pathology, Melbourne, VIC, Australia
| | - Helen Steed
- Royal Alexandra Hospital, Department of Obstetrics and Gynecology, Division of Gynecologic Oncology, Edmonton, AB, Canada
| | - Dominique Stoppa-Lyonnet
- INSERM U830, Department of Tumour Biology, Paris, France
- Institut Curie, Service de Génétique, Paris, France
- Université Paris Descartes, Paris, France
| | - Rebecca Sutphen
- University of South Florida, Epidemiology Center, College of Medicine, Tampa, FL, USA
| | - Anthony J Swerdlow
- The Institute of Cancer Research, Division of Genetics and Epidemiology, London, UK
- The Institute of Cancer Research, Division of Breast Cancer Research, London, UK
| | - Yen Yen Tan
- Medical University of Vienna, Dept of OB/GYN and Comprehensive Cancer Center, Vienna, Austria
| | - Manuel R Teixeira
- Portuguese Oncology Institute, Department of Genetics, Porto, Portugal
- University of Porto, Biomedical Sciences Institute (ICBAS), Porto, Portugal
| | - Soo Hwang Teo
- Cancer Research Malaysia, Breast Cancer Research Programme, Subang Jaya, Selangor, Malaysia
- University of Malaya, Department of Surgery, Faculty of Medicine, Kuala Lumpur, Malaysia
| | - Kathryn L Terry
- Harvard T.H. Chan School of Public Health, Department of Epidemiology, Boston, MA, USA
- Brigham and Women's Hospital and Harvard Medical School, Obstetrics and Gynecology Epidemiology Center, Boston, MA, USA
| | - Mary Beth Terry
- Columbia University, Department of Epidemiology, Mailman School of Public Health, New York, NY, USA
| | - Mads Thomassen
- Odense University Hospital, Department of Clinical Genetics, Odence C, Denmark
| | - Pamela J Thompson
- Cedars-Sinai Medical Center, Samuel Oschin Comprehensive Cancer Institute, Cancer Prevention and Genetics Program, Los Angeles, CA, USA
| | - Liv Cecilie Vestrheim Thomsen
- Haukeland University Hospital, Department of Obstetrics and Gynecology, Bergen, Norway
- University of Bergen, Centre for Cancer Biomarkers CCBIO, Department of Clinical Science, Bergen, Norway
| | - Darcy L Thull
- Magee-Womens Hospital, University of Pittsburgh School of Medicine, Department of Medicine, Pittsburgh, PA, USA
| | - Marc Tischkowitz
- McGill University, Program in Cancer Genetics, Departments of Human Genetics and Oncology, Montréal, QC, Canada
- University of Cambridge, Department of Medical Genetics, Cambridge, UK
| | - Linda Titus
- Dartmouth College, Geisel School of Medicine, Hanover, NH, USA
| | - Amanda E Toland
- The Ohio State University, Department of Cancer Biology and Genetics, Columbus, OH, USA
| | - Diana Torres
- German Cancer Research Center (DKFZ), Molecular Genetics of Breast Cancer, Heidelberg, Germany
- Pontificia Universidad Javeriana, Institute of Human Genetics, Bogota, Colombia
| | - Britton Trabert
- National Cancer Institute, Division of Cancer Epidemiology and Genetics, Bethesda, MD, USA
| | - Ruth Travis
- University of Oxford, Cancer Epidemiology Unit, Oxford, UK
| | - Nadine Tung
- Beth Israel Deaconess Medical Center, Department of Medical Oncology, Boston, MA, USA
| | - Shelley S Tworoger
- Harvard T.H. Chan School of Public Health, Department of Epidemiology, Boston, MA, USA
- Moffitt Cancer Center, Department of Cancer Epidemiology, Tampa, FL, USA
| | - Ellen Valen
- Haukeland University Hospital, Department of Obstetrics and Gynecology, Bergen, Norway
- University of Bergen, Centre for Cancer Biomarkers CCBIO, Department of Clinical Science, Bergen, Norway
| | - Anne M van Altena
- Radboud University Medical Center, Radboud Institute for Health Sciences, Nijmegen, The Netherlands
| | - Annemieke H van der Hout
- University Medical Center Groningen, University Groningen, Department of Genetics, Groningen, The Netherlands
| | - Els Van Nieuwenhuysen
- University Hospitals Leuven, Division of Gynecologic Oncology, Department of Obstetrics and Gynaecology and Leuven Cancer Institute, Leuven, Belgium
| | | | - Ana Vega
- Centro de Investigación en Red de Enfermedades Raras (CIBERER), Madrid, Spain
- Fundación Pública Galega de Medicina Xenómica, Santiago de Compostela, Spain
- Instituto de Investigación Sanitaria de Santiago de Compostela (IDIS), Complejo Hospitalario Universitario de Santiago, SERGAS, Santiago de Compostela, Spain
| | - Digna Velez Edwards
- Vanderbilt University Medical Center, Division of Quantitative Sciences, Department of Obstetrics and Gynecology, Department of Biomedical Sciences, Women's Health Research, Nashville, TN, USA
| | - Robert A Vierkant
- Mayo Clinic, Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Rochester, MN, USA
| | - Frances Wang
- Duke Cancer Institute, Cancer Control and Population Sciences, Durham, NC, USA
- Duke University Hospital, Department of Community and Family Medicine, Durham, NC, USA
| | - Barbara Wappenschmidt
- Faculty of Medicine and University Hospital Cologne, University of Cologne, Center for Familial Breast and Ovarian Cancer, Cologne, Germany
- Faculty of Medicine and University Hospital Cologne, University of Cologne, Center for Integrated Oncology (CIO), Cologne, Germany
| | - Penelope M Webb
- QIMR Berghofer Medical Research Institute, Population Health Department, Brisbane, QLD, Australia
| | - Clarice R Weinberg
- National Institute of Environmental Health Sciences, NIH, Biostatistics and Computational Biology Branch, Research Triangle Park, NC, USA
| | | | - Nicolas Wentzensen
- National Cancer Institute, Division of Cancer Epidemiology and Genetics, Bethesda, MD, USA
| | - Emily White
- University of Washington, Department of Epidemiology, Seattle, WA, USA
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Alice S Whittemore
- Stanford University School of Medicine, Department of Epidemiology & Population Health, Stanford, CA, USA
- Stanford University School of Medicine, Department of Biomedical Data Science, Stanford, CA, USA
| | - Stacey J Winham
- Mayo Clinic, Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Rochester, MN, USA
| | - Alicja Wolk
- Karolinska Institutet, Institute of Environmental Medicine, Stockholm, Sweden
- Uppsala University, Department of Surgical Sciences, Uppsala, Sweden
| | - Yin-Ling Woo
- University of Malaya, Department of Obstetrics and Gynaecology, University of Malaya Medical Centre, Kuala Lumpur, Malaysia
| | - Anna H Wu
- University of Southern California, Department of Preventive Medicine, Keck School of Medicine, Los Angeles, CA, USA
| | - Li Yan
- Hebei Medical University, Fourth Hospital, Department of Molecular Biology, Shijiazhuang, China
| | - Drakoulis Yannoukakos
- National Centre for Scientific Research 'Demokritos', Molecular Diagnostics Laboratory, INRASTES, Athens, Greece
| | | | - Wei Zheng
- Vanderbilt University School of Medicine, Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Nashville, TN, USA
| | - Argyrios Ziogas
- University of California Irvine, Department of Epidemiology, Genetic Epidemiology Research Institute, Irvine, CA, USA
| | - Kristin K Zorn
- Magee-Womens Hospital, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Zdenek Kleibl
- Institute of Biochemistry and Experimental Oncology, First Faculty od Medicine, Charles University, Prague, Czech Republic
| | - Douglas Easton
- University of Cambridge, Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, Cambridge, UK
- University of Cambridge, Centre for Cancer Genetic Epidemiology, Department of Oncology, Cambridge, UK
| | - Kate Lawrenson
- Center for Bioinformatics and Functional Genomics, Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Women's Cancer Program at the Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Centre, Department of Obstetrics and Gynecology, Los Angeles, CA, USA
| | - Anna DeFazio
- The University of Sydney, Centre for Cancer Research, The Westmead Institute for Medical Research, Sydney, NSW, Australia
- Westmead Hospital, Department of Gynaecological Oncology, Sydney, NSW, Australia
| | | | - Susan J Ramus
- University of NSW Sydney, School of Women's and Children's Health, Faculty of Medicine, Sydney, NSW, Australia
- University of NSW Sydney, Adult Cancer Program, Lowy Cancer Research Centre, Sydney, NSW, Australia
| | - Celeste L Pearce
- University of Michigan School of Public Health, Department of Epidemiology, Ann Arbor, MI, USA
- University of Southern California Norris Comprehensive Cancer Center, Department of Preventive Medicine, Keck School of Medicine, Los Angeles, CA, USA
| | - Alvaro N Monteiro
- Moffitt Cancer Center, Department of Cancer Epidemiology, Tampa, FL, USA
| | - Julie Cunningham
- Mayo Clinic, Department of Health Science Research, Division of Epidemiology, Rochester, MN, USA
| | - Ellen L Goode
- Mayo Clinic, Department of Health Science Research, Division of Epidemiology, Rochester, MN, USA
| | - Joellen M Schildkraut
- Emory University, Department of Epidemiology, Rollins School of Public Health, Atlanta, GA, USA
| | - Andrew Berchuck
- Duke University Hospital, Department of Gynecologic Oncology, Durham, NC, USA
| | - Georgia Chenevix-Trench
- QIMR Berghofer Medical Research Institute, Department of Genetics and Computational Biology, Brisbane, QLD, Australia
| | - Simon A Gayther
- Center for Bioinformatics and Functional Genomics, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Antonis C Antoniou
- University of Cambridge, Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, Cambridge, UK
| | - Paul D P Pharoah
- University of Cambridge, Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, Cambridge, UK.
- University of Cambridge, Centre for Cancer Genetic Epidemiology, Department of Oncology, Cambridge, UK.
| |
Collapse
|
53
|
Yang S, Zhou X. PGS-server: accuracy, robustness and transferability of polygenic score methods for biobank scale studies. Brief Bioinform 2022; 23:6534383. [PMID: 35193147 DOI: 10.1093/bib/bbac039] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Revised: 12/29/2021] [Accepted: 01/26/2022] [Indexed: 01/02/2023] Open
Abstract
Polygenic scores (PGS) are important tools for carrying out genetic prediction of common diseases and disease related complex traits, facilitating the development of precision medicine. Unfortunately, despite the critical importance of PGS and the vast number of PGS methods recently developed, few comprehensive comparison studies have been performed to evaluate the effectiveness of PGS methods. To fill this critical knowledge gap, we performed a comprehensive comparison study on 12 different PGS methods through internal evaluations on 25 quantitative and 25 binary traits within the UK Biobank with sample sizes ranging from 147 408 to 336 573, and through external evaluations via 25 cross-study and 112 cross-ancestry analyses on summary statistics from multiple genome-wide association studies with sample sizes ranging from 1415 to 329 345. We evaluate the prediction accuracy, computational scalability, as well as robustness and transferability of different PGS methods across datasets and/or genetic ancestries, providing important guidelines for practitioners in choosing PGS methods. Besides method comparison, we present a simple aggregation strategy that combines multiple PGS from different methods to take advantage of their distinct benefits to achieve stable and superior prediction performance. To facilitate future applications of PGS, we also develop a PGS webserver (http://www.pgs-server.com/) that allows users to upload summary statistics and choose different PGS methods to fit the data directly. We hope that our results, method and webserver will facilitate the routine application of PGS across different research areas.
Collapse
Affiliation(s)
- Sheng Yang
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Xiang Zhou
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.,Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
54
|
Low depression frequency is associated with decreased risk of cardiometabolic disease. NATURE CARDIOVASCULAR RESEARCH 2022; 1:125-131. [PMID: 35991864 PMCID: PMC9389944 DOI: 10.1038/s44161-021-00011-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
55
|
Wang YC, Wu Y, Choi J, Allington G, Zhao S, Khanfar M, Yang K, Fu PY, Wrubel M, Yu X, Mekbib KY, Ocken J, Smith H, Shohfi J, Kahle KT, Lu Q, Jin SC. Computational Genomics in the Era of Precision Medicine: Applications to Variant Analysis and Gene Therapy. J Pers Med 2022; 12:175. [PMID: 35207663 PMCID: PMC8878256 DOI: 10.3390/jpm12020175] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 01/18/2022] [Accepted: 01/24/2022] [Indexed: 02/04/2023] Open
Abstract
Rapid methodological advances in statistical and computational genomics have enabled researchers to better identify and interpret both rare and common variants responsible for complex human diseases. As we continue to see an expansion of these advances in the field, it is now imperative for researchers to understand the resources and methodologies available for various data types and study designs. In this review, we provide an overview of recent methods for identifying rare and common variants and understanding their roles in disease etiology. Additionally, we discuss the strategy, challenge, and promise of gene therapy. As computational and statistical approaches continue to improve, we will have an opportunity to translate human genetic findings into personalized health care.
Collapse
Affiliation(s)
- Yung-Chun Wang
- Department of Genetics, School of Medicine, Washington University, St. Louis, MO 63110, USA; (Y.-C.W.); (J.C.); (S.Z.); (M.K.); (K.Y.); (P.-Y.F.); (M.W.); (X.Y.)
| | - Yuchang Wu
- Department of Biostatistics & Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706, USA;
| | - Julie Choi
- Department of Genetics, School of Medicine, Washington University, St. Louis, MO 63110, USA; (Y.-C.W.); (J.C.); (S.Z.); (M.K.); (K.Y.); (P.-Y.F.); (M.W.); (X.Y.)
| | - Garrett Allington
- Department of Pathology, Yale School of Medicine, New Haven, CT 06510, USA;
- Department of Neurosurgery, Massachusetts General Hospital, Boston, MA 02114, USA; (H.S.); (K.T.K.)
| | - Shujuan Zhao
- Department of Genetics, School of Medicine, Washington University, St. Louis, MO 63110, USA; (Y.-C.W.); (J.C.); (S.Z.); (M.K.); (K.Y.); (P.-Y.F.); (M.W.); (X.Y.)
| | - Mariam Khanfar
- Department of Genetics, School of Medicine, Washington University, St. Louis, MO 63110, USA; (Y.-C.W.); (J.C.); (S.Z.); (M.K.); (K.Y.); (P.-Y.F.); (M.W.); (X.Y.)
| | - Kuangying Yang
- Department of Genetics, School of Medicine, Washington University, St. Louis, MO 63110, USA; (Y.-C.W.); (J.C.); (S.Z.); (M.K.); (K.Y.); (P.-Y.F.); (M.W.); (X.Y.)
| | - Po-Ying Fu
- Department of Genetics, School of Medicine, Washington University, St. Louis, MO 63110, USA; (Y.-C.W.); (J.C.); (S.Z.); (M.K.); (K.Y.); (P.-Y.F.); (M.W.); (X.Y.)
| | - Max Wrubel
- Department of Genetics, School of Medicine, Washington University, St. Louis, MO 63110, USA; (Y.-C.W.); (J.C.); (S.Z.); (M.K.); (K.Y.); (P.-Y.F.); (M.W.); (X.Y.)
| | - Xiaobing Yu
- Department of Genetics, School of Medicine, Washington University, St. Louis, MO 63110, USA; (Y.-C.W.); (J.C.); (S.Z.); (M.K.); (K.Y.); (P.-Y.F.); (M.W.); (X.Y.)
- Department of Computer Science & Engineering, Washington University, St. Louis, MO 63130, USA
| | - Kedous Y. Mekbib
- Department of Neurosurgery, Yale University School of Medicine, New Haven, CT 06510, USA; (K.Y.M.); (J.O.); (J.S.)
| | - Jack Ocken
- Department of Neurosurgery, Yale University School of Medicine, New Haven, CT 06510, USA; (K.Y.M.); (J.O.); (J.S.)
| | - Hannah Smith
- Department of Neurosurgery, Massachusetts General Hospital, Boston, MA 02114, USA; (H.S.); (K.T.K.)
- Department of Neurosurgery, Yale University School of Medicine, New Haven, CT 06510, USA; (K.Y.M.); (J.O.); (J.S.)
| | - John Shohfi
- Department of Neurosurgery, Yale University School of Medicine, New Haven, CT 06510, USA; (K.Y.M.); (J.O.); (J.S.)
| | - Kristopher T. Kahle
- Department of Neurosurgery, Massachusetts General Hospital, Boston, MA 02114, USA; (H.S.); (K.T.K.)
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115, USA
- Departments of Pediatrics and Neurology, Harvard Medical School, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Qiongshi Lu
- Department of Biostatistics & Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706, USA;
| | - Sheng Chih Jin
- Department of Genetics, School of Medicine, Washington University, St. Louis, MO 63110, USA; (Y.-C.W.); (J.C.); (S.Z.); (M.K.); (K.Y.); (P.-Y.F.); (M.W.); (X.Y.)
- Department of Pediatrics, School of Medicine, Washington University, St. Louis, MO 63110, USA
| |
Collapse
|
56
|
Song S, Hou L, Liu JS. A data-adaptive Bayesian regression approach for polygenic risk prediction. Bioinformatics 2022; 38:1938-1946. [PMID: 35020805 PMCID: PMC8963326 DOI: 10.1093/bioinformatics/btac024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 12/21/2021] [Accepted: 01/09/2022] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION Polygenic risk score (PRS) has been widely exploited for genetic risk prediction due to its accuracy and conceptual simplicity. We introduce a unified Bayesian regression framework, NeuPred, for PRS construction, which accommodates varying genetic architectures and improves overall prediction accuracy for complex diseases by allowing for a wide class of prior choices. To take full advantage of the framework, we propose a summary-statistics-based cross-validation strategy to automatically select suitable chromosome-level priors, which demonstrates a striking variability of the prior preference of each chromosome, for the same complex disease, and further significantly improves the prediction accuracy. RESULTS Simulation studies and real data applications with seven disease datasets from the Wellcome Trust Case Control Consortium cohort and eight groups of large-scale genome-wide association studies demonstrate that NeuPred achieves substantial and consistent improvements in terms of predictive r2 over existing methods. In addition, NeuPred has similar or advantageous computational efficiency compared with the state-of-the-art Bayesian methods. AVAILABILITY AND IMPLEMENTATION The R package implementing NeuPred is available at https://github.com/shuangsong0110/NeuPred. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shuang Song
- Center for Statistical Science, Tsinghua University, Beijing
100084, China,School of Life Sciences, Department of Industrial Engineering, Tsinghua
University, Beijing 100084, China
| | - Lin Hou
- To whom correspondence should be addressed.
or
| | - Jun S Liu
- To whom correspondence should be addressed.
or
| |
Collapse
|
57
|
Ahmadi N. Genetic Bases of Complex Traits: From Quantitative Trait Loci to Prediction. Methods Mol Biol 2022; 2467:1-44. [PMID: 35451771 DOI: 10.1007/978-1-0716-2205-6_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Conceived as a general introduction to the book, this chapter is a reminder of the core concepts of genetic mapping and molecular marker-based prediction. It provides an overview of the principles and the evolution of methods for mapping the variation of complex traits, and methods for QTL-based prediction of human disease risk and animal and plant breeding value. The principles of linkage-based and linkage disequilibrium-based QTL mapping methods are described in the context of the simplest, single-marker, methods. Methodological evolutions are analysed in relation with their ability to account for the complexity of the genotype-phenotype relations. Main characteristics of the genetic architecture of complex traits, drawn from QTL mapping works using large populations of unrelated individuals, are presented. Methods combining marker-QTL association data into polygenic risk score that captures part of an individual's susceptibility to complex diseases are reviewed. Principles of best linear mixed model-based prediction of breeding value in animal- and plant-breeding programs using phenotypic and pedigree data, are summarized and methods for moving from BLUP to marker-QTL BLUP are presented. Factors influencing the additional genetic progress achieved by using molecular data and rules for their optimization are discussed.
Collapse
Affiliation(s)
- Nourollah Ahmadi
- CIRAD, UMR AGAP Institut, Montpellier, France.
- AGAP Institut, Univ Montpellier, CIRAD, INRAE, Montpellier SupAgro, Montpellier, France.
| |
Collapse
|
58
|
Ding Y, Hou K, Burch KS, Lapinska S, Privé F, Vilhjálmsson B, Sankararaman S, Pasaniuc B. Large uncertainty in individual polygenic risk score estimation impacts PRS-based risk stratification. Nat Genet 2022; 54:30-39. [PMID: 34931067 PMCID: PMC8758557 DOI: 10.1038/s41588-021-00961-5] [Citation(s) in RCA: 54] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2020] [Accepted: 09/29/2021] [Indexed: 01/05/2023]
Abstract
Although the cohort-level accuracy of polygenic risk scores (PRSs)-estimates of genetic value at the individual level-has been widely assessed, uncertainty in PRSs remains underexplored. In the present study, we show that Bayesian PRS methods can estimate the variance of an individual's PRS and can yield well-calibrated credible intervals via posterior sampling. For 13 real traits in the UK Biobank (n = 291,273 unrelated 'white British'), we observe large variances in individual PRS estimates which impact interpretation of PRS-based stratification; averaging across traits, only 0.8% (s.d. = 1.6%) of individuals with PRS point estimates in the top decile have corresponding 95% credible intervals fully contained in the top decile. We provide an analytical estimator for the expectation of individual PRS variance as a function of SNP heritability, number of causal SNPs and sample size. Our results showcase the importance of incorporating uncertainty in individual PRS estimates into subsequent analyses.
Collapse
Affiliation(s)
- Yi Ding
- Bioinformatics Interdepartmental Program, University of California, Los Angeles (UCLA), Los Angeles, CA, USA.
| | - Kangcheng Hou
- Bioinformatics Interdepartmental Program, University of California, Los Angeles (UCLA), Los Angeles, CA, USA.
| | - Kathryn S Burch
- Bioinformatics Interdepartmental Program, University of California, Los Angeles (UCLA), Los Angeles, CA, USA
| | - Sandra Lapinska
- Bioinformatics Interdepartmental Program, University of California, Los Angeles (UCLA), Los Angeles, CA, USA
| | - Florian Privé
- Department of Economics and Business Economics, National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
| | - Bjarni Vilhjálmsson
- Department of Economics and Business Economics, National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
| | - Sriram Sankararaman
- Bioinformatics Interdepartmental Program, University of California, Los Angeles (UCLA), Los Angeles, CA, USA
- Department of Computer Science, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles (UCLA), Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
| |
Collapse
|
59
|
Wang Y, Zhu M, Ma H, Shen H. Polygenic risk scores: the future of cancer risk prediction, screening, and precision prevention. MEDICAL REVIEW (BERLIN, GERMANY) 2021; 1:129-149. [PMID: 37724297 PMCID: PMC10471106 DOI: 10.1515/mr-2021-0025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 12/13/2021] [Indexed: 09/20/2023]
Abstract
Genome-wide association studies (GWASs) have shown that the genetic architecture of cancers are highly polygenic and enabled researchers to identify genetic risk loci for cancers. The genetic variants associated with a cancer can be combined into a polygenic risk score (PRS), which captures part of an individual's genetic susceptibility to cancer. Recently, PRSs have been widely used in cancer risk prediction and are shown to be capable of identifying groups of individuals who could benefit from the knowledge of their probabilistic susceptibility to cancer, which leads to an increased interest in understanding the potential utility of PRSs that might further refine the assessment and management of cancer risk. In this context, we provide an overview of the major discoveries from cancer GWASs. We then review the methodologies used for PRS construction, and describe steps for the development and evaluation of risk prediction models that include PRS and/or conventional risk factors. Potential utility of PRSs in cancer risk prediction, screening, and precision prevention are illustrated. Challenges and practical considerations relevant to the implementation of PRSs in health care settings are discussed.
Collapse
Affiliation(s)
- Yuzhuo Wang
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Meng Zhu
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China
- Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Hongxia Ma
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China
- Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, Jiangsu, China
- Research Units of Cohort Study on Cardiovascular Diseases and Cancers, Chinese Academy of Medical Sciences, Beijing, China
| | - Hongbing Shen
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China
- Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, Jiangsu, China
- Research Units of Cohort Study on Cardiovascular Diseases and Cancers, Chinese Academy of Medical Sciences, Beijing, China
| |
Collapse
|
60
|
Abstract
Over the past decade, substantial progress has been made in the discovery of alleles contributing to the risk of coronary artery disease. In addition to providing causal insights into disease, these endeavours have yielded and enabled the refinement of polygenic risk scores. These scores can be used to predict incident coronary artery disease in multiple cohorts and indicate the clinical response to some preventive therapies in post hoc analyses of clinical trials. These observations and the widespread ability to calculate polygenic risk scores from direct-to-consumer and health-care-associated biobanks have raised many questions about responsible clinical adoption. In this Review, we describe technical and downstream considerations for the derivation and validation of polygenic risk scores and current evidence for their efficacy and safety. We discuss the implementation of these scores in clinical medicine for uses including risk prediction and screening algorithms for coronary artery disease, prioritization of patient subgroups that are likely to derive benefit from treatment, and efficient prospective clinical trial designs.
Collapse
|
61
|
Ma Y, Zhou X. Genetic prediction of complex traits with polygenic scores: a statistical review. Trends Genet 2021; 37:995-1011. [PMID: 34243982 PMCID: PMC8511058 DOI: 10.1016/j.tig.2021.06.004] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 05/31/2021] [Accepted: 06/03/2021] [Indexed: 01/03/2023]
Abstract
Accurate genetic prediction of complex traits can facilitate disease screening, improve early intervention, and aid in the development of personalized medicine. Genetic prediction of complex traits requires the development of statistical methods that can properly model polygenic architecture and construct a polygenic score (PGS). We present a comprehensive review of 46 methods for PGS construction. We connect the majority of these methods through a multiple linear regression framework which can be instrumental for understanding their prediction performance for traits with distinct genetic architectures. We discuss the practical considerations of PGS analysis as well as challenges and future directions of PGS method development. We hope our review serves as a useful reference both for statistical geneticists who develop PGS methods and for data analysts who perform PGS analysis.
Collapse
Affiliation(s)
- Ying Ma
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
62
|
Johnson R, Burch KS, Hou K, Paciuc M, Pasaniuc B, Sankararaman S. Estimation of regional polygenicity from GWAS provides insights into the genetic architecture of complex traits. PLoS Comput Biol 2021; 17:e1009483. [PMID: 34673766 PMCID: PMC8562817 DOI: 10.1371/journal.pcbi.1009483] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Revised: 11/02/2021] [Accepted: 09/27/2021] [Indexed: 11/18/2022] Open
Abstract
The number of variants that have a non-zero effect on a trait (i.e. polygenicity) is a fundamental parameter in the study of the genetic architecture of a complex trait. Although many previous studies have investigated polygenicity at a genome-wide scale, a detailed understanding of how polygenicity varies across genomic regions is currently lacking. In this work, we propose an accurate and scalable statistical framework to estimate regional polygenicity for a complex trait. We show that our approach yields approximately unbiased estimates of regional polygenicity in simulations across a wide-range of various genetic architectures. We then partition the polygenicity of anthropometric and blood pressure traits across 6-Mb genomic regions (N = 290K, UK Biobank) and observe that all analyzed traits are highly polygenic: over one-third of regions harbor at least one causal variant for each of the traits analyzed. Additionally, we observe wide variation in regional polygenicity: on average across all traits, 48.9% of regions contain at least 5 causal SNPs, 5.44% of regions contain at least 50 causal SNPs. Finally, we find that heritability is proportional to polygenicity at the regional level, which is consistent with the hypothesis that heritability enrichments are largely driven by the variation in the number of causal SNPs. The proportion of SNPs with nonzero effects on a trait, or polygenicity, is a key quantity used to describe the genetic architecture of a complex trait. Furthermore, identifying the specific genomic regions that contribute to trait variation requires an understanding of how the number of causal SNPs varies across regions of the genome (regional polygenicity). In this work, we propose a statistical framework to estimate regional polygenicity for a complex trait using marginal effect sizes from GWAS and LD information. We demonstrate in simulation and empirical data that our approach accurately and efficiently estimates regional polygenicity. We find that SNP-heritability is proportional to polygenicity both on the genome-wide and regional scale, suggesting that the observed differences in heritability across traits stem from differences in the underlying number of causal SNPs.
Collapse
Affiliation(s)
- Ruth Johnson
- Department of Computer Science, University of California, Los Angeles, California, United States of America
- * E-mail: (RJ); (SS)
| | - Kathryn S. Burch
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, California, United States of America
| | - Kangcheng Hou
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, California, United States of America
| | - Mario Paciuc
- Department of Statistics, Rice University, Houston, Texas, United States of America
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, California, United States of America
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America
- Department of Human Genetics, University of California, Los Angeles, California, United States of America
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America
| | - Sriram Sankararaman
- Department of Computer Science, University of California, Los Angeles, California, United States of America
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, California, United States of America
- Department of Human Genetics, University of California, Los Angeles, California, United States of America
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America
- * E-mail: (RJ); (SS)
| |
Collapse
|
63
|
Márquez-Luna C, Gazal S, Loh PR, Kim SS, Furlotte N, Auton A, Price AL. Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets. Nat Commun 2021; 12:6052. [PMID: 34663819 PMCID: PMC8523709 DOI: 10.1038/s41467-021-25171-9] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 07/16/2021] [Indexed: 12/23/2022] Open
Abstract
Polygenic risk prediction is a widely investigated topic because of its promising clinical applications. Genetic variants in functional regions of the genome are enriched for complex trait heritability. Here, we introduce a method for polygenic prediction, LDpred-funct, that leverages trait-specific functional priors to increase prediction accuracy. We fit priors using the recently developed baseline-LD model, including coding, conserved, regulatory, and LD-related annotations. We analytically estimate posterior mean causal effect sizes and then use cross-validation to regularize these estimates, improving prediction accuracy for sparse architectures. We applied LDpred-funct to predict 21 highly heritable traits in the UK Biobank (avg N = 373 K as training data). LDpred-funct attained a +4.6% relative improvement in average prediction accuracy (avg prediction R2 = 0.144; highest R2 = 0.413 for height) compared to SBayesR (the best method that does not incorporate functional information). For height, meta-analyzing training data from UK Biobank and 23andMe cohorts (N = 1107 K) increased prediction R2 to 0.431. Our results show that incorporating functional priors improves polygenic prediction accuracy, consistent with the functional architecture of complex traits.
Collapse
Affiliation(s)
- Carla Márquez-Luna
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Charles R. Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Steven Gazal
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Charles R. Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Po-Ru Loh
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Samuel S Kim
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | | | - Alkes L Price
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
64
|
Wu C, Zhu J, King A, Tong X, Lu Q, Park JY, Wang L, Gao G, Deng HW, Yang Y, Knudsen KE, Rebbeck TR, Long J, Zheng W, Pan W, Conti DV, Haiman CA, Wu L. Novel strategy for disease risk prediction incorporating predicted gene expression and DNA methylation data: a multi-phased study of prostate cancer. Cancer Commun (Lond) 2021; 41:1387-1397. [PMID: 34520132 PMCID: PMC8696216 DOI: 10.1002/cac2.12205] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 06/10/2021] [Accepted: 07/26/2021] [Indexed: 12/15/2022] Open
Abstract
Background DNA methylation and gene expression are known to play important roles in the etiology of human diseases such as prostate cancer (PCa). However, it has not yet been possible to incorporate information of DNA methylation and gene expression into polygenic risk scores (PRSs). Here, we aimed to develop and validate an improved PRS for PCa risk by incorporating genetically predicted gene expression and DNA methylation, and other genomic information using an integrative method. Methods Using data from the PRACTICAL consortium, we derived multiple sets of genetic scores, including those based on available single‐nucleotide polymorphisms through widely used methods of pruning and thresholding, LDpred, LDpred‐funt, AnnoPred, and EBPRS, as well as PRS constructed using the genetically predicted gene expression and DNA methylation through a revised pruning and thresholding strategy. In the tuning step, using the UK Biobank data (1458 prevalent cases and 1467 controls), we selected PRSs with the best performance. Using an independent set of data from the UK Biobank, we developed an integrative PRS combining information from individual scores. Furthermore, in the testing step, we tested the performance of the integrative PRS in another independent set of UK Biobank data of incident cases and controls. Results Our constructed PRS had improved performance (C statistics: 76.1%) over PRSs constructed by individual benchmark methods (from 69.6% to 74.7%). Furthermore, our new PRS had much higher risk assessment power than family history. The overall net reclassification improvement was 69.0% by adding PRS to the baseline model compared with 12.5% by adding family history. Conclusions We developed and validated a new PRS which may improve the utility in predicting the risk of developing PCa. Our innovative method can also be applied to other human diseases to improve risk prediction across multiple outcomes.
Collapse
Affiliation(s)
- Chong Wu
- Department of Statistics, Florida State University, Tallahassee, FL, 32304, USA
| | - Jingjing Zhu
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawaii Cancer Center, University of Hawaii at Manoa, Honolulu, HI, 96813, USA
| | - Austin King
- Department of Statistics, Florida State University, Tallahassee, FL, 32304, USA
| | - Xiaoran Tong
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, 48824, USA
| | - Qing Lu
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA
| | - Jong Y Park
- Department of Cancer Epidemiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, 33612, USA
| | - Liang Wang
- Department of Tumor Biology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, 33612, USA
| | - Guimin Gao
- Department of Public Health Sciences, University of Chicago, Chicago, IL, 60637, USA
| | - Hong-Wen Deng
- Center of Bioinformatics and Genomics, Department of Global Biostatistics and Data Science, Tulane University, New Orleans, LA, 70112, USA
| | - Yaohua Yang
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, 37203, USA
| | - Karen E Knudsen
- Department of Cancer Biology, Sidney Kimmel Cancer Center, Thomas Jefferson University, Philadelphia, PA, 19107, USA
| | - Timothy R Rebbeck
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA.,Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Jirong Long
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, 37203, USA
| | - Wei Zheng
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, 37203, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, MN, 55455, USA
| | - David V Conti
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, CA, 90033, USA
| | - Christopher A Haiman
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, CA, 90033, USA
| | - Lang Wu
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawaii Cancer Center, University of Hawaii at Manoa, Honolulu, HI, 96813, USA
| |
Collapse
|
65
|
Zhao Z, Yi Y, Song J, Wu Y, Zhong X, Lin Y, Hohman TJ, Fletcher J, Lu Q. PUMAS: fine-tuning polygenic risk scores with GWAS summary statistics. Genome Biol 2021; 22:257. [PMID: 34488838 PMCID: PMC8419981 DOI: 10.1186/s13059-021-02479-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 08/25/2021] [Indexed: 12/20/2022] Open
Abstract
Polygenic risk scores (PRSs) have wide applications in human genetics research, but often include tuning parameters which are difficult to optimize in practice due to limited access to individual-level data. Here, we introduce PUMAS, a novel method to fine-tune PRS models using summary statistics from genome-wide association studies (GWASs). Through extensive simulations, external validations, and analysis of 65 traits, we demonstrate that PUMAS can perform various model-tuning procedures using GWAS summary statistics and effectively benchmark and optimize PRS models under diverse genetic architecture. Furthermore, we show that fine-tuned PRSs will significantly improve statistical power in downstream association analysis.
Collapse
Affiliation(s)
- Zijie Zhao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53703 USA
| | - Yanyao Yi
- Department of Statistics, University of Wisconsin-Madison, Madison, WI USA
| | - Jie Song
- Department of Statistics, University of Wisconsin-Madison, Madison, WI USA
| | - Yuchang Wu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53703 USA
| | | | - Yupei Lin
- University of Wisconsin-Madison, Madison, WI USA
| | - Timothy J. Hohman
- Vanderbilt Memory and Alzheimer’s Center, Vanderbilt University Medical Center, Vanderbilt University School of Medicine, Nashville, TN USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN USA
| | - Jason Fletcher
- La Follette School of Public Affairs, University of Wisconsin-Madison, Madison, WI USA
- Department of Sociology, University of Wisconsin-Madison, Madison, WI USA
- Center for Demography of Health and Aging, University of Wisconsin-Madison, Madison, WI USA
| | - Qiongshi Lu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53703 USA
- Department of Statistics, University of Wisconsin-Madison, Madison, WI USA
- Center for Demography of Health and Aging, University of Wisconsin-Madison, Madison, WI USA
| |
Collapse
|
66
|
Zhou G, Zhao H. A fast and robust Bayesian nonparametric method for prediction of complex traits using summary statistics. PLoS Genet 2021; 17:e1009697. [PMID: 34310601 PMCID: PMC8341714 DOI: 10.1371/journal.pgen.1009697] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 08/05/2021] [Accepted: 07/05/2021] [Indexed: 12/27/2022] Open
Abstract
Genetic prediction of complex traits has great promise for disease prevention, monitoring, and treatment. The development of accurate risk prediction models is hindered by the wide diversity of genetic architecture across different traits, limited access to individual level data for training and parameter tuning, and the demand for computational resources. To overcome the limitations of the most existing methods that make explicit assumptions on the underlying genetic architecture and need a separate validation data set for parameter tuning, we develop a summary statistics-based nonparametric method that does not rely on validation datasets to tune parameters. In our implementation, we refine the commonly used likelihood assumption to deal with the discrepancy between summary statistics and external reference panel. We also leverage the block structure of the reference linkage disequilibrium matrix for implementation of a parallel algorithm. Through simulations and applications to twelve traits, we show that our method is adaptive to different genetic architectures, statistically robust, and computationally efficient. Our method is available at https://github.com/eldronzhou/SDPR.
Collapse
Affiliation(s)
- Geyu Zhou
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
67
|
Shan N, Xie Y, Song S, Jiang W, Wang Z, Hou L. A novel transcriptional risk score for risk prediction of complex human diseases. Genet Epidemiol 2021; 45:811-820. [PMID: 34245595 DOI: 10.1002/gepi.22424] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 06/08/2021] [Accepted: 06/24/2021] [Indexed: 11/06/2022]
Abstract
Recently polygenetic risk score (PRS) has been successfully used in the risk prediction of complex human diseases. Many studies incorporated internal information, such as effect size distribution, or external information, such as linkage disequilibrium, functional annotation, and pleiotropy among multiple diseases, to optimize the performance of PRS. To leverage on multiomics datasets, we developed a novel flexible transcriptional risk score (TRS), in which messenger RNA expression levels were imputed and weighted for risk prediction. In simulation studies, we demonstrated that single-tissue TRS has greater prediction power than LDpred, especially when there is a large effect of gene expression on the phenotype. Multitissue TRS improves prediction accuracy when there are multiple tissues with independent contributions to disease risk. We applied our method to complex traits, including Crohn's disease, type 2 diabetes, and so on. The single-tissue TRS method outperformed LDpred and AnnoPred across the tested traits. The performance of multitissue TRS is trait-dependent. Moreover, our method can easily incorporate information from epigenomic and proteomic data upon the availability of reference datasets.
Collapse
Affiliation(s)
- Nayang Shan
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Yuhan Xie
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Shuang Song
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Wei Jiang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Lin Hou
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China.,MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
| |
Collapse
|
68
|
Zhang Q, Privé F, Vilhjálmsson B, Speed D. Improved genetic prediction of complex traits from individual-level data or summary statistics. Nat Commun 2021; 12:4192. [PMID: 34234142 PMCID: PMC8263809 DOI: 10.1038/s41467-021-24485-y] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 06/17/2021] [Indexed: 02/06/2023] Open
Abstract
Most existing tools for constructing genetic prediction models begin with the assumption that all genetic variants contribute equally towards the phenotype. However, this represents a suboptimal model for how heritability is distributed across the genome. Therefore, we develop prediction tools that allow the user to specify the heritability model. We compare individual-level data prediction tools using 14 UK Biobank phenotypes; our new tool LDAK-Bolt-Predict outperforms the existing tools Lasso, BLUP, Bolt-LMM and BayesR for all 14 phenotypes. We compare summary statistic prediction tools using 225 UK Biobank phenotypes; our new tool LDAK-BayesR-SS outperforms the existing tools lassosum, sBLUP, LDpred and SBayesR for 223 of the 225 phenotypes. When we improve the heritability model, the proportion of phenotypic variance explained increases by on average 14%, which is equivalent to increasing the sample size by a quarter.
Collapse
Affiliation(s)
- Qianqian Zhang
- Bioinformatics Research Centre (BiRC), Aarhus University, Aarhus, Denmark
| | - Florian Privé
- National Center for Register-Based Research (NCRR), Department of Economics and Business Economics, Aarhus University, Aarhus, Denmark
| | - Bjarni Vilhjálmsson
- Bioinformatics Research Centre (BiRC), Aarhus University, Aarhus, Denmark
- National Center for Register-Based Research (NCRR), Department of Economics and Business Economics, Aarhus University, Aarhus, Denmark
| | - Doug Speed
- Bioinformatics Research Centre (BiRC), Aarhus University, Aarhus, Denmark.
- Quantitative Genetics and Genomics (QGG), Aarhus University, Aarhus, Denmark.
- Aarhus Institute of Advanced Studies (AIAS), Aarhus University, Aarhus, Denmark.
| |
Collapse
|
69
|
Ji Y, Long J, Kweon SS, Kang D, Kubo M, Park B, Shu XO, Zheng W, Tao R, Li B. Incorporating European GWAS findings improve polygenic risk prediction accuracy of breast cancer among East Asians. Genet Epidemiol 2021; 45:471-484. [PMID: 33739539 PMCID: PMC8372543 DOI: 10.1002/gepi.22382] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Revised: 01/15/2021] [Accepted: 02/08/2021] [Indexed: 12/23/2022]
Abstract
Previous genome-wide association studies (GWASs) have been largely focused on European (EUR) populations. However, polygenic risk scores (PRSs) derived from EUR have been shown to perform worse in non-EURs compared with EURs. In this study, we aim to improve PRS prediction in East Asians (EASs). We introduce a rescaled meta-analysis framework to combine both EUR (N = 122,175) and EAS (N = 30,801) GWAS summary statistics. To improve PRS prediction in EASs, we use a scaling factor to up-weight the EAS data, such that the resulting effect size estimates are more relevant to EASs. We then derive PRSs for EAS from the rescaled meta-analysis results of EAS and EUR data. Evaluated in an independent EAS validation data set, this approach increases the prediction liability-adjusted Nagelkerke's pseudo R2 by 40%, 41%, and 5%, respectively, compared with PRSs derived from an EAS GWAS only, EUR GWAS only, and conventional fixed-effects meta-analysis of EAS and EUR data. The PRS derived from the rescaled meta-analysis approach achieved an area under the receiver operating characteristic curve (AUC) of 0.6059, higher than AUC = 0.5782, 0.5809, 0.6008 for EAS, EUR, and conventional meta-analysis of EAS and EUR. We further compare PRSs constructed by single-nucleotide polymorphisms that have different linkage disequilibrium (LD) scores and minor allele frequencies (MAFs) between EUR and EAS, and observe that lower LD scores or MAF in EAS correspond to poorer PRS performance (AUC = 0.5677, 0.5530, respectively) than higher LD scores or MAF (AUC = 0.589, 0.5993, respectively). We finally build a PRS stratified by LD score differences in EUR and EAS using rescaled meta-analysis, and obtain an AUC of 0.6096, with improvement over other strategies investigated.
Collapse
Affiliation(s)
- Ying Ji
- Vanderbilt Genetics Institute, Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, TN, USA
| | - Jirong Long
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Sun-Seog Kweon
- Department of Preventive Medicine, Chonnam National University Medical School, Hwasun, Korea
- Jeonnam Regional Cancer Center, Chonnam National University Hwasun Hospital, Hwasun, Korea
| | - Daehee Kang
- Department of Preventive Medicine, Seoul National University College of Medicine, Seoul, Korea
- Cancer Research Institute, Seoul National University College of Medicine, Seoul, Korea
- Department of Biomedical Sciences, Seoul National University Graduate School, Seoul, Korea
- Institute of Environmental Medicine, Seoul National University Medical Research Center, Seoul, Korea
| | - Michiaki Kubo
- RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Boyoung Park
- Department of Medicine, Hanyang University College of Medicine, Seoul, Korea
| | - Xiao-Ou Shu
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Wei Zheng
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Ran Tao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Bingshan Li
- Vanderbilt Genetics Institute, Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
70
|
Tang H, He Z. Advances and challenges in quantitative delineation of the genetic architecture of complex traits. QUANTITATIVE BIOLOGY 2021; 9:168-184. [PMID: 35492964 PMCID: PMC9053444 DOI: 10.15302/j-qb-021-0249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Background Genome-wide association studies (GWAS) have been widely adopted in studies of human complex traits and diseases. Results This review surveys areas of active research: quantifying and partitioning trait heritability, fine mapping functional variants and integrative analysis, genetic risk prediction of phenotypes, and the analysis of sequencing studies that have identified millions of rare variants. Current challenges and opportunities are highlighted. Conclusion GWAS have fundamentally transformed the field of human complex trait genetics. Novel statistical and computational methods have expanded the scope of GWAS and have provided valuable insights on the genetic architecture underlying complex phenotypes.
Collapse
Affiliation(s)
- Hua Tang
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
71
|
Pain O, Glanville KP, Hagenaars S, Selzam S, Fürtjes A, Coleman JRI, Rimfeld K, Breen G, Folkersen L, Lewis CM. Imputed gene expression risk scores: a functionally informed component of polygenic risk. Hum Mol Genet 2021; 30:727-738. [PMID: 33611520 PMCID: PMC8127405 DOI: 10.1093/hmg/ddab053] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 02/08/2021] [Accepted: 02/15/2021] [Indexed: 11/12/2022] Open
Abstract
Integration of functional genomic annotations when estimating polygenic risk scores (PRS) can provide insight into aetiology and improve risk prediction. This study explores the predictive utility of gene expression risk scores (GeRS), calculated using imputed gene expression and transcriptome-wide association study (TWAS) results. The predictive utility of GeRS was evaluated using 12 neuropsychiatric and anthropometric outcomes measured in two target samples: UK Biobank and the Twins Early Development Study. GeRS were calculated based on imputed gene expression levels and TWAS results, using 53 gene expression-genotype panels, termed single nucleotide polymorphism (SNP)-weight sets, capturing expression across a range of tissues. We compare the predictive utility of elastic net models containing GeRS within and across SNP-weight sets, and models containing both GeRS and PRS. We estimate the proportion of SNP-based heritability attributable to cis-regulated gene expression. GeRS significantly predicted a range of outcomes, with elastic net models combining GeRS across SNP-weight sets improving prediction. GeRS were less predictive than PRS, but models combining GeRS and PRS improved prediction for several outcomes, with relative improvements ranging from 0.3% for height (P = 0.023) to 4% for rheumatoid arthritis (P = 5.9 × 10-8). The proportion of SNP-based heritability attributable to cis-regulated expression was modest for most outcomes, even when restricting GeRS to colocalized genes. GeRS represent a component of PRS and could be useful for functional stratification of genetic risk. Only in specific circumstances can GeRS substantially improve prediction over PRS alone. Future research considering functional genomic annotations when estimating genetic risk is warranted.
Collapse
Affiliation(s)
- Oliver Pain
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London SE5 8AF, UK
- NIHR Maudsley Biomedical Research Centre, South London and Maudsley NHS Trust, London SE5 8AF, UK
| | - Kylie P Glanville
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London SE5 8AF, UK
| | - Saskia Hagenaars
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London SE5 8AF, UK
| | - Saskia Selzam
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London SE5 8AF, UK
| | - Anna Fürtjes
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London SE5 8AF, UK
| | - Jonathan R I Coleman
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London SE5 8AF, UK
| | - Kaili Rimfeld
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London SE5 8AF, UK
| | - Gerome Breen
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London SE5 8AF, UK
- NIHR Maudsley Biomedical Research Centre, South London and Maudsley NHS Trust, London SE5 8AF, UK
| | - Lasse Folkersen
- Institute of Biological Psychiatry, Sankt Hans Hospital, Copenhagen 4000 Roskilde, Denmark
| | - Cathryn M Lewis
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London SE5 8AF, UK
- NIHR Maudsley Biomedical Research Centre, South London and Maudsley NHS Trust, London SE5 8AF, UK
- Department of Medical and Molecular Genetics, Faculty of Life Sciences and Medicine, King’s College London, London WC2R 2LS, UK
| |
Collapse
|
72
|
Majumdar A, Giambartolomei C, Cai N, Haldar T, Schwarz T, Gandal M, Flint J, Pasaniuc B. Leveraging eQTLs to identify individual-level tissue of interest for a complex trait. PLoS Comput Biol 2021; 17:e1008915. [PMID: 34019542 PMCID: PMC8174686 DOI: 10.1371/journal.pcbi.1008915] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2020] [Revised: 06/03/2021] [Accepted: 03/26/2021] [Indexed: 12/26/2022] Open
Abstract
Genetic predisposition for complex traits often acts through multiple tissues at different time points during development. As a simple example, the genetic predisposition for obesity could be manifested either through inherited variants that control metabolism through regulation of genes expressed in the brain, or that control fat storage through dysregulation of genes expressed in adipose tissue, or both. Here we describe a statistical approach that leverages tissue-specific expression quantitative trait loci (eQTLs) corresponding to tissue-specific genes to prioritize a relevant tissue underlying the genetic predisposition of a given individual for a complex trait. Unlike existing approaches that prioritize relevant tissues for the trait in the population, our approach probabilistically quantifies the tissue-wise genetic contribution to the trait for a given individual. We hypothesize that for a subgroup of individuals the genetic contribution to the trait can be mediated primarily through a specific tissue. Through simulations using the UK Biobank, we show that our approach can predict the relevant tissue accurately and can cluster individuals according to their tissue-specific genetic architecture. We analyze body mass index (BMI) and waist to hip ratio adjusted for BMI (WHRadjBMI) in the UK Biobank to identify subgroups of individuals whose genetic predisposition act primarily through brain versus adipose tissue, and adipose versus muscle tissue, respectively. Notably, we find that these individuals have specific phenotypic features beyond BMI and WHRadjBMI that distinguish them from random individuals in the data, suggesting biological effects of tissue-specific genetic contribution for these traits.
Collapse
Affiliation(s)
- Arunabha Majumdar
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America
- Department of Mathematics, Indian Institute of Technology Hyderabad, Kandi, Telangana, India
| | - Claudia Giambartolomei
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America
| | - Na Cai
- Wellcome Sanger Institute, Wellcome genome campus, Hinxton, United Kingdom
- European Bioinformatics Institute (EMBL-EBI), Wellcome genome campus, Hinxton, United Kingdom
| | - Tanushree Haldar
- Institute for Human Genetics, University of California, San Francisco, California, United States of America
| | - Tommer Schwarz
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, California, United States of America
| | - Michael Gandal
- Program in Neurobehavioral Genetics, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America
| | - Jonathan Flint
- Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America
| | - Bogdan Pasaniuc
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, California, United States of America
| |
Collapse
|
73
|
Cai M, Xiao J, Zhang S, Wan X, Zhao H, Chen G, Yang C. A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits. Am J Hum Genet 2021; 108:632-655. [PMID: 33770506 PMCID: PMC8059341 DOI: 10.1016/j.ajhg.2021.03.002] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Accepted: 03/01/2021] [Indexed: 12/29/2022] Open
Abstract
The development of polygenic risk scores (PRSs) has proved useful to stratify the general European population into different risk groups. However, PRSs are less accurate in non-European populations due to genetic differences across different populations. To improve the prediction accuracy in non-European populations, we propose a cross-population analysis framework for PRS construction with both individual-level (XPA) and summary-level (XPASS) GWAS data. By leveraging trans-ancestry genetic correlation, our methods can borrow information from the Biobank-scale European population data to improve risk prediction in the non-European populations. Our framework can also incorporate population-specific effects to further improve construction of PRS. With innovations in data structure and algorithm design, our methods provide a substantial saving in computational time and memory usage. Through comprehensive simulation studies, we show that our framework provides accurate, efficient, and robust PRS construction across a range of genetic architectures. In a Chinese cohort, our methods achieved 7.3%-198.0% accuracy gain for height and 19.5%-313.3% accuracy gain for body mass index (BMI) in terms of predictive R2 compared to existing PRS approaches. We also show that XPA and XPASS can achieve substantial improvement for construction of height PRSs in the African population, suggesting the generality of our framework across global populations.
Collapse
Affiliation(s)
- Mingxuan Cai
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Jiashun Xiao
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Shunkang Zhang
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Xiang Wan
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China
| | - Hongyu Zhao
- SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai 201111, China; Department of Biostatistics, Yale School of Public Health, New Haven, CT 06510, USA
| | - Gang Chen
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China.
| | - Can Yang
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
| |
Collapse
|
74
|
Shi H, Gazal S, Kanai M, Koch EM, Schoech AP, Siewert KM, Kim SS, Luo Y, Amariuta T, Huang H, Okada Y, Raychaudhuri S, Sunyaev SR, Price AL. Population-specific causal disease effect sizes in functionally important regions impacted by selection. Nat Commun 2021; 12:1098. [PMID: 33597505 PMCID: PMC7889654 DOI: 10.1038/s41467-021-21286-1] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Accepted: 01/15/2021] [Indexed: 01/31/2023] Open
Abstract
Many diseases exhibit population-specific causal effect sizes with trans-ethnic genetic correlations significantly less than 1, limiting trans-ethnic polygenic risk prediction. We develop a new method, S-LDXR, for stratifying squared trans-ethnic genetic correlation across genomic annotations, and apply S-LDXR to genome-wide summary statistics for 31 diseases and complex traits in East Asians (average N = 90K) and Europeans (average N = 267K) with an average trans-ethnic genetic correlation of 0.85. We determine that squared trans-ethnic genetic correlation is 0.82× (s.e. 0.01) depleted in the top quintile of background selection statistic, implying more population-specific causal effect sizes. Accordingly, causal effect sizes are more population-specific in functionally important regions, including conserved and regulatory regions. In regions surrounding specifically expressed genes, causal effect sizes are most population-specific for skin and immune genes, and least population-specific for brain genes. Our results could potentially be explained by stronger gene-environment interaction at loci impacted by selection, particularly positive selection.
Collapse
Affiliation(s)
- Huwenbo Shi
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Masahiro Kanai
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Evan M Koch
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Armin P Schoech
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Katherine M Siewert
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Samuel S Kim
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Yang Luo
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Tiffany Amariuta
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Graduate School of Arts and Sciences, Harvard University, Cambridge, MA, USA
| | - Hailiang Huang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan
| | - Soumya Raychaudhuri
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Arthritis Research UK Centre for Genetics and Genomics, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, The University of Manchester, Manchester, UK
| | - Shamil R Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
75
|
Abstract
Attention-deficit/hyperactivity disorder (ADHD) is a highly heritable neurodevelopmental disorder that is known to have a polygenic (i.e., many genes of individually small effects) architecture. Polygenic scores (PGS), which characterize this polygenicity as a single score for a given individual, are considered the state-of-the-art in psychiatric genetics research. Despite the proliferation of ADHD studies adopting this approach and its clinical implications, remarkably little is known about the predictive utility of PGS in ADHD research to date, given that there have not yet been any systematic or meta-analytic reviews of this rapidly developing literature. We meta-analyzed 12 unique effect sizes from ADHD PGS studies, yielding an N = 40,088. These studies, which included a mixture of large population-based cohorts and case-control samples of predominantly European ancestry, yielded a pooled ADHD PGS effect size of rrandom = 0.201 (95% CI = [0.144, 0.288]) and an rfixed = 0.190 (95% CI = [0.180, 0.199]) in predicting ADHD. In other words, ADHD PGS reliably account for between 3.6% (in the fixed effects model) to 4.0% (in the random effects model) of the variance in broadly defined phenotypic ADHD. Findings provide important insights into the genetics of psychiatric outcomes and raise several key questions about the impact of PGS on psychiatric research moving forward. Our review concludes by providing recommendations for future research directions in the use of PGS, including new methods to account for comorbidities, integrating bioinformatics to elucidate biological pathways, and leveraging PGS to test mechanistic models of ADHD.
Collapse
Affiliation(s)
- James J Li
- Department of Psychology, University of Wisconsin, Madison, WI, USA.
- Waisman Center, University of Wisconsin, WI, Madison, USA.
- Center for Demography of Health and Aging, University of Wisconsin, WI, Madison, USA.
| | - Quanfa He
- Department of Psychology, University of Wisconsin, Madison, WI, USA
- Waisman Center, University of Wisconsin, WI, Madison, USA
| |
Collapse
|
76
|
Ye Y, Chen X, Han J, Jiang W, Natarajan P, Zhao H. Interactions Between Enhanced Polygenic Risk Scores and Lifestyle for Cardiovascular Disease, Diabetes, and Lipid Levels. CIRCULATION-GENOMIC AND PRECISION MEDICINE 2021; 14:e003128. [PMID: 33433237 DOI: 10.1161/circgen.120.003128] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
BACKGROUND Both lifestyle and genetic factors confer risk for cardiovascular diseases, type 2 diabetes, and dyslipidemia. However, the interactions between these 2 groups of risk factors were not comprehensively understood due to previous poor estimation of genetic risk. Here we set out to develop enhanced polygenic risk scores (PRS) and systematically investigate multiplicative and additive interactions between PRS and lifestyle for coronary artery disease, atrial fibrillation, type 2 diabetes, total cholesterol, triglyceride, and LDL-cholesterol. METHODS Our study included 276 096 unrelated White British participants from the UK Biobank. We investigated several PRS methods (P+T, LDpred, PRS continuous shrinkage, and AnnoPred) and showed that AnnoPred achieved consistently improved prediction accuracy for all 6 diseases/traits. With enhanced PRS and combined lifestyle status categorized by smoking, body mass index, physical activity, and diet, we investigated both multiplicative and additive interactions between PRS and lifestyle using regression models. RESULTS We observed that healthy lifestyle reduced disease incidence by similar multiplicative magnitude across different PRS groups. The absolute risk reduction from lifestyle adherence was, however, significantly greater in individuals with higher PRS. Specifically, for type 2 diabetes, the absolute risk reduction from lifestyle adherence was 12.4% (95% CI, 10.0%-14.9%) in the top 1% PRS versus 2.8% (95% CI, 2.3%-3.3%) in the bottom PRS decile, leading to a ratio of >4.4. We also observed a significant interaction effect between PRS and lifestyle on triglyceride level. CONCLUSIONS By leveraging functional annotations, AnnoPred outperforms state-of-the-art methods on quantifying genetic risk through PRS. Our analyses based on enhanced PRS suggest that individuals with high genetic risk may derive similar relative but greater absolute benefit from lifestyle adherence.
Collapse
Affiliation(s)
- Yixuan Ye
- Program of Computational Biology and Bioinformatics (Y.Y., H.Z.), Yale University
| | - Xi Chen
- Department of Statistics and Data Science (X.C., J.H.), Yale University.,Department of Molecular Biophysics and Biochemistry (X.C., J.H.), Yale University
| | - James Han
- Department of Statistics and Data Science (X.C., J.H.), Yale University.,Department of Molecular Biophysics and Biochemistry (X.C., J.H.), Yale University
| | - Wei Jiang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT (W.J., H.Z.)
| | - Pradeep Natarajan
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston (P.N.).,Program in Medical and Population Genetics, Broad Institute of Harvard & Massachusetts Institute of Technology, Cambridge, MA (P.N.)
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics (Y.Y., H.Z.), Yale University.,Department of Biostatistics, Yale School of Public Health, New Haven, CT (W.J., H.Z.)
| |
Collapse
|
77
|
Kim SS, Dey KK, Weissbrod O, Márquez-Luna C, Gazal S, Price AL. Improving the informativeness of Mendelian disease-derived pathogenicity scores for common disease. Nat Commun 2020; 11:6258. [PMID: 33288751 PMCID: PMC7721881 DOI: 10.1038/s41467-020-20087-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 11/09/2020] [Indexed: 02/08/2023] Open
Abstract
Despite considerable progress on pathogenicity scores prioritizing variants for Mendelian disease, little is known about the utility of these scores for common disease. Here, we assess the informativeness of Mendelian disease-derived pathogenicity scores for common disease and improve upon existing scores. We first apply stratified linkage disequilibrium (LD) score regression to evaluate published pathogenicity scores across 41 common diseases and complex traits (average N = 320K). Several of the resulting annotations are informative for common disease, even after conditioning on a broad set of functional annotations. We then improve upon published pathogenicity scores by developing AnnotBoost, a machine learning framework to impute and denoise pathogenicity scores using a broad set of functional annotations. AnnotBoost substantially increases the informativeness for common disease of both previously uninformative and previously informative pathogenicity scores, implying that Mendelian and common disease variants share similar properties. The boosted scores also produce improvements in heritability model fit and in classifying disease-associated, fine-mapped SNPs. Our boosted scores may improve fine-mapping and candidate gene discovery for common disease.
Collapse
Affiliation(s)
- Samuel S Kim
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA.
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
| | - Kushal K Dey
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Omer Weissbrod
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Carla Márquez-Luna
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
| |
Collapse
|
78
|
Amariuta T, Ishigaki K, Sugishita H, Ohta T, Koido M, Dey KK, Matsuda K, Murakami Y, Price AL, Kawakami E, Terao C, Raychaudhuri S. Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements. Nat Genet 2020; 52:1346-1354. [PMID: 33257898 PMCID: PMC8049522 DOI: 10.1038/s41588-020-00740-8] [Citation(s) in RCA: 94] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Accepted: 10/19/2020] [Indexed: 12/15/2022]
Abstract
Poor trans-ancestry portability of polygenic risk scores is a consequence of Eurocentric genetic studies and limited knowledge of shared causal variants. Leveraging regulatory annotations may improve portability by prioritizing functional over tagging variants. We constructed a resource of 707 cell-type-specific IMPACT regulatory annotations by aggregating 5,345 epigenetic datasets to predict binding patterns of 142 transcription factors across 245 cell types. We then partitioned the common SNP heritability of 111 genome-wide association study summary statistics of European (average n ≈ 189,000) and East Asian (average n ≈ 157,000) origin. IMPACT annotations captured consistent SNP heritability between populations, suggesting prioritization of shared functional variants. Variant prioritization using IMPACT resulted in increased trans-ancestry portability of polygenic risk scores from Europeans to East Asians across all 21 phenotypes analyzed (49.9% mean relative increase in R2). Our study identifies a crucial role for functional annotations such as IMPACT to improve the trans-ancestry portability of genetic data.
Collapse
Affiliation(s)
- Tiffany Amariuta
- Center for Data Sciences, Harvard Medical School, Boston, MA, USA
- Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Graduate School of Arts and Sciences, Harvard University, Cambridge, MA, USA
| | - Kazuyoshi Ishigaki
- Center for Data Sciences, Harvard Medical School, Boston, MA, USA
- Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Kanagawa, Japan
| | - Hiroki Sugishita
- Laboratory for Developmental Genetics, RIKEN Center for Integrative Medical Sciences (IMS), Kanagawa, Japan
| | - Tazro Ohta
- Medical Sciences Innovation Hub Program, RIKEN, Kanagawa, Japan
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Shizuoka, Japan
| | - Masaru Koido
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Kanagawa, Japan
- Division of Molecular Pathology, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Kushal K Dey
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Koichi Matsuda
- Laboratory of Genome Technology, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
- Laboratory of Clinical Genome Sequencing, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Yoshinori Murakami
- Division of Molecular Pathology, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Alkes L Price
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Eiryo Kawakami
- Medical Sciences Innovation Hub Program, RIKEN, Kanagawa, Japan
- Artificial Intelligence Medicine, Graduate School of Medicine, Chiba University, Chiba, Japan
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Kanagawa, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| | - Soumya Raychaudhuri
- Center for Data Sciences, Harvard Medical School, Boston, MA, USA.
- Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, The University of Manchester, Manchester, UK.
| |
Collapse
|
79
|
Chen TH, Chatterjee N, Landi MT, Shi J. A penalized regression framework for building polygenic risk models based on summary statistics from genome-wide association studies and incorporating external information. J Am Stat Assoc 2020; 116:133-143. [PMID: 34483403 DOI: 10.1080/01621459.2020.1764849] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Large-scale genome-wide association (GWAS) studies provide opportunities for developing genetic risk prediction models that have the potential to improve disease prevention, intervention or treatment. The key step is to develop polygenic risk score (PRS) models with high predictive performance for a given disease, which typically requires a large training data set for selecting truly associated single nucleotide polymorphisms (SNPs) and estimating effect sizes accurately. Here, we develop a comprehensive penalized regression for fitting l 1 regularized regression models to GWAS summary statistics. We propose incorporating Pleiotropy and ANnotation information into PRS (PANPRS) development through suitable formulation of penalty functions and associated tuning parameters. Extensive simulations show that PANPRS performs equally well or better than existing PRS methods when no functional annotation or pleiotropy is incorporated. When functional annotation data and pleiotropy are informative, PANPRS substantially outperforms existing PRS methods in simulations. Finally, we applied our methods to build PRS for type 2 diabetes and melanoma and found that incorporating relevant functional annotations and GWAS of genetically related traits improved prediction of these two complex diseases.
Collapse
Affiliation(s)
- Ting-Huei Chen
- Department of Mathematics and Statistics, Regular member, Cervo Brain Research Centre, University of Laval, 1045, av. of Medicine, Suite 1056, Quebec G1V 0A6, Canada
| | - Nilanjan Chatterjee
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University Baltimore, Maryland, United States of America, 615 N Wolfe Street Baltimore, MD 21205
| | - Maria Teresa Landi
- Integrative Tumor Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Maryland, United States of America, 9609 Medical Center Drive, RM 7E106, Bethesda, MD, 20892
| | - Jianxin Shi
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Maryland, United States of America, 9609 Medical Center Drive, RM 7E122, Bethesda, MD, 20892
| |
Collapse
|
80
|
Holmans PA. Using Genetics to Increase Specificity of Outcome Prediction in Psychiatric Disorders: Prospects for Progression. Am J Psychiatry 2020; 177:884-887. [PMID: 32998554 DOI: 10.1176/appi.ajp.2020.20081181] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Affiliation(s)
- Peter A Holmans
- Division of Psychological Medicine and Clinical Neurosciences, Cardiff University, United Kingdom
| |
Collapse
|
81
|
Zhao B, Ibrahim JG, Li Y, Li T, Wang Y, Shan Y, Zhu Z, Zhou F, Zhang J, Huang C, Liao H, Yang L, Thompson PM, Zhu H. Heritability of Regional Brain Volumes in Large-Scale Neuroimaging and Genetic Studies. Cereb Cortex 2020; 29:2904-2914. [PMID: 30010813 DOI: 10.1093/cercor/bhy157] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2018] [Revised: 06/11/2018] [Indexed: 12/20/2022] Open
Abstract
Brain genetics is an active research area. The degree to which genetic variants impact variations in brain structure and function remains largely unknown. We examined the heritability of regional brain volumes (P ~ 100) captured by single-nucleotide polymorphisms (SNPs) in UK Biobank (n ~ 9000). We found that regional brain volumes are highly heritable in this study population and common genetic variants can explain up to 80% of their variabilities (median heritability 34.8%). We observed omnigenic impact across the genome and examined the enrichment of SNPs in active chromatin regions. Principal components derived from regional volume data are also highly heritable, but the amount of variance in brain volume explained by the component did not seem to be related to its heritability. Heritability estimates vary substantially across large-scale functional networks, exhibit a symmetric pattern across left and right hemispheres, and are consistent in females and males (correlation = 0.638). We repeated the main analysis in Alzheimer's Disease Neuroimaging Initiative (n ~ 1100), Philadelphia Neurodevelopmental Cohort (n ~ 600), and Pediatric Imaging, Neurocognition, and Genetics (n ~ 500) datasets, which demonstrated that more stable estimates can be obtained from the UK Biobank.
Collapse
Affiliation(s)
- Bingxin Zhao
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Joseph G Ibrahim
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Tengfei Li
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Yue Wang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Yue Shan
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Ziliang Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Fan Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jingwen Zhang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Chao Huang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Huiling Liao
- Department of Statistics, Texas A&M University, College Station, TX, USA
| | - Liuqing Yang
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Paul M Thompson
- Imaging Genetics Center, Mark and Mary Stevens Institute for Neuroimaging & Informatics, University of Southern California, Los Angeles, CA, USA
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
82
|
Thomas M, Sakoda LC, Hoffmeister M, Rosenthal EA, Lee JK, van Duijnhoven FJB, Platz EA, Wu AH, Dampier CH, de la Chapelle A, Wolk A, Joshi AD, Burnett-Hartman A, Gsur A, Lindblom A, Castells A, Win AK, Namjou B, Van Guelpen B, Tangen CM, He Q, Li CI, Schafmayer C, Joshu CE, Ulrich CM, Bishop DT, Buchanan DD, Schaid D, Drew DA, Muller DC, Duggan D, Crosslin DR, Albanes D, Giovannucci EL, Larson E, Qu F, Mentch F, Giles GG, Hakonarson H, Hampel H, Stanaway IB, Figueiredo JC, Huyghe JR, Minnier J, Chang-Claude J, Hampe J, Harley JB, Visvanathan K, Curtis KR, Offit K, Li L, Le Marchand L, Vodickova L, Gunter MJ, Jenkins MA, Slattery ML, Lemire M, Woods MO, Song M, Murphy N, Lindor NM, Dikilitas O, Pharoah PDP, Campbell PT, Newcomb PA, Milne RL, MacInnis RJ, Castellví-Bel S, Ogino S, Berndt SI, Bézieau S, Thibodeau SN, Gallinger SJ, Zaidi SH, Harrison TA, Keku TO, Hudson TJ, Vymetalkova V, Moreno V, Martín V, Arndt V, Wei WQ, Chung W, Su YR, Hayes RB, White E, Vodicka P, Casey G, Gruber SB, Schoen RE, Chan AT, Potter JD, Brenner H, Jarvik GP, Corley DA, Peters U, Hsu L. Genome-wide Modeling of Polygenic Risk Score in Colorectal Cancer Risk. Am J Hum Genet 2020; 107:432-444. [PMID: 32758450 PMCID: PMC7477007 DOI: 10.1016/j.ajhg.2020.07.006] [Citation(s) in RCA: 101] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Accepted: 07/13/2020] [Indexed: 02/08/2023] Open
Abstract
Accurate colorectal cancer (CRC) risk prediction models are critical for identifying individuals at low and high risk of developing CRC, as they can then be offered targeted screening and interventions to address their risks of developing disease (if they are in a high-risk group) and avoid unnecessary screening and interventions (if they are in a low-risk group). As it is likely that thousands of genetic variants contribute to CRC risk, it is clinically important to investigate whether these genetic variants can be used jointly for CRC risk prediction. In this paper, we derived and compared different approaches to generating predictive polygenic risk scores (PRS) from genome-wide association studies (GWASs) including 55,105 CRC-affected case subjects and 65,079 control subjects of European ancestry. We built the PRS in three ways, using (1) 140 previously identified and validated CRC loci; (2) SNP selection based on linkage disequilibrium (LD) clumping followed by machine-learning approaches; and (3) LDpred, a Bayesian approach for genome-wide risk prediction. We tested the PRS in an independent cohort of 101,987 individuals with 1,699 CRC-affected case subjects. The discriminatory accuracy, calculated by the age- and sex-adjusted area under the receiver operating characteristics curve (AUC), was highest for the LDpred-derived PRS (AUC = 0.654) including nearly 1.2 M genetic variants (the proportion of causal genetic variants for CRC assumed to be 0.003), whereas the PRS of the 140 known variants identified from GWASs had the lowest AUC (AUC = 0.629). Based on the LDpred-derived PRS, we are able to identify 30% of individuals without a family history as having risk for CRC similar to those with a family history of CRC, whereas the PRS based on known GWAS variants identified only top 10% as having a similar relative risk. About 90% of these individuals have no family history and would have been considered average risk under current screening guidelines, but might benefit from earlier screening. The developed PRS offers a way for risk-stratified CRC screening and other targeted interventions.
Collapse
Affiliation(s)
- Minta Thomas
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Lori C Sakoda
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Division of Research, Kaiser Permanente Northern California, Oakland, CA 94612, USA
| | - Michael Hoffmeister
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg 69120, Germany
| | - Elisabeth A Rosenthal
- Department of Medicine (Medical Genetics), University of Washington Medical Center, Seattle, WA 98195, USA
| | - Jeffrey K Lee
- Division of Research, Kaiser Permanente Northern California, Oakland, CA 94612, USA
| | - Franzel J B van Duijnhoven
- Division of Human Nutrition and Health, Wageningen University & Research, Wageningen 176700, the Netherlands
| | - Elizabeth A Platz
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, and the Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Baltimore, MD 21287, USA
| | - Anna H Wu
- University of Southern California, Preventative Medicine, Los Angeles, CA 90089, USA
| | - Christopher H Dampier
- Department of Surgery, University of Virginia Health System, Charlottesville, VA 22903, USA
| | - Albert de la Chapelle
- Department of Cancer Biology and Genetics and the Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Alicja Wolk
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm 17177, Sweden
| | - Amit D Joshi
- Clinical and Translational Epidemiology Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | | | - Andrea Gsur
- Institute of Cancer Research, Department of Medicine I, Medical University Vienna, Vienna 1090, Austria
| | - Annika Lindblom
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm 17177, Sweden; Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm 17177, Sweden
| | - Antoni Castells
- Gastroenterology Department, Hospital Clínic, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), University of Barcelona, Barcelona 08007, Spain
| | - Aung Ko Win
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC 3000, Australia
| | - Bahram Namjou
- Center for Autoimmune Genomics and Etiology (CAGE), Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA; Cincinnati VA Medical Center, Cincinnati, OH 45229, USA
| | - Bethany Van Guelpen
- Department of Radiation Sciences, Oncology Unit, Umeå University, Umeå 90187, Sweden; Wallenberg Centre for Molecular Medicine, Umeå University, Umeå 90187, Sweden
| | - Catherine M Tangen
- SWOG Statistical Center, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Qianchuan He
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Christopher I Li
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Clemens Schafmayer
- Department of General Surgery, University Hospital Rostock, Rostock 18051, Germany
| | - Corinne E Joshu
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, and the Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Baltimore, MD 21287, USA
| | - Cornelia M Ulrich
- Huntsman Cancer Institute and Department of Population Health Sciences, University of Utah, Salt Lake City, UT 84112, USA
| | - D Timothy Bishop
- Leeds Institute of Cancer and Pathology, University of Leeds, Leeds LS2 9JT, UK
| | - Daniel D Buchanan
- University of Melbourne Centre for Cancer Research, Victorian Comprehensive Cancer Centre, Parkville, VIC 3010, Australia; Colorectal Oncogenomics Group, Department of Clinical Pathology, The University of Melbourne, Parkville, VIC 3010, Australia; Genomic Medicine and Family Cancer Clinic, Royal Melbourne Hospital, Parkville, VIC 3010, Australia
| | - Daniel Schaid
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA
| | - David A Drew
- Clinical and Translational Epidemiology Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - David C Muller
- School of Public Health, Imperial College London, London SW7 2AZ, UK
| | - David Duggan
- Translational Genomics Research Institute - An Affiliate of City of Hope, Phoenix, AZ 85003, USA
| | - David R Crosslin
- Department of Bioinformatics and Medical Education, University of Washington Medical Center, Seattle, WA 98195, USA
| | - Demetrius Albanes
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Edward L Giovannucci
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Department of Nutrition, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02108, USA
| | - Eric Larson
- Kaiser Permanente Washington Research Institute, Seattle, WA 98101, USA
| | - Flora Qu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Frank Mentch
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Graham G Giles
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC 3000, Australia; Cancer Epidemiology Division, Cancer Council Victoria, 615 St Kilda Road, Melbourne, VIC 3004, Australia; Precision Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton, VIC 3168, Australia
| | - Hakon Hakonarson
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Heather Hampel
- Division of Human Genetics, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, OH 43210, USA
| | - Ian B Stanaway
- Department of Medicine (Medical Genetics), University of Washington Medical Center, Seattle, WA 98195, USA
| | - Jane C Figueiredo
- Department of Medicine, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA; Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Jeroen R Huyghe
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Jessica Minnier
- School of Public Health, Oregon Health & Science University, Portland, OR 97239, USA
| | - Jenny Chang-Claude
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, 69120 Germany; University Medical Centre Hamburg-Eppendorf, University Cancer Centre Hamburg (UCCH), Hamburg 20246, Germany
| | - Jochen Hampe
- Department of Medicine I, University Hospital Dresden, Technische Universität Dresden (TU Dresden), Dresden 01062, Germany
| | - John B Harley
- Center for Autoimmune Genomics and Etiology (CAGE), Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA; Cincinnati VA Medical Center, Cincinnati, OH 45229, USA
| | - Kala Visvanathan
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, and the Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Baltimore, MD 21287, USA
| | - Keith R Curtis
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Kenneth Offit
- Clinical Genetics Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY 10021, USA; Department of Medicine, Weill Cornell Medical College, NY 10065, USA
| | - Li Li
- Department of Family Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | | | - Ludmila Vodickova
- Department of Molecular Biology of Cancer, Institute of Experimental Medicine of the Czech Academy of Sciences, 142 20 Prague 4, Czech Republic; Institute of Biology and Medical Genetics, First Faculty of Medicine, Charles University, 128 00 Prague, Czech Republic; Faculty of Medicine and Biomedical Center in Pilsen, Charles University, 323 00 Pilsen, Czech Republic
| | - Marc J Gunter
- Nutrition and Metabolism Section, International Agency for Research on Cancer, World Health Organization, Lyon 69372, France
| | - Mark A Jenkins
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC 3000, Australia
| | - Martha L Slattery
- Department of Internal Medicine, University of Utah, Salt Lake City, UT 84132, USA
| | - Mathieu Lemire
- PanCuRx Translational Research Initiative, Ontario, Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Michael O Woods
- Memorial University of Newfoundland, Discipline of Genetics, St. John's, NL A1B 3R7, Canada
| | - Mingyang Song
- Clinical and Translational Epidemiology Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02141, USA; Department of Nutrition, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA
| | - Neil Murphy
- Nutrition and Metabolism Section, International Agency for Research on Cancer, World Health Organization, Lyon 69372, France
| | - Noralane M Lindor
- Department of Health Science Research, Mayo Clinic, Scottsdale, AZ 85260, USA
| | - Ozan Dikilitas
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN 55905, USA
| | - Paul D P Pharoah
- Department of Public Health and Primary Care, University of Cambridge, Cambridge CB2 0SR, UK
| | - Peter T Campbell
- Behavioral and Epidemiology Research Group, American Cancer Society, Atlanta, GA 30303, USA
| | - Polly A Newcomb
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; School of Public Health, University of Washington, Seattle, WA 98195, USA
| | - Roger L Milne
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC 3000, Australia; Cancer Epidemiology Division, Cancer Council Victoria, 615 St Kilda Road, Melbourne, VIC 3004, Australia; Precision Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton, VIC 3168, Australia
| | - Robert J MacInnis
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC 3000, Australia; Cancer Epidemiology Division, Cancer Council Victoria, 615 St Kilda Road, Melbourne, VIC 3004, Australia
| | - Sergi Castellví-Bel
- Gastroenterology Department, Hospital Clínic, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), University of Barcelona, Barcelona 08007, Spain
| | - Shuji Ogino
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02141, USA; Program in MPE Molecular Pathological Epidemiology, Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA; Department of Oncologic Pathology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Sonja I Berndt
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Stéphane Bézieau
- Service de Génétique Médicale, Centre Hospitalier Universitaire (CHU) Nantes, Nantes 44093, France
| | - Stephen N Thibodeau
- Division of Laboratory Genetics, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN 85054, USA
| | - Steven J Gallinger
- Lunenfeld Tanenbaum Research Institute, Mount Sinai Hospital, University of Toronto, Toronto, ON M5G1X5, Canada
| | - Syed H Zaidi
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Tabitha A Harrison
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Temitope O Keku
- Center for Gastrointestinal Biology and Disease, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Thomas J Hudson
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Veronika Vymetalkova
- Department of Molecular Biology of Cancer, Institute of Experimental Medicine of the Czech Academy of Sciences, 142 20 Prague 4, Czech Republic; Institute of Biology and Medical Genetics, First Faculty of Medicine, Charles University, 128 00 Prague, Czech Republic; Faculty of Medicine and Biomedical Center in Pilsen, Charles University, 323 00 Pilsen, Czech Republic
| | - Victor Moreno
- Oncology Data Analytics Program, Catalan Institute of Oncology, L'Hospitalet de Llobregat, Barcelona 08908, Spain; CIBER Epidemiología y Salud Pública (CIBERESP), Madrid 28029, Spain; Department of Clinical Sciences, Faculty of Medicine, University of Barcelona, Barcelona 08907, Spain; ONCOBEL Program, Bellvitge Biomedical Research Institute (IDIBELL), L'Hospitalet de Llobregat, Barcelona 08908, Spain
| | - Vicente Martín
- CIBER Epidemiología y Salud Pública (CIBERESP), Madrid 28029, Spain; Biomedicine Institute (IBIOMED), University of León, León 24071, Spain
| | - Volker Arndt
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg 69120, Germany
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Wendy Chung
- Office of Research & Development, Department of Veterans Affairs, Washington, DC 20420, USA; Departments of Pediatrics and Medicine, Columbia University Medical Center, New York, NY 10032, USA
| | - Yu-Ru Su
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Richard B Hayes
- Division of Epidemiology, Department of Population Health, New York University School of Medicine, New York, NY 10016, USA
| | - Emily White
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Department of Epidemiology, University of Washington, Seattle, WA 98195, USA
| | - Pavel Vodicka
- Department of Molecular Biology of Cancer, Institute of Experimental Medicine of the Czech Academy of Sciences, 142 20 Prague 4, Czech Republic; Institute of Biology and Medical Genetics, First Faculty of Medicine, Charles University, 128 00 Prague, Czech Republic; Faculty of Medicine and Biomedical Center in Pilsen, Charles University, 323 00 Pilsen, Czech Republic
| | - Graham Casey
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22903, USA
| | - Stephen B Gruber
- Department of Preventive Medicine, USC Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA
| | - Robert E Schoen
- Department of Medicine and Epidemiology, University of Pittsburgh Medical Center, Pittsburgh, PA 15219, USA
| | - Andrew T Chan
- Clinical and Translational Epidemiology Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02141, USA; Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA
| | - John D Potter
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Centre for Public Health Research, Massey University, Wellington 6140, New Zealand
| | - Hermann Brenner
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg 69120, Germany; Division of Preventive Oncology, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases (NCT), Heidelberg 69120, Germany; German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg 69120, Germany
| | - Gail P Jarvik
- Department of Medicine (Medical Genetics), University of Washington Medical Center, Seattle, WA 98195, USA; Genome Sciences, University of Washington Medical Center, Seattle, WA 98195, USA
| | - Douglas A Corley
- Division of Research, Kaiser Permanente Northern California, Oakland, CA 94612, USA
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Department of Epidemiology, University of Washington, Seattle, WA 98195, USA.
| | - Li Hsu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
83
|
Li X, Li Z, Zhou H, Gaynor SM, Liu Y, Chen H, Sun R, Dey R, Arnett DK, Aslibekyan S, Ballantyne CM, Bielak LF, Blangero J, Boerwinkle E, Bowden DW, Broome JG, Conomos MP, Correa A, Cupples LA, Curran JE, Freedman BI, Guo X, Hindy G, Irvin MR, Kardia SLR, Kathiresan S, Khan AT, Kooperberg CL, Laurie CC, Liu XS, Mahaney MC, Manichaikul AW, Martin LW, Mathias RA, McGarvey ST, Mitchell BD, Montasser ME, Moore JE, Morrison AC, O'Connell JR, Palmer ND, Pampana A, Peralta JM, Peyser PA, Psaty BM, Redline S, Rice KM, Rich SS, Smith JA, Tiwari HK, Tsai MY, Vasan RS, Wang FF, Weeks DE, Weng Z, Wilson JG, Yanek LR, Neale BM, Sunyaev SR, Abecasis GR, Rotter JI, Willer CJ, Peloso GM, Natarajan P, Lin X. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nat Genet 2020; 52:969-983. [PMID: 32839606 PMCID: PMC7483769 DOI: 10.1038/s41588-020-0676-4] [Citation(s) in RCA: 118] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 07/02/2020] [Indexed: 12/13/2022]
Abstract
Large-scale whole-genome sequencing studies have enabled the analysis of rare variants (RVs) associated with complex phenotypes. Commonly used RV association tests have limited scope to leverage variant functions. We propose STAAR (variant-set test for association using annotation information), a scalable and powerful RV association test method that effectively incorporates both variant categories and multiple complementary annotations using a dynamic weighting scheme. For the latter, we introduce 'annotation principal components', multidimensional summaries of in silico variant annotations. STAAR accounts for population structure and relatedness and is scalable for analyzing very large cohort and biobank whole-genome sequencing studies of continuous and dichotomous traits. We applied STAAR to identify RVs associated with four lipid traits in 12,316 discovery and 17,822 replication samples from the Trans-Omics for Precision Medicine Program. We discovered and replicated new RV associations, including disruptive missense RVs of NPC1L1 and an intergenic region near APOC1P1 associated with low-density lipoprotein cholesterol.
Collapse
Affiliation(s)
- Xihao Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Zilin Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Hufeng Zhou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Sheila M Gaynor
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Yaowu Liu
- School of Statistics, Southwestern University of Finance and Economics, Chengdu, China
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Center for Precision Health, School of Public Health and School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Ryan Sun
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Rounak Dey
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Donna K Arnett
- College of Public Health, University of Kentucky, Lexington, KY, USA
| | - Stella Aslibekyan
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| | | | - Lawrence F Bielak
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Donald W Bowden
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Jai G Broome
- Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Matthew P Conomos
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Adolfo Correa
- Jackson Heart Study, Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - L Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, MA, USA
| | - Joanne E Curran
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Barry I Freedman
- Department of Internal Medicine, Nephrology, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - George Hindy
- Department of Population Medicine, Qatar University College of Medicine, QU Health, Doha, Qatar
| | - Marguerite R Irvin
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Sharon L R Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Sekar Kathiresan
- Verve Therapeutics, Cambridge, MA, USA
- Cardiology Division, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Alyna T Khan
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Charles L Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Cathy C Laurie
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - X Shirley Liu
- Department of Data Sciences, Dana-Farber Cancer Institute and Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Statistics, Harvard University, Cambridge, MA, USA
| | - Michael C Mahaney
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Ani W Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Lisa W Martin
- Division of Cardiology, George Washington School of Medicine and Health Sciences, Washington, DC, USA
| | - Rasika A Mathias
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Stephen T McGarvey
- Department of Epidemiology, International Health Institute, Department of Anthropology, Brown University, Providence, RI, USA
| | - Braxton D Mitchell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Geriatrics Research and Education Clinical Center, Baltimore VA Medical Center, Baltimore, MD, USA
| | - May E Montasser
- Division of Endocrinology, Diabetes, and Nutrition, Program for Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Jill E Moore
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Jeffrey R O'Connell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Nicholette D Palmer
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Akhil Pampana
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Juan M Peralta
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Patricia A Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Departments of Medicine, Epidemiology, and Health Services, University of Washington, Seattle, WA, USA
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
- Division of Pulmonary, Critical Care, and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Kenneth M Rice
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Jennifer A Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Hemant K Tiwari
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Michael Y Tsai
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, USA
| | - Ramachandran S Vasan
- Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, MA, USA
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Fei Fei Wang
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Daniel E Weeks
- Department of Human Genetics and Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - James G Wilson
- Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS, USA
- Division of Cardiology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Lisa R Yanek
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Benjamin M Neale
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Shamil R Sunyaev
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Division of Genetics, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Gonçalo R Abecasis
- Regeneron Pharmaceuticals, Tarrytown, NY, USA
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Cristen J Willer
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Gina M Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Pradeep Natarajan
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Department of Statistics, Harvard University, Cambridge, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| |
Collapse
|
84
|
Abstract
Since the initial success of genome-wide association studies (GWAS) in 2005, tens of thousands of genetic variants have been identified for hundreds of human diseases and traits. In a GWAS, genotype information at up to millions of genetic markers is collected from up to hundreds of thousands of individuals, together with their phenotype information. Several scientific goals can be accomplished through the analysis of GWAS data, including the identification of variants, genes, and pathways associated with diseases and traits of interest; the inference of the genetic architecture of these traits; and the development of genetic risk prediction models. In this review, we provide an overview of the statistical challenges in achieving these goals and recent progress in statistical methodology to address these challenges.
Collapse
Affiliation(s)
- Ning Sun
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520, USA
| |
Collapse
|
85
|
Zhang YD, Hurson AN, Zhang H, Choudhury PP, Easton DF, Milne RL, Simard J, Hall P, Michailidou K, Dennis J, Schmidt MK, Chang-Claude J, Gharahkhani P, Whiteman D, Campbell PT, Hoffmeister M, Jenkins M, Peters U, Hsu L, Gruber SB, Casey G, Schmit SL, O'Mara TA, Spurdle AB, Thompson DJ, Tomlinson I, De Vivo I, Landi MT, Law MH, Iles MM, Demenais F, Kumar R, MacGregor S, Bishop DT, Ward SV, Bondy ML, Houlston R, Wiencke JK, Melin B, Barnholtz-Sloan J, Kinnersley B, Wrensch MR, Amos CI, Hung RJ, Brennan P, McKay J, Caporaso NE, Berndt SI, Birmann BM, Camp NJ, Kraft P, Rothman N, Slager SL, Berchuck A, Pharoah PDP, Sellers TA, Gayther SA, Pearce CL, Goode EL, Schildkraut JM, Moysich KB, Amundadottir LT, Jacobs EJ, Klein AP, Petersen GM, Risch HA, Stolzenberg-Solomon RZ, Wolpin BM, Li D, Eeles RA, Haiman CA, Kote-Jarai Z, Schumacher FR, Al Olama AA, Purdue MP, Scelo G, Dalgaard MD, Greene MH, Grotmol T, Kanetsky PA, McGlynn KA, Nathanson KL, Turnbull C, Wiklund F, Chanock SJ, Chatterjee N, Garcia-Closas M. Assessment of polygenic architecture and risk prediction based on common variants across fourteen cancers. Nat Commun 2020; 11:3353. [PMID: 32620889 PMCID: PMC7335068 DOI: 10.1038/s41467-020-16483-3] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Accepted: 05/04/2020] [Indexed: 02/08/2023] Open
Abstract
Genome-wide association studies (GWAS) have led to the identification of hundreds of susceptibility loci across cancers, but the impact of further studies remains uncertain. Here we analyse summary-level data from GWAS of European ancestry across fourteen cancer sites to estimate the number of common susceptibility variants (polygenicity) and underlying effect-size distribution. All cancers show a high degree of polygenicity, involving at a minimum of thousands of loci. We project that sample sizes required to explain 80% of GWAS heritability vary from 60,000 cases for testicular to over 1,000,000 cases for lung cancer. The maximum relative risk achievable for subjects at the 99th risk percentile of underlying polygenic risk scores (PRS), compared to average risk, ranges from 12 for testicular to 2.5 for ovarian cancer. We show that PRS have potential for risk stratification for cancers of breast, colon and prostate, but less so for others because of modest heritability and lower incidence.
Collapse
Affiliation(s)
- Yan Dora Zhang
- Department of Statistics and Actuarial Science, Faculty of Science, The University of Hong Kong, Hong Kong SAR, China
- Centre for PanorOmic Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Amber N Hurson
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, USA
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, USA
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
| | - Parichoy Pal Choudhury
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, USA
| | - Douglas F Easton
- Department of Oncology, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, UK
- Department of Public Health and Primary Care, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, UK
| | - Roger L Milne
- Cancer Epidemiology Division, Cancer Council Victoria, Melbourne, VIC, Australia
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC, Australia
- Precision Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton, VIC, Australia
| | - Jacques Simard
- Centre Hospitalier Universitaire de Québec-Université Laval Research Center, Québec City, QC, Canada
| | - Per Hall
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
- Department of Oncology, Södersjukhuset, Stockholm, Sweden
| | - Kyriaki Michailidou
- Department of Public Health and Primary Care, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, UK
- Department of Electron Microscopy/Molecular Pathology and The Cyprus School of Molecular Medicine, The Cyprus Institute of Neurology & Genetics, Nicosia, Cyprus
| | - Joe Dennis
- Department of Public Health and Primary Care, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, UK
| | - Marjanka K Schmidt
- Division of Molecular Pathology, The Netherlands Cancer Institute - Antoni van Leeuwenhoek Hospital, Amsterdam, The Netherlands
- Division of Psychosocial Research and Epidemiology, The Netherlands Cancer Institute - Antoni van Leeuwenhoek Hospital, Amsterdam, The Netherlands
| | - Jenny Chang-Claude
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Cancer Epidemiology Group, University Cancer Center Hamburg (UCCH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Puya Gharahkhani
- Statistical Genetics, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | - David Whiteman
- Cancer Control, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | - Peter T Campbell
- Behavioral and Epidemiology Research Group, American Cancer Society, Atlanta, GA, USA
| | - Michael Hoffmeister
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Mark Jenkins
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC, Australia
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Li Hsu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Stephen B Gruber
- Department of Preventive Medicine, USC Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Graham Casey
- Department of Public Health Sciences, Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Stephanie L Schmit
- Department of Cancer Epidemiology, H. Lee Moffitt Cancer Center and Research Institution, Tampa, FL, USA
| | - Tracy A O'Mara
- Genetics and Computational Biology Division, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | - Amanda B Spurdle
- Genetics and Computational Biology Division, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | - Deborah J Thompson
- Department of Public Health and Primary Care, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, UK
| | - Ian Tomlinson
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK
- Wellcome Trust Centre for Human Genetics and Oxford NIHR Biomedical Research Centre, University of Oxford, Oxford, UK
| | - Immaculata De Vivo
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Maria Teresa Landi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, USA
| | - Matthew H Law
- Statistical Genetics, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | - Mark M Iles
- Section of Epidemiology and Biostatistics, Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, UK
| | - Florence Demenais
- Université de Paris, UMRS-1124, Institut National de la Santé et de la Recherche Médicale (INSERM), 75006, Paris, France
| | - Rajiv Kumar
- Division of Molecular Genetic Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Stuart MacGregor
- Statistical Genetics, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | - D Timothy Bishop
- Division of Haematology and Immunology, Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - Sarah V Ward
- Centre for Genetic Origins of Health and Disease, School of Biomedical Sciences, The University of Western Australia, Perth, WA, Australia
| | - Melissa L Bondy
- Department of Medicine, Section of Epidemiology and Population Sciences, Baylor College of Medicine, Houston, TX, USA
| | - Richard Houlston
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - John K Wiencke
- Department of Neurological Surgery, School of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Beatrice Melin
- Department of Radiation Sciences Oncology, Umeå University, Umeå, Sweden
| | - Jill Barnholtz-Sloan
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Ben Kinnersley
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Margaret R Wrensch
- Department of Neurological Surgery, School of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Christopher I Amos
- Institute for Clinical and Translational Research, Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Rayjean J Hung
- Lunenfeld-Tanenbuaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Paul Brennan
- International Agency for Research on Cancer, World Health Organization, Lyon, France
| | - James McKay
- International Agency for Research on Cancer, World Health Organization, Lyon, France
| | - Neil E Caporaso
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, USA
| | - Sonja I Berndt
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, USA
| | - Brenda M Birmann
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Nicola J Camp
- Division of Hematology and Hematological Malignancies, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Peter Kraft
- Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Nathaniel Rothman
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, USA
| | - Susan L Slager
- Division of Biomedical Statistics & Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Andrew Berchuck
- Department of Gynecologic Oncology, Duke University Medical Center, Durham, NC, USA
| | - Paul D P Pharoah
- Department of Oncology, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, UK
- Department of Public Health and Primary Care, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, UK
| | - Thomas A Sellers
- Department of Cancer Epidemiology, H. Lee Moffitt Cancer Center and Research Institution, Tampa, FL, USA
| | - Simon A Gayther
- Center for Bioinformatics and Functional Genomics and the Cedars Sinai Genomics Core, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Celeste L Pearce
- Department of Preventive Medicine, USC Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Ellen L Goode
- Division of Epidemiology, Department of Health Science Research, Mayo Clinic, Rochester, MN, USA
| | | | - Kirsten B Moysich
- Division of Cancer Prevention and Control, Roswell Park Cancer Institute, Buffalo, NY, USA
| | - Laufey T Amundadottir
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Eric J Jacobs
- Behavioral and Epidemiology Research Group, American Cancer Society, Atlanta, GA, USA
| | - Alison P Klein
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Gloria M Petersen
- Division of Epidemiology, Department of Health Science Research, Mayo Clinic, Rochester, MN, USA
| | - Harvey A Risch
- Chronic Disease Epidemiology, Yale School of Medicine, New Haven, CT, USA
| | | | - Brian M Wolpin
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Donghui Li
- Division of Cancer Medicine, GI Medical Oncology Department, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Rosalind A Eeles
- Division of Genetics and Epidemiology, The Institute of Cancer Research, Sutton, Surrey, UK
| | - Christopher A Haiman
- Department of Preventive Medicine, USC Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Zsofia Kote-Jarai
- Division of Genetics and Epidemiology, The Institute of Cancer Research, Sutton, Surrey, UK
| | - Fredrick R Schumacher
- Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Ali Amin Al Olama
- Strangeways Research Laboratory, Department of Public Health and Primary Care, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, UK
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK
| | - Mark P Purdue
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, USA
| | - Ghislaine Scelo
- International Agency for Research on Cancer, World Health Organization, Lyon, France
| | - Marlene D Dalgaard
- Department of Growth and Reproduction, Copenhagen University Hospital (Rigshospitalet), Copenhagen, Denmark
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Mark H Greene
- Clinical Genetics Branch, Division of Cancer Genetics and Epidemiology, National Cancer Institute, Rockville, MD, USA
| | | | - Peter A Kanetsky
- Department of Cancer Epidemiology, H. Lee Moffitt Cancer Center and Research Institution, Tampa, FL, USA
| | - Katherine A McGlynn
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, USA
| | - Katherine L Nathanson
- Division of Translational Health and Human Genetics, Department of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Clare Turnbull
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Fredrik Wiklund
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Stephen J Chanock
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA.
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| | | |
Collapse
|
86
|
Chun S, Imakaev M, Hui D, Patsopoulos NA, Neale BM, Kathiresan S, Stitziel NO, Sunyaev SR. Non-parametric Polygenic Risk Prediction via Partitioned GWAS Summary Statistics. Am J Hum Genet 2020; 107:46-59. [PMID: 32470373 PMCID: PMC7332650 DOI: 10.1016/j.ajhg.2020.05.004] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2019] [Accepted: 05/01/2020] [Indexed: 02/07/2023] Open
Abstract
In complex trait genetics, the ability to predict phenotype from genotype is the ultimate measure of our understanding of genetic architecture underlying the heritability of a trait. A complete understanding of the genetic basis of a trait should allow for predictive methods with accuracies approaching the trait's heritability. The highly polygenic nature of quantitative traits and most common phenotypes has motivated the development of statistical strategies focused on combining myriad individually non-significant genetic effects. Now that predictive accuracies are improving, there is a growing interest in the practical utility of such methods for predicting risk of common diseases responsive to early therapeutic intervention. However, existing methods require individual-level genotypes or depend on accurately specifying the genetic architecture underlying each disease to be predicted. Here, we propose a polygenic risk prediction method that does not require explicitly modeling any underlying genetic architecture. We start with summary statistics in the form of SNP effect sizes from a large GWAS cohort. We then remove the correlation structure across summary statistics arising due to linkage disequilibrium and apply a piecewise linear interpolation on conditional mean effects. In both simulated and real datasets, this new non-parametric shrinkage (NPS) method can reliably allow for linkage disequilibrium in summary statistics of 5 million dense genome-wide markers and consistently improves prediction accuracy. We show that NPS improves the identification of groups at high risk for breast cancer, type 2 diabetes, inflammatory bowel disease, and coronary heart disease, all of which have available early intervention or prevention treatments.
Collapse
Affiliation(s)
- Sung Chun
- Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Maxim Imakaev
- Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Daniel Hui
- Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Systems Biology and Computer Science Program, Ann Romney Center for Neurological Diseases, Department of Neurology, Brigham & Women's Hospital, Boston, MA 02115, USA
| | - Nikolaos A Patsopoulos
- Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Systems Biology and Computer Science Program, Ann Romney Center for Neurological Diseases, Department of Neurology, Brigham & Women's Hospital, Boston, MA 02115, USA
| | - Benjamin M Neale
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Sekar Kathiresan
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA 02114, USA; Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Nathan O Stitziel
- Cardiovascular Division, Department of Medicine, Washington University School of Medicine, Saint Louis, MO 63110, USA; Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63110, USA; McDonnell Genome Institute, Washington University School of Medicine, Saint Louis, MO 63110, USA.
| | - Shamil R Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA.
| |
Collapse
|
87
|
Lewis CM, Vassos E. Polygenic risk scores: from research tools to clinical instruments. Genome Med 2020; 12:44. [PMID: 32423490 PMCID: PMC7236300 DOI: 10.1186/s13073-020-00742-5] [Citation(s) in RCA: 551] [Impact Index Per Article: 137.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Accepted: 05/01/2020] [Indexed: 12/19/2022] Open
Abstract
Genome-wide association studies have shown unequivocally that common complex disorders have a polygenic genetic architecture and have enabled researchers to identify genetic variants associated with diseases. These variants can be combined into a polygenic risk score that captures part of an individual's susceptibility to diseases. Polygenic risk scores have been widely applied in research studies, confirming the association between the scores and disease status, but their clinical utility has yet to be established. Polygenic risk scores may be used to estimate an individual's lifetime genetic risk of disease, but the current discriminative ability is low in the general population. Clinical implementation of polygenic risk score (PRS) may be useful in cohorts where there is a higher prior probability of disease, for example, in early stages of diseases to assist in diagnosis or to inform treatment choices. Important considerations are the weaker evidence base in application to non-European ancestry and the challenges in translating an individual's PRS from a percentile of a normal distribution to a lifetime disease risk. In this review, we consider how PRS may be informative at different points in the disease trajectory giving examples of progress in the field and discussing obstacles that need to be addressed before clinical implementation.
Collapse
Affiliation(s)
- Cathryn M Lewis
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, King's College London, de Crespigny Park, London, SE5 8AF, UK.
- Department of Medical and Molecular Genetics, Faculty of Life Sciences and Medicine, King's College London, London, UK.
| | - Evangelos Vassos
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, King's College London, de Crespigny Park, London, SE5 8AF, UK
| |
Collapse
|
88
|
Cano-Gamez E, Trynka G. From GWAS to Function: Using Functional Genomics to Identify the Mechanisms Underlying Complex Diseases. Front Genet 2020; 11:424. [PMID: 32477401 PMCID: PMC7237642 DOI: 10.3389/fgene.2020.00424] [Citation(s) in RCA: 248] [Impact Index Per Article: 62.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 04/06/2020] [Indexed: 12/19/2022] Open
Abstract
Genome-wide association studies (GWAS) have successfully mapped thousands of loci associated with complex traits. These associations could reveal the molecular mechanisms altered in common complex diseases and result in the identification of novel drug targets. However, GWAS have also left a number of outstanding questions. In particular, the majority of disease-associated loci lie in non-coding regions of the genome and, even though they are thought to play a role in gene expression regulation, it is unclear which genes they regulate and in which cell types or physiological contexts this regulation occurs. This has hindered the translation of GWAS findings into clinical interventions. In this review we summarize how these challenges have been addressed over the last decade, with a particular focus on the integration of GWAS results with functional genomics datasets. Firstly, we investigate how the tissues and cell types involved in diseases can be identified using methods that test for enrichment of GWAS variants in genomic annotations. Secondly, we explore how to find the genes regulated by GWAS loci using methods that test for colocalization of GWAS signals with molecular phenotypes such as quantitative trait loci (QTLs). Finally, we highlight potential future research avenues such as integrating GWAS results with single-cell sequencing read-outs, designing functionally informed polygenic risk scores (PRS), and validating disease associated genes using genetic engineering. These tools will be crucial to identify new drug targets for common complex diseases.
Collapse
Affiliation(s)
- Eddie Cano-Gamez
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| | - Gosia Trynka
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom
- Open Targets, Wellcome Genome Campus, Cambridge, United Kingdom
| |
Collapse
|
89
|
van de Geijn B, Finucane H, Gazal S, Hormozdiari F, Amariuta T, Liu X, Gusev A, Loh PR, Reshef Y, Kichaev G, Raychauduri S, Price AL. Annotations capturing cell type-specific TF binding explain a large fraction of disease heritability. Hum Mol Genet 2020; 29:1057-1067. [PMID: 31595288 PMCID: PMC7206853 DOI: 10.1093/hmg/ddz226] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 08/12/2019] [Accepted: 09/10/2019] [Indexed: 12/21/2022] Open
Abstract
Regulatory variation plays a major role in complex disease and that cell type-specific binding of transcription factors (TF) is critical to gene regulation. However, assessing the contribution of genetic variation in TF-binding sites to disease heritability is challenging, as binding is often cell type-specific and annotations from directly measured TF binding are not currently available for most cell type-TF pairs. We investigate approaches to annotate TF binding, including directly measured chromatin data and sequence-based predictions. We find that TF-binding annotations constructed by intersecting sequence-based TF-binding predictions with cell type-specific chromatin data explain a large fraction of heritability across a broad set of diseases and corresponding cell types; this strategy of constructing annotations addresses both the limitation that identical sequences may be bound or unbound depending on surrounding chromatin context and the limitation that sequence-based predictions are generally not cell type-specific. We partitioned the heritability of 49 diseases and complex traits using stratified linkage disequilibrium (LD) score regression with the baseline-LD model (which is not cell type-specific) plus the new annotations. We determined that 100 bp windows around MotifMap sequenced-based TF-binding predictions intersected with a union of six cell type-specific chromatin marks (imputed using ChromImpute) performed best, with an 58% increase in heritability enrichment compared to the chromatin marks alone (11.6× vs. 7.3×, P = 9 × 10-14 for difference) and a 20% increase in cell type-specific signal conditional on annotations from the baseline-LD model (P = 8 × 10-11 for difference). Our results show that TF-binding annotations explain substantial disease heritability and can help refine genome-wide association signals.
Collapse
Affiliation(s)
- Bryce van de Geijn
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston 02115, MA, USA
| | - Hilary Finucane
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston 02115, MA, USA
| | - Farhad Hormozdiari
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston 02115, MA, USA
| | - Tiffany Amariuta
- Center for Data Sciences, Harvard Medical School, Boston, MA 02215, USA
- Divisions of Genetics, Rheumatology, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02215, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02215, USA
- Graduate School of Arts and Sciences, Harvard University, Boston, MA 02215, USA
| | - Xuanyao Liu
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston 02115, MA, USA
| | | | - Po-Ru Loh
- Brigham and Women’s Hospital, Boston, MA 02215, USA
| | - Yakir Reshef
- Department of Computer Science, Harvard University, Cambridge, MA 02138, USA
- Harvard/MIT MD/PhD Program, Boston, MA 02215, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02215, USA
| | - Gleb Kichaev
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Soumya Raychauduri
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston 02115, MA, USA
- Center for Data Sciences, Harvard Medical School, Boston, MA 02215, USA
- Divisions of Genetics, Rheumatology, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02215, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02215, USA
- Graduate School of Arts and Sciences, Harvard University, Boston, MA 02215, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston 02115, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02215, USA
| |
Collapse
|
90
|
Yang S, Zhou X. Accurate and Scalable Construction of Polygenic Scores in Large Biobank Data Sets. Am J Hum Genet 2020; 106:679-693. [PMID: 32330416 PMCID: PMC7212266 DOI: 10.1016/j.ajhg.2020.03.013] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 03/30/2020] [Indexed: 01/24/2023] Open
Abstract
Accurate construction of polygenic scores (PGS) can enable early diagnosis of diseases and facilitate the development of personalized medicine. Accurate PGS construction requires prediction models that are both adaptive to different genetic architectures and scalable to biobank scale datasets with millions of individuals and tens of millions of genetic variants. Here, we develop such a method called Deterministic Bayesian Sparse Linear Mixed Model (DBSLMM). DBSLMM relies on a flexible modeling assumption on the effect size distribution to achieve robust and accurate prediction performance across a range of genetic architectures. DBSLMM also relies on a simple deterministic search algorithm to yield an approximate analytic estimation solution using summary statistics only. The deterministic search algorithm, when paired with further algebraic innovations, results in substantial computational savings. With simulations, we show that DBSLMM achieves scalable and accurate prediction performance across a range of realistic genetic architectures. We then apply DBSLMM to analyze 25 traits in UK Biobank. For these traits, compared to existing approaches, DBSLMM achieves an average of 2.03%-101.09% accuracy gain in internal cross-validations. In external validations on two separate datasets, including one from BioBank Japan, DBSLMM achieves an average of 14.74%-522.74% accuracy gain. In these real data applications, DBSLMM is 1.03-28.11 times faster and uses only 7.4%-24.8% of physical memory as compared to other multiple regression-based PGS methods. Overall, DBSLMM represents an accurate and scalable method for constructing PGS in biobank scale datasets.
Collapse
Affiliation(s)
- Sheng Yang
- Department of Biostatistics, Nanjing Medical University, Nanjing, Jiangsu 211166, China; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
91
|
Borges MG, Rocha CS, Carvalho BS, Lopes-Cendes I. Methodological differences can affect sequencing depth with a possible impact on the accuracy of genetic diagnosis. Genet Mol Biol 2020; 43:e20190270. [PMID: 32343762 PMCID: PMC7198014 DOI: 10.1590/1678-4685-gmb-2019-0270] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Accepted: 02/16/2020] [Indexed: 11/24/2022] Open
Abstract
For a better interpretation of variants, evidence-based databases, such as
ClinVar, compile data on the presumed relationships between variants and
phenotypes. In this study, we aimed to analyze the pattern of sequencing depth
in variants from whole-exome sequencing data in the 1000 Genomes project phase
3, focusing on the variants present in the ClinVar database that were predicted
to affect protein-coding regions. We demonstrate that the distribution of the
sequencing depth varies across different sequencing centers (pair-wise
comparison, p < 0.001). Most importantly, we found that the
distribution pattern of sequencing depth is specific to each facility, making it
possible to correctly assign 96.9% of the samples to their sequencing center.
Thus, indicating the presence of a systematic bias, related to the methods used
in the different facilities, which generates significant variations in breadth
and depth in whole-exome sequencing data in clinically relevant regions. Our
results show that methodological differences, leading to significant
heterogeneity in sequencing depth, may potentially influence the accuracy of
genetic diagnosis. Furthermore, our findings highlight how it is still
challenging to integrate results from different sequencing centers, which may
also have an impact on genomic research.
Collapse
Affiliation(s)
- Murilo G Borges
- Universidade Estadual de Campinas (UNICAMP), Faculdade de Ciências Médicas, Departamento de Genética Médica e Medicina Genômica, Campinas, SP, Brazil.,Instituto Brasileiro de Neurociência e Neurotecnologia (BRAINN), Campinas, SP, Brazil.,Universidade Estadual de Campinas (UNICAMP), Instituto de Física "Gleb Wataghin". Campinas, SP, Brazil
| | - Cristiane S Rocha
- Universidade Estadual de Campinas (UNICAMP), Faculdade de Ciências Médicas, Departamento de Genética Médica e Medicina Genômica, Campinas, SP, Brazil.,Instituto Brasileiro de Neurociência e Neurotecnologia (BRAINN), Campinas, SP, Brazil
| | - Benilton S Carvalho
- Instituto Brasileiro de Neurociência e Neurotecnologia (BRAINN), Campinas, SP, Brazil.,Universidade Estadual de Campinas (UNICAMP), Instituto de Matemática, Estatística e Computação Científica, Departamento de Estatística, Campinas, SP, Brazil
| | - Iscia Lopes-Cendes
- Universidade Estadual de Campinas (UNICAMP), Faculdade de Ciências Médicas, Departamento de Genética Médica e Medicina Genômica, Campinas, SP, Brazil.,Instituto Brasileiro de Neurociência e Neurotecnologia (BRAINN), Campinas, SP, Brazil
| |
Collapse
|
92
|
Li B, Lu Q, Zhao H. An evaluation of noncoding genome annotation tools through enrichment analysis of 15 genome-wide association studies. Brief Bioinform 2020; 20:995-1003. [PMID: 29106447 DOI: 10.1093/bib/bbx131] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Revised: 09/02/2017] [Indexed: 01/08/2023] Open
Abstract
Functionally annotating genetic variations is an essential yet challenging topic in human genetics research. As large consortia including ENCODE and Roadmap Epigenomics Project continue to generate high-throughput transcriptomic and epigenomic data, many computational frameworks have been developed to integrate these experimental data to predict functionality of genetic variations in both protein-coding and noncoding regions. Here, we compare a number of recently developed annotation frameworks for noncoding regions through enrichment analysis on genome-wide association studies (GWASs). We also compare several different strategies to quantify enrichment using GWAS summary statistics. Our analyses highlight the importance of jointly modeling context-specific annotations with genome-wide data in providing statistically powerful and biologically interpretable enrichment for complex disease associations. Our findings provide insights into when and how computational genome annotations may benefit future complex disease studies on the genome-wide scale.
Collapse
Affiliation(s)
- Boyang Li
- Department of Biostatistics, Yale School of Public Health
| | - Qiongshi Lu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| |
Collapse
|
93
|
Song S, Jiang W, Hou L, Zhao H. Leveraging effect size distributions to improve polygenic risk scores derived from summary statistics of genome-wide association studies. PLoS Comput Biol 2020; 16:e1007565. [PMID: 32045423 PMCID: PMC7039528 DOI: 10.1371/journal.pcbi.1007565] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Revised: 02/24/2020] [Accepted: 11/25/2019] [Indexed: 12/29/2022] Open
Abstract
Genetic risk prediction is an important problem in human genetics, and accurate prediction can facilitate disease prevention and treatment. Calculating polygenic risk score (PRS) has become widely used due to its simplicity and effectiveness, where only summary statistics from genome-wide association studies are needed in the standard method. Recently, several methods have been proposed to improve standard PRS by utilizing external information, such as linkage disequilibrium and functional annotations. In this paper, we introduce EB-PRS, a novel method that leverages information for effect sizes across all the markers to improve prediction accuracy. Compared to most existing genetic risk prediction methods, our method does not need to tune parameters nor external information. Real data applications on six diseases, including asthma, breast cancer, celiac disease, Crohn's disease, Parkinson's disease and type 2 diabetes show that EB-PRS achieved 307.1%, 42.8%, 25.5%, 3.1%, 74.3% and 49.6% relative improvements in terms of predictive r2 over standard PRS method with optimally tuned parameters. Besides, compared to LDpred that makes use of LD information, EB-PRS also achieved 37.9%, 33.6%, 8.6%, 36.2%, 40.6% and 10.8% relative improvements. We note that our method is not the first method leveraging effect size distributions. Here we first justify our method by presenting theoretical optimal property over existing methods in this class of methods, and substantiate our theoretical result with extensive simulation results. The R-package EBPRS that implements our method is available on CRAN.
Collapse
Affiliation(s)
- Shuang Song
- Center for Statistical Science, Tsinghua University, Beijing, China
- Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Wei Jiang
- Department of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut, United States of America
| | - Lin Hou
- Center for Statistical Science, Tsinghua University, Beijing, China
- Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Hongyu Zhao
- Department of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
94
|
Hormozdiari F, van de Geijn B, Nasser J, Weissbrod O, Gazal S, Ju CJT, Connor LO, Hujoel MLA, Engreitz J, Hormozdiari F, Price AL. Functional disease architectures reveal unique biological role of transposable elements. Nat Commun 2019; 10:4054. [PMID: 31492842 PMCID: PMC6731302 DOI: 10.1038/s41467-019-11957-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Accepted: 08/08/2019] [Indexed: 12/19/2022] Open
Abstract
Transposable elements (TE) comprise roughly half of the human genome. Though initially derided as junk DNA, they have been widely hypothesized to contribute to the evolution of gene regulation. However, the contribution of TE to the genetic architecture of diseases remains unknown. Here, we analyze data from 41 independent diseases and complex traits to draw three conclusions. First, TE are uniquely informative for disease heritability. Despite overall depletion for heritability (54% of SNPs, 39 ± 2% of heritability), TE explain substantially more heritability than expected based on their depletion for known functional annotations. This implies that TE acquire function in ways that differ from known functional annotations. Second, older TE contribute more to disease heritability, consistent with acquiring biological function. Third, Short Interspersed Nuclear Elements (SINE) are far more enriched for blood traits than for other traits. Our results can help elucidate the biological roles that TE play in the genetic architecture of diseases.
Collapse
Affiliation(s)
- Farhad Hormozdiari
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA. .,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Bryce van de Geijn
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Joseph Nasser
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Omer Weissbrod
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Chelsea J-T Ju
- Department of Computer Science, University of California, Los Angeles, CA, 90095, USA
| | - Luke O' Connor
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.,Program in Bioinformatics and Integrative Genomics, Harvard Graduate School of Arts and Sciences, Boston, MA, USA
| | - Margaux L A Hujoel
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Jesse Engreitz
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Fereydoun Hormozdiari
- Department of Biochemistry and Molecular Medicine, University of California, Davis, CA, 95616, USA.,MIND Institute and UC-Davis Genome Center, Davis, CA, 95616, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA. .,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA. .,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
| |
Collapse
|
95
|
Johnson R, Shi H, Pasaniuc B, Sankararaman S. A unifying framework for joint trait analysis under a non-infinitesimal model. Bioinformatics 2019; 34:i195-i201. [PMID: 29949958 PMCID: PMC6022541 DOI: 10.1093/bioinformatics/bty254] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Motivation A large proportion of risk regions identified by genome-wide association studies (GWAS) are shared across multiple diseases and traits. Understanding whether this clustering is due to sharing of causal variants or chance colocalization can provide insights into shared etiology of complex traits and diseases. Results In this work, we propose a flexible, unifying framework to quantify the overlap between a pair of traits called UNITY (Unifying Non-Infinitesimal Trait analYsis). We formulate a Bayesian generative model that relates the overlap between pairs of traits to GWAS summary statistic data under a non-infinitesimal genetic architecture underlying each trait. We propose a Metropolis-Hastings sampler to compute the posterior density of the genetic overlap parameters in this model. We validate our method through comprehensive simulations and analyze summary statistics from height and body mass index GWAS to show that it produces estimates consistent with the known genetic makeup of both traits. Availability and implementation The UNITY software is made freely available to the research community at: https://github.com/bogdanlab/UNITY. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ruth Johnson
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Huwenbo Shi
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA.,Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.,Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Sriram Sankararaman
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA.,Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA.,Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
96
|
Boulesteix AL, Wright MN, Hoffmann S, König IR. Statistical learning approaches in the genetic epidemiology of complex diseases. Hum Genet 2019; 139:73-84. [PMID: 31049651 DOI: 10.1007/s00439-019-01996-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Accepted: 03/04/2019] [Indexed: 02/07/2023]
Abstract
In this paper, we give an overview of methodological issues related to the use of statistical learning approaches when analyzing high-dimensional genetic data. The focus is set on regression models and machine learning algorithms taking genetic variables as input and returning a classification or a prediction for the target variable of interest; for example, the present or future disease status, or the future course of a disease. After briefly explaining the basic motivation and principle of these methods, we review different procedures that can be used to evaluate the accuracy of the obtained models and discuss common flaws that may lead to over-optimistic conclusions with respect to their prediction performance and usefulness.
Collapse
Affiliation(s)
- Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig-Maximilians-University, Munich, Germany.
| | - Marvin N Wright
- Leibniz Institute for Prevention Research and Epidemiology-BIPS, Bremen, Germany.,Section of Biostatistics, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| | - Sabine Hoffmann
- Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig-Maximilians-University, Munich, Germany
| | - Inke R König
- Institute of Medical Biometry and Statistics, University of Lübeck, Lübeck, Germany
| |
Collapse
|
97
|
Jackknife Model Averaging Prediction Methods for Complex Phenotypes with Gene Expression Levels by Integrating External Pathway Information. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2019; 2019:2807470. [PMID: 31089389 PMCID: PMC6476151 DOI: 10.1155/2019/2807470] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/13/2019] [Accepted: 03/20/2019] [Indexed: 01/03/2023]
Abstract
Motivation In the past few years many prediction approaches have been proposed and widely employed in high dimensional genetic data for disease risk evaluation. However, those approaches typically ignore in model fitting the important group structures that naturally exists in genetic data. Methods In the present study, we applied a novel model-averaging approach, called jackknife model averaging prediction (JMAP), for high dimensional genetic risk prediction while incorporating pathway information into the model specification. JMAP selects the optimal weights across candidate models by minimizing a cross validation criterion in a jackknife way. Compared with previous approaches, one of the primary features of JMAP is to allow model weights to vary from 0 to 1 but without the limitation that the summation of weights is equal to one. We evaluated the performance of JMAP using extensive simulation studies and compared it with existing methods. We finally applied JMAP to four real cancer datasets that are publicly available from TCGA. Results The simulations showed that compared with other existing approaches (e.g., gsslasso), JMAP performed best or is among the best methods across a range of scenarios. For example, among 14 out of 16 simulation settings with PVE = 0.3, JMAP has an average of 0.075 higher prediction accuracy compared with gsslasso. We further found that in the simulation, the model weights for the true candidate models have much smaller chances to be zero compared with those for the null candidate models and are substantially greater in magnitude. In the real data application, JMAP also behaves comparably or better compared with the other methods for continuous phenotypes. For example, for the COAD, CRC, and PAAD datasets, the average gains of predictive accuracy of JMAP are 0.019, 0.064, and 0.052 compared with gsslasso. Conclusion The proposed method JMAP is a novel model-averaging approach for high dimensional genetic risk prediction while incorporating external useful group structures into the model specification.
Collapse
|
98
|
Hujoel MLA, Gazal S, Hormozdiari F, van de Geijn B, Price AL. Disease Heritability Enrichment of Regulatory Elements Is Concentrated in Elements with Ancient Sequence Age and Conserved Function across Species. Am J Hum Genet 2019; 104:611-624. [PMID: 30905396 PMCID: PMC6451699 DOI: 10.1016/j.ajhg.2019.02.008] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 02/05/2019] [Indexed: 02/06/2023] Open
Abstract
Regulatory elements, e.g., enhancers and promoters, have been widely reported to be enriched for disease and complex trait heritability. We investigated how this enrichment varies with the age of the underlying genome sequence, the conservation of regulatory function across species, and the target gene of the regulatory element. We estimated heritability enrichment by applying stratified LD score regression to summary statistics from 41 independent diseases and complex traits (average N = 320K) and meta-analyzing results across traits. Enrichment of human putative enhancers and promoters was larger in elements with older sequence age, assessed via alignment with other species irrespective of conserved functionality: putative enhancer elements with ancient sequence age (older than the split between marsupial and placental mammals) were 8.8× enriched (versus 2.5× for all putative enhancers; p = 3e-14), and promoter elements with ancient sequence age were 13.5× enriched (versus 5.1× for all promoters; p = 5e-16). Enrichment of human putative enhancers and promoters was also larger in elements whose regulatory function was conserved across species, e.g., human putative enhancers that were enhancers in ≥5 of 9 other mammals were 4.6× enriched (p = 5e-12 versus all putative enhancers). Enrichment of human promoters was larger in promoters of loss-of-function intolerant genes: 12.0× enrichment (p = 8e-15 versus all promoters). The mean value of several measures of negative selection within these genomic annotations mirrored all of these findings. Notably, the annotations with these excess heritability enrichments were jointly significant conditional on each other and on our baseline-LD model, which includes a broad set of coding, conserved, regulatory, and LD-related annotations.
Collapse
Affiliation(s)
- Margaux L A Hujoel
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Division of Biostatistics, Dana-Farber Cancer Institute, Boston, MA 02215, USA.
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Farhad Hormozdiari
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Bryce van de Geijn
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Alkes L Price
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
99
|
Chasioti D, Yan J, Nho K, Saykin AJ. Progress in Polygenic Composite Scores in Alzheimer's and Other Complex Diseases. Trends Genet 2019; 35:371-382. [PMID: 30922659 DOI: 10.1016/j.tig.2019.02.005] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Revised: 02/12/2019] [Accepted: 02/22/2019] [Indexed: 11/25/2022]
Abstract
Advances in high-throughput genotyping and next-generation sequencing (NGS) coupled with larger sample sizes brings the realization of precision medicine closer than ever. Polygenic approaches incorporating the aggregate influence of multiple genetic variants can contribute to a better understanding of the genetic architecture of many complex diseases and facilitate patient stratification. This review addresses polygenic concepts, methodological developments, hypotheses, and key issues in study design. Polygenic risk scores (PRSs) have been applied to many complex diseases and here we focus on Alzheimer's disease (AD) as a primary exemplar. This review was designed to serve as a starting point for investigators wishing to use PRSs in their research and those interested in enhancing clinical study designs through enrichment strategies.
Collapse
Affiliation(s)
- Danai Chasioti
- Department of BioHealth Informatics, Indiana University-Purdue University, Indianapolis, IN 46202, USA; Indiana Alzheimer Disease Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA; Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
| | - Jingwen Yan
- Department of BioHealth Informatics, Indiana University-Purdue University, Indianapolis, IN 46202, USA; Indiana Alzheimer Disease Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA; Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
| | - Kwangsik Nho
- Department of BioHealth Informatics, Indiana University-Purdue University, Indianapolis, IN 46202, USA; Indiana Alzheimer Disease Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA; Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
| | - Andrew J Saykin
- Indiana Alzheimer Disease Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA; Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA; Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
| |
Collapse
|
100
|
Abstract
Lung cancer is the leading cause of cancer deaths in both men and women in the US. While most sporadic lung cancer cases are related to environmental factors such as smoking, genetic susceptibility may also play an important role and a number of lung cancer associated single-nucleotide polymorphisms (SNPs) have been identified although many remain to be found. The collective effects of genome-wide minor alleles of common SNPs, or the minor allele content (MAC) in an individual, have been linked with quantitative variations of complex traits and diseases. Here we studied MAC in lung cancer using previously published SNPs data sets (US and Finland samples) and found higher MAC in cases relative to matched controls. A set of 5400 SNPs with MA (MAF < 0.5) more common in cases (P < 0.08) and linkage disequilibrium (LD) r2 = 0.3 was found to have the best predictive accuracy. These results identify higher MAC in lung cancer susceptibility and provide a meaningful genetic method to identify those at risk of lung cancer.
Collapse
|