1
|
Gallagher CS, Ginsburg GS, Musick A. Biobanking with genetics shapes precision medicine and global health. Nat Rev Genet 2024:10.1038/s41576-024-00794-y. [PMID: 39567741 DOI: 10.1038/s41576-024-00794-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/14/2024] [Indexed: 11/22/2024]
Abstract
Precision medicine provides patients with access to personally tailored treatments based on individual-level data. However, developing personalized therapies requires analyses with substantial statistical power to map genetic and epidemiologic associations that ultimately create models informing clinical decisions. As one solution, biobanks have emerged as large-scale, longitudinal cohort studies with long-term storage of biological specimens and health information, including electronic health records and participant survey responses. By providing access to individual-level data for genotype-phenotype mapping efforts, pharmacogenomic studies, polygenic risk score assessments and rare variant analyses, biobanks support ongoing and future precision medicine research. Notably, due in part to the geographical enrichment of biobanks in Western Europe and North America, European ancestries have become disproportionately over-represented in precision medicine research. Herein, we provide a genetics-focused review of biobanks from around the world that are in pursuit of supporting precision medicine. We discuss the limitations of their designs, ongoing efforts to diversify genomics research and strategies to maximize the benefits of research leveraging biobanks for all.
Collapse
Affiliation(s)
- C Scott Gallagher
- All of Us Research Program, National Institutes of Health, Bethesda, MD, USA
| | - Geoffrey S Ginsburg
- All of Us Research Program, National Institutes of Health, Bethesda, MD, USA
| | - Anjené Musick
- All of Us Research Program, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
2
|
Pathan N, Deng WQ, Di Scipio M, Khan M, Mao S, Morton RW, Lali R, Pigeyre M, Chong MR, Paré G. A method to estimate the contribution of rare coding variants to complex trait heritability. Nat Commun 2024; 15:1245. [PMID: 38336875 PMCID: PMC10858280 DOI: 10.1038/s41467-024-45407-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 01/22/2024] [Indexed: 02/12/2024] Open
Abstract
It has been postulated that rare coding variants (RVs; MAF < 0.01) contribute to the "missing" heritability of complex traits. We developed a framework, the Rare variant heritability (RARity) estimator, to assess RV heritability (h2RV) without assuming a particular genetic architecture. We applied RARity to 31 complex traits in the UK Biobank (n = 167,348) and showed that gene-level RV aggregation suffers from 79% (95% CI: 68-93%) loss of h2RV. Using unaggregated variants, 27 traits had h2RV > 5%, with height having the highest h2RV at 21.9% (95% CI: 19.0-24.8%). The total heritability, including common and rare variants, recovered pedigree-based estimates for 11 traits. RARity can estimate gene-level h2RV, enabling the assessment of gene-level characteristics and revealing 11, previously unreported, gene-phenotype relationships. Finally, we demonstrated that in silico pathogenicity prediction (variant-level) and gene-level annotations do not generally enrich for RVs that over-contribute to complex trait variance, and thus, innovative methods are needed to predict RV functionality.
Collapse
Affiliation(s)
- Nazia Pathan
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, Canada
- Department of Pathology and Molecular Medicine, McMaster University, Michael G. DeGroote School of Medicine, Hamilton, Canada
| | - Wei Q Deng
- Peter Boris Centre for Addictions Research, St. Joseph's Healthcare Hamilton, Hamilton, Canada
- Department of Psychiatry and Behavioural Neurosciences, McMaster University, Hamilton, Canada
| | - Matteo Di Scipio
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, Canada
- Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, Canada
| | - Mohammad Khan
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, Canada
- Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, Canada
| | - Shihong Mao
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, Canada
| | - Robert W Morton
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, Canada
- Department of Pathology and Molecular Medicine, McMaster University, Michael G. DeGroote School of Medicine, Hamilton, Canada
| | - Ricky Lali
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, Canada
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Canada
| | - Marie Pigeyre
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, Canada
- Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, Canada
| | - Michael R Chong
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, Canada
- Department of Pathology and Molecular Medicine, McMaster University, Michael G. DeGroote School of Medicine, Hamilton, Canada
- Thrombosis and Atherosclerosis Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Canada
| | - Guillaume Paré
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, Canada.
- Department of Pathology and Molecular Medicine, McMaster University, Michael G. DeGroote School of Medicine, Hamilton, Canada.
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Canada.
- Thrombosis and Atherosclerosis Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Canada.
| |
Collapse
|
3
|
Aw AJ, McRae J, Rahmani E, Song YS. Highly parameterized polygenic scores tend to overfit to population stratification via random effects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.27.577589. [PMID: 38352303 PMCID: PMC10862757 DOI: 10.1101/2024.01.27.577589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
Polygenic scores (PGSs), increasingly used in clinical settings, frequently include many genetic variants, with performance typically peaking at thousands of variants. Such highly parameterized PGSs often include variants that do not pass a genome-wide significance threshold. We propose a mathematical perspective that renders the effects of many of these non-significant variants random rather than causal, with the randomness capturing population structure. We devise methods to assess variant effect randomness and population stratification bias. Applying these methods to 141 traits from the UK Biobank, we find that, for many PGSs, the effects of non-significant variants are considerably random, with the extent of randomness associated with the degree of overfitting to population structure of the discovery cohort. Our findings explain why highly parameterized PGSs simultaneously have superior cohort-specific performance and limited generalizability, suggesting the critical need for variant randomness tests in PGS evaluation. Supporting code and a dashboard are available at https://github.com/songlab-cal/StratPGS.
Collapse
Affiliation(s)
- Alan J. Aw
- Department of Statistics, University of California, Berkeley
- Center for Computational Biology, University of California, Berkeley
- Artificial Intelligence Laboratory, Illumina Inc
| | - Jeremy McRae
- Artificial Intelligence Laboratory, Illumina Inc
| | - Elior Rahmani
- Department of Computational Medicine, University of California, Los Angeles
| | - Yun S. Song
- Department of Statistics, University of California, Berkeley
- Center for Computational Biology, University of California, Berkeley
- Computer Science Division, University of California, Berkeley
| |
Collapse
|
4
|
Verma SS, Gudiseva HV, Chavali VRM, Salowe RJ, Bradford Y, Guare L, Lucas A, Collins DW, Vrathasha V, Nair RM, Rathi S, Zhao B, He J, Lee R, Zenebe-Gete S, Bowman AS, McHugh CP, Zody MC, Pistilli M, Khachatryan N, Daniel E, Murphy W, Henderer J, Kinzy TG, Iyengar SK, Peachey NS, Taylor KD, Guo X, Chen YDI, Zangwill L, Girkin C, Ayyagari R, Liebmann J, Chuka-Okosa CM, Williams SE, Akafo S, Budenz DL, Olawoye OO, Ramsay M, Ashaye A, Akpa OM, Aung T, Wiggs JL, Ross AG, Cui QN, Addis V, Lehman A, Miller-Ellis E, Sankar PS, Williams SM, Ying GS, Cooke Bailey J, Rotter JI, Weinreb R, Khor CC, Hauser MA, Ritchie MD, O'Brien JM. A multi-cohort genome-wide association study in African ancestry individuals reveals risk loci for primary open-angle glaucoma. Cell 2024; 187:464-480.e10. [PMID: 38242088 DOI: 10.1016/j.cell.2023.12.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 07/24/2023] [Accepted: 12/04/2023] [Indexed: 01/21/2024]
Abstract
Primary open-angle glaucoma (POAG), the leading cause of irreversible blindness worldwide, disproportionately affects individuals of African ancestry. We conducted a genome-wide association study (GWAS) for POAG in 11,275 individuals of African ancestry (6,003 cases; 5,272 controls). We detected 46 risk loci associated with POAG at genome-wide significance. Replication and post-GWAS analyses, including functionally informed fine-mapping, multiple trait co-localization, and in silico validation, implicated two previously undescribed variants (rs1666698 mapping to DBF4P2; rs34957764 mapping to ROCK1P1) and one previously associated variant (rs11824032 mapping to ARHGEF12) as likely causal. For individuals of African ancestry, a polygenic risk score (PRS) for POAG from our mega-analysis (African ancestry individuals) outperformed a PRS from summary statistics of a much larger GWAS derived from European ancestry individuals. This study quantifies the genetic architecture similarities and differences between African and non-African ancestry populations for this blinding disease.
Collapse
Affiliation(s)
- Shefali S Verma
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Harini V Gudiseva
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Venkata R M Chavali
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Rebecca J Salowe
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yuki Bradford
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Lindsay Guare
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Anastasia Lucas
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - David W Collins
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Vrathasha Vrathasha
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Rohini M Nair
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Sonika Rathi
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Bingxin Zhao
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - Jie He
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Roy Lee
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Selam Zenebe-Gete
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Anita S Bowman
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | | | | | - Maxwell Pistilli
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Naira Khachatryan
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Ebenezer Daniel
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | | | - Jeffrey Henderer
- Department of Ophthalmology, Lewis Katz School of Medicine, Temple University, Philadelphia, PA, USA
| | - Tyler G Kinzy
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA; Louis Stokes Cleveland VA Medical Center, Cleveland, OH, USA
| | - Sudha K Iyengar
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA; Louis Stokes Cleveland VA Medical Center, Cleveland, OH, USA
| | - Neal S Peachey
- Louis Stokes Cleveland VA Medical Center, Cleveland, OH, USA; Cole Eye Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Kent D Taylor
- Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Xiuqing Guo
- Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Yii-Der Ida Chen
- Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Linda Zangwill
- Viterbi Family Department of Ophthalmology, Shiley Eye Institute, University of California, San Diego, La Jolla, CA, USA
| | - Christopher Girkin
- Department of Ophthalmology and Visual Sciences, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Radha Ayyagari
- Viterbi Family Department of Ophthalmology, Shiley Eye Institute, University of California, San Diego, La Jolla, CA, USA
| | - Jeffrey Liebmann
- Department of Ophthalmology, Columbia University Medical Center, Columbia University, New York, NY, USA
| | | | - Susan E Williams
- Division of Ophthalmology, Department of Neurosciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Stephen Akafo
- Unit of Ophthalmology, Department of Surgery, University of Ghana Medical School, Accra, Ghana
| | - Donald L Budenz
- Department of Ophthalmology, University of North Carolina, Chapel Hill, NC, USA
| | | | - Michele Ramsay
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Adeyinka Ashaye
- Department of Ophthalmology, University of Ibadan, Ibadan, Nigeria
| | - Onoja M Akpa
- Department of Epidemiology and Medical Statistics, College of Medicine, University of Ibadan, Ibadan, Nigeria
| | - Tin Aung
- Singapore Eye Research Institute, Singapore, Singapore
| | - Janey L Wiggs
- Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA, USA
| | - Ahmara G Ross
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Qi N Cui
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Victoria Addis
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Amanda Lehman
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Eydie Miller-Ellis
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Prithvi S Sankar
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Scott M Williams
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
| | - Gui-Shuang Ying
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jessica Cooke Bailey
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA; Louis Stokes Cleveland VA Medical Center, Cleveland, OH, USA; Department of Pharmacology and Toxicology, Center for Health Disparities, Brody School of Medicine. East Carolina University, Greenville, NC, 27834, USA
| | - Jerome I Rotter
- Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Robert Weinreb
- Viterbi Family Department of Ophthalmology, Shiley Eye Institute, University of California, San Diego, La Jolla, CA, USA
| | | | | | - Marylyn D Ritchie
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Joan M O'Brien
- Scheie Eye Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. joan.o'
| |
Collapse
|
5
|
Irving-Pease EK, Refoyo-Martínez A, Barrie W, Ingason A, Pearson A, Fischer A, Sjögren KG, Halgren AS, Macleod R, Demeter F, Henriksen RA, Vimala T, McColl H, Vaughn AH, Speidel L, Stern AJ, Scorrano G, Ramsøe A, Schork AJ, Rosengren A, Zhao L, Kristiansen K, Iversen AKN, Fugger L, Sudmant PH, Lawson DJ, Durbin R, Korneliussen T, Werge T, Allentoft ME, Sikora M, Nielsen R, Racimo F, Willerslev E. The selection landscape and genetic legacy of ancient Eurasians. Nature 2024; 625:312-320. [PMID: 38200293 PMCID: PMC10781624 DOI: 10.1038/s41586-023-06705-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Accepted: 10/03/2023] [Indexed: 01/12/2024]
Abstract
The Holocene (beginning around 12,000 years ago) encompassed some of the most significant changes in human evolution, with far-reaching consequences for the dietary, physical and mental health of present-day populations. Using a dataset of more than 1,600 imputed ancient genomes1, we modelled the selection landscape during the transition from hunting and gathering, to farming and pastoralism across West Eurasia. We identify key selection signals related to metabolism, including that selection at the FADS cluster began earlier than previously reported and that selection near the LCT locus predates the emergence of the lactase persistence allele by thousands of years. We also find strong selection in the HLA region, possibly due to increased exposure to pathogens during the Bronze Age. Using ancient individuals to infer local ancestry tracts in over 400,000 samples from the UK Biobank, we identify widespread differences in the distribution of Mesolithic, Neolithic and Bronze Age ancestries across Eurasia. By calculating ancestry-specific polygenic risk scores, we show that height differences between Northern and Southern Europe are associated with differential Steppe ancestry, rather than selection, and that risk alleles for mood-related phenotypes are enriched for Neolithic farmer ancestry, whereas risk alleles for diabetes and Alzheimer's disease are enriched for Western hunter-gatherer ancestry. Our results indicate that ancient selection and migration were large contributors to the distribution of phenotypic diversity in present-day Europeans.
Collapse
Affiliation(s)
- Evan K Irving-Pease
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark.
| | - Alba Refoyo-Martínez
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - William Barrie
- GeoGenetics Group, Department of Zoology, University of Cambridge, Cambridge, UK
| | - Andrés Ingason
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Institute of Biological Psychiatry, Mental Health Services, Copenhagen University Hospital, Roskilde, Denmark
| | - Alice Pearson
- Department of Genetics, University of Cambridge, Cambridge, UK
- Department of Zoology, University of Cambridge, Cambridge, UK
| | - Anders Fischer
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Department of Historical Studies, University of Gothenburg, Gothenburg, Sweden
- Sealand Archaeology, Kalundborg, Denmark
| | - Karl-Göran Sjögren
- Department of Historical Studies, University of Gothenburg, Gothenburg, Sweden
| | - Alma S Halgren
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA, USA
| | - Ruairidh Macleod
- GeoGenetics Group, Department of Zoology, University of Cambridge, Cambridge, UK
- UCL Genetics Institute, University College London, London, UK
| | - Fabrice Demeter
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Eco-anthropologie, Muséum national d'Histoire naturelle, CNRS, Université Paris Cité, Musée de l'Homme, Paris, France
| | - Rasmus A Henriksen
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Tharsika Vimala
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Hugh McColl
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Andrew H Vaughn
- Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Leo Speidel
- UCL Genetics Institute, University College London, London, UK
- Ancient Genomics Laboratory, The Francis Crick Institute, London, UK
| | - Aaron J Stern
- Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Gabriele Scorrano
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Abigail Ramsøe
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Andrew J Schork
- Institute of Biological Psychiatry, Mental Health Services, Copenhagen University Hospital, Roskilde, Denmark
- Neurogenomics Division, The Translational Genomics Research Institute (TGEN), Phoenix, AZ, USA
| | - Anders Rosengren
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Institute of Biological Psychiatry, Mental Health Services, Copenhagen University Hospital, Roskilde, Denmark
| | - Lei Zhao
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Kristian Kristiansen
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Department of Historical Studies, University of Gothenburg, Gothenburg, Sweden
| | - Astrid K N Iversen
- Oxford Centre for Neuroinflammation, Nuffield Department of Clinical Neurosciences, John Radcliffe Hospital, University of Oxford, Oxford, UK
- Nuffield Department of Clinical Neurosciences, John Radcliffe Hospital, University of Oxford, Oxford, UK
| | - Lars Fugger
- Oxford Centre for Neuroinflammation, Nuffield Department of Clinical Neurosciences, John Radcliffe Hospital, University of Oxford, Oxford, UK
- Department of Clinical Medicine, Aarhus University Hospital, Aarhus, Denmark
- MRC Human Immunology Unit, John Radcliffe Hospital, University of Oxford, Oxford, UK
| | - Peter H Sudmant
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA, USA
- Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Daniel J Lawson
- Institute of Statistical Sciences, School of Mathematics, University of Bristol, Bristol, UK
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Cambridge, UK
| | - Thorfinn Korneliussen
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Thomas Werge
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
- Institute of Biological Psychiatry, Mental Health Center Sct Hans, Copenhagen University Hospital, Copenhagen, Denmark
| | - Morten E Allentoft
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Trace and Environmental DNA (TrEnD) Laboratory, School of Molecular and Life Science, Curtin University, Perth, Western Australia, Australia
| | - Martin Sikora
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Rasmus Nielsen
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark.
- Departments of Integrative Biology and Statistics, UC Berkeley, Berkeley, CA, USA.
| | - Fernando Racimo
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark.
| | - Eske Willerslev
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark.
- GeoGenetics Group, Department of Zoology, University of Cambridge, Cambridge, UK.
- MARUM Center for Marine Environmental Sciences and Faculty of Geosciences, University of Bremen, Bremen, Germany.
| |
Collapse
|
6
|
Ranglani S, Ward J, Sattar N, Strawbridge RJ, Lyall DM. Testing for associations between HbA1c levels, polygenic risk and brain health in UK Biobank (N = 39 283). Diabetes Obes Metab 2023; 25:3136-3143. [PMID: 37435691 DOI: 10.1111/dom.15207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 06/09/2023] [Accepted: 06/18/2023] [Indexed: 07/13/2023]
Abstract
AIM To investigate whether continuous HbA1c levels and HbA1c-polygenic risk scores (HbA1c-PRS) are significantly associated with worse brain health independent of type 2 diabetes (T2D) diagnosis (vs. not), by examining brain structure and cognitive test score phenotypes. METHODS Using UK Biobank data (n = 39 283), we tested whether HbA1c levels and/or HbA1c-PRS were associated with cognitive test scores and brain imaging phenotypes. We adjusted for confounders of age, sex, Townsend deprivation score, level of education, genotyping chip, eight genetic principal components, smoking, alcohol intake frequency, cholesterol medication, body mass index, T2D and apolipoprotein (APOE) e4 dosage. RESULTS We found an association between higher HbA1c levels and poorer performance on symbol digit substitution scores (standardized beta [β] = -0.022, P = .001) in the fully adjusted model. We also found an association between higher HbA1c levels and worse brain MRI phenotypes of grey matter (GM; fully-adjusted β = -0.026, P < .001), whole brain volume (β = -0.072, P = .0113) and a general factor of frontal lobe GM (β = -0.022, P < .001) in partially and fully adjusted models. HbA1c-PRS were significantly associated with GM volume in the fully adjusted model (β = -0.010, P = .0113); however, when adjusted for HbA1c levels, the association was not significant. CONCLUSIONS Our findings suggest that measured HbA1c is associated with poorer cognitive health, and that HbA1c-PRS do not add significant information to this.
Collapse
Affiliation(s)
- Sanskar Ranglani
- School of Neuroscience and Psychology, University of Glasgow, Glasgow, UK
| | - Joey Ward
- School of Health & Wellbeing, University of Glasgow, Glasgow, UK
| | - Naveed Sattar
- School of Cardiovascular and Metabolic Sciences, University of Glasgow, Glasgow, UK
| | - Rona J Strawbridge
- School of Health & Wellbeing, University of Glasgow, Glasgow, UK
- Division of Cardiovascular Medicine, Department of Medicine Solna, Karolinska Institutet, Solna, Sweden
- HDR-UK, London, UK
| | - Donald M Lyall
- School of Health & Wellbeing, University of Glasgow, Glasgow, UK
| |
Collapse
|
7
|
Gao W, Liu L, Huh E, Gbahou F, Cecon E, Oshima M, Houzé L, Katsonis P, Hegron A, Fan Z, Hou G, Charpentier G, Boissel M, Derhourhi M, Marre M, Balkau B, Froguel P, Scharfmann R, Lichtarge O, Dam J, Bonnefond A, Liu J, Jockers R. Human GLP1R variants affecting GLP1R cell surface expression are associated with impaired glucose control and increased adiposity. Nat Metab 2023; 5:1673-1684. [PMID: 37709961 PMCID: PMC11610247 DOI: 10.1038/s42255-023-00889-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 08/09/2023] [Indexed: 09/16/2023]
Abstract
The glucagon-like peptide 1 receptor (GLP1R) is a major drug target with several agonists being prescribed in individuals with type 2 diabetes and obesity1,2. The impact of genetic variability of GLP1R on receptor function and its association with metabolic traits are unclear with conflicting reports. Here, we show an unexpected diversity of phenotypes ranging from defective cell surface expression to complete or pathway-specific gain of function (GoF) and loss of function (LoF), after performing a functional profiling of 60 GLP1R variants across four signalling pathways. The defective insulin secretion of GLP1R LoF variants is rescued by allosteric GLP1R ligands or high concentrations of exendin-4/semaglutide in INS-1 823/3 cells. Genetic association studies in 200,000 participants from the UK Biobank show that impaired GLP1R cell surface expression contributes to poor glucose control and increased adiposity with increased glycated haemoglobin A1c and body mass index. This study defines impaired GLP1R cell surface expression as a risk factor for traits associated with type 2 diabetes and obesity and provides potential treatment options for GLP1R LoF variant carriers.
Collapse
Affiliation(s)
- Wenwen Gao
- Université Paris Cité, Institut Cochin, INSERM, CNRS, Paris, France
- State Key Laboratory for Diagnosis and Treatment of Severe Zoonotic Infectious Diseases, Key Laboratory for Zoonosis Research of the Ministry of Education, Institute of Zoonosis, and College of Veterinary Medicine, Jilin University, Changchun, China
| | - Lei Liu
- Université Paris Cité, Institut Cochin, INSERM, CNRS, Paris, France
- Cellular Signaling Laboratory, International Research Center for Sensory Biology and Technology of MOST, Key Laboratory of Molecular Biophysics of Ministry of Education, School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Eunna Huh
- Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, TX, USA
| | - Florence Gbahou
- Université Paris Cité, Institut Cochin, INSERM, CNRS, Paris, France
| | - Erika Cecon
- Université Paris Cité, Institut Cochin, INSERM, CNRS, Paris, France
| | - Masaya Oshima
- Université Paris Cité, Institut Cochin, INSERM, CNRS, Paris, France
| | - Ludivine Houzé
- Université Paris Cité, Institut Cochin, INSERM, CNRS, Paris, France
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Alan Hegron
- Université Paris Cité, Institut Cochin, INSERM, CNRS, Paris, France
- Institute for Research in Immunology and Cancer, University of Montreal, Montreal, Quebec, Canada
- Department of Biochemistry and Molecular Medicine, University of Montreal, Montreal, Quebec, Canada
| | - Zhiran Fan
- Cellular Signaling Laboratory, International Research Center for Sensory Biology and Technology of MOST, Key Laboratory of Molecular Biophysics of Ministry of Education, School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Guofei Hou
- Cellular Signaling Laboratory, International Research Center for Sensory Biology and Technology of MOST, Key Laboratory of Molecular Biophysics of Ministry of Education, School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Guillaume Charpentier
- CERITD (Centre d'Étude et de Recherche pour l'Intensification du Traitement du Diabète), Evry, France
| | - Mathilde Boissel
- University of Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Mehdi Derhourhi
- University of Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Michel Marre
- Institut Necker-Enfants Malades, INSERM, Université Paris Cité, Paris, France
- Clinique Ambroise Paré, Neuilly-sur-Seine, France
| | - Beverley Balkau
- Inserm U1018, Center for Research in Epidemiology and Population Health, Villejuif, France
- University Paris-Saclay, University Paris-Sud, Villejuif, France
| | - Philippe Froguel
- University of Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Department of Metabolism, Imperial College London, London, UK
| | | | - Olivier Lichtarge
- Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Julie Dam
- Université Paris Cité, Institut Cochin, INSERM, CNRS, Paris, France
| | - Amélie Bonnefond
- University of Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Department of Metabolism, Imperial College London, London, UK
| | - Jianfeng Liu
- Cellular Signaling Laboratory, International Research Center for Sensory Biology and Technology of MOST, Key Laboratory of Molecular Biophysics of Ministry of Education, School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China.
| | - Ralf Jockers
- Université Paris Cité, Institut Cochin, INSERM, CNRS, Paris, France.
| |
Collapse
|
8
|
Knutson KA, Pan W. MATS: a novel multi-ancestry transcriptome-wide association study to account for heterogeneity in the effects of cis-regulated gene expression on complex traits. Hum Mol Genet 2023; 32:1237-1251. [PMID: 36179104 PMCID: PMC10077507 DOI: 10.1093/hmg/ddac247] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 09/16/2022] [Accepted: 09/28/2022] [Indexed: 01/16/2023] Open
Abstract
The Transcriptome-Wide Association Study (TWAS) is a widely used approach which integrates gene expression and Genome Wide Association Study (GWAS) data to study the role of cis-regulated gene expression (GEx) in complex traits. However, the genetic architecture of GEx varies across populations, and recent findings point to possible ancestral heterogeneity in the effects of GEx on complex traits, which may be amplified in TWAS by modeling GEx as a function of cis-eQTLs. Here, we present a novel extension to TWAS to account for heterogeneity in the effects of cis-regulated GEx which are correlated with ancestry. Our proposed Multi-Ancestry TwaS (MATS) framework jointly analyzes samples from multiple populations and distinguishes between shared, ancestry-specific and/or subject-specific expression-trait associations. As such, MATS amplifies power to detect shared GEx associations over ancestry-stratified TWAS through increased sample sizes, and facilitates the detection of genes with subgroup-specific associations which may be masked by standard TWAS. Our simulations highlight the improved Type-I error conservation and power of MATS compared with competing approaches. Our real data applications to Alzheimer's disease (AD) case-control genotypes from the Alzheimer's Disease Sequencing Project (ADSP) and continuous phenotypes from the UK Biobank (UKBB) identify a number of unique gene-trait associations which were not discovered through standard and/or ancestry-stratified TWAS. Ultimately, these findings promote MATS as a powerful method for detecting and estimating significant gene expression effects on complex traits within multi-ancestry cohorts and corroborates the mounting evidence for inter-population heterogeneity in gene-trait associations.
Collapse
Affiliation(s)
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
9
|
Duchen D, Vergara C, Thio CL, Kundu P, Chatterjee N, Thomas DL, Wojcik GL, Duggal P. Pathogen exposure misclassification can bias association signals in GWAS of infectious diseases when using population-based common control subjects. Am J Hum Genet 2023; 110:336-348. [PMID: 36649706 PMCID: PMC9943744 DOI: 10.1016/j.ajhg.2022.12.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 12/20/2022] [Indexed: 01/18/2023] Open
Abstract
Genome-wide association studies (GWASs) have been performed to identify host genetic factors for a range of phenotypes, including for infectious diseases. The use of population-based common control subjects from biobanks and extensive consortia is a valuable resource to increase sample sizes in the identification of associated loci with minimal additional expense. Non-differential misclassification of the outcome has been reported when the control subjects are not well characterized, which often attenuates the true effect size. However, for infectious diseases the comparison of affected subjects to population-based common control subjects regardless of pathogen exposure can also result in selection bias. Through simulated comparisons of pathogen-exposed cases and population-based common control subjects, we demonstrate that not accounting for pathogen exposure can result in biased effect estimates and spurious genome-wide significant signals. Further, the observed association can be distorted depending upon strength of the association between a locus and pathogen exposure and the prevalence of pathogen exposure. We also used a real data example from the hepatitis C virus (HCV) genetic consortium comparing HCV spontaneous clearance to persistent infection with both well-characterized control subjects and population-based common control subjects from the UK Biobank. We find biased effect estimates for known HCV clearance-associated loci and potentially spurious HCV clearance associations. These findings suggest that the choice of control subjects is especially important for infectious diseases or outcomes that are conditional upon environmental exposures.
Collapse
Affiliation(s)
- Dylan Duchen
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Candelaria Vergara
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Chloe L Thio
- Division of Infectious Diseases, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
| | - Prosenjit Kundu
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - David L Thomas
- Division of Infectious Diseases, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
| | - Genevieve L Wojcik
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Priya Duggal
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA.
| |
Collapse
|
10
|
Connally NJ, Nazeen S, Lee D, Shi H, Stamatoyannopoulos J, Chun S, Cotsapas C, Cassa CA, Sunyaev SR. The missing link between genetic association and regulatory function. eLife 2022; 11:e74970. [PMID: 36515579 PMCID: PMC9842386 DOI: 10.7554/elife.74970] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 12/02/2022] [Indexed: 12/15/2022] Open
Abstract
The genetic basis of most traits is highly polygenic and dominated by non-coding alleles. It is widely assumed that such alleles exert small regulatory effects on the expression of cis-linked genes. However, despite the availability of gene expression and epigenomic datasets, few variant-to-gene links have emerged. It is unclear whether these sparse results are due to limitations in available data and methods, or to deficiencies in the underlying assumed model. To better distinguish between these possibilities, we identified 220 gene-trait pairs in which protein-coding variants influence a complex trait or its Mendelian cognate. Despite the presence of expression quantitative trait loci near most GWAS associations, by applying a gene-based approach we found limited evidence that the baseline expression of trait-related genes explains GWAS associations, whether using colocalization methods (8% of genes implicated), transcription-wide association (2% of genes implicated), or a combination of regulatory annotations and distance (4% of genes implicated). These results contradict the hypothesis that most complex trait-associated variants coincide with homeostatic expression QTLs, suggesting that better models are needed. The field must confront this deficit and pursue this 'missing regulation.'
Collapse
Affiliation(s)
- Noah J Connally
- Department of Biomedical Informatics, Harvard Medical SchoolBostonUnited States
- Brigham and Women’s Hospital, Division of Genetics, Harvard Medical SchoolBostonUnited States
- Program in Medical and Population Genetics, Broad Institute of MIT and HarvardCambridgeUnited States
| | - Sumaiya Nazeen
- Department of Biomedical Informatics, Harvard Medical SchoolBostonUnited States
- Brigham and Women’s Hospital, Division of Genetics, Harvard Medical SchoolBostonUnited States
- Brigham and Women’s Hospital, Department of Neurology, Harvard Medical SchoolBostonUnited States
| | - Daniel Lee
- Department of Biomedical Informatics, Harvard Medical SchoolBostonUnited States
- Brigham and Women’s Hospital, Division of Genetics, Harvard Medical SchoolBostonUnited States
- Program in Medical and Population Genetics, Broad Institute of MIT and HarvardCambridgeUnited States
| | - Huwenbo Shi
- Program in Medical and Population Genetics, Broad Institute of MIT and HarvardCambridgeUnited States
- Department of Epidemiology, Harvard T.H. Chan School of Public HealthBostonUnited States
| | | | - Sung Chun
- Division of Pulmonary Medicine, Boston Children’s HospitalBostonUnited States
| | - Chris Cotsapas
- Program in Medical and Population Genetics, Broad Institute of MIT and HarvardCambridgeUnited States
- Department of Neurology, Yale Medical SchoolNew HavenUnited States
- Department of Genetics, Yale Medical SchoolNew HavenUnited States
| | - Christopher A Cassa
- Brigham and Women’s Hospital, Division of Genetics, Harvard Medical SchoolBostonUnited States
- Program in Medical and Population Genetics, Broad Institute of MIT and HarvardCambridgeUnited States
| | - Shamil R Sunyaev
- Department of Biomedical Informatics, Harvard Medical SchoolBostonUnited States
- Brigham and Women’s Hospital, Division of Genetics, Harvard Medical SchoolBostonUnited States
- Program in Medical and Population Genetics, Broad Institute of MIT and HarvardCambridgeUnited States
| |
Collapse
|
11
|
Qin X, Chiang CWK, Gaggiotti OE. KLFDAPC: a supervised machine learning approach for spatial genetic structure analysis. Brief Bioinform 2022; 23:bbac202. [PMID: 35649387 PMCID: PMC9294434 DOI: 10.1093/bib/bbac202] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 04/05/2022] [Accepted: 04/29/2022] [Indexed: 12/30/2022] Open
Abstract
Geographic patterns of human genetic variation provide important insights into human evolution and disease. A commonly used tool to detect and describe them is principal component analysis (PCA) or the supervised linear discriminant analysis of principal components (DAPC). However, genetic features produced from both approaches could fail to correctly characterize population structure for complex scenarios involving admixture. In this study, we introduce Kernel Local Fisher Discriminant Analysis of Principal Components (KLFDAPC), a supervised non-linear approach for inferring individual geographic genetic structure that could rectify the limitations of these approaches by preserving the multimodal space of samples. We tested the power of KLFDAPC to infer population structure and to predict individual geographic origin using neural networks. Simulation results showed that KLFDAPC has higher discriminatory power than PCA and DAPC. The application of our method to empirical European and East Asian genome-wide genetic datasets indicated that the first two reduced features of KLFDAPC correctly recapitulated the geography of individuals and significantly improved the accuracy of predicting individual geographic origin when compared to PCA and DAPC. Therefore, KLFDAPC can be useful for geographic ancestry inference, design of genome scans and correction for spatial stratification in GWAS that link genes to adaptation or disease susceptibility.
Collapse
Affiliation(s)
- Xinghu Qin
- Centre for Biological Diversity, Sir Harold Mitchell Building, University of St Andrews, Fife, KY16 9TF, UK
| | - Charleston W K Chiang
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine & Department of Quantitative and Computational Biology, University of Southern California, USA
| | - Oscar E Gaggiotti
- Centre for Biological Diversity, Sir Harold Mitchell Building, University of St Andrews, Fife, KY16 9TF, UK
| |
Collapse
|
12
|
Blood Lines of the British People. Blood 2022. [DOI: 10.1017/9781009205528.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
13
|
Seal S, Datta A, Basu S. Efficient estimation of SNP heritability using Gaussian predictive process in large scale cohort studies. PLoS Genet 2022; 18:e1010151. [PMID: 35442943 PMCID: PMC9060362 DOI: 10.1371/journal.pgen.1010151] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2021] [Revised: 05/02/2022] [Accepted: 03/16/2022] [Indexed: 12/15/2022] Open
Abstract
With the advent of high throughput genetic data, there have been attempts to estimate heritability from genome-wide SNP data on a cohort of distantly related individuals using linear mixed model (LMM). Fitting such an LMM in a large scale cohort study, however, is tremendously challenging due to its high dimensional linear algebraic operations. In this paper, we propose a new method named PredLMM approximating the aforementioned LMM motivated by the concepts of genetic coalescence and Gaussian predictive process. PredLMM has substantially better computational complexity than most of the existing LMM based methods and thus, provides a fast alternative for estimating heritability in large scale cohort studies. Theoretically, we show that under a model of genetic coalescence, the limiting form of our approximation is the celebrated predictive process approximation of large Gaussian process likelihoods that has well-established accuracy standards. We illustrate our approach with extensive simulation studies and use it to estimate the heritability of multiple quantitative traits from the UK Biobank cohort.
Collapse
Affiliation(s)
- Souvik Seal
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| | - Abhirup Datta
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Saonli Basu
- Department of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
- * E-mail:
| |
Collapse
|
14
|
Tank R, Ward J, Flegal KE, Smith DJ, Bailey MES, Cavanagh J, Lyall DM. Association between polygenic risk for Alzheimer's disease, brain structure and cognitive abilities in UK Biobank. Neuropsychopharmacology 2022; 47:564-569. [PMID: 34621014 PMCID: PMC8674313 DOI: 10.1038/s41386-021-01190-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 08/05/2021] [Accepted: 09/14/2021] [Indexed: 02/07/2023]
Abstract
Previous studies testing associations between polygenic risk for late-onset Alzheimer's disease (LOAD-PGR) and brain magnetic resonance imaging (MRI) measures have been limited by small samples and inconsistent consideration of potential confounders. This study investigates whether higher LOAD-PGR is associated with differences in structural brain imaging and cognitive values in a relatively large sample of non-demented, generally healthy adults (UK Biobank). Summary statistics were used to create PGR scores for n = 32,790 participants using LDpred. Outcomes included 12 structural MRI volumes and 6 concurrent cognitive measures. Models were adjusted for age, sex, body mass index, genotyping chip, 8 genetic principal components, lifetime smoking, apolipoprotein (APOE) e4 genotype and socioeconomic deprivation. We tested for statistical interactions between APOE e4 allele dose and LOAD-PGR vs. all outcomes. In fully adjusted models, LOAD-PGR was associated with worse fluid intelligence (standardised beta [β] = -0.080 per LOAD-PGR standard deviation, p = 0.002), matrix completion (β = -0.102, p = 0.003), smaller left hippocampal total (β = -0.118, p = 0.002) and body (β = -0.069, p = 0.002) volumes, but not other hippocampal subdivisions. There were no significant APOE x LOAD-PGR score interactions for any outcomes in fully adjusted models. This is the largest study to date investigating LOAD-PGR and non-demented structural brain MRI and cognition phenotypes. LOAD-PGR was associated with smaller hippocampal volumes and aspects of cognitive ability in healthy adults and could supplement APOE status in risk stratification of cognitive impairment/LOAD.
Collapse
Affiliation(s)
- Rachana Tank
- Institute of Health and Wellbeing, University of Glasgow, Glasgow, UK
| | - Joey Ward
- Institute of Health and Wellbeing, University of Glasgow, Glasgow, UK
| | - Kristin E Flegal
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK
| | - Daniel J Smith
- Centre for Clinical Brain Sciences, Division of Psychiatry, University of Edinburgh, Edinburgh, UK
| | - Mark E S Bailey
- School of Life Sciences, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, UK
| | - Jonathan Cavanagh
- Institute of Infection, Immunity & Inflammation, University of Glasgow, Glasgow, UK
| | - Donald M Lyall
- Institute of Health and Wellbeing, University of Glasgow, Glasgow, UK.
| |
Collapse
|
15
|
Large-scale migration into Britain during the Middle to Late Bronze Age. Nature 2021; 601:588-594. [PMID: 34937049 DOI: 10.1038/s41586-021-04287-4] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Accepted: 11/29/2021] [Indexed: 11/08/2022]
Abstract
Present-day people from England and Wales harbour more ancestry derived from Early European Farmers (EEF) than people of the Early Bronze Age1. To understand this, we generated genome-wide data from 793 individuals, increasing data from the Middle to Late Bronze and Iron Age in Britain by 12-fold, and Western and Central Europe by 3.5-fold. Between 1000 and 875 BC, EEF ancestry increased in southern Britain (England and Wales) but not northern Britain (Scotland) due to incorporation of migrants who arrived at this time and over previous centuries, and who were genetically most similar to ancient individuals from France. These migrants contributed about half the ancestry of Iron Age people of England and Wales, thereby creating a plausible vector for the spread of early Celtic languages into Britain. These patterns are part of a broader trend of EEF ancestry becoming more similar across central and western Europe in the Middle to Late Bronze Age, coincident with archaeological evidence of intensified cultural exchange2-6. There was comparatively less gene flow from continental Europe during the Iron Age, and Britain's independent genetic trajectory is also reflected in the rise of the allele conferring lactase persistence to ~50% by this time compared to ~7% in central Europe where it rose rapidly in frequency only a millennium later. This suggests that dairy products were used in qualitatively different ways in Britain and in central Europe over this period.
Collapse
|
16
|
Márquez-Luna C, Gazal S, Loh PR, Kim SS, Furlotte N, Auton A, Price AL. Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets. Nat Commun 2021; 12:6052. [PMID: 34663819 PMCID: PMC8523709 DOI: 10.1038/s41467-021-25171-9] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 07/16/2021] [Indexed: 12/23/2022] Open
Abstract
Polygenic risk prediction is a widely investigated topic because of its promising clinical applications. Genetic variants in functional regions of the genome are enriched for complex trait heritability. Here, we introduce a method for polygenic prediction, LDpred-funct, that leverages trait-specific functional priors to increase prediction accuracy. We fit priors using the recently developed baseline-LD model, including coding, conserved, regulatory, and LD-related annotations. We analytically estimate posterior mean causal effect sizes and then use cross-validation to regularize these estimates, improving prediction accuracy for sparse architectures. We applied LDpred-funct to predict 21 highly heritable traits in the UK Biobank (avg N = 373 K as training data). LDpred-funct attained a +4.6% relative improvement in average prediction accuracy (avg prediction R2 = 0.144; highest R2 = 0.413 for height) compared to SBayesR (the best method that does not incorporate functional information). For height, meta-analyzing training data from UK Biobank and 23andMe cohorts (N = 1107 K) increased prediction R2 to 0.431. Our results show that incorporating functional priors improves polygenic prediction accuracy, consistent with the functional architecture of complex traits.
Collapse
Affiliation(s)
- Carla Márquez-Luna
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Charles R. Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Steven Gazal
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Charles R. Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Po-Ru Loh
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Samuel S Kim
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | | | - Alkes L Price
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
17
|
Demanelis K, Tong L, Pierce BL. Genetically Increased Telomere Length and Aging-Related Traits in the U.K. Biobank. J Gerontol A Biol Sci Med Sci 2021; 76:15-22. [PMID: 31603979 DOI: 10.1093/gerona/glz240] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Indexed: 12/28/2022] Open
Abstract
Telomere length (TL) shortens over time in most human cell types and is a potential biomarker of aging. However, the causal association of TL on physical and cognitive traits that decline with age has not been extensively examined in middle-aged adults. Using a Mendelian randomization (MR) approach, we utilized genetically increased TL (GI-TL) to estimate the impact of TL on aging-related traits among U.K. Biobank (UKB) participants (age 40-69 years). We manually curated 53 aging-related traits from the UKB and restricted to unrelated participants of British ancestry (n = 337,522). We estimated GI-TL as a linear combination of nine TL-associated single nucleotide polymorphisms (SNPs), each weighted by its previously-reported association with leukocyte TL. Regression models were used to assess the associations between GI-TL and each trait. We obtained MR estimates using the two-sample inverse variance weighted (IVW) approach. We identified six age-related traits associated with GI-TL (Bonferroni-corrected threshold p < .001): pulse pressure (PP) (p = 5.2 × 10-14), systolic blood pressure (SBP) (p = 2.9 × 10-15), diastolic blood pressure (DBP) (p = 5.5 × 10-6), hypertension (p = 5.5 × 10-11), forced expiratory volume (FEV1) (p = .0001), and forced vital capacity (FVC) (p = 3.8 × 10-6). Under MR assumptions, one standard deviation increase in TL (~1,200 base pairs) increased PP, SBP, and DBP by 1.5, 2.3, and 0.8 mmHg, respectively, while FEV1 and FVC increased by 34.7 and 52.2 mL, respectively. The observed associations appear unlikely to be due to selection bias based on analyses including inverse probability weights and analyses of simulated data. These findings suggest that longer TL increases pulmonary function and blood pressure traits among middle-aged UKB participants.
Collapse
Affiliation(s)
| | - Lin Tong
- Department of Public Health Sciences
| | - Brandon L Pierce
- Department of Public Health Sciences.,Department of Human Genetics, University of Chicago, Illinois.,University of Chicago Comprehensive Cancer Center University of Chicago, University of Chicago, Illinois
| |
Collapse
|
18
|
Xu H, Zhen Q, Bai M, Fang L, Zhang Y, Li B, Ge H, Moon S, Chen W, Fu W, Xu Q, Zhou Y, Yu Y, Lin L, Yong L, Zhang T, Chen S, Liu S, Zhang H, Chen R, Cao L, Zhang Y, Zhang R, Yang H, Hu X, Akey JM, Jin X, Sun L. Deep sequencing of 1320 genes reveals the landscape of protein-truncating variants and their contribution to psoriasis in 19,973 Chinese individuals. Genome Res 2021; 31:1150-1158. [PMID: 34155038 PMCID: PMC8256863 DOI: 10.1101/gr.267963.120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2020] [Accepted: 05/10/2021] [Indexed: 12/30/2022]
Abstract
Protein-truncating variants (PTVs) have important impacts on phenotype diversity and disease. However, their population genetics characteristics in more globally diverse populations are not well defined. Here, we describe patterns of PTVs in 1320 genes sequenced in 10,539 healthy controls and 9434 patients with psoriasis, all of Han Chinese ancestry. We identify 8720 PTVs, of which 77% are novel, and estimate 88% of all PTVs are deleterious and subject to purifying selection. Furthermore, we show that individuals with psoriasis have a significantly higher burden of PTVs compared to controls (P = 0.02). Finally, we identified 18 PTVs in 14 genes with unusually high levels of population differentiation, consistent with the action of local adaptation. Our study provides insights into patterns and consequences of PTVs.
Collapse
Affiliation(s)
- Huixin Xu
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei 230032, China
- School of Medicine, South China University of Technology, Guangzhou 510006, China
| | - Qi Zhen
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei 230032, China
- Key Laboratory of Dermatology (Anhui Medical University), Ministry of Education, Anhui, Hefei 230032, China
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei 230032, China
- Anhui Provincial Institute of Translational Medicine, Hefei 230032, China
| | - Mingzhou Bai
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei 230032, China
| | - Lin Fang
- Guangdong Engineering Research Center of Life Sciences Bigdata, Shenzhen 518083, China
- Department of Biology, University of Copenhagen, Copenhagen 2200, Denmark
| | - Yong Zhang
- Guangdong Engineering Research Center of Life Sciences Bigdata, Shenzhen 518083, China
- Department of Biology, University of Copenhagen, Copenhagen 2200, Denmark
| | - Bao Li
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei 230032, China
- Key Laboratory of Dermatology (Anhui Medical University), Ministry of Education, Anhui, Hefei 230032, China
| | - Huiyao Ge
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei 230032, China
- Key Laboratory of Dermatology (Anhui Medical University), Ministry of Education, Anhui, Hefei 230032, China
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei 230032, China
- Anhui Provincial Institute of Translational Medicine, Hefei 230032, China
| | - Sunjin Moon
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08540, USA
| | - Weiwei Chen
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei 230032, China
- Key Laboratory of Dermatology (Anhui Medical University), Ministry of Education, Anhui, Hefei 230032, China
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei 230032, China
- Anhui Provincial Institute of Translational Medicine, Hefei 230032, China
| | - Wenqing Fu
- Microsoft Corporation, Redmond, Washington 98052, USA
| | - Qiongqiong Xu
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei 230032, China
- Key Laboratory of Dermatology (Anhui Medical University), Ministry of Education, Anhui, Hefei 230032, China
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei 230032, China
- Anhui Provincial Institute of Translational Medicine, Hefei 230032, China
| | - Yuwen Zhou
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yafeng Yu
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei 230032, China
- Key Laboratory of Dermatology (Anhui Medical University), Ministry of Education, Anhui, Hefei 230032, China
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei 230032, China
- Anhui Provincial Institute of Translational Medicine, Hefei 230032, China
| | - Long Lin
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Liang Yong
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei 230032, China
- Key Laboratory of Dermatology (Anhui Medical University), Ministry of Education, Anhui, Hefei 230032, China
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei 230032, China
- Anhui Provincial Institute of Translational Medicine, Hefei 230032, China
| | - Tao Zhang
- Department of Biology, University of Copenhagen, Copenhagen 2200, Denmark
| | - Shirui Chen
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei 230032, China
- Key Laboratory of Dermatology (Anhui Medical University), Ministry of Education, Anhui, Hefei 230032, China
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei 230032, China
- Anhui Provincial Institute of Translational Medicine, Hefei 230032, China
| | - Siyang Liu
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen 510006, Guangdong, China
| | - Hui Zhang
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei 230032, China
- Key Laboratory of Dermatology (Anhui Medical University), Ministry of Education, Anhui, Hefei 230032, China
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei 230032, China
- Anhui Provincial Institute of Translational Medicine, Hefei 230032, China
| | - Ruoyan Chen
- The First Affiliated Hospital of Shenzhen University, Shenzhen Second People's Hospital, Shenzhen 518035, China
| | - Lu Cao
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei 230032, China
- Key Laboratory of Dermatology (Anhui Medical University), Ministry of Education, Anhui, Hefei 230032, China
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei 230032, China
- Anhui Provincial Institute of Translational Medicine, Hefei 230032, China
| | - Yuanwei Zhang
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei 230032, China
| | - Ruixue Zhang
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei 230032, China
- Key Laboratory of Dermatology (Anhui Medical University), Ministry of Education, Anhui, Hefei 230032, China
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei 230032, China
- Anhui Provincial Institute of Translational Medicine, Hefei 230032, China
| | - Huanjie Yang
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei 230032, China
| | - Xia Hu
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei 230032, China
- Key Laboratory of Dermatology (Anhui Medical University), Ministry of Education, Anhui, Hefei 230032, China
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei 230032, China
- Anhui Provincial Institute of Translational Medicine, Hefei 230032, China
| | - Joshua M Akey
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08540, USA
| | - Xin Jin
- School of Medicine, South China University of Technology, Guangzhou 510006, China
| | - Liangdan Sun
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei 230032, China
- Key Laboratory of Dermatology (Anhui Medical University), Ministry of Education, Anhui, Hefei 230032, China
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei 230032, China
- Anhui Provincial Institute of Translational Medicine, Hefei 230032, China
| |
Collapse
|
19
|
Carress H, Lawson DJ, Elhaik E. Population genetic considerations for using biobanks as international resources in the pandemic era and beyond. BMC Genomics 2021; 22:351. [PMID: 34001009 PMCID: PMC8127217 DOI: 10.1186/s12864-021-07618-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 04/14/2021] [Indexed: 12/11/2022] Open
Abstract
The past years have seen the rise of genomic biobanks and mega-scale meta-analysis of genomic data, which promises to reveal the genetic underpinnings of health and disease. However, the over-representation of Europeans in genomic studies not only limits the global understanding of disease risk but also inhibits viable research into the genomic differences between carriers and patients. Whilst the community has agreed that more diverse samples are required, it is not enough to blindly increase diversity; the diversity must be quantified, compared and annotated to lead to insight. Genetic annotations from separate biobanks need to be comparable and computable and to operate without access to raw data due to privacy concerns. Comparability is key both for regular research and to allow international comparison in response to pandemics. Here, we evaluate the appropriateness of the most common genomic tools used to depict population structure in a standardized and comparable manner. The end goal is to reduce the effects of confounding and learn from genuine variation in genetic effects on phenotypes across populations, which will improve the value of biobanks (locally and internationally), increase the accuracy of association analyses and inform developmental efforts.
Collapse
Affiliation(s)
- Hannah Carress
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK
| | - Daniel John Lawson
- School of Mathematics and Integrative Epidemiology Unit, University of Bristol, Bristol, UK
| | - Eran Elhaik
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK. .,Department of Biology, Lund University, Lund, Sweden.
| |
Collapse
|
20
|
Juan L, Wang Y, Jiang J, Yang Q, Wang G, Wang Y. Evaluating individual genome similarity with a topic model. Bioinformatics 2020; 36:4757-4764. [PMID: 32573702 DOI: 10.1093/bioinformatics/btaa583] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2019] [Revised: 04/30/2020] [Accepted: 06/15/2020] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Evaluating genome similarity among individuals is an essential step in data analysis. Advanced sequencing technology detects more and rarer variants for massive individual genomes, thus enabling individual-level genome similarity evaluation. However, the current methodologies, such as the principal component analysis (PCA), lack the capability to fully leverage rare variants and are also difficult to interpret in terms of population genetics. RESULTS Here, we introduce a probabilistic topic model, latent Dirichlet allocation, to evaluate individual genome similarity. A total of 2535 individuals from the 1000 Genomes Project (KGP) were used to demonstrate our method. Various aspects of variant choice and model parameter selection were studied. We found that relatively rare (0.001<allele frequency < 0.175) and sparse (average interval > 20 000 bp) variants are more efficient for genome similarity evaluation. At least 100 000 such variants are necessary. In our results, the populations show significantly less mixed and more cohesive visualization than the PCA results. The global similarities among the KGP genomes are consistent with known geographical, historical and cultural factors. AVAILABILITY AND IMPLEMENTATION The source code and data access are available at: https://github.com/lrjuan/LDA_genome. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Yongtian Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | | | - Qi Yang
- School of Life Science and Technology
| | - Guohua Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
21
|
Mathieson I. Human adaptation over the past 40,000 years. Curr Opin Genet Dev 2020; 62:97-104. [PMID: 32745952 PMCID: PMC7484260 DOI: 10.1016/j.gde.2020.06.003] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 05/10/2020] [Accepted: 06/01/2020] [Indexed: 02/07/2023]
Abstract
Over the past few years several methodological and data-driven advances have greatly improved our ability to robustly detect genomic signatures of selection in humans. New methods applied to large samples of present-day genomes provide increased power, while ancient DNA allows precise estimation of timing and tempo. However, despite these advances, we are still limited in our ability to translate these signatures into understanding about which traits were actually under selection, and why. Combining information from different populations and timescales may allow interpretation of selective sweeps. Other modes of selection have proved more difficult to detect. In particular, despite strong evidence of the polygenicity of most human traits, evidence for polygenic selection is weak, and its importance in recent human evolution remains unclear. Balancing selection and archaic introgression seem important for the maintenance of potentially adaptive immune diversity, but perhaps less so for other traits.
Collapse
Affiliation(s)
- Iain Mathieson
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, United States.
| |
Collapse
|
22
|
Genotyping Array Design and Data Quality Control in the Million Veteran Program. Am J Hum Genet 2020; 106:535-548. [PMID: 32243820 DOI: 10.1016/j.ajhg.2020.03.004] [Citation(s) in RCA: 127] [Impact Index Per Article: 25.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Accepted: 03/06/2020] [Indexed: 02/07/2023] Open
Abstract
The Million Veteran Program (MVP), initiated by the Department of Veterans Affairs (VA), aims to collect biosamples with consent from at least one million veterans. Presently, blood samples have been collected from over 800,000 enrolled participants. The size and diversity of the MVP cohort, as well as the availability of extensive VA electronic health records, make it a promising resource for precision medicine. MVP is conducting array-based genotyping to provide a genome-wide scan of the entire cohort, in parallel with whole-genome sequencing, methylation, and other 'omics assays. Here, we present the design and performance of the MVP 1.0 custom Axiom array, which was designed and developed as a single assay to be used across the multi-ethnic MVP cohort. A unified genetic quality-control analysis was developed and conducted on an initial tranche of 485,856 individuals, leading to a high-quality dataset of 459,777 unique individuals. 668,418 genetic markers passed quality control and showed high-quality genotypes not only on common variants but also on rare variants. We confirmed that, with non-European individuals making up nearly 30%, MVP's substantial ancestral diversity surpasses that of other large biobanks. We also demonstrated the quality of the MVP dataset by replicating established genetic associations with height in European Americans and African Americans ancestries. This current dataset has been made available to approved MVP researchers for genome-wide association studies and other downstream analyses. Further data releases will be available for analysis as recruitment at the VA continues and the cohort expands both in size and diversity.
Collapse
|
23
|
Sohail M, Maier RM, Ganna A, Bloemendal A, Martin AR, Turchin MC, Chiang CWK, Hirschhorn J, Daly MJ, Patterson N, Neale B, Mathieson I, Reich D, Sunyaev SR. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife 2019; 8:e39702. [PMID: 30895926 PMCID: PMC6428571 DOI: 10.7554/elife.39702] [Citation(s) in RCA: 222] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Accepted: 01/15/2019] [Indexed: 01/03/2023] Open
Abstract
Genetic predictions of height differ among human populations and these differences have been interpreted as evidence of polygenic adaptation. These differences were first detected using SNPs genome-wide significantly associated with height, and shown to grow stronger when large numbers of sub-significant SNPs were included, leading to excitement about the prospect of analyzing large fractions of the genome to detect polygenic adaptation for multiple traits. Previous studies of height have been based on SNP effect size measurements in the GIANT Consortium meta-analysis. Here we repeat the analyses in the UK Biobank, a much more homogeneously designed study. We show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population stratification. More generally, our results imply that typical constructions of polygenic scores are sensitive to population stratification and that population-level differences should be interpreted with caution. Editorial note This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).
Collapse
Affiliation(s)
- Mashaal Sohail
- Division of Genetics, Department of MedicineBrigham and Women’s Hospital and Harvard Medical SchoolBostonUnited States
- Department of Biomedical InformaticsHarvard Medical SchoolBostonUnited States
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
| | - Robert M Maier
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
- Stanley Center for Psychiatric ResearchBroad Institute of MIT and HarvardCambridgeUnited States
- Analytical and Translational Genetics UnitMassachusetts General HospitalBostonUnited States
| | - Andrea Ganna
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
- Stanley Center for Psychiatric ResearchBroad Institute of MIT and HarvardCambridgeUnited States
- Analytical and Translational Genetics UnitMassachusetts General HospitalBostonUnited States
- Department of Medical Epidemiology and BiostatisticsKarolinska InstitutetStockholmSweden
- Institute for Molecular Medicine FinlandUniversity of HelsinkiHelsinkiFinland
| | - Alex Bloemendal
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
- Stanley Center for Psychiatric ResearchBroad Institute of MIT and HarvardCambridgeUnited States
- Analytical and Translational Genetics UnitMassachusetts General HospitalBostonUnited States
| | - Alicia R Martin
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
- Stanley Center for Psychiatric ResearchBroad Institute of MIT and HarvardCambridgeUnited States
- Analytical and Translational Genetics UnitMassachusetts General HospitalBostonUnited States
| | - Michael C Turchin
- Center for Computational Molecular BiologyBrown UniversityProvidenceUnited States
- Department of Ecology and Evolutionary BiologyBrown UniversityProvidenceUnited States
| | - Charleston WK Chiang
- Department of Preventive Medicine, Center for Genetic Epidemiology, Keck School of MedicineUniversity of Southern CaliforniaLos AngelesUnited States
| | - Joel Hirschhorn
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
- Departments of Pediatrics and GeneticsHarvard Medical SchoolBostonUnited States
- Division of Endocrinology and Center for Basic and Translational Obesity ResearchBoston Children’s HospitalBostonUnited States
| | - Mark J Daly
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
- Stanley Center for Psychiatric ResearchBroad Institute of MIT and HarvardCambridgeUnited States
- Analytical and Translational Genetics UnitMassachusetts General HospitalBostonUnited States
- Institute for Molecular Medicine FinlandUniversity of HelsinkiHelsinkiFinland
| | - Nick Patterson
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
- Department of GeneticsHarvard Medical SchoolBostonUnited States
| | - Benjamin Neale
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
- Stanley Center for Psychiatric ResearchBroad Institute of MIT and HarvardCambridgeUnited States
- Analytical and Translational Genetics UnitMassachusetts General HospitalBostonUnited States
| | - Iain Mathieson
- Department of Genetics, Perelman School of MedicineUniversity of PennsylvaniaPhiladelphiaUnited States
| | - David Reich
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
- Department of GeneticsHarvard Medical SchoolBostonUnited States
- Howard Hughes Medical Institute, Harvard Medical SchoolBostonUnited States
| | - Shamil R Sunyaev
- Department of Biomedical InformaticsHarvard Medical SchoolBostonUnited States
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
- Division of Genetics, Department of MedicineBrigham and Women’s Hospital and Harvard Medical SchoolBostonUnited States
| |
Collapse
|
24
|
Schoech AP, Jordan DM, Loh PR, Gazal S, O'Connor LJ, Balick DJ, Palamara PF, Finucane HK, Sunyaev SR, Price AL. Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat Commun 2019; 10:790. [PMID: 30770844 PMCID: PMC6377669 DOI: 10.1038/s41467-019-08424-6] [Citation(s) in RCA: 76] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 01/09/2019] [Indexed: 02/06/2023] Open
Abstract
Understanding the role of rare variants is important in elucidating the genetic basis of human disease. Negative selection can cause rare variants to have larger per-allele effect sizes than common variants. Here, we develop a method to estimate the minor allele frequency (MAF) dependence of SNP effect sizes. We use a model in which per-allele effect sizes have variance proportional to [p(1 - p)]α, where p is the MAF and negative values of α imply larger effect sizes for rare variants. We estimate α for 25 UK Biobank diseases and complex traits. All traits produce negative α estimates, with best-fit mean of -0.38 (s.e. 0.02) across traits. Despite larger rare variant effect sizes, rare variants (MAF < 1%) explain less than 10% of total SNP-heritability for most traits analyzed. Using evolutionary modeling and forward simulations, we validate the α model of MAF-dependent trait effects and assess plausible values of relevant evolutionary parameters.
Collapse
Affiliation(s)
- Armin P Schoech
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA.
| | - Daniel M Jordan
- Charles R. Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
| | - Po-Ru Loh
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, 02115, MA, USA
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA
| | - Luke J O'Connor
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA
| | - Daniel J Balick
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, 02115, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, 02115, MA, USA
| | - Pier F Palamara
- Department of Statistics, University of Oxford, Oxford, OX1 3LB, UK
| | - Hilary K Finucane
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA
| | - Shamil R Sunyaev
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, 02115, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, 02115, MA, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA.
| |
Collapse
|
25
|
Gong J, Wang F, Xiao B, Panjwani N, Lin F, Keenan K, Avolio J, Esmaeili M, Zhang L, He G, Soave D, Mastromatteo S, Baskurt Z, Kim S, O’Neal WK, Polineni D, Blackman SM, Corvol H, Cutting GR, Drumm M, Knowles MR, Rommens JM, Sun L, Strug LJ. Genetic association and transcriptome integration identify contributing genes and tissues at cystic fibrosis modifier loci. PLoS Genet 2019; 15:e1008007. [PMID: 30807572 PMCID: PMC6407791 DOI: 10.1371/journal.pgen.1008007] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Revised: 03/08/2019] [Accepted: 02/06/2019] [Indexed: 01/09/2023] Open
Abstract
Cystic Fibrosis (CF) exhibits morbidity in several organs, including progressive lung disease in all patients and intestinal obstruction at birth (meconium ileus) in ~15%. Individuals with the same causal CFTR mutations show variable disease presentation which is partly attributed to modifier genes. With >6,500 participants from the International CF Gene Modifier Consortium, genome-wide association investigation identified a new modifier locus for meconium ileus encompassing ATP12A on chromosome 13 (min p = 3.83x10(-10)); replicated loci encompassing SLC6A14 on chromosome X and SLC26A9 on chromosome 1, (min p<2.2x10(-16), 2.81x10(-11), respectively); and replicated a suggestive locus on chromosome 7 near PRSS1 (min p = 2.55x10(-7)). PRSS1 is exclusively expressed in the exocrine pancreas and was previously associated with non-CF pancreatitis with functional characterization demonstrating impact on PRSS1 gene expression. We thus asked whether the other meconium ileus modifier loci impact gene expression and in which organ. We developed and applied a colocalization framework called the Simple Sum (SS) that integrates regulatory and genetic association information, and also contrasts colocalization evidence across tissues or genes. The associated modifier loci colocalized with expression quantitative trait loci (eQTLs) for ATP12A (p = 3.35x10(-8)), SLC6A14 (p = 1.12x10(-10)) and SLC26A9 (p = 4.48x10(-5)) in the pancreas, even though meconium ileus manifests in the intestine. The meconium ileus susceptibility locus on chromosome X appeared shifted in location from a previously identified locus for CF lung disease severity. Using the SS we integrated the lung disease association locus with eQTLs from nasal epithelia of 63 CF participants and demonstrated evidence of colocalization with airway-specific regulation of SLC6A14 (p = 2.3x10(-4)). Cystic Fibrosis is realizing the promise of personalized medicine, and identification of the contributing organ and understanding of tissue specificity for a gene modifier is essential for the next phase of personalizing therapeutic strategies.
Collapse
Affiliation(s)
- Jiafen Gong
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada
| | - Fan Wang
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Bowei Xiao
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada
| | - Naim Panjwani
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada
| | - Fan Lin
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada
| | - Katherine Keenan
- Program in Physiology and Experimental Medicine, The Hospital for Sick Children, Toronto, ON, Canada
| | - Julie Avolio
- Program in Translational Medicine, The Hospital for Sick Children, Toronto, ON, Canada
| | - Mohsen Esmaeili
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada
| | - Lin Zhang
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Gengming He
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - David Soave
- Wilfrid Laurier University, Department of Mathematics, Waterloo, Ontario, Canada
- Ontario Institute for Cancer Research, Department of Computational Biology, Toronto, Ontario, Canada
| | - Scott Mastromatteo
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada
| | - Zeynep Baskurt
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada
| | - Sangook Kim
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Wanda K. O’Neal
- Marsico Lung Institute and Cystic Fibrosis Pulmonary Research and Treatment Center, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Deepika Polineni
- Marsico Lung Institute and Cystic Fibrosis Pulmonary Research and Treatment Center, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Department of Internal Medicine, University of Kansas Medical Centre, Kansas City, Kansas, United States of America
| | - Scott M. Blackman
- Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Harriet Corvol
- Assistance Publique-Hôpitaux de Paris (AP-HP), Hôspital Trousseau, Pediatric Pulmonary Department; Institut National de la Santé et la Recherche Médicale (INSERM) U938, Paris, France
- Sorbonne Universités, Université Pierre et Marie (UPMC) Paris, Paris, France
| | - Garry R. Cutting
- Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Mitchell Drumm
- Department of Pediatrics, Case Western Reserve University, Cleveland, Ohio, United States of America
- Department of Genetics, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Michael R. Knowles
- Marsico Lung Institute and Cystic Fibrosis Pulmonary Research and Treatment Center, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Johanna M. Rommens
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Lei Sun
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Lisa J. Strug
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, Canada
| |
Collapse
|
26
|
O'Connor LJ, Price AL. Distinguishing genetic correlation from causation across 52 diseases and complex traits. Nat Genet 2018; 50:1728-1734. [PMID: 30374074 PMCID: PMC6684375 DOI: 10.1038/s41588-018-0255-0] [Citation(s) in RCA: 248] [Impact Index Per Article: 35.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Accepted: 09/13/2018] [Indexed: 12/22/2022]
Abstract
Mendelian randomization, a method to infer causal relationships, is confounded by genetic correlations reflecting shared etiology. We developed a model in which a latent causal variable mediates the genetic correlation; trait 1 is partially genetically causal for trait 2 if it is strongly genetically correlated with the latent causal variable, quantified using the genetic causality proportion. We fit this model using mixed fourth moments [Formula: see text] and [Formula: see text] of marginal effect sizes for each trait; if trait 1 is causal for trait 2, then SNPs affecting trait 1 (large [Formula: see text]) will have correlated effects on trait 2 (large α1α2), but not vice versa. In simulations, our method avoided false positives due to genetic correlations, unlike Mendelian randomization. Across 52 traits (average n = 331,000), we identified 30 causal relationships with high genetic causality proportion estimates. Novel findings included a causal effect of low-density lipoprotein on bone mineral density, consistent with clinical trials of statins in osteoporosis.
Collapse
Affiliation(s)
- Luke J O'Connor
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Bioinformatics and Integrative Genomics, Harvard Graduate School of Arts and Sciences, Cambridge, MA, USA.
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA.
| |
Collapse
|
27
|
High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability. Nat Genet 2018; 50:1311-1317. [PMID: 30104759 PMCID: PMC6145075 DOI: 10.1038/s41588-018-0177-x] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2017] [Accepted: 06/21/2018] [Indexed: 12/19/2022]
Abstract
Interest in reconstructing demographic histories has motivated the development of methods to estimate locus-specific pairwise coalescence times from whole-genome sequence data. Here we introduce a powerful new method, ASMC, that can estimate coalescence times using only SNP array data, and is orders of magnitude faster than previous approaches. We applied ASMC to detect recent positive selection in 113,851 phased British samples from the UK Biobank, and detected 12 genome-wide significant signals, including 6 novel loci. We also applied ASMC to sequencing data from 498 Dutch individuals to detect background selection at deeper time scales. We detected strong heritability enrichment in regions of high background selection in an analysis of 20 independent diseases and complex traits using stratified LD score regression, conditioned on a broad set of functional annotations (including other background selection annotations). These results underscore the widespread effects of background selection on the genetic architecture of complex traits.
Collapse
|
28
|
Okada Y, Momozawa Y, Sakaue S, Kanai M, Ishigaki K, Akiyama M, Kishikawa T, Arai Y, Sasaki T, Kosaki K, Suematsu M, Matsuda K, Yamamoto K, Kubo M, Hirose N, Kamatani Y. Deep whole-genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese. Nat Commun 2018; 9:1631. [PMID: 29691385 PMCID: PMC5915442 DOI: 10.1038/s41467-018-03274-0] [Citation(s) in RCA: 130] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2017] [Accepted: 02/01/2018] [Indexed: 12/19/2022] Open
Abstract
Understanding natural selection is crucial to unveiling evolution of modern humans. Here, we report natural selection signatures in the Japanese population using 2234 high-depth whole-genome sequence (WGS) data (25.9×). Using rare singletons, we identify signals of very recent selection for the past 2000–3000 years in multiple loci (ADH cluster, MHC region, BRAP-ALDH2, SERHL2). In large-scale genome-wide association study (GWAS) dataset (n = 171,176), variants with selection signatures show enrichment in heterogeneity of derived allele frequency spectra among the geographic regions of Japan, highlighted by two major regional clusters (Hondo and Ryukyu). While the selection signatures do not show enrichment in archaic hominin-derived genome sequences, they overlap with the SNPs associated with the modern human traits. The strongest overlaps are observed for the alcohol or nutrition metabolism-related traits. Our study illustrates the value of high-depth WGS to understand evolution and their relationship with disease risk. Recent natural selection left signals in human genomes. Here, Okada et al. generate high-depth whole-genome sequence (WGS) data (25.9×) from 2,234 Japanese people of the BioBank Japan Project (BBJ), and identify signals of recent natural selection which overlap variants associated with human traits.
Collapse
Affiliation(s)
- Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, 565-0871, Japan. .,Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan. .,Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, 565-0871, Japan.
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan
| | - Saori Sakaue
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, 565-0871, Japan.,Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan.,Department of Allergy and Rheumatology, Graduate School of Medicine, the University of Tokyo, Tokyo, 113-8655, Japan
| | - Masahiro Kanai
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, 565-0871, Japan.,Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
| | - Kazuyoshi Ishigaki
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan
| | - Masato Akiyama
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan
| | - Toshihiro Kishikawa
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, 565-0871, Japan.,Department of Otorhinolaryngology-Head and Neck Surgery, Osaka University Graduate School of Medicine, Osaka, 565-0871, Japan
| | - Yasumichi Arai
- Center for Supercentenarian Medical Research, Keio University School of Medicine, Shinanomachi 35, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Takashi Sasaki
- Center for Supercentenarian Medical Research, Keio University School of Medicine, Shinanomachi 35, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Kenjiro Kosaki
- Center for Medical Genetics, Keio University School of Medicine, Shinanomachi 35, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Makoto Suematsu
- Department of Biochemistry, Keio University School of Medicine, Shinanomachi 35, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Koichi Matsuda
- Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, The University of Tokyo, Tokyo, 108-8639, Japan
| | - Kazuhiko Yamamoto
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan
| | - Michiaki Kubo
- RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan
| | - Nobuyoshi Hirose
- Center for Supercentenarian Medical Research, Keio University School of Medicine, Shinanomachi 35, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Yoichiro Kamatani
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan.,Center for Genomic Medicine, Kyoto University Graduate School of Medicine, Sakyo-ku, Kyoto, 606-8507, Japan
| |
Collapse
|
29
|
Akbari A, Vitti JJ, Iranmehr A, Bakhtiari M, Sabeti PC, Mirarab S, Bafna V. Identifying the favored mutation in a positive selective sweep. Nat Methods 2018; 15:279-282. [PMID: 29457793 PMCID: PMC6231406 DOI: 10.1038/nmeth.4606] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Accepted: 01/08/2018] [Indexed: 01/23/2023]
Abstract
Most approaches that capture signatures of selective sweeps in population genomics data do not identify the specific mutation favored by selection. We present iSAFE (for "integrated selection of allele favored by evolution"), a method that enables researchers to accurately pinpoint the favored mutation in a large region (∼5 Mbp) by using a statistic derived solely from population genetics signals. iSAFE does not require knowledge of demography, the phenotype under selection, or functional annotations of mutations.
Collapse
Affiliation(s)
- Ali Akbari
- Department of Electrical & Computer Engineering, University of California San Diego, La Jolla, California, USA
| | - Joseph J Vitti
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Arya Iranmehr
- Department of Electrical & Computer Engineering, University of California San Diego, La Jolla, California, USA
| | - Mehrdad Bakhtiari
- Department of Computer Science & Engineering, University of California San Diego, La Jolla, California, USA
| | - Pardis C Sabeti
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Siavash Mirarab
- Department of Electrical & Computer Engineering, University of California San Diego, La Jolla, California, USA
| | - Vineet Bafna
- Department of Computer Science & Engineering, University of California San Diego, La Jolla, California, USA
| |
Collapse
|
30
|
Wang Z, Li L, Glicksberg BS, Israel A, Dudley JT, Ma'ayan A. Predicting age by mining electronic medical records with deep learning characterizes differences between chronological and physiological age. J Biomed Inform 2017; 76:59-68. [PMID: 29113935 PMCID: PMC5716867 DOI: 10.1016/j.jbi.2017.11.003] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2017] [Revised: 10/28/2017] [Accepted: 11/04/2017] [Indexed: 02/08/2023]
Abstract
Determining the discrepancy between chronological and physiological age of patients is central to preventative and personalized care. Electronic medical records (EMR) provide rich information about the patient physiological state, but it is unclear whether such information can be predictive of chronological age. Here we present a deep learning model that uses vital signs and lab tests contained within the EMR of Mount Sinai Health System (MSHS) to predict chronological age. The model is trained on 377,686 EMR from patients of ages 18-85 years old. The discrepancy between the predicted and real chronological age is then used as a proxy to estimate physiological age. Overall, the model can predict the chronological age of patients with a standard deviation error of ∼7 years. The ages of the youngest and oldest patients were more accurately predicted, while patients of ages ranging between 40 and 60 years were the least accurately predicted. Patients with the largest discrepancy between their physiological and chronological age were further inspected. The patients predicted to be significantly older than their chronological age have higher systolic blood pressure, higher cholesterol, damaged liver, and anemia. In contrast, patients predicted to be younger than their chronological age have lower blood pressure and shorter stature among other indicators; both groups display lower weight than the population average. Using information from ∼10,000 patients from the entire cohort who have been also profiled with SNP arrays, genome-wide association study (GWAS) uncovers several novel genetic variants associated with aging. In particular, significant variants were mapped to genes known to be associated with inflammation, hypertension, lipid metabolism, height, and increased lifespan in mice. Several genes with missense mutations were identified as novel candidate aging genes. In conclusion, we demonstrate how EMR data can be used to assess overall health via a scale that is based on deviation from the patient's predicted chronological age.
Collapse
Affiliation(s)
- Zichen Wang
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
| | - Li Li
- Department of Genetics and Genomic Sciences, Institute of Next Generation Healthcare, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
| | - Benjamin S Glicksberg
- Department of Genetics and Genomic Sciences, Institute of Next Generation Healthcare, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
| | - Ariel Israel
- Department of Family Medicine, Clalit Health Services, Jerusalem 90258, Israel
| | - Joel T Dudley
- Department of Genetics and Genomic Sciences, Institute of Next Generation Healthcare, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
| | - Avi Ma'ayan
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA.
| |
Collapse
|
31
|
Márquez-Luna C, Loh PR, Price AL. Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genet Epidemiol 2017; 41:811-823. [PMID: 29110330 PMCID: PMC5726434 DOI: 10.1002/gepi.22083] [Citation(s) in RCA: 194] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Revised: 08/16/2017] [Accepted: 08/30/2017] [Indexed: 01/04/2023]
Abstract
Methods for genetic risk prediction have been widely investigated in recent years. However, most available training data involves European samples, and it is currently unclear how to accurately predict disease risk in other populations. Previous studies have used either training data from European samples in large sample size or training data from the target population in small sample size, but not both. Here, we introduce a multiethnic polygenic risk score that combines training data from European samples and training data from the target population. We applied this approach to predict type 2 diabetes (T2D) in a Latino cohort using both publicly available European summary statistics in large sample size (Neff = 40k) and Latino training data in small sample size (Neff = 8k). Here, we attained a >70% relative improvement in prediction accuracy (from R2 = 0.027 to 0.047) compared to methods that use only one source of training data, consistent with large relative improvements in simulations. We observed a systematically lower load of T2D risk alleles in Latino individuals with more European ancestry, which could be explained by polygenic selection in ancestral European and/or Native American populations. We predict T2D in a South Asian UK Biobank cohort using European (Neff = 40k) and South Asian (Neff = 16k) training data and attained a >70% relative improvement in prediction accuracy, and application to predict height in an African UK Biobank cohort using European (N = 113k) and African (N = 2k) training data attained a 30% relative improvement. Our work reduces the gap in polygenic risk prediction accuracy between European and non-European target populations.
Collapse
Affiliation(s)
- Carla Márquez-Luna
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Po-Ru Loh
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
| | - Alkes L Price
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
| |
Collapse
|
32
|
Mostafavi H, Berisa T, Day FR, Perry JRB, Przeworski M, Pickrell JK. Identifying genetic variants that affect viability in large cohorts. PLoS Biol 2017; 15:e2002458. [PMID: 28873088 PMCID: PMC5584811 DOI: 10.1371/journal.pbio.2002458] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 08/03/2017] [Indexed: 12/20/2022] Open
Abstract
A number of open questions in human evolutionary genetics would become tractable if we were able to directly measure evolutionary fitness. As a step towards this goal, we developed a method to examine whether individual genetic variants, or sets of genetic variants, currently influence viability. The approach consists in testing whether the frequency of an allele varies across ages, accounting for variation in ancestry. We applied it to the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort and to the parents of participants in the UK Biobank. Across the genome, we found only a few common variants with large effects on age-specific mortality: tagging the APOE ε4 allele and near CHRNA3. These results suggest that when large, even late-onset effects are kept at low frequency by purifying selection. Testing viability effects of sets of genetic variants that jointly influence 1 of 42 traits, we detected a number of strong signals. In participants of the UK Biobank of British ancestry, we found that variants that delay puberty timing are associated with a longer parental life span (P~6.2 × 10−6 for fathers and P~2.0 × 10−3 for mothers), consistent with epidemiological studies. Similarly, variants associated with later age at first birth are associated with a longer maternal life span (P~1.4 × 10−3). Signals are also observed for variants influencing cholesterol levels, risk of coronary artery disease (CAD), body mass index, as well as risk of asthma. These signals exhibit consistent effects in the GERA cohort and among participants of the UK Biobank of non-British ancestry. We also found marked differences between males and females, most notably at the CHRNA3 locus, and variants associated with risk of CAD and cholesterol levels. Beyond our findings, the analysis serves as a proof of principle for how upcoming biomedical data sets can be used to learn about selection effects in contemporary humans. Our global understanding of adaptation in humans is limited to indirect statistical inferences from patterns of genetic variation, which are sensitive to past selection pressures. We introduced a method that allowed us to directly observe ongoing selection in humans by identifying genetic variants that affect survival to a given age (i.e., viability selection). We applied our approach to the GERA cohort and parents of the UK Biobank participants. We found viability effects of variants near the APOE and CHRNA3 genes, which are associated with the risk of Alzheimer disease and smoking behavior, respectively. We also tested for the joint effect of sets of genetic variants that influence quantitative traits. We uncovered an association between longer life span and genetic variants that delay puberty timing and age at first birth. We also detected detrimental effects of higher genetically predicted cholesterol levels, body mass index, risk of coronary artery disease (CAD), and risk of asthma on survival. Some of the observed effects differ between males and females, most notably those at the CHRNA3 gene and variants associated with risk of CAD and cholesterol levels. Beyond this application, our analysis shows how large biomedical data sets can be used to study natural selection in humans.
Collapse
Affiliation(s)
- Hakhamanesh Mostafavi
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
- * E-mail: (HM); (MP); (JKP)
| | - Tomaz Berisa
- New York Genome Center, New York, New York, United States of America
| | - Felix R. Day
- MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, United Kingdom
| | - John R. B. Perry
- MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, United Kingdom
| | - Molly Przeworski
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- * E-mail: (HM); (MP); (JKP)
| | - Joseph K. Pickrell
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
- New York Genome Center, New York, New York, United States of America
- * E-mail: (HM); (MP); (JKP)
| |
Collapse
|