1
|
Choquet H, Yin J, Kim Y, Hoffmann T, Saini S, Shringarpure S, Team, Jorgenson E, Asgari M. 501 Meta-analyses of genome-wide association studies in multiethnic cohorts identify risk loci associated with hidradenitis suppurativa. J Invest Dermatol 2022. [DOI: 10.1016/j.jid.2022.05.510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
2
|
Ashenhurst JR, Sazonova OV, Svrchek O, Detweiler S, Kita R, Babalola L, McIntyre M, Aslibekyan S, Fontanillas P, Shringarpure S, Pollard JD, Koelsch BL. A Polygenic Score for Type 2 Diabetes Improves Risk Stratification Beyond Current Clinical Screening Factors in an Ancestrally Diverse Sample. Front Genet 2022; 13:871260. [PMID: 35559025 PMCID: PMC9086969 DOI: 10.3389/fgene.2022.871260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 03/31/2022] [Indexed: 11/13/2022] Open
Abstract
A substantial proportion of the adult United States population with type 2 diabetes (T2D) are undiagnosed, calling into question the comprehensiveness of current screening practices, which primarily rely on age, family history, and body mass index (BMI). We hypothesized that a polygenic score (PGS) may serve as a complementary tool to identify high-risk individuals. The T2D polygenic score maintained predictive utility after adjusting for family history and combining genetics with family history led to even more improved disease risk prediction. We observed that the PGS was meaningfully related to age of onset with implications for screening practices: there was a linear and statistically significant relationship between the PGS and T2D onset (-1.3 years per standard deviation of the PGS). Evaluation of U.S. Preventive Task Force and a simplified version of American Diabetes Association screening guidelines showed that addition of a screening criterion for those above the 90th percentile of the PGS provided a small increase the sensitivity of the screening algorithm. Among T2D-negative individuals, the T2D PGS was associated with prediabetes, where each standard deviation increase of the PGS was associated with a 23% increase in the odds of prediabetes diagnosis. Additionally, each standard deviation increase in the PGS corresponded to a 43% increase in the odds of incident T2D at one-year follow-up. Using complications and forms of clinical intervention (i.e., lifestyle modification, metformin treatment, or insulin treatment) as proxies for advanced illness we also found statistically significant associations between the T2D PGS and insulin treatment and diabetic neuropathy. Importantly, we were able to replicate many findings in a Hispanic/Latino cohort from our database, highlighting the value of the T2D PGS as a clinical tool for individuals with ancestry other than European. In this group, the T2D PGS provided additional disease risk information beyond that offered by traditional screening methodologies. The T2D PGS also had predictive value for the age of onset and for prediabetes among T2D-negative Hispanic/Latino participants. These findings strengthen the notion that a T2D PGS could play a role in the clinical setting across multiple ancestries, potentially improving T2D screening practices, risk stratification, and disease management.
Collapse
|
3
|
O'Connell J, Yun T, Moreno M, Li H, Litterman N, Kolesnikov A, Noblin E, Chang PC, Shastri A, Dorfman EH, Shringarpure S, Auton A, Carroll A, McLean CY. A population-specific reference panel for improved genotype imputation in African Americans. Commun Biol 2021; 4:1269. [PMID: 34741098 PMCID: PMC8571350 DOI: 10.1038/s42003-021-02777-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 10/12/2021] [Indexed: 12/17/2022] Open
Abstract
There is currently a dearth of accessible whole genome sequencing (WGS) data for individuals residing in the Americas with Sub-Saharan African ancestry. We generated whole genome sequencing data at intermediate (15×) coverage for 2,294 individuals with large amounts of Sub-Saharan African ancestry, predominantly Atlantic African admixed with varying amounts of European and American ancestry. We performed extensive comparisons of variant callers, phasing algorithms, and variant filtration on these data to construct a high quality imputation panel containing data from 2,269 unrelated individuals. With the exception of the TOPMed imputation server (which notably cannot be downloaded), our panel substantially outperformed other available panels when imputing African American individuals. The raw sequencing data, variant calls and imputation panel for this cohort are all freely available via dbGaP and should prove an invaluable resource for further study of admixed African genetics.
Collapse
Affiliation(s)
| | | | | | - Helen Li
- Google Health, Cambridge, MA, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Eijsbouts C, Zheng T, Kennedy NA, Bonfiglio F, Anderson CA, Moutsianas L, Holliday J, Shi J, Shringarpure S, Voda AI, Farrugia G, Franke A, Hübenthal M, Abecasis G, Zawistowski M, Skogholt AH, Ness-Jensen E, Hveem K, Esko T, Teder-Laving M, Zhernakova A, Camilleri M, Boeckxstaens G, Whorwell PJ, Spiller R, McVean G, D'Amato M, Jostins L, Parkes M. Genome-wide analysis of 53,400 people with irritable bowel syndrome highlights shared genetic pathways with mood and anxiety disorders. Nat Genet 2021; 53:1543-1552. [PMID: 34741163 PMCID: PMC8571093 DOI: 10.1038/s41588-021-00950-8] [Citation(s) in RCA: 74] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 09/08/2021] [Indexed: 12/19/2022]
Abstract
Irritable bowel syndrome (IBS) results from disordered brain-gut interactions. Identifying susceptibility genes could highlight the underlying pathophysiological mechanisms. We designed a digestive health questionnaire for UK Biobank and combined identified cases with IBS with independent cohorts. We conducted a genome-wide association study with 53,400 cases and 433,201 controls and replicated significant associations in a 23andMe panel (205,252 cases and 1,384,055 controls). Our study identified and confirmed six genetic susceptibility loci for IBS. Implicated genes included NCAM1, CADM2, PHF2/FAM120A, DOCK9, CKAP2/TPTE2P3 and BAG6. The first four are associated with mood and anxiety disorders, expressed in the nervous system, or both. Mirroring this, we also found strong genome-wide correlation between the risk of IBS and anxiety, neuroticism and depression (rg > 0.5). Additional analyses suggested this arises due to shared pathogenic pathways rather than, for example, anxiety causing abdominal symptoms. Implicated mechanisms require further exploration to help understand the altered brain-gut interactions underlying IBS.
Collapse
Affiliation(s)
- Chris Eijsbouts
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Tenghao Zheng
- Center for Molecular Medicine & Clinical Epidemiology Unit, Department of Medicine Solna, Karolinska Institutet, Stockholm, Sweden
- School of Biological Sciences, Monash University, Clayton, Victoria, Australia
| | - Nicholas A Kennedy
- IBD Pharmacogenetics, College of Medicine and Health, University of Exeter, Exeter, UK
| | - Ferdinando Bonfiglio
- Center for Molecular Medicine & Clinical Epidemiology Unit, Department of Medicine Solna, Karolinska Institutet, Stockholm, Sweden
- School of Biological Sciences, Monash University, Clayton, Victoria, Australia
| | - Carl A Anderson
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Loukas Moutsianas
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- William Harvey Research Institute, Barts & The London School of Medicine & Dentistry, Queen Mary University of London, London, UK
| | - Joanne Holliday
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | | | | | - Alexandru-Ioan Voda
- Kennedy Institute of Rheumatology, University of Oxford, Oxford, UK
- Saint Edmund Hall, University of Oxford, Oxford, UK
| | - Gianrico Farrugia
- Enteric NeuroScience Program, Division of Gastroenterology and Hepatology, Department of Medicine, Mayo Clinic, Rochester, MN, USA
| | - Andre Franke
- Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
| | - Matthias Hübenthal
- Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
- Department of Dermatology, Quincke Research Center, University Hospital Schleswig-Holstein, Kiel, Germany
| | - Gonçalo Abecasis
- Department of Biostatistics, University of Michigan, School of Public Health, Ann Arbor, MI, USA
| | - Matthew Zawistowski
- Department of Biostatistics, University of Michigan, School of Public Health, Ann Arbor, MI, USA
| | - Anne Heidi Skogholt
- Department of Laboratory Medicine, Children's and Women's Health, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Public Health and Nursing, Norwegian University of Science and Technology, Trondheim, Norway
| | - Eivind Ness-Jensen
- Department of Public Health and Nursing, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Medicine, Levanger Hospital, Nord-Trøndelag Hospital Trust, Levanger, Norway
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Karolinska University Hospital, Stockholm, Sweden
| | - Kristian Hveem
- Department of Public Health and Nursing, Norwegian University of Science and Technology, Trondheim, Norway
| | - Tõnu Esko
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Maris Teder-Laving
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Alexandra Zhernakova
- Department of Genetics, University Medical Center Groningen, Groningen, the Netherlands
| | - Michael Camilleri
- Clinical Enteric Neuroscience Translational and Epidemiological Research and Division of Gastroenterology and Hepatology, Department of Medicine, Mayo Clinic, Rochester, MN, USA
| | - Guy Boeckxstaens
- David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Peter J Whorwell
- Neurogastroenterology Unit, Wythenshawe Hospital, Centre for Gastrointestinal Sciences, University of Manchester, Manchester, UK
| | - Robin Spiller
- Nottingham Digestive Diseases Centre, National Institute for Health Research Nottingham Biomedical Research Centre, Nottingham University Hospitals NHS Trust and the University of Nottingham, Nottingham, UK
| | - Gil McVean
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Mauro D'Amato
- Center for Molecular Medicine & Clinical Epidemiology Unit, Department of Medicine Solna, Karolinska Institutet, Stockholm, Sweden.
- School of Biological Sciences, Monash University, Clayton, Victoria, Australia.
- Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany.
- Biodonostia Health Research Institute, San Sebastian, Spain.
- Gastrointestinal Genetics Lab, CIC bioGUNE - Basque Research and Technology Alliance, Derio, Spain.
- IKERBASQUE, The Basque Science Foundation, Bilbao, Spain.
| | - Luke Jostins
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK.
- Kennedy Institute of Rheumatology, University of Oxford, Oxford, UK.
- Christ Church, University of Oxford, Oxford, UK.
| | - Miles Parkes
- Division of Gastroenterology and Hepatology, Department of Medicine, University of Cambridge, Cambridge, UK.
- Department of Gastroenterology, Cambridge University Hospital, Cambridge, UK.
| |
Collapse
|
5
|
Thorp JG, Campos AI, Grotzinger AD, Gerring ZF, An J, Ong JS, Wang W, Shringarpure S, Byrne EM, MacGregor S, Martin NG, Medland SE, Middeldorp CM, Derks EM. Symptom-level modelling unravels the shared genetic architecture of anxiety and depression. Nat Hum Behav 2021; 5:1432-1442. [PMID: 33859377 DOI: 10.1038/s41562-021-01094-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 03/01/2021] [Indexed: 02/02/2023]
Abstract
Depression and anxiety are highly prevalent and comorbid psychiatric traits that cause considerable burden worldwide. Here we use factor analysis and genomic structural equation modelling to investigate the genetic factor structure underlying 28 items assessing depression, anxiety and neuroticism, a closely related personality trait. Symptoms of depression and anxiety loaded on two distinct, although highly genetically correlated factors, and neuroticism items were partitioned between them. We used this factor structure to conduct genome-wide association analyses on latent factors of depressive symptoms (89 independent variants, 61 genomic loci) and anxiety symptoms (102 variants, 73 loci) in the UK Biobank. Of these associated variants, 72% and 78%, respectively, replicated in an independent cohort of approximately 1.9 million individuals with self-reported diagnosis of depression and anxiety. We use these results to characterize shared and trait-specific genetic associations. Our findings provide insight into the genetic architecture of depression and anxiety and comorbidity between them.
Collapse
Affiliation(s)
- Jackson G Thorp
- Translational Neurogenomics, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia.
- Faculty of Medicine, University of Queensland, Brisbane, Queensland, Australia.
| | - Adrian I Campos
- Faculty of Medicine, University of Queensland, Brisbane, Queensland, Australia
- Genetic Epidemiology, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | | | - Zachary F Gerring
- Translational Neurogenomics, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Jiyuan An
- Statistical Genetics, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Jue-Sheng Ong
- Statistical Genetics, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | | | | | - Enda M Byrne
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
| | - Stuart MacGregor
- Statistical Genetics, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Nicholas G Martin
- Genetic Epidemiology, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Sarah E Medland
- Psychiatric Genetics, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Christel M Middeldorp
- Child Health Research Centre, University of Queensland, Brisbane, Queensland, Australia
- Child and Youth Mental Health Service, Children's Health Queensland Hospital and Health Service, Brisbane, Queensland, Australia
- Department of Biological Psychology, VU University Amsterdam, Amsterdam, The Netherlands
| | - Eske M Derks
- Translational Neurogenomics, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia.
| |
Collapse
|
6
|
Micheletti SJ, Bryc K, Ancona Esselmann SG, Freyman WA, Moreno ME, Poznik GD, Shastri AJ, Beleza S, Mountain JL, Agee M, Aslibekyan S, Auton A, Bell R, Clark S, Das S, Elson S, Fletez-Brant K, Fontanillas P, Gandhi P, Heilbron K, Hicks B, Hinds D, Huber K, Jewett E, Jiang Y, Kleinman A, Lin K, Litterman N, McCreight J, McIntyre M, McManus K, Mozaffari S, Nandakumar P, Noblin L, Northover C, O’Connell J, Petrakovitz A, Pitts S, Shelton J, Shringarpure S, Tian C, Tung J, Tunney R, Vacic V, Wang X, Zare A. Genetic Consequences of the Transatlantic Slave Trade in the Americas. Am J Hum Genet 2020; 107:265-277. [PMID: 32707084 PMCID: PMC7413858 DOI: 10.1016/j.ajhg.2020.06.012] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 06/15/2020] [Indexed: 01/07/2023] Open
Abstract
According to historical records of transatlantic slavery, traders forcibly deported an estimated 12.5 million people from ports along the Atlantic coastline of Africa between the 16th and 19th centuries, with global impacts reaching to the present day, more than a century and a half after slavery's abolition. Such records have fueled a broad understanding of the forced migration from Africa to the Americas yet remain underexplored in concert with genetic data. Here, we analyzed genotype array data from 50,281 research participants, which-combined with historical shipping documents-illustrate that the current genetic landscape of the Americas is largely concordant with expectations derived from documentation of slave voyages. For instance, genetic connections between people in slave trading regions of Africa and disembarkation regions of the Americas generally mirror the proportion of individuals forcibly moved between those regions. While some discordances can be explained by additional records of deportations within the Americas, other discordances yield insights into variable survival rates and timing of arrival of enslaved people from specific regions of Africa. Furthermore, the greater contribution of African women to the gene pool compared to African men varies across the Americas, consistent with literature documenting regional differences in slavery practices. This investigation of the transatlantic slave trade, which is broad in scope in terms of both datasets and analyses, establishes genetic links between individuals in the Americas and populations across Atlantic Africa, yielding a more comprehensive understanding of the African roots of peoples of the Americas.
Collapse
|
7
|
Wojcik GL, Fuchsberger C, Taliun D, Welch R, Martin AR, Shringarpure S, Carlson CS, Abecasis G, Kang HM, Boehnke M, Bustamante CD, Gignoux CR, Kenny EE. Imputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies. G3 (Bethesda) 2018; 8:3255-3267. [PMID: 30131328 PMCID: PMC6169386 DOI: 10.1534/g3.118.200502] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Accepted: 08/03/2018] [Indexed: 01/26/2023]
Abstract
The emergence of very large cohorts in genomic research has facilitated a focus on genotype-imputation strategies to power rare variant association. These strategies have benefited from improvements in imputation methods and association tests, however little attention has been paid to ways in which array design can increase rare variant association power. Therefore, we developed a novel framework to select tag SNPs using the reference panel of 26 populations from Phase 3 of the 1000 Genomes Project. We evaluate tag SNP performance via mean imputed r2 at untyped sites using leave-one-out internal validation and standard imputation methods, rather than pairwise linkage disequilibrium. Moving beyond pairwise metrics allows us to account for haplotype diversity across the genome for improve imputation accuracy and demonstrates population-specific biases from pairwise estimates. We also examine array design strategies that contrast multi-ethnic cohorts vs. single populations, and show a boost in performance for the former can be obtained by prioritizing tag SNPs that contribute information across multiple populations simultaneously. Using our framework, we demonstrate increased imputation accuracy for rare variants (frequency < 1%) by 0.5-3.1% for an array of one million sites and 0.7-7.1% for an array of 500,000 sites, depending on the population. Finally, we show how recent explosive growth in non-African populations means tag SNPs capture on average 30% fewer other variants than in African populations. The unified framework presented here will enable investigators to make informed decisions for the design of new arrays, and help empower the next phase of rare variant association for global health.
Collapse
Affiliation(s)
- Genevieve L Wojcik
- Department of Genetics, Stanford University School of Medicine, 365 Lasuen Street, Littlefield Center MC2069, Stanford, CA 94305
| | - Christian Fuchsberger
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109
- Center for Biomedicine, European Academy of Bolzano/Bozen (EURAC), affiliated with the University of Lübeck, Bolzano, Bozen, 39100, Italy
| | - Daniel Taliun
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109
| | - Ryan Welch
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109
| | - Alicia R Martin
- Department of Genetics, Stanford University School of Medicine, 365 Lasuen Street, Littlefield Center MC2069, Stanford, CA 94305
| | - Suyash Shringarpure
- Department of Genetics, Stanford University School of Medicine, 365 Lasuen Street, Littlefield Center MC2069, Stanford, CA 94305
| | - Christopher S Carlson
- Fred Hutchinson Cancer Center, University of Washington, 1100 Fairview Ave. N., Seattle, WA 98109
| | - Goncalo Abecasis
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109
| | - Hyun Min Kang
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109
| | - Michael Boehnke
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109
| | - Carlos D Bustamante
- Department of Genetics, Stanford University School of Medicine, 365 Lasuen Street, Littlefield Center MC2069, Stanford, CA 94305
- Department of Biomedical Data Science, Stanford University School of Medicine, 365 Lasuen Street, Littlefield Center MC2069, Stanford, CA 94305
| | - Christopher R Gignoux
- Department of Genetics, Stanford University School of Medicine, 365 Lasuen Street, Littlefield Center MC2069, Stanford, CA 94305
| | - Eimear E Kenny
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029
- The Icahn Institute of Multiscale Biology and Genomics, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029
- The Center for Statistical Genetics, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029
| |
Collapse
|
8
|
Raisaro JL, Tramèr F, Ji Z, Bu D, Zhao Y, Carey K, Lloyd D, Sofia H, Baker D, Flicek P, Shringarpure S, Bustamante C, Wang S, Jiang X, Ohno-Machado L, Tang H, Wang X, Hubaux JP. Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks. J Am Med Inform Assoc 2017; 24:799-805. [PMID: 28339683 PMCID: PMC5881894 DOI: 10.1093/jamia/ocw167] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 09/27/2016] [Accepted: 12/01/2016] [Indexed: 12/21/2022] Open
Abstract
The Global Alliance for Genomics and Health (GA4GH) created the Beacon Project as a means of testing the willingness of data holders to share genetic data in the simplest technical context-a query for the presence of a specified nucleotide at a given position within a chromosome. Each participating site (or "beacon") is responsible for assuring that genomic data are exposed through the Beacon service only with the permission of the individual to whom the data pertains and in accordance with the GA4GH policy and standards.While recognizing the inference risks associated with large-scale data aggregation, and the fact that some beacons contain sensitive phenotypic associations that increase privacy risk, the GA4GH adjudged the risk of re-identification based on the binary yes/no allele-presence query responses as acceptable. However, recent work demonstrated that, given a beacon with specific characteristics (including relatively small sample size and an adversary who possesses an individual's whole genome sequence), the individual's membership in a beacon can be inferred through repeated queries for variants present in the individual's genome.In this paper, we propose three practical strategies for reducing re-identification risks in beacons. The first two strategies manipulate the beacon such that the presence of rare alleles is obscured; the third strategy budgets the number of accesses per user for each individual genome. Using a beacon containing data from the 1000 Genomes Project, we demonstrate that the proposed strategies can effectively reduce re-identification risk in beacon-like datasets.
Collapse
Affiliation(s)
- Jean Louis Raisaro
- School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Florian Tramèr
- School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Zhanglong Ji
- Health Science Department of Biomedical Informatics, University of California San Diego, San Diego, CA, USA
| | - Diyue Bu
- School of Informatics and Computing, Indiana University Bloomington, Bloomington, IN, USA
| | - Yongan Zhao
- School of Informatics and Computing, Indiana University Bloomington, Bloomington, IN, USA
| | | | - David Lloyd
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
| | - Heidi Sofia
- Division of Genomic Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Dixie Baker
- Martin, Blanck and Associates, Alexandria, VA, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | | | - Shuang Wang
- Health Science Department of Biomedical Informatics, University of California San Diego, San Diego, CA, USA
| | - Xiaoqian Jiang
- Health Science Department of Biomedical Informatics, University of California San Diego, San Diego, CA, USA
| | - Lucila Ohno-Machado
- Health Science Department of Biomedical Informatics, University of California San Diego, San Diego, CA, USA
| | - Haixu Tang
- School of Informatics and Computing, Indiana University Bloomington, Bloomington, IN, USA
| | - XiaoFeng Wang
- School of Informatics and Computing, Indiana University Bloomington, Bloomington, IN, USA
| | - Jean-Pierre Hubaux
- School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| |
Collapse
|
9
|
Baharian S, Barakatt M, Gignoux CR, Shringarpure S, Errington J, Blot WJ, Bustamante CD, Kenny EE, Williams SM, Aldrich MC, Gravel S. The Great Migration and African-American Genomic Diversity. PLoS Genet 2016; 12:e1006059. [PMID: 27232753 PMCID: PMC4883799 DOI: 10.1371/journal.pgen.1006059] [Citation(s) in RCA: 113] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2015] [Accepted: 04/26/2016] [Indexed: 12/23/2022] Open
Abstract
We present a comprehensive assessment of genomic diversity in the African-American population by studying three genotyped cohorts comprising 3,726 African-Americans from across the United States that provide a representative description of the population across all US states and socioeconomic status. An estimated 82.1% of ancestors to African-Americans lived in Africa prior to the advent of transatlantic travel, 16.7% in Europe, and 1.2% in the Americas, with increased African ancestry in the southern United States compared to the North and West. Combining demographic models of ancestry and those of relatedness suggests that admixture occurred predominantly in the South prior to the Civil War and that ancestry-biased migration is responsible for regional differences in ancestry. We find that recent migrations also caused a strong increase in genetic relatedness among geographically distant African-Americans. Long-range relatedness among African-Americans and between African-Americans and European-Americans thus track north- and west-bound migration routes followed during the Great Migration of the twentieth century. By contrast, short-range relatedness patterns suggest comparable mobility of ∼15–16km per generation for African-Americans and European-Americans, as estimated using a novel analytical model of isolation-by-distance. Genetic studies of African-Americans identify functional variants, elucidate historical and genealogical mysteries, and reveal basic biology. However, African-Americans have been under-represented in genetic studies, and relatively little is known about nation-wide patterns of genomic diversity in the population. Here, we study African-American genomic diversity using genotype data from nationally and regionally representative cohorts. Access to these unique cohorts allows us to clarify the role of population structure, admixture, and recent massive migrations in shaping African-American genomic diversity and sheds new light on the genetic history of this population.
Collapse
Affiliation(s)
- Soheil Baharian
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada
- McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
| | - Maxime Barakatt
- McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
- School of Computer Science, McGill University, Montreal, Quebec, Canada
| | - Christopher R. Gignoux
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Suyash Shringarpure
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Jacob Errington
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada
- McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
| | - William J. Blot
- Division of Epidemiology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- International Epidemiology Institute, Rockville, Maryland, United States of America
| | - Carlos D. Bustamante
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Eimear E. Kenny
- Department of Genetics and Genomic Sciences, The Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- The Charles Bronfman Institute for Personalized Medicine, The Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- The Icahn Institute for Genomics and Multiscale Biology, The Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- The Center for Statistical Genetics, The Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Scott M. Williams
- Department of Genetics, Institute for Quantitative Biomedical Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Melinda C. Aldrich
- Division of Epidemiology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Department of Thoracic Surgery, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada
- McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
- * E-mail:
| |
Collapse
|
10
|
Shringarpure S, Xing EP. Effects of sample selection bias on the accuracy of population structure and ancestry inference. G3 (Bethesda) 2014; 4:901-11. [PMID: 24637351 PMCID: PMC4025489 DOI: 10.1534/g3.113.007633] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2013] [Accepted: 03/10/2014] [Indexed: 01/01/2023]
Abstract
Population stratification is an important task in genetic analyses. It provides information about the ancestry of individuals and can be an important confounder in genome-wide association studies. Public genotyping projects have made a large number of datasets available for study. However, practical constraints dictate that of a geographical/ethnic population, only a small number of individuals are genotyped. The resulting data are a sample from the entire population. If the distribution of sample sizes is not representative of the populations being sampled, the accuracy of population stratification analyses of the data could be affected. We attempt to understand the effect of biased sampling on the accuracy of population structure analysis and individual ancestry recovery. We examined two commonly used methods for analyses of such datasets, ADMIXTURE and EIGENSOFT, and found that the accuracy of recovery of population structure is affected to a large extent by the sample used for analysis and how representative it is of the underlying populations. Using simulated data and real genotype data from cattle, we show that sample selection bias can affect the results of population structure analyses. We develop a mathematical framework for sample selection bias in models for population structure and also proposed a correction for sample selection bias using auxiliary information about the sample. We demonstrate that such a correction is effective in practice using simulated and real data.
Collapse
Affiliation(s)
| | - Eric P Xing
- School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
| |
Collapse
|
11
|
Abstract
Motivation: Clustering of genotype data is an important way of understanding similarities and differences between populations. A summary of populations through clustering allows us to make inferences about the evolutionary history of the populations. Many methods have been proposed to perform clustering on multilocus genotype data. However, most of these methods do not directly address the question of how many clusters the data should be divided into and leave that choice to the user. Methods: We present StructHDP, which is a method for automatically inferring the number of clusters from genotype data in the presence of admixture. Our method is an extension of two existing methods, Structure and Structurama. Using a Hierarchical Dirichlet Process (HDP), we model the presence of admixture of an unknown number of ancestral populations in a given sample of genotype data. We use a Gibbs sampler to perform inference on the resulting model and infer the ancestry proportions and the number of clusters that best explain the data. Results: To demonstrate our method, we simulated data from an island model using the neutral coalescent. Comparing the results of StructHDP with Structurama shows the utility of combining HDPs with the Structure model. We used StructHDP to analyze a dataset of 155 Taita thrush, Turdus helleri, which has been previously analyzed using Structure and Structurama. StructHDP correctly picks the optimal number of populations to cluster the data. The clustering based on the inferred ancestry proportions also agrees with that inferred using Structure for the optimal number of populations. We also analyzed data from 1048 individuals from the Human Genome Diversity project from 53 world populations. We found that the clusters obtained correspond with major geographical divisions of the world, which is in agreement with previous analyses of the dataset. Availability: StructHDP is written in C++. The code will be available for download at http://www.sailing.cs.cmu.edu/structhdp. Contact:suyash@cs.cmu.edu; epxing@cs.cmu.edu
Collapse
Affiliation(s)
- Suyash Shringarpure
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| | | | | |
Collapse
|
12
|
Ray P, Shringarpure S, Kolar M, Xing EP. CSMET: comparative genomic motif detection via multi-resolution phylogenetic shadowing. PLoS Comput Biol 2008; 4:e1000090. [PMID: 18535663 PMCID: PMC2396503 DOI: 10.1371/journal.pcbi.1000090] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2007] [Accepted: 04/28/2008] [Indexed: 11/19/2022] Open
Abstract
Functional turnover of transcription factor binding sites (TFBSs), such as whole-motif loss or gain, are common events during genome evolution. Conventional probabilistic phylogenetic shadowing methods model the evolution of genomes only at nucleotide level, and lack the ability to capture the evolutionary dynamics of functional turnover of aligned sequence entities. As a result, comparative genomic search of non-conserved motifs across evolutionarily related taxa remains a difficult challenge, especially in higher eukaryotes, where the cis-regulatory regions containing motifs can be long and divergent; existing methods rely heavily on specialized pattern-driven heuristic search or sampling algorithms, which can be difficult to generalize and hard to interpret based on phylogenetic principles. We propose a new method: Conditional Shadowing via Multi-resolution Evolutionary Trees, or CSMET, which uses a context-dependent probabilistic graphical model that allows aligned sites from different taxa in a multiple alignment to be modeled by either a background or an appropriate motif phylogeny conditioning on the functional specifications of each taxon. The functional specifications themselves are the output of a phylogeny which models the evolution not of individual nucleotides, but of the overall functionality (e.g., functional retention or loss) of the aligned sequence segments over lineages. Combining this method with a hidden Markov model that autocorrelates evolutionary rates on successive sites in the genome, CSMET offers a principled way to take into consideration lineage-specific evolution of TFBSs during motif detection, and a readily computable analytical form of the posterior distribution of motifs under TFBS turnover. On both simulated and real Drosophila cis-regulatory modules, CSMET outperforms other state-of-the-art comparative genomic motif finders.
Collapse
Affiliation(s)
- Pradipta Ray
- School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Suyash Shringarpure
- School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Mladen Kolar
- School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Eric P. Xing
- School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
13
|
Affiliation(s)
- K Ramu
- Department of Medicinal Chemistry, University of Mississippi, University, Massachusetts 38677, USA
| | | | | |
Collapse
|
14
|
Affiliation(s)
- K Ramu
- Department of Medicinal Chemistry, School of Pharmacy, University of Mississippi, University 38677
| | | | | | | | | |
Collapse
|