1
|
Guo J, Guo Q, Zhong T, Xu C, Xia Z, Fang H, Chen Q, Zhou Y, Xie J, Jin D, Yang Y, Wu X, Zhu H, Hour A, Jin X, Zhou Y, Li Q. Phenome-wide association study in 25,639 pregnant Chinese women reveals loci associated with maternal comorbidities and child health. CELL GENOMICS 2024; 4:100632. [PMID: 39389020 DOI: 10.1016/j.xgen.2024.100632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 12/02/2023] [Accepted: 07/19/2024] [Indexed: 10/12/2024]
Abstract
Phenome-wide association studies (PheWAS) have been less focused on maternal diseases and maternal-newborn comorbidities, especially in the Chinese population. To enhance our understanding of the genetic basis of these related diseases, we conducted a PheWAS on 25,639 pregnant women and 14,151 newborns in the Chinese Han population using ultra-low-coverage whole-genome sequence (ulcWGS). We identified 2,883 maternal trait-associated SNPs associated with 26 phenotypes, among which 99.5% were near established genome-wide association study (GWAS) loci. Further refinement delineated these SNPs to 442 unique trait-associated loci (TALs) predicated on linkage disequilibrium R2 > 0.8, revealing that 75.6% demonstrated pleiotropy and 50.9% were located in genes implicated in analogous phenotypes. Notably, we discovered 21 maternal SNPs associated with 35 neonatal phenotypes, including two SNPs associated with identical complications in both mothers and children. These findings underscore the importance of integrating ulcWGS data to enrich the discoveries derived from traditional PheWAS approaches.
Collapse
Affiliation(s)
- Jintao Guo
- United Diagnostic and Research Center for Clinical Genetics, Women and Children's Hospital, School of Medicine and School of Public Health, Xiamen University, Xiamen 361102, China; National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China; Department of Hematology, School of Medicine, Xiamen University, Xiamen 361102, China; Weifang People's Hospital, Shandong Second Medical University, Shandong 261041, China
| | - Qiwei Guo
- United Diagnostic and Research Center for Clinical Genetics, Women and Children's Hospital, School of Medicine and School of Public Health, Xiamen University, Xiamen 361102, China
| | - Taoling Zhong
- National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China
| | - Chaoqun Xu
- National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China
| | - Zhongmin Xia
- United Diagnostic and Research Center for Clinical Genetics, Women and Children's Hospital, School of Medicine and School of Public Health, Xiamen University, Xiamen 361102, China
| | - Hongkun Fang
- National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China; Weifang People's Hospital, Shandong Second Medical University, Shandong 261041, China
| | - Qinwei Chen
- National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China; Department of Hematology, School of Medicine, Xiamen University, Xiamen 361102, China
| | - Ying Zhou
- National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China
| | - Jieqiong Xie
- United Diagnostic and Research Center for Clinical Genetics, Women and Children's Hospital, School of Medicine and School of Public Health, Xiamen University, Xiamen 361102, China
| | - Dandan Jin
- United Diagnostic and Research Center for Clinical Genetics, Women and Children's Hospital, School of Medicine and School of Public Health, Xiamen University, Xiamen 361102, China
| | - You Yang
- BGI-Shenzhen, Shenzhen 518103, China
| | - Xin Wu
- BGI-Shenzhen, Shenzhen 518103, China
| | | | - Ailing Hour
- Department of Life Science, Fu-Jen Catholic University, Xinzhuang Dist., New Taipei City 242, Taiwan
| | - Xin Jin
- BGI-Shenzhen, Shenzhen 518103, China
| | - Yulin Zhou
- United Diagnostic and Research Center for Clinical Genetics, Women and Children's Hospital, School of Medicine and School of Public Health, Xiamen University, Xiamen 361102, China.
| | - Qiyuan Li
- Department of Pediatrics, School of Medicine, Xiamen University, Xiamen 361102, China; National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China; Department of Hematology, School of Medicine, Xiamen University, Xiamen 361102, China.
| |
Collapse
|
2
|
Ji A, Sui Y, Xue X, Ji X, Shi W, Shi Y, Terkeltaub R, Dalbeth N, Takei R, Yan F, Sun M, Li M, Lu J, Cui L, Liu Z, Wang C, Li X, Han L, Fang Z, Sun W, Liang Y, He Y, Zheng G, Wang X, Wang J, Zhang H, Pang L, Qi H, Li Y, Cheng Z, Li Z, Xiao J, Zeng C, Merriman TR, Qu H, Fang X, Li C. Novel Genetic Loci in Early-Onset Gout Derived From Whole-Genome Sequencing of an Adolescent Gout Cohort. Arthritis Rheumatol 2024. [PMID: 39118347 DOI: 10.1002/art.42969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 08/01/2024] [Accepted: 08/05/2024] [Indexed: 08/10/2024]
Abstract
OBJECTIVE Mechanisms underlying the adolescent-onset and early-onset gout are unclear. This study aimed to discover variants associated with early-onset gout. METHODS We conducted whole-genome sequencing in a discovery adolescent-onset gout cohort of 905 individuals (gout onset 12 to 19 years) to discover common and low-frequency single-nucleotide variants (SNVs) associated with gout. Candidate common SNVs were genotyped in an early-onset gout cohort of 2,834 individuals (gout onset ≤30 years old), and meta-analysis was performed with the discovery and replication cohorts to identify loci associated with early-onset gout. Transcriptome and epigenomic analyses, quantitative real-time polymerase chain reaction and RNA sequencing in human peripheral blood leukocytes, and knock-down experiments in human THP-1 macrophage cells investigated the regulation and function of candidate gene RCOR1. RESULTS In addition to ABCG2, a urate transporter previously linked to pediatric-onset and early-onset gout, we identified two novel loci (Pmeta < 5.0 × 10-8): rs12887440 (RCOR1) and rs35213808 (FSTL5-MIR4454). Additionally, we found associations at ABCG2 and SLC22A12 that were driven by low-frequency SNVs. SNVs in RCOR1 were linked to elevated blood leukocyte messenger RNA levels. THP-1 macrophage culture studies revealed the potential of decreased RCOR1 to suppress gouty inflammation. CONCLUSION This is the first comprehensive genetic characterization of adolescent-onset gout. The identified risk loci of early-onset gout mediate inflammatory responsiveness to crystals that could mediate gouty arthritis. This study will contribute to risk prediction and therapeutic interventions to prevent adolescent-onset gout.
Collapse
Affiliation(s)
- Aichang Ji
- Affiliated Hospital of Qingdao University, Qingdao, China
| | - Yang Sui
- China National Center for Bioinformation, Beijing Institute of Genomics, Chinese Academy of Sciences, and University of Chinese Academy of Sciences, Beijing, China
| | - Xiaomei Xue
- Affiliated Hospital of Qingdao University, Qingdao, China
| | - Xiapeng Ji
- Affiliated Hospital of Qingdao University, Qingdao, China
| | - Wenrui Shi
- China National Center for Bioinformation, Beijing Institute of Genomics, Chinese Academy of Sciences, and University of Chinese Academy of Sciences, Beijing, China
| | - Yongyong Shi
- Affiliated Hospital of Qingdao University and Biomedical Sciences Institute of Qingdao University (Qingdao Branch of SJTU Bio-X Institutes), Qingdao University, Qingdao, China, and Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai, China
| | | | | | - Riku Takei
- Asia Pacific Gout Consortium and University of Alabama at Birmingham
| | - Fei Yan
- Affiliated Hospital of Qingdao University, Qingdao, China
| | - Mingshu Sun
- Shandong Provincial Clinical Research Center for Immune Diseases and Gout & Shandong Provincial Key Laboratory of Metabolic Diseases, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Maichao Li
- Affiliated Hospital of Qingdao University, Qingdao, China
| | - Jie Lu
- Affiliated Hospital of Qingdao University, Qingdao, China
| | - Lingling Cui
- Affiliated Hospital of Qingdao University, Qingdao, China
| | - Zhen Liu
- Affiliated Hospital of Qingdao University, Qingdao, China
| | - Can Wang
- Affiliated Hospital of Qingdao University, Qingdao, China
| | - Xinde Li
- Affiliated Hospital of Qingdao University, Qingdao, China
| | - Lin Han
- Affiliated Hospital of Qingdao University, Qingdao, China
| | - Zhanjie Fang
- China National Center for Bioinformation, Beijing Institute of Genomics, Chinese Academy of Sciences, and University of Chinese Academy of Sciences, Beijing, China
| | - Wenyan Sun
- Affiliated Hospital of Qingdao University, Qingdao, China
| | - Yue Liang
- China National Center for Bioinformation, Beijing Institute of Genomics, Chinese Academy of Sciences, and University of Chinese Academy of Sciences, Beijing, China
| | - Yuwei He
- Affiliated Hospital of Qingdao University, Qingdao, China
| | - Guangmin Zheng
- China National Center for Bioinformation, Beijing Institute of Genomics, Chinese Academy of Sciences, and University of Chinese Academy of Sciences, Beijing, China
| | - Xuefeng Wang
- Affiliated Hospital of Qingdao University, Qingdao, China
| | - Jiayi Wang
- Development Center for Medical Science & Technology, National Health Commission of the People's Republic of China, Beijing, China
| | - Hui Zhang
- Institute of Metabolic Diseases, Qingdao University, Qingdao, China
| | - Lei Pang
- Affiliated Hospital of Qingdao University, Qingdao, China
| | - Han Qi
- Affiliated Hospital of Qingdao University, Qingdao, China
| | - Yushuang Li
- Affiliated Hospital of Qingdao University, Qingdao, China
| | - Zan Cheng
- Affiliated Hospital of Qingdao University, Qingdao, China
| | - Zhiqiang Li
- The Biomedical Sciences Institute and The Affiliated Hospital of Qingdao University, Qingdao University, Qingdao, Shandong, China
| | - Jingfa Xiao
- China National Center for Bioinformation, Beijing Institute of Genomics, Chinese Academy of Sciences, and University of Chinese Academy of Sciences, Beijing, China
| | - Changqing Zeng
- China National Center for Bioinformation, Beijing Institute of Genomics, Chinese Academy of Sciences, and University of Chinese Academy of Sciences, Beijing, China
| | - Tony R Merriman
- Asia Pacific Gout Consortium, University of Alabama at Birmingham, Institute of Metabolic Diseases, Qingdao University, Qingdao, China, and University of Otago, Dunedin, New Zealand
| | - Hongzhu Qu
- China National Center for Bioinformation, Beijing Institute of Genomics, Chinese Academy of Sciences, University of Chinese Academy of Sciences, and Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing, China
| | - Xiangdong Fang
- China National Center for Bioinformation, Beijing Institute of Genomics, Chinese Academy of Sciences, University of Chinese Academy of Sciences, and Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing, China
| | - Changgui Li
- The Affiliated Hospital of Qingdao University, Qingdao, China, Asia Pacific Gout Consortium, and Institute of Metabolic Diseases, Qingdao University, Qingdao, China
| |
Collapse
|
3
|
Bougiouri K, Aninta SG, Charlton S, Harris A, Carmagnini A, Piličiauskienė G, Feuerborn TR, Scarsbrook L, Tabadda K, Blaževičius P, Parker HG, Gopalakrishnan S, Larson G, Ostrander EA, Irving-Pease EK, Frantz LA, Racimo F. Imputation of ancient canid genomes reveals inbreeding history over the past 10,000 years. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.15.585179. [PMID: 38903121 PMCID: PMC11188068 DOI: 10.1101/2024.03.15.585179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/22/2024]
Abstract
The multi-millenia long history between dogs and humans has placed them at the forefront of archeological and genomic research. Despite ongoing efforts including the analysis of ancient dog and wolf genomes, many questions remain regarding their geographic and temporal origins, and the microevolutionary processes that led to the diversity of breeds today. Although ancient genomes provide valuable information, their use is hindered by low depth of coverage and post-mortem damage, which inhibits confident genotype calling. In the present study, we assess how genotype imputation of ancient dog and wolf genomes, utilising a large reference panel, can improve the resolution provided by ancient datasets. Imputation accuracy was evaluated by down-sampling high coverage dog and wolf genomes to 0.05-2x coverage and comparing concordance between imputed and high coverage genotypes. We measured the impact of imputation on principal component analyses and runs of homozygosity. Our findings show high (R2>0.9) imputation accuracy for dogs with coverage as low as 0.5x and for wolves as low as 1.0x. We then imputed a dataset of 90 ancient dog and wolf genomes, to assess changes in inbreeding during the last 10,000 years of dog evolution. Ancient dog and wolf populations generally exhibited lower inbreeding levels than present-day individuals. Interestingly, regions with low ROH density maintained across ancient and present-day samples were significantly associated with genes related to olfaction and immune response. Our study indicates that imputing ancient canine genomes is a viable strategy that allows for the use of analytical methods previously limited to high-quality genetic data.
Collapse
Affiliation(s)
- Katia Bougiouri
- Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Sabhrina Gita Aninta
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Sophy Charlton
- BioArCh, Department of Archaeology, University of York, York, UK
| | - Alex Harris
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Alberto Carmagnini
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK
- Palaeogenomics Group, Department of Veterinary Sciences, Ludwig Maximilian University, Munich, Germany
| | - Giedrė Piličiauskienė
- Department of Archeology, Faculty of History, Vilnius University, Vilnius, Lithuania
| | - Tatiana R. Feuerborn
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Lachie Scarsbrook
- The Palaeogenomics and Bio-archaeology Research Network, Research Laboratory for Archaeology and History of Art, University of Oxford, Oxford, UK
| | - Kristina Tabadda
- The Palaeogenomics and Bio-archaeology Research Network, Research Laboratory for Archaeology and History of Art, University of Oxford, Oxford, UK
| | - Povilas Blaževičius
- Department of Archeology, Faculty of History, Vilnius University, Vilnius, Lithuania
- National Museum of Lithuania, Vilnius, Lithuania
| | - Heidi G. Parker
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Shyam Gopalakrishnan
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Greger Larson
- The Palaeogenomics and Bio-archaeology Research Network, Research Laboratory for Archaeology and History of Art, University of Oxford, Oxford, UK
| | - Elaine A. Ostrander
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Evan K. Irving-Pease
- Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Laurent A.F. Frantz
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK
- Palaeogenomics Group, Department of Veterinary Sciences, Ludwig Maximilian University, Munich, Germany
| | - Fernando Racimo
- Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
4
|
French JN, Pua VB, Laboulaye R, Leal TP, Olivas MC, Lima-Costa MF, Horta BL, Barreto ML, Tarazona-Santos E, Mata I, O’Connor TD. Comparing the effect of imputation reference panel composition in four distinct Latin American cohorts. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.11.589057. [PMID: 38659746 PMCID: PMC11042191 DOI: 10.1101/2024.04.11.589057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Genome-wide association studies have been useful in identifying genetic risk factors for various phenotypes. These studies rely on imputation and many existing panels are largely composed of individuals of European ancestry, resulting in lower levels of imputation quality in underrepresented populations. We aim to analyze how the composition of imputation reference panels affects imputation quality in four target Latin American cohorts. We compared imputation quality for chromosomes 7 and X when altering the imputation reference panel by: 1) increasing the number of Latin American individuals; 2) excluding either Latin American, African, or European individuals, or 3) increasing the Indigenous American (IA) admixture proportions of included Latin Americans. We found that increasing the number of Latin Americans in the reference panel improved imputation quality in the four populations; however, there were differences between chromosomes 7 and X in some cohorts. Excluding Latin Americans from analysis resulted in worse imputation quality in every cohort, while differential effects were seen when excluding Europeans and Africans between and within cohorts and between chromosomes 7 and X. Finally, increasing IA-like admixture proportions in the reference panel increased imputation quality at different levels in different populations. The difference in results between populations and chromosomes suggests that existing and future reference panels containing Latin American individuals are likely to perform differently in different Latin American populations.
Collapse
Affiliation(s)
- Jennifer N French
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD
| | - Victor Borda Pua
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD
- University of Maryland Institute for Health Computing, Rockville, MD
| | - Roland Laboulaye
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD
| | - Thiago Peixoto Leal
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, USA
| | - Mario Cornejo Olivas
- Neurogenetics Working Group, Universidad Cientifica del Sur, Lima, Peru
- Neurogenetics Research Center, Instituto Nacional de Ciencias Neurologicas, Lima, Peru
| | | | - Bernardo L Horta
- Postgraduate Program in Epidemiology, Federal University of Pelotas, Pelotas, Brazil
| | - Mauricio L Barreto
- Center for Data and Knowledge Integration for Health (CIDACS), Gonçalo Moniz Institute (IGM), Oswaldo Cruz Foundation (FIOCRUZ-BA), Salvador, Bahia, Brazil
- Collective Health Institute, Federal University of Bahia (UFBA), Salvador, Bahia, Brazil
| | - Eduardo Tarazona-Santos
- Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Brazil
| | - Ignacio Mata
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, USA
| | - Timothy D. O’Connor
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD
- Program in Health Equity and Population Health, University of Maryland School of Medicine
| |
Collapse
|
5
|
Bhérer C, Eveleigh R, Trajanoska K, St-Cyr J, Paccard A, Nadukkalam Ravindran P, Caron E, Bader Asbah N, McClelland P, Wei C, Baumgartner I, Schindewolf M, Döring Y, Perley D, Lefebvre F, Lepage P, Bourgey M, Bourque G, Ragoussis J, Mooser V, Taliun D. A cost-effective sequencing method for genetic studies combining high-depth whole exome and low-depth whole genome. NPJ Genom Med 2024; 9:8. [PMID: 38326393 PMCID: PMC10850497 DOI: 10.1038/s41525-024-00390-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 12/07/2023] [Indexed: 02/09/2024] Open
Abstract
Whole genome sequencing (WGS) at high-depth (30X) allows the accurate discovery of variants in the coding and non-coding DNA regions and helps elucidate the genetic underpinnings of human health and diseases. Yet, due to the prohibitive cost of high-depth WGS, most large-scale genetic association studies use genotyping arrays or high-depth whole exome sequencing (WES). Here we propose a cost-effective method which we call "Whole Exome Genome Sequencing" (WEGS), that combines low-depth WGS and high-depth WES with up to 8 samples pooled and sequenced simultaneously (multiplexed). We experimentally assess the performance of WEGS with four different depth of coverage and sample multiplexing configurations. We show that the optimal WEGS configurations are 1.7-2.0 times cheaper than standard WES (no-plexing), 1.8-2.1 times cheaper than high-depth WGS, reach similar recall and precision rates in detecting coding variants as WES, and capture more population-specific variants in the rest of the genome that are difficult to recover when using genotype imputation methods. We apply WEGS to 862 patients with peripheral artery disease and show that it directly assesses more known disease-associated variants than a typical genotyping array and thousands of non-imputable variants per disease-associated locus.
Collapse
Affiliation(s)
- Claude Bhérer
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canada Excellence Research Chair in Genomic Medicine, McGill University, Montréal, Québec, Canada
| | - Robert Eveleigh
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canadian Centre for Computational Genomics, McGill University, Montréal, Québec, Canada
| | - Katerina Trajanoska
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canada Excellence Research Chair in Genomic Medicine, McGill University, Montréal, Québec, Canada
| | - Janick St-Cyr
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
| | - Antoine Paccard
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
| | - Praveen Nadukkalam Ravindran
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canada Excellence Research Chair in Genomic Medicine, McGill University, Montréal, Québec, Canada
| | - Elizabeth Caron
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
| | - Nimara Bader Asbah
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
| | - Peyton McClelland
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canada Excellence Research Chair in Genomic Medicine, McGill University, Montréal, Québec, Canada
| | - Clare Wei
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canada Excellence Research Chair in Genomic Medicine, McGill University, Montréal, Québec, Canada
| | - Iris Baumgartner
- Division of Angiology, Swiss Cardiovascular Center, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- Department for BioMedical Research (DBMR), University of Bern, Bern, Switzerland
| | - Marc Schindewolf
- Division of Angiology, Swiss Cardiovascular Center, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- Department for BioMedical Research (DBMR), University of Bern, Bern, Switzerland
| | - Yvonne Döring
- Division of Angiology, Swiss Cardiovascular Center, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- Department for BioMedical Research (DBMR), University of Bern, Bern, Switzerland
- Institute for Cardiovascular Prevention (IPEK), Ludwig-Maximilians University Munich, Pettenkoferstr 9, 80336, Munich, Germany
| | - Danielle Perley
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canadian Centre for Computational Genomics, McGill University, Montréal, Québec, Canada
| | - François Lefebvre
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canadian Centre for Computational Genomics, McGill University, Montréal, Québec, Canada
| | - Pierre Lepage
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
| | | | - Guillaume Bourque
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canadian Centre for Computational Genomics, McGill University, Montréal, Québec, Canada
| | - Jiannis Ragoussis
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
| | - Vincent Mooser
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canada Excellence Research Chair in Genomic Medicine, McGill University, Montréal, Québec, Canada
| | - Daniel Taliun
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada.
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada.
- Canada Excellence Research Chair in Genomic Medicine, McGill University, Montréal, Québec, Canada.
| |
Collapse
|
6
|
Herzig AF, Velo-Suárez L, Dina C, Redon R, Deleuze JF, Génin E. How local reference panels improve imputation in French populations. Sci Rep 2024; 14:370. [PMID: 38172507 PMCID: PMC10764714 DOI: 10.1038/s41598-023-49931-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 12/13/2023] [Indexed: 01/05/2024] Open
Abstract
Imputation servers offer the exclusive possibility to harness the largest public reference panels which have been shown to deliver very high precision in the imputation of European genomes. Many studies have nonetheless stressed the importance of 'study specific panels' (SSPs) as an alternative and have shown the benefits of combining public reference panels with SSPs. But such combined approaches are not attainable when using external imputation servers. To investigate how to confront this challenge, we imputed 550 French individuals using either the University of Michigan imputation server with the Haplotype Reference Consortium (HRC) panel or an in-house SSP of 850 whole-genome sequenced French individuals. With approximate geo-localization of both our target and SSP individuals we are able to pinpoint different scenarios where SSP-based imputation would be preferred over server-based imputation or vice-versa. This is achieved by showing to a high degree of resolution the importance of the proximity of the reference panel to target individuals; with a focus on the clear added value of SSPs for estimating haplotype phase and for the imputation of rare variants (minor allele-frequency below 0.01). Such benefits were most evident for individuals from the same geographical regions in France as the SSP individuals. Overall, only 42.3% of all 125,442 variants evaluated were better imputed with an SSP from France compared to an external reference panel, however this rises to 58.1% for individuals from geographic regions well covered by the SSP. By investigating haplotype sharing and population fine-structure in France, we show the importance of including SSP haplotypes for imputation but also that they should ideally be combined with large public panels. In the absence of the unattainable results from a combined panel of the HRC and our French SSP, we put forward a pragmatic solution where server-based and SSP-based imputation outcomes can be combined based on comparing posterior genotype probabilities. We show that such an approach can give a level of imputation accuracy in excess of what could be achieved with either strategy alone. The results presented provide detailed insights into the accuracy of imputation that should be expected from different strategies for European populations.
Collapse
Affiliation(s)
| | - Lourdes Velo-Suárez
- Univ Brest, Inserm, EFS, UMR 1078, GGB, Brest, France
- CHRU Brest, Brest, France
| | - Christian Dina
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du thorax, Nantes, France
| | - Richard Redon
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du thorax, Nantes, France
| | - Jean-François Deleuze
- Centre National de Recherche en Génomique Humaine (CNRGH), Université Paris-Saclay, CEA, Evry, France
- Fondation Jean Dausset - Centre d'Etude du Polymorphisme Humain (CEPH), Paris, France
| | - Emmanuelle Génin
- Univ Brest, Inserm, EFS, UMR 1078, GGB, Brest, France
- CHRU Brest, Brest, France
| |
Collapse
|
7
|
Xu J, Liu D, Hassan A, Genovese G, Cote AC, Fennessy B, Cheng E, Charney AW, Knowles JA, Ayub M, Peterson RE, Bigdeli TB, Huckins LM. Evaluation of imputation performance of multiple reference panels in a Pakistani population. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.22.23300448. [PMID: 38234809 PMCID: PMC10793543 DOI: 10.1101/2023.12.22.23300448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Genotype imputation is crucial for GWAS, but reference panels and existing benchmarking studies prioritize European individuals. Consequently, it is unclear which publicly available reference panel should be used for Pakistani individuals, and whether ancestry composition or sample size of the panel matters more for imputation accuracy. Our study compared different reference panels to impute genotype data in 1814 Pakistani individuals, finding the best performance balancing accuracy and coverage with meta-imputation with TOPMed and the expanded 1000 Genomes (ex1KG) reference. Imputation accuracy of ex1KG outperformed TOPMed despite its 30-fold smaller sample size, supporting efforts to create future panels with diverse populations.
Collapse
Affiliation(s)
- Jiayi Xu
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA
| | - Dongjing Liu
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Arsalan Hassan
- University of Peshawar, Peshawar, Khyber Pakhtunkhwa, Pakistan
- Institute of Omics and Health Research, Lahore, Pakistan
| | - Giulio Genovese
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Alanna C. Cote
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Brian Fennessy
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Esther Cheng
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | | - James A. Knowles
- The Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, USA
| | | | - Roseann E. Peterson
- Department of Psychiatry and Behavioral Sciences, Institute for Genomics in Health, State University of New York Downstate Health Sciences University, Brooklyn, NY, USA
| | - Tim B. Bigdeli
- Department of Psychiatry and Behavioral Sciences, Institute for Genomics in Health, State University of New York Downstate Health Sciences University, Brooklyn, NY, USA
| | - Laura M. Huckins
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA
| |
Collapse
|
8
|
Bakulski KM, Blostein F, London SJ. Linking Prenatal Environmental Exposures to Lifetime Health with Epigenome-Wide Association Studies: State-of-the-Science Review and Future Recommendations. ENVIRONMENTAL HEALTH PERSPECTIVES 2023; 131:126001. [PMID: 38048101 PMCID: PMC10695268 DOI: 10.1289/ehp12956] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 10/06/2023] [Accepted: 10/16/2023] [Indexed: 12/05/2023]
Abstract
BACKGROUND The prenatal environment influences lifetime health; epigenetic mechanisms likely predominate. In 2016, the first international consortium paper on cigarette smoking during pregnancy and offspring DNA methylation identified extensive, reproducible exposure signals. This finding raised expectations for epigenome-wide association studies (EWAS) of other exposures. OBJECTIVE We review the current state-of-the-science for DNA methylation associations across prenatal exposures in humans and provide future recommendations. METHODS We reviewed 134 prenatal environmental EWAS of DNA methylation in newborns, focusing on 51 epidemiological studies with meta-analysis or replication testing. Exposures spanned cigarette smoking, alcohol consumption, air pollution, dietary factors, psychosocial stress, metals, other chemicals, and other exogenous factors. Of the reproducible DNA methylation signatures, we examined implementation as exposure biomarkers. RESULTS Only 19 (14%) of these prenatal EWAS were conducted in cohorts of 1,000 or more individuals, reflecting the still early stage of the field. To date, the largest perinatal EWAS sample size was 6,685 participants. For comparison, the most recent genome-wide association study for birth weight included more than 300,000 individuals. Replication, at some level, was successful with exposures to cigarette smoking, folate, dietary glycemic index, particulate matter with aerodynamic diameter < 10 μ m and < 2.5 μ m , nitrogen dioxide, mercury, cadmium, arsenic, electronic waste, PFAS, and DDT. Reproducible effects of a more limited set of prenatal exposures (smoking, folate) enabled robust methylation biomarker creation. DISCUSSION Current evidence demonstrates the scientific premise for reproducible DNA methylation exposure signatures. Better powered EWAS could identify signatures across many exposures and enable comprehensive biomarker development. Whether methylation biomarkers of exposures themselves cause health effects remains unclear. We expect that larger EWAS with enhanced coverage of epigenome and exposome, along with improved single-cell technologies and evolving methods for integrative multi-omics analyses and causal inference, will expand mechanistic understanding of causal links between environmental exposures, the epigenome, and health outcomes throughout the life course. https://doi.org/10.1289/EHP12956.
Collapse
Affiliation(s)
| | - Freida Blostein
- University of Michigan, Ann Arbor, Michigan, USA
- Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Stephanie J. London
- National Institute of Environmental Health Sciences, National Institutes of Health, U.S. Department of Health and Human Services, Research Triangle Park, North Carolina, USA
| |
Collapse
|
9
|
Bebo A, Jarmul JA, Pletcher MJ, Hasbani NR, Couper D, Nambi V, Ballantyne CM, Fornage M, Morrison AC, Avery CL, de Vries PS. Coronary heart disease and ischemic stroke polygenic risk scores and atherosclerotic cardiovascular disease in a diverse, population-based cohort study. PLoS One 2023; 18:e0285259. [PMID: 37327218 PMCID: PMC10275447 DOI: 10.1371/journal.pone.0285259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 04/18/2023] [Indexed: 06/18/2023] Open
Abstract
The predictive ability of coronary heart disease (CHD) and ischemic stroke (IS) polygenic risk scores (PRS) have been evaluated individually, but whether they predict the combined outcome of atherosclerotic cardiovascular disease (ASCVD) remains insufficiently researched. It is also unclear whether associations of the CHD and IS PRS with ASCVD are independent of subclinical atherosclerosis measures. 7,286 White and 2,016 Black participants from the population-based Atherosclerosis Risk in Communities study who were free of cardiovascular disease and type 2 diabetes at baseline were included. We computed previously validated CHD and IS PRS consisting of 1,745,179 and 3,225,583 genetic variants, respectively. Cox proportional hazards models were used to test the association between each PRS and ASCVD, adjusting for traditional risk factors, ankle-brachial index, carotid intima media thickness, and carotid plaque. The hazard ratios (HR) for the CHD and IS PRS were significant with HR of 1.50 (95% CI: 1.36-1.66) and 1.31 (95% CI: 1.18-1.45) respectively for the risk of incident ASCVD per standard deviation increase in CHD and IS PRS among White participants after adjusting for traditional risk factors. The HR for the CHD PRS was not significant with an HR of 0.95 (95% CI: 0.79-1.13) for the risk of incident ASCVD in Black participants. The HR for the IS PRS was significant with an HR of 1.26 (95%CI: 1.05-1.51) for the risk of incident ASCVD in Black participants. The association of the CHD and IS PRS with ASCVD was not attenuated in White participants after adjustment for ankle-brachial index, carotid intima media thickness, and carotid plaque. The CHD and IS PRS do not cross-predict well, and predict better the outcome for which they were created than the composite ASCVD outcome. Thus, the use of the composite outcome of ASCVD may not be ideal for genetic risk prediction.
Collapse
Affiliation(s)
- Allison Bebo
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States of America
| | - Jamie A. Jarmul
- Gillings School of Public Health, Department of Health Policy and Management, University of North Carolina, Chapel Hill, Chapel Hill, NC, United States of America
- School of Medicine, University of North Carolina, Chapel Hill, Chapel Hill, NC, United States of America
| | - Mark J. Pletcher
- Departments of Epidemiology and Biostatistics and Medicine, University of California, San Francisco, San Francisco, CA, United States of America
| | - Natalie R. Hasbani
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States of America
| | - David Couper
- Gillings School of Public Health, Department of Biostatistics, University of North Carolina, Chapel Hill, Chapel Hill, NC, United States of America
- Collaborative Studies Coordinating Center, University of North Carolina, Chapel Hill, Chapel Hill, NC, United States of America
| | - Vijay Nambi
- Baylor College of Medicine, Houston, TX, United States of America
- Michael E DeBakey Veterans Affairs Medical Center, Houston, TX, United States of America
| | | | - Myriam Fornage
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States of America
- McGovern Medical School Institute of Molecular Medicine Research Center for Human Genetics, Houston, TX, United States of America
| | - Alanna C. Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States of America
| | - Christy L. Avery
- Gillings School of Public Health, Department of Epidemiology, University of North Carolina, Chapel Hill, Chapel Hill, NC, United States of America
- Carolina Population Center, University of North Carolina–Chapel Hill, Chapel Hill, NC, United States of America
| | - Paul S. de Vries
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States of America
| |
Collapse
|
10
|
Li S, Yan B, Li TKT, Lu J, Gu Y, Tan Y, Gong F, Lam TW, Xie P, Wang Y, Lin G, Luo R. Ultra-low-coverage genome-wide association study-insights into gestational age using 17,844 embryo samples with preimplantation genetic testing. Genome Med 2023; 15:10. [PMID: 36788602 PMCID: PMC9926832 DOI: 10.1186/s13073-023-01158-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 01/26/2023] [Indexed: 02/16/2023] Open
Abstract
BACKGROUND Very low-coverage (0.1 to 1×) whole genome sequencing (WGS) has become a promising and affordable approach to discover genomic variants of human populations for genome-wide association study (GWAS). To support genetic screening using preimplantation genetic testing (PGT) in a large population, the sequencing coverage goes below 0.1× to an ultra-low level. However, the feasibility and effectiveness of ultra-low-coverage WGS (ulcWGS) for GWAS remains undetermined. METHODS We built a pipeline to carry out analysis of ulcWGS data for GWAS. To examine its effectiveness, we benchmarked the accuracy of genotype imputation at the combination of different coverages below 0.1× and sample sizes from 2000 to 16,000, using 17,844 embryo PGT samples with approximately 0.04× average coverage and the standard Chinese sample HG005 with known genotypes. We then applied the imputed genotypes of 1744 transferred embryos who have gestational ages and complete follow-up records to GWAS. RESULTS The accuracy of genotype imputation under ultra-low coverage can be improved by increasing the sample size and applying a set of filters. From 1744 born embryos, we identified 11 genomic risk loci associated with gestational ages and 166 genes mapped to these loci according to positional, expression quantitative trait locus, and chromatin interaction strategies. Among these mapped genes, CRHBP, ICAM1, and OXTR were more frequently reported as preterm birth related. By joint analysis of gene expression data from previous studies, we constructed interrelationships of mainly CRHBP, ICAM1, PLAGL1, DNMT1, CNTLN, DKK1, and EGR2 with preterm birth, infant disease, and breast cancer. CONCLUSIONS This study not only demonstrates that ulcWGS could achieve relatively high accuracy of adequate genotype imputation and is capable of GWAS, but also provides insights into the associations between gestational age and genetic variations of the fetal embryos from Chinese population.
Collapse
Affiliation(s)
- Shumin Li
- grid.194645.b0000000121742757Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Bin Yan
- grid.194645.b0000000121742757Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Thomas K. T. Li
- grid.415550.00000 0004 1764 4144Department of Obstetrics & Gynecology, Queen Mary Hospital, The University of Hong Kong, Hong Kong, China
| | - Jianliang Lu
- grid.194645.b0000000121742757Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Yifan Gu
- grid.216417.70000 0001 0379 7164NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, School of Basic Medical Science, Institute of Reproductive and Stem Cell Engineering, Central South University, Changsha, 410008 Hunan China ,grid.477823.d0000 0004 1756 593XClinical Research Center for Reproduction and Genetics in Hunan Province, Reproductive and Genetic Hospital of CITIC-Xiangya, Changsha, 410013 Hunan China
| | - Yueqiu Tan
- grid.216417.70000 0001 0379 7164NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, School of Basic Medical Science, Institute of Reproductive and Stem Cell Engineering, Central South University, Changsha, 410008 Hunan China ,grid.477823.d0000 0004 1756 593XClinical Research Center for Reproduction and Genetics in Hunan Province, Reproductive and Genetic Hospital of CITIC-Xiangya, Changsha, 410013 Hunan China
| | - Fei Gong
- grid.216417.70000 0001 0379 7164NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, School of Basic Medical Science, Institute of Reproductive and Stem Cell Engineering, Central South University, Changsha, 410008 Hunan China ,grid.477823.d0000 0004 1756 593XClinical Research Center for Reproduction and Genetics in Hunan Province, Reproductive and Genetic Hospital of CITIC-Xiangya, Changsha, 410013 Hunan China
| | - Tak-Wah Lam
- grid.194645.b0000000121742757Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Pingyuan Xie
- Hunan Normal University School of Medicine, Changsha, 410013, Hunan, China. .,National Engineering and Research Center of Human Stem Cell, Changsha, Hunan, China.
| | - Yuexuan Wang
- Department of Computer Science, The University of Hong Kong, Hong Kong, China. .,College of Computer Science and Technology, Zhejiang University, Hangzhou, China.
| | - Ge Lin
- NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, School of Basic Medical Science, Institute of Reproductive and Stem Cell Engineering, Central South University, Changsha, 410008, Hunan, China. .,Clinical Research Center for Reproduction and Genetics in Hunan Province, Reproductive and Genetic Hospital of CITIC-Xiangya, Changsha, 410013, Hunan, China. .,National Engineering and Research Center of Human Stem Cell, Changsha, Hunan, China.
| | - Ruibang Luo
- Department of Computer Science, The University of Hong Kong, Hong Kong, China.
| |
Collapse
|
11
|
Dekeyser T, Génin E, Herzig AF. Opening the Black Box of Imputation Software to Study the Impact of Reference Panel Composition on Performance. Genes (Basel) 2023; 14:410. [PMID: 36833337 PMCID: PMC9956390 DOI: 10.3390/genes14020410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 01/22/2023] [Accepted: 01/30/2023] [Indexed: 02/09/2023] Open
Abstract
Genotype imputation is widely used to enrich genetic datasets. The operation relies on panels of known reference haplotypes, typically with whole-genome sequencing data. How to choose a reference panel has been widely studied and it is essential to have a panel that is well matched to the individuals who require missing genotype imputation. However, it is broadly accepted that such an imputation panel will have an enhanced performance with the inclusion of diversity (haplotypes from many different populations). We investigate this observation by examining, in fine detail, exactly which reference haplotypes are contributing at different regions of the genome. This is achieved using a novel method of inserting synthetic genetic variation into the reference panel in order to track the performance of leading imputation algorithms. We show that while diversity may globally improve imputation accuracy, there can be occasions where incorrect genotypes are imputed following the inclusion of more diverse haplotypes in the reference panel. We, however, demonstrate a technique for retaining and benefitting from the diversity in the reference panel whilst avoiding the occasional adverse effects on imputation accuracy. What is more, our results more clearly elucidate the role of diversity in a reference panel than has been shown in previous studies.
Collapse
Affiliation(s)
- Thibault Dekeyser
- Inserm, Université de Brest, EFS, UMR 1078, GGB, F-29200 Brest, France
- CHRU Brest, F-29200 Brest, France
| | - Emmanuelle Génin
- Inserm, Université de Brest, EFS, UMR 1078, GGB, F-29200 Brest, France
- CHRU Brest, F-29200 Brest, France
| | - Anthony F. Herzig
- Inserm, Université de Brest, EFS, UMR 1078, GGB, F-29200 Brest, France
| |
Collapse
|
12
|
De Marino A, Mahmoud AA, Bose M, Bircan KO, Terpolovsky A, Bamunusinghe V, Bohn S, Khan U, Novković B, Yazdi PG. A comparative analysis of current phasing and imputation software. PLoS One 2022; 17:e0260177. [PMID: 36260643 PMCID: PMC9581364 DOI: 10.1371/journal.pone.0260177] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 09/01/2022] [Indexed: 12/02/2022] Open
Abstract
Whole-genome data has become significantly more accessible over the last two decades. This can largely be attributed to both reduced sequencing costs and imputation models which make it possible to obtain nearly whole-genome data from less expensive genotyping methods, such as microarray chips. Although there are many different approaches to imputation, the Hidden Markov Model (HMM) remains the most widely used. In this study, we compared the latest versions of the most popular HMM-based tools for phasing and imputation: Beagle5.4, Eagle2.4.1, Shapeit4, Impute5 and Minimac4. We benchmarked them on four input datasets with three levels of chip density. We assessed each imputation software on the basis of accuracy, speed and memory usage, and showed how the choice of imputation accuracy metric can result in different interpretations. The highest average concordance rate was achieved by Beagle5.4, followed by Impute5 and Minimac4, using a reference-based approach during phasing and the highest density chip. IQS and R2 metrics revealed that Impute5 and Minimac4 obtained better results for low frequency markers, while Beagle5.4 remained more accurate for common markers (MAF>5%). Computational load as measured by run time was lower for Beagle5.4 than Minimac4 and Impute5, while Minimac4 utilized the least memory of the imputation tools we compared. ShapeIT4, used the least memory of the phasing tools examined with genotype chip data, while Eagle2.4.1 used the least memory phasing WGS data. Finally, we determined the combination of phasing software, imputation software, and reference panel, best suited for different situations and analysis needs and created an automated pipeline that provides a way for users to create customized chips designed to optimize their imputation results.
Collapse
Affiliation(s)
- Adriano De Marino
- Research & Development, SelfDecode, Miami, FL, United States of America
| | | | - Madhuchanda Bose
- Research & Development, SelfDecode, Miami, FL, United States of America
| | | | | | | | - Sandra Bohn
- Research & Development, SelfDecode, Miami, FL, United States of America
| | - Umar Khan
- Research & Development, SelfDecode, Miami, FL, United States of America
| | - Biljana Novković
- Research & Development, SelfDecode, Miami, FL, United States of America
| | - Puya G. Yazdi
- Research & Development, SelfDecode, Miami, FL, United States of America
- * E-mail:
| |
Collapse
|
13
|
Developing CIRdb as a catalog of natural genetic variation in the Canary Islanders. Sci Rep 2022; 12:16132. [PMID: 36168029 PMCID: PMC9514705 DOI: 10.1038/s41598-022-20442-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 09/13/2022] [Indexed: 11/29/2022] Open
Abstract
The current inhabitants of the Canary Islands have a unique genetic makeup in the European diversity landscape due to the existence of African footprints from recent admixture events, especially of North African components (> 20%). The underrepresentation of non-Europeans in genetic studies and the sizable North African ancestry, which is nearly absent from all existing catalogs of worldwide genetic diversity, justify the need to develop CIRdb, a population-specific reference catalog of natural genetic variation in the Canary Islanders. Based on array genotyping of the selected unrelated donors and comparisons against available datasets from European, sub-Saharan, and North African populations, we illustrate the intermediate genetic differentiation of Canary Islanders between Europeans and North Africans and the existence of within-population differences that are likely driven by genetic isolation. Here we describe the overall design and the methods that are being implemented to further develop CIRdb. This resource will help to strengthen the implementation of Precision Medicine in this population by contributing to increase the diversity in genetic studies. Among others, this will translate into improved ability to fine map disease genes and simplify the identification of causal variants and estimate the prevalence of unattended Mendelian diseases.
Collapse
|
14
|
Avery CL, Howard AG, Ballou AF, Buchanan VL, Collins JM, Downie CG, Engel SM, Graff M, Highland HM, Lee MP, Lilly AG, Lu K, Rager JE, Staley BS, North KE, Gordon-Larsen P. Strengthening Causal Inference in Exposomics Research: Application of Genetic Data and Methods. ENVIRONMENTAL HEALTH PERSPECTIVES 2022; 130:55001. [PMID: 35533073 PMCID: PMC9084332 DOI: 10.1289/ehp9098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 04/08/2022] [Accepted: 04/12/2022] [Indexed: 05/11/2023]
Abstract
Advances in technologies to measure a broad set of exposures have led to a range of exposome research efforts. Yet, these efforts have insufficiently integrated methods that incorporate genetic data to strengthen causal inference, despite evidence that many exposome-associated phenotypes are heritable. Objective: We demonstrate how integration of methods and study designs that incorporate genetic data can strengthen causal inference in exposomics research by helping address six challenges: reverse causation and unmeasured confounding, comprehensive examination of phenotypic effects, low efficiency, replication, multilevel data integration, and characterization of tissue-specific effects. Examples are drawn from studies of biomarkers and health behaviors, exposure domains where the causal inference methods we describe are most often applied. Discussion: Technological, computational, and statistical advances in genotyping, imputation, and analysis, combined with broad data sharing and cross-study collaborations, offer multiple opportunities to strengthen causal inference in exposomics research. Full application of these opportunities will require an expanded understanding of genetic variants that predict exposome phenotypes as well as an appreciation that the utility of genetic variants for causal inference will vary by exposure and may depend on large sample sizes. However, several of these challenges can be addressed through international scientific collaborations that prioritize data sharing. Ultimately, we anticipate that efforts to better integrate methods that incorporate genetic data will extend the reach of exposomics research by helping address the challenges of comprehensively measuring the exposome and its health effects across studies, the life course, and in varied contexts and diverse populations. https://doi.org/10.1289/EHP9098.
Collapse
Affiliation(s)
- Christy L Avery
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Annie Green Howard
- Department of Biostatistics, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Anna F Ballou
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Victoria L Buchanan
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Jason M Collins
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Carolina G Downie
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Stephanie M Engel
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Mariaelisa Graff
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Heather M Highland
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Moa P Lee
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Adam G Lilly
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Department of Sociology, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Kun Lu
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Julia E Rager
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Brooke S Staley
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Kari E North
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Penny Gordon-Larsen
- Department of Nutrition, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
15
|
Sun Q, Liu W, Rosen JD, Huang L, Pace RG, Dang H, Gallins PJ, Blue EE, Ling H, Corvol H, Strug LJ, Bamshad MJ, Gibson RL, Pugh EW, Blackman SM, Cutting GR, O'Neal WK, Zhou YH, Wright FA, Knowles MR, Wen J, Li Y. Leveraging TOPMed imputation server and constructing a cohort-specific imputation reference panel to enhance genotype imputation among cystic fibrosis patients. HGG ADVANCES 2022; 3:100090. [PMID: 35128485 PMCID: PMC8804187 DOI: 10.1016/j.xhgg.2022.100090] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 01/06/2022] [Indexed: 11/25/2022] Open
Abstract
Cystic fibrosis (CF) is a severe genetic disorder that can cause multiple comorbidities affecting the lungs, the pancreas, the luminal digestive system and beyond. In our previous genome-wide association studies (GWAS), we genotyped approximately 8,000 CF samples using a mixture of different genotyping platforms. More recently, the Cystic Fibrosis Genome Project (CFGP) performed deep (approximately 30×) whole genome sequencing (WGS) of 5,095 samples to better understand the genetic mechanisms underlying clinical heterogeneity among patients with CF. For mixtures of GWAS array and WGS data, genotype imputation has proven effective in increasing effective sample size. Therefore, we first performed imputation for the approximately 8,000 CF samples with GWAS array genotype using the Trans-Omics for Precision Medicine (TOPMed) freeze 8 reference panel. Our results demonstrate that TOPMed can provide high-quality imputation for patients with CF, boosting genomic coverage from approximately 0.3-4.2 million genotyped markers to approximately 11-43 million well-imputed markers, and significantly improving polygenic risk score (PRS) prediction accuracy. Furthermore, we built a CF-specific CFGP reference panel based on WGS data of patients with CF. We demonstrate that despite having approximately 3% the sample size of TOPMed, our CFGP reference panel can still outperform TOPMed when imputing some CF disease-causing variants, likely owing to allele and haplotype differences between patients with CF and general populations. We anticipate our imputed data for 4,656 samples without WGS data will benefit our subsequent genetic association studies, and the CFGP reference panel built from CF WGS samples will benefit other investigators studying CF.
Collapse
Affiliation(s)
- Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Weifang Liu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jonathan D. Rosen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Le Huang
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Rhonda G. Pace
- Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Hong Dang
- Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Paul J. Gallins
- Bioinformatics Research Center and Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| | - Elizabeth E. Blue
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA
- Brotman Baty Institute, Seattle, WA 98195, USA
| | - Hua Ling
- Center for Inherited Disease Research (CIDR), Johns Hopkins University, Baltimore, MD 21205, USA
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Harriet Corvol
- Sorbonne Université, Inserm, Centre de Recherche Saint-Antoine, Assistance Publique-Hôpitaux de Paris (APHP), Hôpital Trousseau, Service de Pneumologie Pédiatrique, Paris, France
| | - Lisa J. Strug
- Departments of Statistical Sciences and Computer Science and Division of Biostatistics, University of Toronto, Toronto, ON, Canada
- Program in Genetics and Genome Biology and The Centre for Applied Genomics, The Hospital for Sick Children, University of Toronto, Toronto, ON, Canada
| | - Michael J. Bamshad
- Department of Pediatrics, University of Washington, Seattle, WA 98105, USA
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Division of Genetic Medicine, Seattle Children's Hospital, Seattle, WA 98105, USA
- Brotman Baty Institute, Seattle, WA 98195, USA
| | - Ronald L. Gibson
- Department of Pediatrics, University of Washington, Seattle, WA 98105, USA
| | - Elizabeth W. Pugh
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Scott M. Blackman
- Division of Pediatric Endocrinology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Garry R. Cutting
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Wanda K. O'Neal
- Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yi-Hui Zhou
- Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - Fred A. Wright
- Bioinformatics Research Center and Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
- Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - Michael R. Knowles
- Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jia Wen
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Cystic Fibrosis Genome Project
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Bioinformatics Research Center and Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA
- Center for Inherited Disease Research (CIDR), Johns Hopkins University, Baltimore, MD 21205, USA
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
- Sorbonne Université, Inserm, Centre de Recherche Saint-Antoine, Assistance Publique-Hôpitaux de Paris (APHP), Hôpital Trousseau, Service de Pneumologie Pédiatrique, Paris, France
- Departments of Statistical Sciences and Computer Science and Division of Biostatistics, University of Toronto, Toronto, ON, Canada
- Program in Genetics and Genome Biology and The Centre for Applied Genomics, The Hospital for Sick Children, University of Toronto, Toronto, ON, Canada
- Department of Pediatrics, University of Washington, Seattle, WA 98105, USA
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Division of Genetic Medicine, Seattle Children's Hospital, Seattle, WA 98105, USA
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
- Division of Pediatric Endocrinology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
- Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Brotman Baty Institute, Seattle, WA 98195, USA
| |
Collapse
|
16
|
He KY, Kelly TN, Wang H, Liang J, Zhu L, Cade BE, Assimes TL, Becker LC, Beitelshees AL, Bielak LF, Bress AP, Brody JA, Chang YPC, Chang YC, de Vries PS, Duggirala R, Fox ER, Franceschini N, Furniss AL, Gao Y, Guo X, Haessler J, Hung YJ, Hwang SJ, Irvin MR, Kalyani RR, Liu CT, Liu C, Martin LW, Montasser ME, Muntner PM, Mwasongwe S, Naseri T, Palmas W, Reupena MS, Rice KM, Sheu WHH, Shimbo D, Smith JA, Snively BM, Yanek LR, Zhao W, Blangero J, Boerwinkle E, Chen YDI, Correa A, Cupples LA, Curran JE, Fornage M, He J, Hou L, Kaplan RC, Kardia SLR, Kenny EE, Kooperberg C, Lloyd-Jones D, Loos RJF, Mathias RA, McGarvey ST, Mitchell BD, North KE, Peyser PA, Psaty BM, Raffield LM, Rao DC, Redline S, Reiner AP, Rich SS, Rotter JI, Taylor KD, Tracy R, Vasan RS, Morrison AC, Levy D, Chakravarti A, Arnett DK, Zhu X. Rare coding variants in RCN3 are associated with blood pressure. BMC Genomics 2022; 23:148. [PMID: 35183128 PMCID: PMC8858539 DOI: 10.1186/s12864-022-08356-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 02/01/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND While large genome-wide association studies have identified nearly one thousand loci associated with variation in blood pressure, rare variant identification is still a challenge. In family-based cohorts, genome-wide linkage scans have been successful in identifying rare genetic variants for blood pressure. This study aims to identify low frequency and rare genetic variants within previously reported linkage regions on chromosomes 1 and 19 in African American families from the Trans-Omics for Precision Medicine (TOPMed) program. Genetic association analyses weighted by linkage evidence were completed with whole genome sequencing data within and across TOPMed ancestral groups consisting of 60,388 individuals of European, African, East Asian, Hispanic, and Samoan ancestries. RESULTS Associations of low frequency and rare variants in RCN3 and multiple other genes were observed for blood pressure traits in TOPMed samples. The association of low frequency and rare coding variants in RCN3 was further replicated in UK Biobank samples (N = 403,522), and reached genome-wide significance for diastolic blood pressure (p = 2.01 × 10- 7). CONCLUSIONS Low frequency and rare variants in RCN3 contributes blood pressure variation. This study demonstrates that focusing association analyses in linkage regions greatly reduces multiple-testing burden and improves power to identify novel rare variants associated with blood pressure traits.
Collapse
Affiliation(s)
- Karen Y He
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH, 44106, USA
| | - Tanika N Kelly
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
| | - Heming Wang
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - Jingjing Liang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH, 44106, USA
| | - Luke Zhu
- Center for Human Genetics & Genomics, New York University Grossman School of Medicine, New York, NY, USA
| | - Brian E Cade
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - Themistocles L Assimes
- Department of Medicine (Division of Cardiovascular Medicine), Stanford University, Palo Alto, CA, USA
| | - Lewis C Becker
- GeneSTAR Research Program, Department of Medicine, Divisions of Cardiology and General Internal Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Amber L Beitelshees
- Division of Endocrinology, Diabetes, and Nutrition, Program for Personalized and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Lawrence F Bielak
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Adam P Bress
- Department of Population Health Sciences, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Jennifer A Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Yen-Pei Christy Chang
- Division of Endocrinology, Diabetes, and Nutrition, Program for Personalized and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Yi-Cheng Chang
- Graduate Institute of Medical Genomics and Proteomics, National Taiwan University, Taipei City, Taiwan
- Institute of Biomedical Sciences, Academia Sinica, Taipei City, Taiwan
- Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
| | - Paul S de Vries
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Ravindranath Duggirala
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Ervin R Fox
- Division of Cardiovascular Diseases, Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - Nora Franceschini
- Department of Epidemiology, UNC Gillings School of Global Public Health, Chapel Hill, NC, USA
| | - Anna L Furniss
- Jackson Heart Study, University of Mississippi Medical Center, Jackson, MS, USA
| | - Yan Gao
- Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Jeffrey Haessler
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Yi-Jen Hung
- Institute of Preventive Medicine, National Defense Medical Center, New Taipei City, Taiwan
| | - Shih-Jen Hwang
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA, USA
- Framingham Heart Study, National Heart, Lung, and Blood Institute's Framingham Heart Study, Framingham, MA, USA
| | - Marguerite Ryan Irvin
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AB, USA
| | - Rita R Kalyani
- GeneSTAR Research Program, Department of Medicine, Division of Endocrinology, Diabetes and Metabolism, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Ching-Ti Liu
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA, USA
- Framingham Heart Study, National Heart, Lung, and Blood Institute's Framingham Heart Study, Framingham, MA, USA
| | - Chunyu Liu
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA, USA
- Framingham Heart Study, National Heart, Lung, and Blood Institute's Framingham Heart Study, Framingham, MA, USA
| | - Lisa Warsinger Martin
- Division of Cardiology, Department of Medicine, George Washington University, Washington, DC, USA
| | - May E Montasser
- Division of Endocrinology, Diabetes, and Nutrition, Program for Personalized and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Paul M Muntner
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AB, USA
| | | | - Take Naseri
- Ministry of Health, Government of Samoa, Apia, Samoa
| | - Walter Palmas
- Division of Cardiology, Columbia University Irving Medical Center, New York, NY, USA
| | | | - Kenneth M Rice
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA, USA
| | - Wayne H-H Sheu
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung City, Taiwan
| | - Daichi Shimbo
- Division of Cardiology, Columbia University Irving Medical Center, New York, NY, USA
| | - Jennifer A Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Beverly M Snively
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Lisa R Yanek
- GeneSTAR Research Program, Department of Medicine, Division of General Internal Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Wei Zhao
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Yii-Der Ida Chen
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
- Division of Genomic Outcomes, Department of Pediatrics, Harbor-UCLA Medical Center Professor of Pediatrics, UCLA, Torrance, CA, USA
| | - Adolfo Correa
- Jackson Heart Study, University of Mississippi Medical Center, Jackson, MS, USA
| | - L Adrienne Cupples
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA, USA
- Framingham Heart Study, National Heart, Lung, and Blood Institute's Framingham Heart Study, Framingham, MA, USA
| | - Joanne E Curran
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Myriam Fornage
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Jiang He
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
| | - Lifang Hou
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University Chicago, Evanston, IL, USA
| | - Robert C Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY, USA
| | - Sharon L R Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Donald Lloyd-Jones
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY, USA
| | - Ruth J F Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Rasika A Mathias
- GeneSTAR Research Program, Department of Medicine, Divisions of Allergy and Clinical Immunology and General Internal Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Stephen T McGarvey
- International Health Institute and Department of Epidemiology, School of Public Health, Brown University, Providence, RI, USA
- Department of Anthropology, Brown University, Providence, RI, USA
| | - Braxton D Mitchell
- Division of Endocrinology, Diabetes, and Nutrition, Program for Personalized and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Geriatrics Research and Education Clinical Center, Veterans Affairs Medical Center, Baltimore, MD, USA
| | - Kari E North
- Department of Epidemiology, UNC Gillings School of Global Public Health, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Patricia A Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Departments of Medicine, Epidemiology, and Health Systems and Population Health, University of Washington, Seattle, WA, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - D C Rao
- Division of Biostatistics, Washington University School of Medicine, St. Louis, MO, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - Alex P Reiner
- Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Kent D Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Russell Tracy
- Department of Pathology & Laboratory Medicine, Larner College of Medicine, University of Vermont, Burlington, VT, USA
- Department of Biochemistry, University of Vermont, Burlington, VT, USA
| | - Ramachandran S Vasan
- Framingham Heart Study, National Heart, Lung, and Blood Institute's Framingham Heart Study, Framingham, MA, USA
- Department of Medicine, School of Medicine, Boston University, Boston, MA, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Daniel Levy
- Framingham Heart Study, National Heart, Lung, and Blood Institute's Framingham Heart Study, Framingham, MA, USA
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Aravinda Chakravarti
- Center for Human Genetics & Genomics, New York University Grossman School of Medicine, New York, NY, USA
| | - Donna K Arnett
- University of Kentucky College of Public Health, Lexington, KY, USA
| | - Xiaofeng Zhu
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH, 44106, USA.
| |
Collapse
|
17
|
Xu ZM, Rüeger S, Zwyer M, Brites D, Hiza H, Reinhard M, Rutaihwa L, Borrell S, Isihaka F, Temba H, Maroa T, Naftari R, Hella J, Sasamalo M, Reither K, Portevin D, Gagneux S, Fellay J. Using population-specific add-on polymorphisms to improve genotype imputation in underrepresented populations. PLoS Comput Biol 2022; 18:e1009628. [PMID: 35025869 PMCID: PMC8791479 DOI: 10.1371/journal.pcbi.1009628] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2021] [Revised: 01/26/2022] [Accepted: 11/10/2021] [Indexed: 12/13/2022] Open
Abstract
Genome-wide association studies rely on the statistical inference of untyped variants, called imputation, to increase the coverage of genotyping arrays. However, the results are often suboptimal in populations underrepresented in existing reference panels and array designs, since the selected single nucleotide polymorphisms (SNPs) may fail to capture population-specific haplotype structures, hence the full extent of common genetic variation. Here, we propose to sequence the full genomes of a small subset of an underrepresented study cohort to inform the selection of population-specific add-on tag SNPs and to generate an internal population-specific imputation reference panel, such that the remaining array-genotyped cohort could be more accurately imputed. Using a Tanzania-based cohort as a proof-of-concept, we demonstrate the validity of our approach by showing improvements in imputation accuracy after the addition of our designed add-on tags to the base H3Africa array. Genome-wide association studies, which study the association between genetic variants and various phenotypes, typically rely on genotyping arrays. Only a small proportion of genetic variants within the genome are typed on genotyping arrays. Untyped variants are statistically inferred through a process known as genotype imputation, where correlations between variants (haplotypes) observed in external reference panels are leveraged to infer untyped variants in the study population. However, for study populations that are underrepresented in existing reference panels, the quality of imputation is often sub-optimal. This is because typed variants incorporated on existing genotyping arrays can be unsuitable for the study population, and haplotype structures can be different between the reference and the study population. Here, we illustrate an approach to select a custom set of population-specific typed variants to improve genotype imputation in such underrepresented populations.
Collapse
Affiliation(s)
- Zhi Ming Xu
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Sina Rüeger
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Michaela Zwyer
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Daniela Brites
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Hellen Hiza
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
- Ifakara Health Institute, Dar es Salaam, Tanzania
| | - Miriam Reinhard
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Liliana Rutaihwa
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Sonia Borrell
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | | | | | - Thomas Maroa
- Ifakara Health Institute, Dar es Salaam, Tanzania
| | | | - Jerry Hella
- Ifakara Health Institute, Dar es Salaam, Tanzania
| | | | - Klaus Reither
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Damien Portevin
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Sebastien Gagneux
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Jacques Fellay
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Precision Medicine Unit, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
- * E-mail:
| |
Collapse
|
18
|
Jones AV, Curtiss D, Harris C, Southerington T, Hautalahti M, Wihuri P, Mäkelä J, Kallionpää RE, Makkonen E, Knopp T, Mannermaa A, Mäkinen E, Moilanen AM, Tezel TH, Waheed NK. An assessment of prevalence of Type 1 CFI rare variants in European AMD, and why lack of broader genetic data hinders development of new treatments and healthcare access. PLoS One 2022; 17:e0272260. [PMID: 36067162 PMCID: PMC9447915 DOI: 10.1371/journal.pone.0272260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 07/14/2022] [Indexed: 11/19/2022] Open
Abstract
PURPOSE Advanced age-related macular degeneration (AAMD) risk is associated with rare complement Factor I (FI) genetic variants associated with low FI protein levels (termed 'Type 1'), but it is unclear how variant prevalences differ between AMD patients from different ethnicities. METHODS Collective prevalence of Type 1 CFI rare variant genotypes were examined in four European AAMD datasets. Collective minor allele frequencies (MAFs) were sourced from the natural history study SCOPE, the UK Biobank, the International AMD Genomics Consortium (IAMDGC), and the Finnish Biobank Cooperative (FINBB), and compared to paired control MAFs or background population prevalence rates from the Genome Aggregation Database (gnomAD). Due to a lack of available genetic data in non-European AAMD, power calculations were undertaken to estimate the AAMD population sizes required to identify statistically significant association between Type 1 CFI rare variants and disease risk in different ethnicities, using gnomAD populations as controls. RESULTS Type 1 CFI rare variants were enriched in all European AAMD cohorts, with odds ratios (ORs) ranging between 3.1 and 7.8, and a greater enrichment was observed in dry AMD from FINBB (OR 8.9, 95% CI 1.49-53.31). The lack of available non-European AAMD datasets prevented us exploring this relationship more globally, however a statistical association may be detectable by future sequencing studies that sample approximately 2,000 AAMD individuals from Ashkenazi Jewish and Latino/Admixed American ethnicities. CONCLUSIONS The relationship between Type 1 CFI rare variants increasing odds of AAMD are well established in Europeans, however the lack of broader genetic data in AAMD has adverse implications for clinical development and future commercialisation strategies of targeted FI therapies in AAMD. These findings emphasise the importance of generating more diverse genetic data in AAMD to improve equity of access to new treatments and address the bias in health care.
Collapse
Affiliation(s)
- Amy V. Jones
- Gyroscope Therapeutics Limited, London, United Kingdom
| | - Darin Curtiss
- Gyroscope Therapeutics Limited, London, United Kingdom
| | - Claire Harris
- Gyroscope Therapeutics Limited, London, United Kingdom
| | - Tom Southerington
- Finnish Biobank Cooperative–FINBB, Turku, Finland
- University of Turku, Turku, Finland
| | | | - Pauli Wihuri
- Finnish Biobank Cooperative–FINBB, Turku, Finland
| | | | - Roosa E. Kallionpää
- Auria Biobank, Turku University Hospital and University of Turku, Turku, Finland
| | | | - Theresa Knopp
- Helsinki Biobank, HUS, Helsinki University Hospital, Helsinki, Finland
| | | | - Erna Mäkinen
- Biobank of Central Finland, Hospital Nova of Central Finland, Jyväskylä, Finland
| | - Anne-Mari Moilanen
- Biobank Borealis of Northern Finland, Oulu University Hospital, Oulu, Finland
| | - Tongalp H. Tezel
- Department of Ophthalmology, Edward S. Harkness Eye Institute, Columbia University College of Physicians and Surgeons, Columbia University Medical Center, New York, NY, United States of America
| | | | - Nadia K. Waheed
- Gyroscope Therapeutics Limited, London, United Kingdom
- Department of Ophthalmology, Tufts University School of Medicine, Boston, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
19
|
Suitability of GWAS as a Tool to Discover SNPs Associated with Tick Resistance in Cattle: A Review. Pathogens 2021; 10:pathogens10121604. [PMID: 34959558 PMCID: PMC8707706 DOI: 10.3390/pathogens10121604] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 11/22/2021] [Accepted: 12/01/2021] [Indexed: 12/22/2022] Open
Abstract
Understanding the biological mechanisms underlying tick resistance in cattle holds the potential to facilitate genetic improvement through selective breeding. Genome wide association studies (GWAS) are popular in research on unraveling genetic determinants underlying complex traits such as tick resistance. To date, various studies have been published on single nucleotide polymorphisms (SNPs) associated with tick resistance in cattle. The discovery of SNPs related to tick resistance has led to the mapping of associated candidate genes. Despite the success of these studies, information on genetic determinants associated with tick resistance in cattle is still limited. This warrants the need for more studies to be conducted. In Africa, the cost of genotyping is still relatively expensive; thus, conducting GWAS is a challenge, as the minimum number of animals recommended cannot be genotyped. These population size and genotype cost challenges may be overcome through the establishment of collaborations. Thus, the current review discusses GWAS as a tool to uncover SNPs associated with tick resistance, by focusing on the study design, association analysis, factors influencing the success of GWAS, and the progress on cattle tick resistance studies.
Collapse
|
20
|
Kim S, Shin JY, Kwon NJ, Kim CU, Kim C, Lee CS, Seo JS. Evaluation of low-pass genome sequencing in polygenic risk score calculation for Parkinson's disease. Hum Genomics 2021; 15:58. [PMID: 34454617 PMCID: PMC8403377 DOI: 10.1186/s40246-021-00357-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 08/22/2021] [Indexed: 12/02/2022] Open
Abstract
Background Low-pass sequencing (LPS) has been extensively investigated for applicability to various genetic studies due to its advantages over genotype array data including cost-effectiveness. Predicting the risk of complex diseases such as Parkinson’s disease (PD) using polygenic risk score (PRS) based on the genetic variations has shown decent prediction accuracy. Although ultra-LPS has been shown to be effective in PRS calculation, array data has been favored to the majority of PRS analysis, especially for PD.
Results Using eight high-coverage WGS, we assessed imputation approaches for downsampled LPS data ranging from 0.5 × to 7.0 × . We demonstrated that uncertain genotype calls of LPS diminished imputation accuracy, and an imputation approach using genotype likelihoods was plausible for LPS. Additionally, comparing imputation accuracies between LPS and simulated array illustrated that LPS had higher accuracies particularly at rare frequencies. To evaluate ultra-low coverage data in PRS calculation for PD, we prepared low-coverage WGS and genotype array of 87 PD cases and 101 controls. Genotype imputation of array and downsampled LPS were conducted using a population-specific reference panel, and we calculated risk scores based on the PD-associated SNPs from an East Asian meta-GWAS. The PRS models discriminated cases and controls as previously reported when both LPS and genotype array were used. Also strong correlations in PRS models for PD between LPS and genotype array were discovered. Conclusions Overall, this study highlights the potentials of LPS under 1.0 × followed by genotype imputation in PRS calculation and suggests LPS as attractive alternatives to genotype array in the area of precision medicine for PD. Supplementary Information The online version contains supplementary material available at 10.1186/s40246-021-00357-w.
Collapse
Affiliation(s)
- Sungjae Kim
- Precision Medicine Institute, Seoul, 08511, Republic of Korea.,Department of Biomedical Sciences, Seoul National University Graduate School, Seoul, 03080, Republic of Korea
| | - Jong-Yeon Shin
- Precision Medicine Institute, Seoul, 08511, Republic of Korea
| | - Nak-Jung Kwon
- Precision Medicine Institute, Seoul, 08511, Republic of Korea
| | | | - Changhoon Kim
- Precision Medicine Institute, Seoul, 08511, Republic of Korea
| | - Chong Sik Lee
- Department of Neurology, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-ro 43-gil, Pungnap 2(i)-dong, Songpa-gu, Seoul, 05505, Republic of Korea.
| | - Jeong-Sun Seo
- Precision Medicine Institute, Seoul, 08511, Republic of Korea. .,Asian Genome Institute, Seoul National University Bundang Hospital, 172 Dolma-ro, Seongnam, Bundang-gu, Gyeonggi-do, 13605, Republic of Korea.
| |
Collapse
|
21
|
Charon C, Allodji R, Meyer V, Deleuze JF. Impact of pre- and post-variant filtration strategies on imputation. Sci Rep 2021; 11:6214. [PMID: 33737531 PMCID: PMC7973508 DOI: 10.1038/s41598-021-85333-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Accepted: 02/22/2021] [Indexed: 01/04/2023] Open
Abstract
Quality control (QC) methods for genome-wide association studies and fine mapping are commonly used for imputation, however they result in loss of many single nucleotide polymorphisms (SNPs). To investigate the consequences of filtration on imputation, we studied the direct effects on the number of markers, their allele frequencies, imputation quality scores and post-filtration events. We pre-phrased 1031 genotyped individuals from diverse ethnicities and compared the imputed variants to 1089 NCBI recorded individuals for additional validation. Without QC-based variant pre-filtration, we observed no impairment in the imputation of SNPs that failed QC whereas with pre-filtration there was an overall loss of information. Significant differences between frequencies with and without pre-filtration were found only in the range of very rare (5E-04-1E-03) and rare variants (1E-03-5E-03) (p < 1E-04). Increasing the post-filtration imputation quality score from 0.3 to 0.8 reduced the number of single nucleotide variants (SNVs) < 0.001 2.5 fold with or without QC pre-filtration and halved the number of very rare variants (5E-04). Thus, to maintain confidence and enough SNVs, we propose here a two-step filtering procedure which allows less stringent filtering prior to imputation and post-imputation in order to increase the number of very rare and rare variants compared to conservative filtration methods.
Collapse
Affiliation(s)
- Céline Charon
- CEA Paris-Saclay, Institut François Jacob, Centre National de Recherche en Génomique Humaine, 2 rue Gaston Crémieux, Evry, 91057, France.
| | - Rodrigue Allodji
- Radiation Epidemiology Group CESP, Inserm Unit 1018, Gustave Roussy Université Paris Saclay, 114 rue Edouard Vaillant, Villejuif, 94805, France
| | - Vincent Meyer
- CEA Paris-Saclay, Institut François Jacob, Centre National de Recherche en Génomique Humaine, 2 rue Gaston Crémieux, Evry, 91057, France
| | - Jean-François Deleuze
- CEA Paris-Saclay, Institut François Jacob, Centre National de Recherche en Génomique Humaine, 2 rue Gaston Crémieux, Evry, 91057, France
| |
Collapse
|
22
|
Quick C, Anugu P, Musani S, Weiss ST, Burchard EG, White MJ, Keys KL, Cucca F, Sidore C, Boehnke M, Fuchsberger C. Sequencing and imputation in GWAS: Cost-effective strategies to increase power and genomic coverage across diverse populations. Genet Epidemiol 2020; 44:537-549. [PMID: 32519380 PMCID: PMC7449570 DOI: 10.1002/gepi.22326] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Revised: 04/02/2020] [Accepted: 05/22/2020] [Indexed: 01/03/2023]
Abstract
A key aim for current genome-wide association studies (GWAS) is to interrogate the full spectrum of genetic variation underlying human traits, including rare variants, across populations. Deep whole-genome sequencing is the gold standard to fully capture genetic variation, but remains prohibitively expensive for large sample sizes. Array genotyping interrogates a sparser set of variants, which can be used as a scaffold for genotype imputation to capture a wider set of variants. However, imputation quality depends crucially on reference panel size and genetic distance from the target population. Here, we consider sequencing a subset of GWAS participants and imputing the rest using a reference panel that includes both sequenced GWAS participants and an external reference panel. We investigate how imputation quality and GWAS power are affected by the number of participants sequenced for admixed populations (African and Latino Americans) and European population isolates (Sardinians and Finns), and identify powerful, cost-effective GWAS designs given current sequencing and array costs. For populations that are well-represented in existing reference panels, we find that array genotyping alone is cost-effective and well-powered to detect common- and rare-variant associations. For poorly represented populations, sequencing a subset of participants is often most cost-effective, and can substantially increase imputation quality and GWAS power.
Collapse
Affiliation(s)
- Corbin Quick
- Department of Biostatistics and Center for Statistical GeneticsUniversity of Michigan School of Public HealthAnn ArborMichigan
| | - Pramod Anugu
- University of Mississippi Medical CenterJacksonMississippi
| | - Solomon Musani
- University of Mississippi Medical CenterJacksonMississippi
| | - Scott T. Weiss
- Harvard Medical SchoolBostonMassachusetts
- Channing Department of Network MedicineBrigham and Women's HospitalBostonCalifornia
- Partners HealthCare Personalized MedicineBostonMassachusetts
| | - Esteban G. Burchard
- Department of MedicineUniversity of California San FranciscoSan FranciscoCalifornia
- Department of Bioengineering and Therapeutic SciencesUniversity of California San FranciscoSan FranciscoCalifornia
| | - Marquitta J. White
- Department of MedicineUniversity of California San FranciscoSan FranciscoCalifornia
| | - Kevin L. Keys
- Department of MedicineUniversity of California San FranciscoSan FranciscoCalifornia
| | - Francesco Cucca
- Istituto di Ricerca Genetica e Biomedica (IRGB), CNRMonserratoItaly
- Dipartimento di Scienze BiomedicheUniversità di SassariSassariItaly
| | - Carlo Sidore
- Istituto di Ricerca Genetica e Biomedica (IRGB), CNRMonserratoItaly
| | - Michael Boehnke
- Department of Biostatistics and Center for Statistical GeneticsUniversity of Michigan School of Public HealthAnn ArborMichigan
| | - Christian Fuchsberger
- Department of Biostatistics and Center for Statistical GeneticsUniversity of Michigan School of Public HealthAnn ArborMichigan
- Department of Genetics and Pharmacology, Institute of Genetic EpidemiologyMedical University of InnsbruckInnsbruckAustria
- Institute for Biomedicine, Eurac ResearchAffiliated Institute of the University of LübeckBolzanoItaly
| |
Collapse
|