1
|
Martinez KL, Klein A, Martin JR, Sampson CU, Giles JB, Beck ML, Bhakta K, Quatraro G, Farol J, Karnes JH. Disparities in ABO blood type determination across diverse ancestries: a systematic review and validation in the All of Us Research Program. J Am Med Inform Assoc 2024:ocae161. [PMID: 38917427 DOI: 10.1093/jamia/ocae161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 05/02/2024] [Accepted: 06/19/2024] [Indexed: 06/27/2024] Open
Abstract
OBJECTIVES ABO blood types have widespread clinical use and robust associations with disease. The purpose of this study is to evaluate the portability and suitability of tag single-nucleotide polymorphisms (tSNPs) used to determine ABO alleles and blood types across diverse populations in published literature. MATERIALS AND METHODS Bibliographic databases were searched for studies using tSNPs to determine ABO alleles. We calculated linkage between tSNPs and functional variants across inferred continental ancestry groups from 1000 Genomes. We compared r2 across ancestry and assessed real-world consequences by comparing tSNP-derived blood types to serology in a diverse population from the All of Us Research Program. RESULTS Linkage between functional variants and O allele tSNPs was significantly lower in African (median r2 = 0.443) compared to East Asian (r2 = 0.946, P = 1.1 × 10-5) and European (r2 = 0.869, P = .023) populations. In All of Us, discordance between tSNP-derived blood types and serology was high across all SNPs in African ancestry individuals and linkage was strongly correlated with discordance across all ancestries (ρ = -0.90, P = 3.08 × 10-23). DISCUSSION Many studies determine ABO blood types using tSNPs. However, tSNPs with low linkage disequilibrium promote misinference of ABO blood types, particularly in diverse populations. We observe common use of inappropriate tSNPs to determine ABO blood type, particularly for O alleles and with some tSNPs mistyping up to 58% of individuals. CONCLUSION Our results highlight the lack of transferability of tSNPs across ancestries and potential exacerbation of disparities in genomic research for underrepresented populations. This is especially relevant as more diverse cohorts are made publicly available.
Collapse
Affiliation(s)
- Kiana L Martinez
- Department of Pharmacy Practice and Science, The University of Arizona R. Ken Coit College of Pharmacy, Tucson, AZ 85721, United States
| | - Andrew Klein
- Department of Pharmacy Practice and Science, The University of Arizona R. Ken Coit College of Pharmacy, Tucson, AZ 85721, United States
| | - Jennifer R Martin
- Department of Pharmacy Practice and Science, The University of Arizona R. Ken Coit College of Pharmacy, Tucson, AZ 85721, United States
- Department of the University of Arizona Health Sciences Library, The University of Arizona, Tucson, AZ 85721, United States
| | - Chinwuwanuju U Sampson
- Department of Pharmacy Practice and Science, The University of Arizona R. Ken Coit College of Pharmacy, Tucson, AZ 85721, United States
| | - Jason B Giles
- Department of Pharmacy Practice and Science, The University of Arizona R. Ken Coit College of Pharmacy, Tucson, AZ 85721, United States
| | - Madison L Beck
- Department of Pharmacy Practice and Science, The University of Arizona R. Ken Coit College of Pharmacy, Tucson, AZ 85721, United States
| | - Krupa Bhakta
- Department of Pharmacy Practice and Science, The University of Arizona R. Ken Coit College of Pharmacy, Tucson, AZ 85721, United States
| | - Gino Quatraro
- Department of Pharmacy Practice and Science, The University of Arizona R. Ken Coit College of Pharmacy, Tucson, AZ 85721, United States
| | - Juvie Farol
- Department of Clinical and Translational Science, The University of Arizona College of Medicine, Tucson, AZ 85721, United States
| | - Jason H Karnes
- Department of Pharmacy Practice and Science, The University of Arizona R. Ken Coit College of Pharmacy, Tucson, AZ 85721, United States
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37232, United States
| |
Collapse
|
2
|
Santos R, Moreno-Torres V, Pintos I, Corral O, de Mendoza C, Soriano V, Corpas M. Low-coverage whole genome sequencing for a highly selective cohort of severe COVID-19 patients. GIGABYTE 2024; 2024:gigabyte127. [PMID: 38948510 PMCID: PMC11211761 DOI: 10.46471/gigabyte.127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 06/04/2024] [Indexed: 07/02/2024] Open
Abstract
Despite the advances in genetic marker identification associated with severe COVID-19, the full genetic characterisation of the disease remains elusive. This study explores imputation in low-coverage whole genome sequencing for a severe COVID-19 patient cohort. We generated a dataset of 79 imputed variant call format files using the GLIMPSE1 tool, each containing an average of 9.5 million single nucleotide variants. Validation revealed a high imputation accuracy (squared Pearson correlation ≍0.97) across sequencing platforms, showcasing GLIMPSE1's ability to confidently impute variants with minor allele frequencies as low as 2% in individuals with Spanish ancestry. We carried out a comprehensive analysis of the patient cohort, examining hospitalisation and intensive care utilisation, sex and age-based differences, and clinical phenotypes using a standardised set of medical terms developed to characterise severe COVID-19 symptoms. The methods and findings presented here can be leveraged for future genomic projects to gain vital insights into health challenges like COVID-19.
Collapse
Affiliation(s)
- Renato Santos
- National Heart & Lung Institute, Imperial College London, London, UK
| | - Víctor Moreno-Torres
- Puerta de Hierro University Hospital & Research Institute, Majadahonda, Madrid, Spain
| | - Ilduara Pintos
- Puerta de Hierro University Hospital & Research Institute, Majadahonda, Madrid, Spain
| | - Octavio Corral
- Health Sciences School & Medical Centre, Universidad Internacional La Rioja (UNIR), Madrid, Spain
| | - Carmen de Mendoza
- Puerta de Hierro University Hospital & Research Institute, Majadahonda, Madrid, Spain
| | - Vicente Soriano
- Health Sciences School & Medical Centre, Universidad Internacional La Rioja (UNIR), Madrid, Spain
| | - Manuel Corpas
- School of Life Sciences, University of Westminster, London, UK
| |
Collapse
|
3
|
Ardiansyah E, Riza AL, Dian S, Ganiem AR, Alisjahbana B, Setiabudiawan TP, van Laarhoven A, van Crevel R, Kumar V. Sequencing whole genomes of the West Javanese population in Indonesia reveals novel variants and improves imputation accuracy. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.14.598981. [PMID: 38915501 PMCID: PMC11195206 DOI: 10.1101/2024.06.14.598981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Existing genotype imputation reference panels are mainly derived from European populations, limiting their accuracy in non-European populations. To improve imputation accuracy for Indonesians, the world's fourth most populous country, we combined Whole Genome Sequencing (WGS) data from 227 West Javanese individuals with East Asian data from the 1000 Genomes Project. This created three reference panels: EAS 1KGP3 (EASp), Indonesian (INDp), and a combined panel (EASp+INDp). We also used ten West-Javanese samples with WGS and SNP-typing data for benchmarking. We identified 1.8 million novel single nucleotide variants (SNVs) in the West Javanese population, which, while similar to the East Asians, are distinct from the Central Indonesian Flores population. Adding INDp to the EASp reference panel improved imputation accuracy (R2) from 0.85 to 0.90, and concordance from 87.88% to 91.13%. These findings underscore the importance of including Indonesian genetic data in reference panels, advocating for broader WGS of diverse Indonesian populations to enhance genomic studies.
Collapse
Affiliation(s)
- Edwin Ardiansyah
- Research Center for Care and Control of Infectious Diseases, Universitas Padjadjaran, Bandung, Indonesia
| | - Anca-Lelia Riza
- Laboratory of Human Genomics, University of Medicine and Pharmacy of Craiova, 200638 Craiova, Romania
| | - Sofiati Dian
- Research Center for Care and Control of Infectious Diseases, Universitas Padjadjaran, Bandung, Indonesia
- Department of Neurology, Hasan Sadikin Hospital, Faculty of Medicine, Universitas Padjadjaran, Bandung, Indonesia
| | - Ahmad Rizal Ganiem
- Research Center for Care and Control of Infectious Diseases, Universitas Padjadjaran, Bandung, Indonesia
- Department of Neurology, Hasan Sadikin Hospital, Faculty of Medicine, Universitas Padjadjaran, Bandung, Indonesia
| | - Bachti Alisjahbana
- Research Center for Care and Control of Infectious Diseases, Universitas Padjadjaran, Bandung, Indonesia
- Department of Internal Medicine, Hasan Sadikin Hospital, Faculty of Medicine, Universitas Padjadjaran, Bandung, Indonesia
| | - Todia P Setiabudiawan
- Department of Internal Medicine and Radboud Center of Infectious Diseases (RCI), Radboud University Medical Center, Nijmegen, Netherlands
| | - Arjan van Laarhoven
- Department of Internal Medicine and Radboud Center of Infectious Diseases (RCI), Radboud University Medical Center, Nijmegen, Netherlands
| | - Reinout van Crevel
- Department of Internal Medicine and Radboud Center of Infectious Diseases (RCI), Radboud University Medical Center, Nijmegen, Netherlands
| | - Vinod Kumar
- Department of Internal Medicine and Radboud Center of Infectious Diseases (RCI), Radboud University Medical Center, Nijmegen, Netherlands
- University of Groningen, University Medical Center Groningen, department of Genetics, Groningen, the Netherlands
| |
Collapse
|
4
|
Croock D, Swart Y, Schurz H, Petersen DC, Möller M, Uren C. Data Harmonization Guidelines to Combine Multi-platform Genomic Data from Admixed Populations and Boost Power in Genome-Wide Association Studies. Curr Protoc 2024; 4:e1055. [PMID: 38837690 DOI: 10.1002/cpz1.1055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2024]
Abstract
Data harmonization involves combining data from multiple independent sources and processing the data to produce one uniform dataset. Merging separate genotypes or whole-genome sequencing datasets has been proposed as a strategy to increase the statistical power of association tests by increasing the effective sample size. However, data harmonization is not a widely adopted strategy due to the difficulties with merging data (including confounding produced by batch effects and population stratification). Detailed data harmonization protocols are scarce and are often conflicting. Moreover, data harmonization protocols that accommodate samples of admixed ancestry are practically non-existent. Existing data harmonization procedures must be modified to ensure the heterogeneous ancestry of admixed individuals is incorporated into additional downstream analyses without confounding results. Here, we propose a set of guidelines for merging multi-platform genetic data from admixed samples that can be adopted by any investigator with elementary bioinformatics experience. We have applied these guidelines to aggregate 1544 tuberculosis (TB) case-control samples from six separate in-house datasets and conducted a genome-wide association study (GWAS) of TB susceptibility. The GWAS performed on the merged dataset had improved power over analyzing the datasets individually and produced summary statistics free from bias introduced by batch effects and population stratification. © 2024 Wiley Periodicals LLC. Basic Protocol 1: Processing separate datasets comprising array genotype data Alternate Protocol 1: Processing separate datasets comprising array genotype and whole-genome sequencing data Alternate Protocol 2: Performing imputation using a local reference panel Basic Protocol 2: Merging separate datasets Basic Protocol 3: Ancestry inference using ADMIXTURE and RFMix Basic Protocol 4: Batch effect correction using pseudo-case-control comparisons.
Collapse
Affiliation(s)
- Dayna Croock
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa
| | - Yolandi Swart
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa
| | - Haiko Schurz
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa
| | - Desiree C Petersen
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa
| | - Marlo Möller
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa
- Centre for Bioinformatics and Computational Biology, Stellenbosch University, Stellenbosch, South Africa
| | - Caitlin Uren
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa
- Centre for Bioinformatics and Computational Biology, Stellenbosch University, Stellenbosch, South Africa
| |
Collapse
|
5
|
Bellavance J, Wang L, Gagliano Taliun SA. Eight quick tips for including chromosome X in genome-wide association studies. PLoS Comput Biol 2024; 20:e1012160. [PMID: 38843110 PMCID: PMC11156303 DOI: 10.1371/journal.pcbi.1012160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2024] Open
Affiliation(s)
- Justin Bellavance
- Faculty of Medicine, Université de Montréal, Montréal, Québec, Canada
- Research Centre, Montréal Heart Institute, Montréal, Québec, Canada
| | - Linda Wang
- Faculty of Medicine, Université de Montréal, Montréal, Québec, Canada
- Research Centre, Montréal Heart Institute, Montréal, Québec, Canada
| | - Sarah A. Gagliano Taliun
- Research Centre, Montréal Heart Institute, Montréal, Québec, Canada
- Department of Medicine, Faculty of Medicine, Université de Montréal, Montréal, Québec, Canada
- Department of Neurosciences, Faculty of Medicine, Université de Montréal, Montréal, Québec, Canada
| |
Collapse
|
6
|
Schurz H, Naranbhai V, Yates TA, Gilchrist JJ, Parks T, Dodd PJ, Möller M, Hoal EG, Morris AP, Hill AVS. Multi-ancestry meta-analysis of host genetic susceptibility to tuberculosis identifies shared genetic architecture. eLife 2024; 13:e84394. [PMID: 38224499 PMCID: PMC10789494 DOI: 10.7554/elife.84394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Accepted: 11/23/2023] [Indexed: 01/17/2024] Open
Abstract
The heritability of susceptibility to tuberculosis (TB) disease has been well recognized. Over 100 genes have been studied as candidates for TB susceptibility, and several variants were identified by genome-wide association studies (GWAS), but few replicate. We established the International Tuberculosis Host Genetics Consortium to perform a multi-ancestry meta-analysis of GWAS, including 14,153 cases and 19,536 controls of African, Asian, and European ancestry. Our analyses demonstrate a substantial degree of heritability (pooled polygenic h2 = 26.3%, 95% CI 23.7-29.0%) for susceptibility to TB that is shared across ancestries, highlighting an important host genetic influence on disease. We identified one global host genetic correlate for TB at genome-wide significance (p<5 × 10-8) in the human leukocyte antigen (HLA)-II region (rs28383206, p-value=5.2 × 10-9) but failed to replicate variants previously associated with TB susceptibility. These data demonstrate the complex shared genetic architecture of susceptibility to TB and the importance of large-scale GWAS analysis across multiple ancestries experiencing different levels of infection pressure.
Collapse
Affiliation(s)
- Haiko Schurz
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch UniversityCape TownSouth Africa
| | - Vivek Naranbhai
- Wellcome Centre for Human Genetics, University of OxfordOxfordUnited Kingdom
- Massachusetts General HospitalBostonUnited States
- Dana-Farber Cancer InstituteBostonUnited States
- Centre for the AIDS Programme of Research in South AfricaDurbanSouth Africa
- Harvard Medical SchoolBostonUnited States
| | - Tom A Yates
- Division of Infection and Immunity, Faculty of Medical Sciences, University College LondonLondonUnited Kingdom
| | - James J Gilchrist
- Wellcome Centre for Human Genetics, University of OxfordOxfordUnited Kingdom
- Department of Paediatrics, University of OxfordOxfordUnited Kingdom
| | - Tom Parks
- Wellcome Centre for Human Genetics, University of OxfordOxfordUnited Kingdom
- Department of Infectious Diseases Imperial College LondonLondonUnited Kingdom
| | - Peter J Dodd
- School of Health and Related Research, University of SheffieldSheffieldUnited Kingdom
| | - Marlo Möller
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch UniversityCape TownSouth Africa
| | - Eileen G Hoal
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch UniversityCape TownSouth Africa
| | - Andrew P Morris
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, The University of ManchesterManchesterUnited Kingdom
| | - Adrian VS Hill
- Wellcome Centre for Human Genetics, University of OxfordOxfordUnited Kingdom
- Jenner Institute, University of OxfordOxfordUnited Kingdom
| |
Collapse
|
7
|
Nanjala R, Mbiyavanga M, Hashim S, de Villiers S, Mulder N. Assessing HLA imputation accuracy in a West African population. PLoS One 2023; 18:e0291437. [PMID: 37768905 PMCID: PMC10538777 DOI: 10.1371/journal.pone.0291437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 08/28/2023] [Indexed: 09/30/2023] Open
Abstract
The Human Leukocyte Antigen (HLA) region plays an important role in autoimmune and infectious diseases. HLA is a highly polymorphic region and thus difficult to impute. We, therefore, sought to evaluate HLA imputation accuracy, specifically in a West African population, since they are understudied and are known to harbor high genetic diversity. The study sets were selected from 315 Gambian individuals within the Gambian Genome Variation Project (GGVP) Whole Genome Sequence datasets. Two different arrays, Illumina Omni 2.5 and Human Hereditary and Health in Africa (H3Africa), were assessed for the appropriateness of their markers, and these were used to test several imputation panels and tools. The reference panels were chosen from the 1000 Genomes (1kg-All), 1000 Genomes African (1kg-Afr), 1000 Genomes Gambian (1kg-Gwd), H3Africa, and the HLA Multi-ethnic datasets. HLA-A, HLA-B, and HLA-C alleles were imputed using HIBAG, SNP2HLA, CookHLA, and Minimac4, and concordance rate was used as an assessment metric. The best performing tool was found to be HIBAG, with a concordance rate of 0.84, while the best performing reference panel was the H3Africa panel, with a concordance rate of 0.62. Minimac4 (0.75) was shown to increase HLA-B allele imputation accuracy compared to HIBAG (0.71), SNP2HLA (0.51) and CookHLA (0.17). The H3Africa and Illumina Omni 2.5 array performances were comparable, showing that genotyping arrays have less influence on HLA imputation in West African populations. The findings show that using a larger population-specific reference panel and the HIBAG tool improves the accuracy of HLA imputation in a West African population.
Collapse
Affiliation(s)
- Ruth Nanjala
- Department of Biochemistry and Biotechnology, Pwani University, Kilifi, Kenya
- Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, Cape Town, South Africa
| | - Mamana Mbiyavanga
- Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, Cape Town, South Africa
| | - Suhaila Hashim
- Department of Biochemistry and Biotechnology, Pwani University, Kilifi, Kenya
- Pwani University Biosciences Research Centre, Pwani University, Kilifi, Kenya
| | - Santie de Villiers
- Department of Biochemistry and Biotechnology, Pwani University, Kilifi, Kenya
- Pwani University Biosciences Research Centre, Pwani University, Kilifi, Kenya
| | - Nicola Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, Cape Town, South Africa
| |
Collapse
|
8
|
Tan T, Atkinson EG. Strategies for the Genomic Analysis of Admixed Populations. Annu Rev Biomed Data Sci 2023; 6:105-127. [PMID: 37127050 PMCID: PMC10871708 DOI: 10.1146/annurev-biodatasci-020722-014310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Admixed populations constitute a large portion of global human genetic diversity, yet they are often left out of genomics analyses. This exclusion is problematic, as it leads to disparities in the understanding of the genetic structure and history of diverse cohorts and the performance of genomic medicine across populations. Admixed populations have particular statistical challenges, as they inherit genomic segments from multiple source populations-the primary reason they have historically been excluded from genetic studies. In recent years, however, an increasing number of statistical methods and software tools have been developed to account for and leverage admixture in the context of genomics analyses. Here, we provide a survey of such computational strategies for the informed consideration of admixture to allow for the well-calibrated inclusion of mixed ancestry populations in large-scale genomics studies, and we detail persisting gaps in existing tools.
Collapse
Affiliation(s)
- Taotao Tan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA;
| | - Elizabeth G Atkinson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA;
| |
Collapse
|
9
|
Sengupta D, Botha G, Meintjes A, Mbiyavanga M, Hazelhurst S, Mulder N, Ramsay M, Choudhury A. Performance and accuracy evaluation of reference panels for genotype imputation in sub-Saharan African populations. CELL GENOMICS 2023; 3:100332. [PMID: 37388906 PMCID: PMC10300601 DOI: 10.1016/j.xgen.2023.100332] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Revised: 02/11/2023] [Accepted: 05/02/2023] [Indexed: 07/01/2023]
Abstract
Based on evaluations of imputation performed on a genotype dataset consisting of about 11,000 sub-Saharan African (SSA) participants, we show Trans-Omics for Precision Medicine (TOPMed) and the African Genome Resource (AGR) to be currently the best panels for imputing SSA datasets. We report notable differences in the number of single-nucleotide polymorphisms (SNPs) that are imputed by different panels in datasets from East, West, and South Africa. Comparisons with a subset of 95 SSA high-coverage whole-genome sequences (WGSs) show that despite being about 20-fold smaller, the AGR imputed dataset has higher concordance with the WGSs. Moreover, the level of concordance between imputed and WGS datasets was strongly influenced by the extent of Khoe-San ancestry in a genome, highlighting the need for integration of not only geographically but also ancestrally diverse WGS data in reference panels for further improvement in imputation of SSA datasets. Approaches that integrate imputed data from different panels could also lead to better imputation.
Collapse
Affiliation(s)
- Dhriti Sengupta
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Gerrit Botha
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute for Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Ayton Meintjes
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute for Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Mamana Mbiyavanga
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute for Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | | | | | - Scott Hazelhurst
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, South Africa
| | - Nicola Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute for Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Michèle Ramsay
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- Division of Human Genetics, National Health Laboratory Service and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Ananyo Choudhury
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| |
Collapse
|
10
|
Chattopadhyay A, Lee CY, Shen YC, Lu KC, Hsiao TH, Lin CH, La LC, Tsai MH, Lu TP, Chuang EY. Multi-ethnic imputation system (MI-System): a genotype imputation server for high-dimensional data. J Biomed Inform 2023:104423. [PMID: 37308034 DOI: 10.1016/j.jbi.2023.104423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 05/11/2023] [Accepted: 06/09/2023] [Indexed: 06/14/2023]
Abstract
OBJECTIVE Genotype imputation is a commonly used technique that infers un-typed variants into a study's genotype data, allowing better identification of causal variants in disease studies. However, due to overrepresentation of Caucasian studies, there's a lack of understanding of genetic basis of health-outcomes in other ethnic populations. Therefore, facilitating imputation of missing key-predictor-variants that can potentially improve a risk health-outcome prediction model, specifically for Asian ancestry, is of utmost relevance. METHODS We aimed to construct an imputation and analysis web-platform, that primarily facilitates, but is not limited to, genotype imputation on East-Asians. The goal is to provide a collaborative imputation platform for researchers in the public domain, towards rapidly and efficiently conducting accurate genotype imputation. RESULTS We present an online genotype imputation platform, Multi-ethnic Imputation System (MI-System) (https://misystem.cgm.ntu.edu.tw/), that offers users 3 established pipelines, SHAPEIT2-IMPUTE2, SHAPEIT4-IMPUTE5, and Beagle5.1 for conducting imputation analyses. In addition to 1000 Genomes and Hapmap3, a new customized Taiwan Biobank (TWB) reference panel, specifically created for Taiwanese-Chinese ancestry is provided. MI-System further offers functions to create customized reference panels to be used for imputation, conduct quality control, split whole genome data into chromosomes, and convert genome builds. CONCLUSION Users can upload their genotype data and perform imputation with minimum effort and resources. The utility functions further can be utilized to preprocess user uploaded data with easy clicks. MI-System potentially contributes to Asian-population genetics research, while eliminating the requirement for high performing computational resources and bioinformatics expertise. It will enable an increased pace of research and provide a knowledge-base for genetic carriers of complex diseases, therefore greatly enhancing patient-driven research. STATEMENT OF SIGNIFICANCE Multi-ethnic Imputation System (MI-System), primarily facilitates, but is not limited to, imputation on East-Asians, through 3 established prephasing-imputation pipelines, SHAPEIT2-IMPUTE2, SHAPEIT4-IMPUTE5, and Beagle5.1, where users can upload their genotype data and perform imputation and other utility functions with minimum effort and resources. A new customized Taiwan Biobank (TWB) reference panel, specifically created for Taiwanese-Chinese ancestry is provided. Utility functions include (a) create customized reference panels, (b) conduct quality control, (c) split whole genome data into chromosomes, and (d) convert genome builds. Users can also combine 2 reference panels using the system and use combined panels as reference to conduct imputation using MI-System.
Collapse
Affiliation(s)
- Amrita Chattopadhyay
- Bioinformatics and Biostatistics Core, Centre of Genomic and Precision Medicine, National Taiwan University, Taipei 10055, Taiwan
| | - Chien-Yueh Lee
- Master Program for Biomedical Engineering, College of Biomedical Engineering, China Medical University, Taichung 40402, Taiwan
| | - Ying-Cheng Shen
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
| | - Kuan-Chen Lu
- Department of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei 10055, Taiwan
| | - Tzu-Hung Hsiao
- Department of Medical Research, Taichung Veterans General Hospital, Taiwan
| | - Ching-Heng Lin
- Department of Medical Research, Taichung Veterans General Hospital, Taiwan
| | - Liang-Chuan La
- Bioinformatics and Biostatistics Core, Centre of Genomic and Precision Medicine, National Taiwan University, Taipei 10055, Taiwan; Graduate Institute of Physiology, College of Medicine, National Taiwan University, Taipei 10051, Taiwan
| | - Mong-Hsun Tsai
- Bioinformatics and Biostatistics Core, Centre of Genomic and Precision Medicine, National Taiwan University, Taipei 10055, Taiwan; Institute of Biotechnology, National Taiwan University, Taipei 10672, Taiwan; Center of Biotechnology, National Taiwan University, Taipei 10672, Taiwan
| | - Tzu-Pin Lu
- Bioinformatics and Biostatistics Core, Centre of Genomic and Precision Medicine, National Taiwan University, Taipei 10055, Taiwan; Department of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei 10055, Taiwan
| | - Eric Y Chuang
- Bioinformatics and Biostatistics Core, Centre of Genomic and Precision Medicine, National Taiwan University, Taipei 10055, Taiwan; Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan; Biomedical Technology and Device Research Laboratories, Industrial Technology Research Institute, Hsinchu, Taiwan.
| |
Collapse
|
11
|
Sun L, Wang Z, Lu T, Manolio TA, Paterson AD. eXclusionarY: 10 years later, where are the sex chromosomes in GWASs? Am J Hum Genet 2023; 110:903-912. [PMID: 37267899 DOI: 10.1016/j.ajhg.2023.04.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023] Open
Abstract
10 years ago, a detailed analysis showed that only 33% of genome-wide association study (GWAS) results included the X chromosome. Multiple recommendations were made to combat such exclusion. Here, we re-surveyed the research landscape to determine whether these earlier recommendations had been translated. Unfortunately, among the genome-wide summary statistics reported in 2021 in the NHGRI-EBI GWAS Catalog, only 25% provided results for the X chromosome and 3% for the Y chromosome, suggesting that the exclusion phenomenon not only persists but has also expanded into an exclusionary problem. Normalizing by physical length of the chromosome, the average number of studies published through November 2022 with genome-wide-significant findings on the X chromosome is ∼1 study/Mb. By contrast, it ranges from ∼6 to ∼16 studies/Mb for chromosomes 4 and 19, respectively. Compared with the autosomal growth rate of ∼0.086 studies/Mb/year over the last decade, studies of the X chromosome grew at less than one-seventh that rate, only ∼0.012 studies/Mb/year. Among the studies that reported significant associations on the X chromosome, we noted extreme heterogeneities in data analysis and reporting of results, suggesting the need for clear guidelines. Unsurprisingly, among the 430 scores sampled from the PolyGenic Score Catalog, 0% contained weights for sex chromosomal SNPs. To overcome the dearth of sex chromosome analyses, we provide five sets of recommendations and future directions. Finally, until the sex chromosomes are included in a whole-genome study, instead of GWASs, we propose such studies would more properly be referred to as "AWASs," meaning "autosome-wide scans."
Collapse
Affiliation(s)
- Lei Sun
- Department of Statistical Sciences, Faculty of Arts and Science, University of Toronto, Toronto, ON, Canada; Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.
| | - Zhong Wang
- Department of Statistics and Data Science, Faculty of Science, National University of Singapore, Singapore
| | - Tianyuan Lu
- Department of Statistical Sciences, Faculty of Arts and Science, University of Toronto, Toronto, ON, Canada; Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC, Canada
| | - Teri A Manolio
- Division of Genomic Medicine, National Human Genome Research Institute, NIH, Bethesda, MD, USA
| | - Andrew D Paterson
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada; Division of Epidemiology, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada; Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada.
| |
Collapse
|
12
|
Why does the X chromosome lag behind autosomes in GWAS findings? PLoS Genet 2023; 19:e1010472. [PMID: 36848382 PMCID: PMC9997976 DOI: 10.1371/journal.pgen.1010472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 03/09/2023] [Accepted: 02/15/2023] [Indexed: 03/01/2023] Open
Abstract
The X-chromosome is among the largest human chromosomes. It differs from autosomes by a number of important features including hemizygosity in males, an almost complete inactivation of one copy in females, and unique patterns of recombination. We used data from the Catalog of Published Genome Wide Association Studies to compare densities of the GWAS-detected SNPs on the X-chromosome and autosomes. The density of GWAS-detected SNPs on the X-chromosome is 6-fold lower compared to the density of the GWAS-detected SNPs on autosomes. Differences between the X-chromosome and autosomes cannot be explained by differences in the overall SNP density, lower X-chromosome coverage by genotyping platforms or low call rate of X-chromosomal SNPs. Similar differences in the density of GWAS-detected SNPs were found in female-only GWASs (e.g. ovarian cancer GWASs). We hypothesized that the lower density of GWAS-detected SNPs on the X-chromosome compared to autosomes is not a result of a methodological bias, e.g. differences in coverage or call rates, but has a real underlying biological reason-a lower density of functional SNPs on the X-chromosome versus autosomes. This hypothesis is supported by the observation that (i) the overall SNP density of X-chromosome is lower compared to the SNP density on autosomes and that (ii) the density of genic SNPs on the X-chromosome is lower compared to autosomes while densities of intergenic SNPs are similar.
Collapse
|
13
|
Nanjala R, Mbiyavanga M, Hashim S, de Villiers S, Mulder N. Assessing HLA imputation accuracy in a West African population. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.23.525129. [PMID: 36747714 PMCID: PMC9900754 DOI: 10.1101/2023.01.23.525129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
The Human Leukocyte Antigen (HLA) region plays an important role in autoimmune and infectious diseases. HLA is a highly polymorphic region and thus difficult to impute. We therefore sought to evaluate HLA imputation accuracy, specifically in a West African population, since they are understudied and are known to harbor high genetic diversity. The study sets were selected from Gambian individuals within the Gambian Genome Variation Project (GGVP) Whole Genome Sequence datasets. Two different arrays, Illumina Omni 2.5 and Human Hereditary and Health in Africa (H3Africa), were assessed for the appropriateness of their markers, and these were used to test several imputation panels and tools. The reference panels were chosen from the 1000 Genomes dataset (1kg-All), 1000 Genomes African dataset (1kg-Afr), 1000 Genomes Gambian dataset (1kg-Gwd), H3Africa dataset and the HLA Multi-ethnic dataset. HLA-A, HLA-B and HLA-C alleles were imputed using HIBAG, SNP2HLA, CookHLA and Minimac4, and concordance rate was used as an assessment metric. Overall, the best performing tool was found to be HIBAG, with a concordance rate of 0.84, while the best performing reference panel was the H3Africa panel with a concordance rate of 0.62. Minimac4 (0.75) was shown to increase HLA-B allele imputation accuracy compared to HIBAG (0.71), SNP2HLA (0.51) and CookHLA (0.17). The H3Africa and Illumina Omni 2.5 array performances were comparable, showing that genotyping arrays have less influence on HLA imputation in West African populations. The findings show that using a larger population-specific reference panel and the HIBAG tool improves the accuracy of HLA imputation in West African populations. Author Summary For studies that associate a particular HLA type to a phenotypic trait for instance HIV susceptibility or control, genotype imputation remains the main method for acquiring a larger sample size. Genotype imputation, process of inferring unobserved genotypes, is a statistical technique and thus deals with probabilities. Also, the HLA region is highly variable and therefore difficult to impute. In view of this, it is important to assess HLA imputation accuracy especially in African populations. This is because the African genome has high diversity, and such studies have hardly been conducted in African populations. This work highlights that using HIBAG imputation tool and a larger population-specific reference panel increases HLA imputation accuracy in an African population.
Collapse
Affiliation(s)
- Ruth Nanjala
- Department of Biochemistry and Biotechnology, Pwani University, Kenya
- Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, South Africa
| | - Mamana Mbiyavanga
- Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, South Africa
| | - Suhaila Hashim
- Department of Biochemistry and Biotechnology, Pwani University, Kenya
- Pwani University Biosciences Research Centre, Pwani University, Kenya
| | - Santie de Villiers
- Department of Biochemistry and Biotechnology, Pwani University, Kenya
- Pwani University Biosciences Research Centre, Pwani University, Kenya
| | - Nicola Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, South Africa
| |
Collapse
|
14
|
Baldrighi GN, Nova A, Bernardinelli L, Fazia T. A Pipeline for Phasing and Genotype Imputation on Mixed Human Data (Parents-Offspring Trios and Unrelated Subjects) by Reviewing Current Methods and Software. LIFE (BASEL, SWITZERLAND) 2022; 12:life12122030. [PMID: 36556394 PMCID: PMC9781110 DOI: 10.3390/life12122030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 12/01/2022] [Accepted: 12/02/2022] [Indexed: 12/09/2022]
Abstract
Genotype imputation has become an essential prerequisite when performing association analysis. It is a computational technique that allows us to infer genetic markers that have not been directly genotyped, thereby increasing statistical power in subsequent association studies, which consequently has a crucial impact on the identification of causal variants. Many features need to be considered when choosing the proper algorithm for imputation, including the target sample on which it is performed, i.e., related individuals, unrelated individuals, or both. Problems could arise when dealing with a target sample made up of mixed data, composed of both related and unrelated individuals, especially since the scientific literature on this topic is not sufficiently clear. To shed light on this issue, we examined existing algorithms and software for performing phasing and imputation on mixed human data from SNP arrays, specifically when related subjects belong to trios. By discussing the advantages and limitations of the current algorithms, we identified LD-based methods as being the most suitable for reconstruction of haplotypes in this specific context, and we proposed a feasible pipeline that can be used for imputing genotypes in both phased and unphased human data.
Collapse
|
15
|
Sun Q, Yang Y, Rosen JD, Jiang MZ, Chen J, Liu W, Wen J, Raffield LM, Pace RG, Zhou YH, Wright FA, Blackman SM, Bamshad MJ, Gibson RL, Cutting GR, Knowles MR, Schrider DR, Fuchsberger C, Li Y. MagicalRsq: Machine-learning-based genotype imputation quality calibration. Am J Hum Genet 2022; 109:1986-1997. [PMID: 36198314 PMCID: PMC9674945 DOI: 10.1016/j.ajhg.2022.09.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Accepted: 09/16/2022] [Indexed: 01/26/2023] Open
Abstract
Whole-genome sequencing (WGS) is the gold standard for fully characterizing genetic variation but is still prohibitively expensive for large samples. To reduce costs, many studies sequence only a subset of individuals or genomic regions, and genotype imputation is used to infer genotypes for the remaining individuals or regions without sequencing data. However, not all variants can be well imputed, and the current state-of-the-art imputation quality metric, denoted as standard Rsq, is poorly calibrated for lower-frequency variants. Here, we propose MagicalRsq, a machine-learning-based method that integrates variant-level imputation and population genetics statistics, to provide a better calibrated imputation quality metric. Leveraging WGS data from the Cystic Fibrosis Genome Project (CFGP), and whole-exome sequence data from UK BioBank (UKB), we performed comprehensive experiments to evaluate the performance of MagicalRsq compared to standard Rsq for partially sequenced studies. We found that MagicalRsq aligns better with true R2 than standard Rsq in almost every situation evaluated, for both European and African ancestry samples. For example, when applying models trained from 1,992 CFGP sequenced samples to an independent 3,103 samples with no sequencing but TOPMed imputation from array genotypes, MagicalRsq, compared to standard Rsq, achieved net gains of 1.4 million rare, 117k low-frequency, and 18k common variants, where net gains were gained numbers of correctly distinguished variants by MagicalRsq over standard Rsq. MagicalRsq can serve as an improved post-imputation quality metric and will benefit downstream analysis by better distinguishing well-imputed variants from those poorly imputed. MagicalRsq is freely available on GitHub.
Collapse
Affiliation(s)
- Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yingxi Yang
- Department of Statistics and Data Science, Yale University, New Haven, CT 06520, USA
| | - Jonathan D Rosen
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Min-Zhi Jiang
- Department of Applied Physical Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jiawen Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Weifang Liu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jia Wen
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Rhonda G Pace
- Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yi-Hui Zhou
- Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - Fred A Wright
- Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA; Bioinformatics Research Center and Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| | - Scott M Blackman
- Division of Pediatric Endocrinology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Michael J Bamshad
- Department of Pediatrics, University of Washington, Seattle, WA 98105, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Ronald L Gibson
- Department of Pediatrics, University of Washington, Seattle, WA 98105, USA
| | - Garry R Cutting
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Michael R Knowles
- Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Christian Fuchsberger
- Institute for Biomedicine, Eurac Research (affiliated with the University of Lübeck), Bolzano, Italy.
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
| |
Collapse
|
16
|
De Marino A, Mahmoud AA, Bose M, Bircan KO, Terpolovsky A, Bamunusinghe V, Bohn S, Khan U, Novković B, Yazdi PG. A comparative analysis of current phasing and imputation software. PLoS One 2022; 17:e0260177. [PMID: 36260643 PMCID: PMC9581364 DOI: 10.1371/journal.pone.0260177] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 09/01/2022] [Indexed: 12/02/2022] Open
Abstract
Whole-genome data has become significantly more accessible over the last two decades. This can largely be attributed to both reduced sequencing costs and imputation models which make it possible to obtain nearly whole-genome data from less expensive genotyping methods, such as microarray chips. Although there are many different approaches to imputation, the Hidden Markov Model (HMM) remains the most widely used. In this study, we compared the latest versions of the most popular HMM-based tools for phasing and imputation: Beagle5.4, Eagle2.4.1, Shapeit4, Impute5 and Minimac4. We benchmarked them on four input datasets with three levels of chip density. We assessed each imputation software on the basis of accuracy, speed and memory usage, and showed how the choice of imputation accuracy metric can result in different interpretations. The highest average concordance rate was achieved by Beagle5.4, followed by Impute5 and Minimac4, using a reference-based approach during phasing and the highest density chip. IQS and R2 metrics revealed that Impute5 and Minimac4 obtained better results for low frequency markers, while Beagle5.4 remained more accurate for common markers (MAF>5%). Computational load as measured by run time was lower for Beagle5.4 than Minimac4 and Impute5, while Minimac4 utilized the least memory of the imputation tools we compared. ShapeIT4, used the least memory of the phasing tools examined with genotype chip data, while Eagle2.4.1 used the least memory phasing WGS data. Finally, we determined the combination of phasing software, imputation software, and reference panel, best suited for different situations and analysis needs and created an automated pipeline that provides a way for users to create customized chips designed to optimize their imputation results.
Collapse
Affiliation(s)
- Adriano De Marino
- Research & Development, SelfDecode, Miami, FL, United States of America
| | | | - Madhuchanda Bose
- Research & Development, SelfDecode, Miami, FL, United States of America
| | | | | | | | - Sandra Bohn
- Research & Development, SelfDecode, Miami, FL, United States of America
| | - Umar Khan
- Research & Development, SelfDecode, Miami, FL, United States of America
| | - Biljana Novković
- Research & Development, SelfDecode, Miami, FL, United States of America
| | - Puya G. Yazdi
- Research & Development, SelfDecode, Miami, FL, United States of America
- * E-mail:
| |
Collapse
|
17
|
Kang KW, Cho YW, Lee SK, Jung KY, Kim JH, Kim DW, Lee SA, Hong SB, Na IS, Lee SH, Baek WK, Choi SY, Kim MK. Multidimensional Early Prediction Score for Drug-Resistant Epilepsy. J Clin Neurol 2022; 18:553-561. [PMID: 36062773 PMCID: PMC9444554 DOI: 10.3988/jcn.2022.18.5.553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/16/2022] [Accepted: 03/16/2022] [Indexed: 11/17/2022] Open
Abstract
Background and Purpose Achieving favorable postoperative outcomes in patients with drug-resistant epilepsy (DRE) requires early referrals for preoperative examinations. The purpose of this study was to investigate the possibility of a user-friendly early DRE prediction model that is easy for nonexperts to utilize. Methods A two-step genotype analysis was performed, by applying 1) whole-exome sequencing (WES) to the initial test set (n=243) and 2) target sequencing to the validation set (n=311). Based on a multicenter case–control study design using the WES data set, 11 genetic and 2 clinical predictors were selected to develop the DRE risk prediction model. The early prediction scores for DRE (EPS-DRE) was calculated for each group of the selected genetic predictors (EPS-DREgen), clinical predictors (EPS-DREcln), and two types of predictor mix (EPS-DREmix) in both the initial test set and the validation set. Results The multidimensional EPS-DREmix of the predictor mix group provided a better match to the outcome data than did the unidimensional EPS-DREgen or EPS-DREcln. Unlike previous studies, the EPS-DREmix model was developed using only 11 genetic and 2 clinical predictors, but it exhibited good discrimination ability in distinguishing DRE from drug-responsive epilepsy. These results were verified using an unrelated validation set. Conclusions Our results suggest that EPS-DREmix has good performance in early DRE prediction and is a user-friendly tool that is easy to apply in real clinical trials, especially by nonexperts who do not have detailed knowledge or equipment for assessing DRE. Further studies are needed to improve the performance of the EPS-DREmix model.
Collapse
Affiliation(s)
- Kyung Wook Kang
- Department of Neurology, Chonnam National University Hospital, Chonnam National University Medical School, Gwangju, Korea
| | - Yong Won Cho
- Department of Neurology, Dongsan Medical Center, Keimyung University School of Medicine, Daegu, Korea
| | - Sang Kun Lee
- Department of Neurology, Comprehensive Epilepsy Center, Laboratory for Neurotherapeutics, Biomedical Research Institute, Seoul National University Hospital, Seoul, Korea Program in Neuroscience, Seoul National University College of Medicine, Seoul, Korea
| | - Ki-Young Jung
- Department of Neurology, Comprehensive Epilepsy Center, Laboratory for Neurotherapeutics, Biomedical Research Institute, Seoul National University Hospital, Seoul, Korea Program in Neuroscience, Seoul National University College of Medicine, Seoul, Korea
| | - Ji Hyun Kim
- Department of Neurology, Korea University Guro Hospital, Korea University College of Medicine, Seoul, Korea
| | - Dong Wook Kim
- Department of Neurology, Konkuk University School of Medicine, Seoul, Korea
| | - Sang-Ahm Lee
- Department of Neurology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
| | - Seung Bong Hong
- Department of Neurology, Samsung Medical Center, Samsung Advanced Institute for Health Sciences & Technology (SAIHST), Sungkyunkwan University School of Medicine, Samsung Biomedical Research Institute (SBRI), Seoul, Korea.,National Epilepsy Care Center, Seoul, Korea
| | - In-Seop Na
- National Program of Excellence in Software Centre, Chosun University, Gwangju, Korea
| | - So-Hyun Lee
- Department of Biomedical Science, Chonnam National University Medical School, Hwasun, Korea
| | - Won-Ki Baek
- Department of Microbiology, Keimyung University School of Medicine, Daegu, Korea
| | - Seok-Yong Choi
- Department of Biomedical Science, Chonnam National University Medical School, Hwasun, Korea.
| | - Myeong-Kyu Kim
- Department of Neurology, Chonnam National University Hospital, Chonnam National University Medical School, Gwangju, Korea.
| |
Collapse
|
18
|
Examining Barriers and Opportunities of Conducting Genome-Wide Association Studies in Developing Countries. CURR EPIDEMIOL REP 2022. [DOI: 10.1007/s40471-022-00303-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
19
|
Barua M, Paterson AD. Population-based studies reveal an additive role of type IV collagen variants in hematuria and albuminuria. Pediatr Nephrol 2022; 37:253-262. [PMID: 33635378 DOI: 10.1007/s00467-021-04934-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 10/31/2020] [Accepted: 01/07/2021] [Indexed: 02/08/2023]
Abstract
Specific variants in genes that encode the α3α4α5 chains of type IV collagen cause Alport syndrome (AS), which encompass a clinical spectrum from isolated hematuria to multisystem disease affecting sight, hearing and kidney function. The commonest form is X-linked Alport syndrome (XLAS; COL4A5) with autosomal AS (COL4A3 and COL4A4) comprising a minority of cases. While historic data estimates the frequency of AS at 1:5000-10,000, recent population-based genetic studies suggest the prevalence is considerably higher. Genome-wide association studies (GWAS) have been performed in the Icelandic (deCODE) and UK (UK Biobank) populations, demonstrating an association of type IV collagen gene variants with AS relevant kidney traits. In the Icelandic population, 1 in 600 carries a 2.5-kb COL4A3 coding deletion or a COL4A3 missense variant (rs200287952[A], Gly695Arg), both of which are strongly associated with hematuria and albuminuria (P values = 1.9 × 10-5 to 2.5 × 10-20). In the UK Biobank, COL4A4 rs35138315 (Ser969X; carrier frequency 0.13%) is strongly associated with both hematuria and albuminuria (P = 1.5 × 10-73). Thus, the frequency for autosomal AS is 5-16 times higher than the historic prevalence of all forms of the disorder. Furthermore, COL4A4 rs3518315 (Ser969X) is also a reported founder mutation in families with autosomal dominant focal and segmental glomerulosclerosis and autosomal recessive forms of AS. This supports an additive mode of inheritance for specific variants, wherein a number of copies of a mutation influence disease severity in a cumulative fashion. These studies did not include the X chromosome, excluding analysis of COL4A5, which represents an area for future study.
Collapse
Affiliation(s)
- Moumita Barua
- Division of Nephrology, Toronto General Hospital, 200 Elizabeth Street, 8NU-855, Toronto, ON, M5G 2C4, Canada. .,Department of Medicine, University of Toronto, Toronto, Canada. .,Toronto General Hospital Research Institute, University Health Network, Toronto, Canada. .,Institute of Medical Sciences, University of Toronto, Toronto, Canada.
| | - Andrew D Paterson
- Institute of Medical Sciences, University of Toronto, Toronto, Canada.,Divisions of Epidemiology and Biostatistics, Dalla Lana School of Public Health, Toronto, Canada.,Genetics and Genome Biology, Research Institute at Hospital for Sick Children, Toronto, Canada
| |
Collapse
|
20
|
Deng T, Zhang P, Garrick D, Gao H, Wang L, Zhao F. Comparison of Genotype Imputation for SNP Array and Low-Coverage Whole-Genome Sequencing Data. Front Genet 2022; 12:704118. [PMID: 35046990 PMCID: PMC8762119 DOI: 10.3389/fgene.2021.704118] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Accepted: 12/03/2021] [Indexed: 11/17/2022] Open
Abstract
Genotype imputation is the term used to describe the process of inferring unobserved genotypes in a sample of individuals. It is a key step prior to a genome-wide association study (GWAS) or genomic prediction. The imputation accuracy will directly influence the results from subsequent analyses. In this simulation-based study, we investigate the accuracy of genotype imputation in relation to some factors characterizing SNP chip or low-coverage whole-genome sequencing (LCWGS) data. The factors included the imputation reference population size, the proportion of target markers /SNP density, the genetic relationship (distance) between the target population and the reference population, and the imputation method. Simulations of genotypes were based on coalescence theory accounting for the demographic history of pigs. A population of simulated founders diverged to produce four separate but related populations of descendants. The genomic data of 20,000 individuals were simulated for a 10-Mb chromosome fragment. Our results showed that the proportion of target markers or SNP density was the most critical factor affecting imputation accuracy under all imputation situations. Compared with Minimac4, Beagle5.1 reproduced higher-accuracy imputed data in most cases, more notably when imputing from the LCWGS data. Compared with SNP chip data, LCWGS provided more accurate genotype imputation. Our findings provided a relatively comprehensive insight into the accuracy of genotype imputation in a realistic population of domestic animals.
Collapse
Affiliation(s)
- Tianyu Deng
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Pengfei Zhang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Dorian Garrick
- A. L. Rae Centre of Genetics and Breeding, Massey University, Hamilton, New Zealand
| | - Huijiang Gao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lixian Wang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Fuping Zhao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| |
Collapse
|
21
|
Genome and transcriptome profiling of spontaneous preterm birth phenotypes. Sci Rep 2022; 12:1003. [PMID: 35046466 PMCID: PMC8770724 DOI: 10.1038/s41598-022-04881-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 12/23/2021] [Indexed: 12/27/2022] Open
Abstract
Preterm birth (PTB) occurs before 37 weeks of gestation. Risk factors include genetics and infection/inflammation. Different mechanisms have been reported for spontaneous preterm birth (SPTB) and preterm birth following preterm premature rupture of membranes (PPROM). This study aimed to identify early pregnancy biomarkers of SPTB and PPROM from the maternal genome and transcriptome. Pregnant women were recruited at the Liverpool Women’s Hospital. Pregnancy outcomes were categorised as SPTB, PPROM (≤ 34 weeks gestation, n = 53), high-risk term (HTERM, ≥ 37 weeks, n = 126) or low-risk (no history of SPTB/PPROM) term (LTERM, ≥ 39 weeks, n = 188). Blood samples were collected at 16 and 20 weeks gestation from which, genome (UK Biobank Axiom array) and transcriptome (Clariom D Human assay) data were acquired. PLINK and R were used to perform genetic association and differential expression analyses and expression quantitative trait loci (eQTL) mapping. Several significant molecular signatures were identified across the analyses in preterm cases. Genome-wide significant SNP rs14675645 (ASTN1) was associated with SPTB whereas microRNA-142 transcript and PPARG1-FOXP3 gene set were associated with PPROM at week 20 of gestation and is related to inflammation and immune response. This study has determined genomic and transcriptomic candidate biomarkers of SPTB and PPROM that require validation in diverse populations.
Collapse
|
22
|
Xu ZM, Rüeger S, Zwyer M, Brites D, Hiza H, Reinhard M, Rutaihwa L, Borrell S, Isihaka F, Temba H, Maroa T, Naftari R, Hella J, Sasamalo M, Reither K, Portevin D, Gagneux S, Fellay J. Using population-specific add-on polymorphisms to improve genotype imputation in underrepresented populations. PLoS Comput Biol 2022; 18:e1009628. [PMID: 35025869 PMCID: PMC8791479 DOI: 10.1371/journal.pcbi.1009628] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2021] [Revised: 01/26/2022] [Accepted: 11/10/2021] [Indexed: 12/13/2022] Open
Abstract
Genome-wide association studies rely on the statistical inference of untyped variants, called imputation, to increase the coverage of genotyping arrays. However, the results are often suboptimal in populations underrepresented in existing reference panels and array designs, since the selected single nucleotide polymorphisms (SNPs) may fail to capture population-specific haplotype structures, hence the full extent of common genetic variation. Here, we propose to sequence the full genomes of a small subset of an underrepresented study cohort to inform the selection of population-specific add-on tag SNPs and to generate an internal population-specific imputation reference panel, such that the remaining array-genotyped cohort could be more accurately imputed. Using a Tanzania-based cohort as a proof-of-concept, we demonstrate the validity of our approach by showing improvements in imputation accuracy after the addition of our designed add-on tags to the base H3Africa array. Genome-wide association studies, which study the association between genetic variants and various phenotypes, typically rely on genotyping arrays. Only a small proportion of genetic variants within the genome are typed on genotyping arrays. Untyped variants are statistically inferred through a process known as genotype imputation, where correlations between variants (haplotypes) observed in external reference panels are leveraged to infer untyped variants in the study population. However, for study populations that are underrepresented in existing reference panels, the quality of imputation is often sub-optimal. This is because typed variants incorporated on existing genotyping arrays can be unsuitable for the study population, and haplotype structures can be different between the reference and the study population. Here, we illustrate an approach to select a custom set of population-specific typed variants to improve genotype imputation in such underrepresented populations.
Collapse
Affiliation(s)
- Zhi Ming Xu
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Sina Rüeger
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Michaela Zwyer
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Daniela Brites
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Hellen Hiza
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
- Ifakara Health Institute, Dar es Salaam, Tanzania
| | - Miriam Reinhard
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Liliana Rutaihwa
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Sonia Borrell
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | | | | | - Thomas Maroa
- Ifakara Health Institute, Dar es Salaam, Tanzania
| | | | - Jerry Hella
- Ifakara Health Institute, Dar es Salaam, Tanzania
| | | | - Klaus Reither
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Damien Portevin
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Sebastien Gagneux
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Jacques Fellay
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Precision Medicine Unit, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
- * E-mail:
| |
Collapse
|
23
|
Swart Y, Uren C, van Helden PD, Hoal EG, Möller M. Local Ancestry Adjusted Allelic Association Analysis Robustly Captures Tuberculosis Susceptibility Loci. Front Genet 2021; 12:716558. [PMID: 34721521 PMCID: PMC8554120 DOI: 10.3389/fgene.2021.716558] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 10/01/2021] [Indexed: 11/13/2022] Open
Abstract
Pulmonary tuberculosis (TB), caused by Mycobacterium tuberculosis, is a complex disease. The risk of developing active TB is in part determined by host genetic factors. Most genetic studies investigating TB susceptibility fail to replicate association signals particularly across diverse populations. South African populations arose because of multi-wave genetic admixture from the indigenous KhoeSan, Bantu-speaking Africans, Europeans, Southeast Asian-and East Asian populations. This has led to complex genetic admixture with heterogenous patterns of linkage disequilibrium and associated traits. As a result, precise estimation of both global and local ancestry is required to prevent both false positive and false-negative associations. Here, 820 individuals from South Africa were genotyped on the SNP-dense Illumina Multi-Ethnic Genotyping Array (∼1.7M SNPs) followed by local and global ancestry inference using RFMix. Local ancestry adjusted allelic association (LAAA) models were utilized owing to the extensive genetic heterogeneity present in this population. Hence, an interaction term, comprising the identification of the minor allele that corresponds to the ancestry present at the specific locus under investigation, was included as a covariate. One SNP (rs28647531) located on chromosome 4q22 was significantly associated with TB susceptibility and displayed a SNP minor allelic effect (G allele, frequency = 0.204) whilst correcting for local ancestry for Bantu-speaking African ancestry (p-value = 5.518 × 10-7; OR = 3.065; SE = 0.224). Although no other variants passed the significant threshold, clear differences were observed between the lead variants identified for each ancestry. Furthermore, the LAAA model robustly captured the source of association signals in multi-way admixed individuals from South Africa and allowed the identification of ancestry-specific disease risk alleles associated with TB susceptibility that have previously been missed.
Collapse
Affiliation(s)
- Yolandi Swart
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Caitlin Uren
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa.,Centre for Bioinformatics and Computational Biology, Stellenbosch University, Stellenbosch, South Africa
| | - Paul D van Helden
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Eileen G Hoal
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Marlo Möller
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa.,Centre for Bioinformatics and Computational Biology, Stellenbosch University, Stellenbosch, South Africa
| |
Collapse
|
24
|
Elsayed I, Martinez-Carrasco A, Cornejo-Olivas M, Bandres-Ciga S. Mapping the Diverse and Inclusive Future of Parkinson's Disease Genetics and Its Widespread Impact. Genes (Basel) 2021; 12:1681. [PMID: 34828286 PMCID: PMC8624537 DOI: 10.3390/genes12111681] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 10/19/2021] [Accepted: 10/20/2021] [Indexed: 12/27/2022] Open
Abstract
Over the last decades, genetics has been the engine that has pushed us along on our voyage to understand the etiology of Parkinson's disease (PD). Although a large number of risk loci and causative mutations for PD have been identified, it is clear that much more needs to be done to solve the missing heritability mystery. Despite remarkable efforts, as a field, we have failed in terms of diversity and inclusivity. The vast majority of genetic studies in PD have focused on individuals of European ancestry, leading to a gap of knowledge on the existing genetic differences across populations and PD as a whole. As we move forward, shedding light on the genetic architecture contributing to PD in non-European populations is essential, and will provide novel insight into the generalized genetic map of the disease. In this review, we discuss how better representation of understudied ancestral groups in PD genetics research requires addressing and resolving all the challenges that hinder the inclusion of these populations. We further provide an overview of PD genetics in the clinics, covering the current challenges and limitations of genetic testing and counseling. Finally, we describe the impact of worldwide collaborative initiatives in the field, shaping the future of the new era of PD genetics as we advance in our understanding of the genetic architecture of PD.
Collapse
Affiliation(s)
- Inas Elsayed
- Faculty of Pharmacy, University of Gezira, Wad Medani P.O. Box 20, Sudan;
- International Parkinson Disease Genomics Consortium (IPDGC)-Africa, University of Gezira, Wad Medani P.O. Box 20, Sudan
| | | | - Mario Cornejo-Olivas
- Neurogenetics Research Center, Instituto Nacional de Ciencias Neurológicas, Lima 15003, Peru;
- Center for Global Health, Universidad Peruana Cayetano Heredia, Lima 15103, Peru
| | - Sara Bandres-Ciga
- Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, NIH, Bethesda, MD 20892, USA
| |
Collapse
|
25
|
Stahl K, Gola D, König IR. Assessment of Imputation Quality: Comparison of Phasing and Imputation Algorithms in Real Data. Front Genet 2021; 12:724037. [PMID: 34630519 PMCID: PMC8493217 DOI: 10.3389/fgene.2021.724037] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 08/26/2021] [Indexed: 01/02/2023] Open
Abstract
Despite the widespread use of genotype imputation tools and the availability of different approaches, late developments of currently used programs have not been compared comprehensively. We therefore assessed the performance of 35 combinations of phasing and imputation programs, including versions of SHAPEIT, Eagle, Beagle, minimac, PBWT, and IMPUTE, for genetic imputation of completely missing SNPs with a HRC reference panel regarding quality and speed. We used a data set comprising 1,149 fully sequenced individuals from the German population, subsetting the SNPs to approximate the Illumina Infinium-Omni5 array. Five hundred fifty-three thousand two hundred and thirty-four SNPs across two selected chromosomes were utilized for comparison between imputed and sequenced genotypes. We found that all tested programs with the exception of PBWT impute genotypes with very high accuracy (mean error rate < 0.005). PBTW hardly ever imputes the less frequent allele correctly (mean concordance for genotypes including the minor allele <0.0002). For all programs, imputation accuracy drops for rare alleles with a frequency <0.05. Even though overall concordance is high, concordance drops with genotype probability, indicating that low genotype probabilities are rare. The mean concordance of SNPs with a genotype probability <95% drops below 0.9, at which point disregarding imputed genotypes might prove favorable. For fast and accurate imputation, a combination of Eagle2.4.1 using a reference panel for phasing and Beagle5.1 for imputation performs best. Replacing Beagle5.1 with minimac3, minimac4, Beagle4.1, or IMPUTE4 results in a small gain in accuracy at a high cost of speed.
Collapse
Affiliation(s)
- Katharina Stahl
- Department of Genetic Epidemiology, University Medical Center, University of Göttingen, Göttingen, Germany.,Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Lübeck, Germany
| | - Damian Gola
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Lübeck, Germany
| | - Inke R König
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Lübeck, Germany.,German Center for Cardiovascular Research, Partner Site Hamburg/Kiel/Lübeck, Lübeck, Germany
| |
Collapse
|
26
|
Gupta J, Care A, Goodfellow L, Alfirevic Z, Lian LY, Müller-Myhsok B, Alfirevic A, Phelan M. Metabolic profiling of maternal serum of women at high-risk of spontaneous preterm birth using NMR and MGWAS approach. Biosci Rep 2021; 41:BSR20210759. [PMID: 34402867 PMCID: PMC8415214 DOI: 10.1042/bsr20210759] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 07/28/2021] [Accepted: 08/17/2021] [Indexed: 12/26/2022] Open
Abstract
Preterm birth (PTB) is a leading global cause of infant mortality. Risk factors include genetics, lifestyle choices and infection. Understanding the mechanism of PTB could aid the development of novel approaches to prevent PTB. This study aimed to investigate the metabolic biomarkers of PTB in early pregnancy and the association of significant metabolites with participant genotypes. Maternal sera collected at 16 and 20 weeks of gestation, from women who previously experienced PTB (high-risk) and women who did not (low-risk controls), were analysed using 1H nuclear magnetic resonance (NMR) metabolomics and genome-wide screening microarray. ANOVA and probabilistic neural network (PNN) modelling were performed on the spectral bins. Metabolomics genome-wide association (MGWAS) of the spectral bins and genotype data from the same participants was applied to determine potential metabolite-gene pathways. Phenylalanine, acetate and lactate metabolite differences between PTB cases and controls were obtained by ANOVA and PNN showed strong prediction at week 20 (AUC = 0.89). MGWAS identified several metabolite bins with strong genetic associations. Cis-eQTL analysis highlighted TRAF1 (involved in the inflammatory pathway) local to a non-coding SNP associated with lactate at week 20 of gestation. MGWAS of a well-defined cohort of participants highlighted a lactate-TRAF1 relationship that could potentially contribute to PTB.
Collapse
Affiliation(s)
- Juhi K. Gupta
- Wolfson Centre for Personalised Medicine, Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 3GL, UK
- Harris-Wellbeing Research Centre, University Department, Liverpool Women’s Hospital, Liverpool, L8 7SS, UK
| | - Angharad Care
- Harris-Wellbeing Research Centre, University Department, Liverpool Women’s Hospital, Liverpool, L8 7SS, UK
| | - Laura Goodfellow
- Harris-Wellbeing Research Centre, University Department, Liverpool Women’s Hospital, Liverpool, L8 7SS, UK
| | - Zarko Alfirevic
- Harris-Wellbeing Research Centre, University Department, Liverpool Women’s Hospital, Liverpool, L8 7SS, UK
| | - Lu-Yun Lian
- NMR Centre for Structural Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK
| | - Bertram Müller-Myhsok
- Wolfson Centre for Personalised Medicine, Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 3GL, UK
- Max Planck Institute of Psychiatry, Munich 80804, Germany
| | - Ana Alfirevic
- Wolfson Centre for Personalised Medicine, Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 3GL, UK
- Harris-Wellbeing Research Centre, University Department, Liverpool Women’s Hospital, Liverpool, L8 7SS, UK
| | - Marie M. Phelan
- NMR Centre for Structural Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK
| |
Collapse
|
27
|
Care AG, Gupta JK, Goodfellow L, Zhang G, Monangi N, Belling E, Landero J, Chappell J, Sharp A, Alfirevic A, Müller-Myhsok B, Muglia LJ, Alfirevic Z. Maternal selenium levels and whole genome screen in recurrent spontaneous preterm birth population: A nested case control study. Eur J Obstet Gynecol Reprod Biol 2021; 265:203-211. [PMID: 34534736 DOI: 10.1016/j.ejogrb.2021.08.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 08/11/2021] [Indexed: 10/20/2022]
Abstract
OBJECTIVE To establish if low maternal selenium (Se) was associated with sPTB in women with recurrent sPTB and identify genetic link with maternal Se levels. DESIGN Nested case-control study. SETTING Tertiary Maternity Hospital. POPULATION Plasma and whole blood from pregnant women with history of early sPTB/PPROM < 34+0 and European ancestry were obtained at 20 weeks (range 15-24 weeks). 'Cases' were recurrent PTB/PPROM < 34+0 weeks and term (≥37+0) deliveries were classified as 'high-risk controls.' Women with previous term births and index birth ≥ 39 weeks were 'low-risk controls'. METHODS Maternal plasma Se measured by ICP-MS was used as a continuous phenotype in a GWAS analysis. Se was added to a logistic regression model using PTB predictor variables. MAIN OUTCOME MEASURES Maternal Se concentration, recurrent early sPTB/PPROM. RESULTS 53/177 high-risk women had a recurrent sPTB/PPROM < 34+0weeks and were 2.7 times more likely to have a Se level < 83.3 ppm at 20weeks of pregnancy compared with low-risk term controls (n = 179), (RR 2.7, 95%CI 1.5-4.8; p = .001). One SNP from a non-coding region (FOXN3 intron variant, rs55793422) reached genome-wide significance level (p = 3.73E-08). Targeted analysis of Se gene variant did not show difference between preterm and term births. (χ2 test, OR = 0.95; 95%CI = 0.59-1.56; p = 0.82). When Se levels were added to a clinical prediction model, only an additional 5% of cases (n = 3) and 0.6% (n = 1) of controls were correctly identified. CONCLUSIONS Low plasma Se is associated with sPTB risk but is not sufficiently predictive at individual patient level. We did not find a genetic association between maternal Se levels and Se-related genes.
Collapse
Affiliation(s)
- Angharad G Care
- Harris Wellbeing Preterm Birth Research Centre, Department of Women's and Children's Health, University of Liverpool, Liverpool Women's Hospital, Liverpool, United Kingdom.
| | - Juhi K Gupta
- Harris Wellbeing Preterm Birth Research Centre, Department of Women's and Children's Health, University of Liverpool, Liverpool Women's Hospital, Liverpool, United Kingdom; Wolfson Centre for Personalised Medicine, Department of Molecular and Clinical Pharmacology, University of Liverpool, Liverpool, United Kingdom
| | - Laura Goodfellow
- Harris Wellbeing Preterm Birth Research Centre, Department of Women's and Children's Health, University of Liverpool, Liverpool Women's Hospital, Liverpool, United Kingdom
| | - Ge Zhang
- Division of Human Genetics, Center for Prevention of Preterm Birth, Perinatal Institute, Cincinnati Children's Hospital Medical Center, Department of Pediatrics and University of Cincinnati College of Medicine, Cincinnati, OH, United States
| | - Nagendra Monangi
- Division of Human Genetics, Center for Prevention of Preterm Birth, Perinatal Institute, Cincinnati Children's Hospital Medical Center, Department of Pediatrics and University of Cincinnati College of Medicine, Cincinnati, OH, United States
| | - Elizabeth Belling
- Division of Human Genetics, Center for Prevention of Preterm Birth, Perinatal Institute, Cincinnati Children's Hospital Medical Center, Department of Pediatrics and University of Cincinnati College of Medicine, Cincinnati, OH, United States
| | - Julio Landero
- Department of Chemistry, University of Cincinnati, Cincinnati, OH, United States
| | - Joanne Chappell
- Division of Human Genetics, Center for Prevention of Preterm Birth, Perinatal Institute, Cincinnati Children's Hospital Medical Center, Department of Pediatrics and University of Cincinnati College of Medicine, Cincinnati, OH, United States
| | - Andrew Sharp
- Harris Wellbeing Preterm Birth Research Centre, Department of Women's and Children's Health, University of Liverpool, Liverpool Women's Hospital, Liverpool, United Kingdom
| | - Ana Alfirevic
- Harris Wellbeing Preterm Birth Research Centre, Department of Women's and Children's Health, University of Liverpool, Liverpool Women's Hospital, Liverpool, United Kingdom; Wolfson Centre for Personalised Medicine, Department of Molecular and Clinical Pharmacology, University of Liverpool, Liverpool, United Kingdom
| | - Bertram Müller-Myhsok
- Waterhouse Building, University of Liverpool, Liverpool, United Kingdom; Max Plank Institute of Psychiatry, Munich, Germany
| | - Louis J Muglia
- Division of Human Genetics, Center for Prevention of Preterm Birth, Perinatal Institute, Cincinnati Children's Hospital Medical Center, Department of Pediatrics and University of Cincinnati College of Medicine, Cincinnati, OH, United States
| | - Zarko Alfirevic
- Harris Wellbeing Preterm Birth Research Centre, Department of Women's and Children's Health, University of Liverpool, Liverpool Women's Hospital, Liverpool, United Kingdom
| |
Collapse
|
28
|
Machipisa T, Chong M, Muhamed B, Chishala C, Shaboodien G, Pandie S, de Vries J, Laing N, Joachim A, Daniels R, Ntsekhe M, Hugo-Hamman CT, Gitura B, Ogendo S, Lwabi P, Okello E, Damasceno A, Novela C, Mocumbi AO, Madeira G, Musuku J, Mtaja A, ElSayed A, Elhassan HHM, Bode-Thomas F, Okeahialam BN, Zühlke LJ, Mulder N, Ramesar R, Lesosky M, Parks T, Cordell HJ, Keavney B, Engel ME, Paré G. Association of Novel Locus With Rheumatic Heart Disease in Black African Individuals: Findings From the RHDGen Study. JAMA Cardiol 2021; 6:1000-1011. [PMID: 34106200 PMCID: PMC8190704 DOI: 10.1001/jamacardio.2021.1627] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 03/25/2021] [Indexed: 01/02/2023]
Abstract
Importance Rheumatic heart disease (RHD), a sequela of rheumatic fever characterized by permanent heart valve damage, is the leading cause of cardiac surgery in Africa. However, its pathophysiologic characteristics and genetics are poorly understood. Understanding genetic susceptibility may aid in prevention, control, and interventions to eliminate RHD. Objective To identify common genetic loci associated with RHD susceptibility in Black African individuals. Design, Setting, and Participants This multicenter case-control genome-wide association study (GWAS), the Genetics of Rheumatic Heart Disease, examined more than 7 million genotyped and imputed single-nucleotide variations. The 4809 GWAS participants and 116 independent trio families were enrolled from 8 African countries between December 31, 2012, and March 31, 2018. All GWAS participants and trio probands were screened by use of echocardiography. Data analyses took place from May 15, 2017, until March 14, 2021. Main Outcomes and Measures Genetic associations with RHD. Results This study included 4809 African participants (2548 RHD cases and 2261 controls; 3301 women [69%]; mean [SD] age, 36.5 [16.3] years). The GWAS identified a single RHD risk locus, 11q24.1 (rs1219406 [odds ratio, 1.65; 95% CI, 1.48-1.82; P = 4.36 × 10-8]), which reached genome-wide significance in Black African individuals. Our meta-analysis of Black (n = 3179) and admixed (n = 1055) African individuals revealed several suggestive loci. The study also replicated a previously reported association in Pacific Islander individuals (rs11846409) at the immunoglobulin heavy chain locus, in the meta-analysis of Black and admixed African individuals (odds ratio, 1.16; 95% CI, 1.06-1.27; P = 1.19 × 10-3). The HLA (rs9272622) associations reported in Aboriginal Australian individuals could not be replicated. In support of the known polygenic architecture for RHD, overtransmission of a polygenic risk score from unaffected parents to affected probands was observed (polygenic transmission disequilibrium testing mean [SE], 0.27 [0.16] SDs; P = .04996), and the chip-based heritability was estimated to be high at 0.49 (SE = 0.12; P = 3.28 × 10-5) in Black African individuals. Conclusions and Relevance This study revealed a novel candidate susceptibility locus exclusive to Black African individuals and an important heritable component to RHD susceptibility in African individuals.
Collapse
Affiliation(s)
- Tafadzwa Machipisa
- Department of Medicine, University of Cape Town and Groote Schuur Hospital, Cape Town, South Africa
- Hatter Institute for Cardiovascular Diseases Research in Africa and Cape Heart Institute, Department of Medicine, University of Cape Town, Cape Town, South Africa
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada
- Thrombosis and Atherosclerosis Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada
- Department of Pathology and Molecular Medicine, McMaster University, Michael G. DeGroote School of Medicine, Hamilton, Ontario, Canada
| | - Michael Chong
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada
- Thrombosis and Atherosclerosis Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada
- Department of Pathology and Molecular Medicine, McMaster University, Michael G. DeGroote School of Medicine, Hamilton, Ontario, Canada
| | - Babu Muhamed
- Department of Medicine, University of Cape Town and Groote Schuur Hospital, Cape Town, South Africa
- Hatter Institute for Cardiovascular Diseases Research in Africa and Cape Heart Institute, Department of Medicine, University of Cape Town, Cape Town, South Africa
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada
- Thrombosis and Atherosclerosis Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada
- Department of Pathology and Molecular Medicine, McMaster University, Michael G. DeGroote School of Medicine, Hamilton, Ontario, Canada
| | - Chishala Chishala
- Department of Medicine, University of Cape Town and Groote Schuur Hospital, Cape Town, South Africa
- Hatter Institute for Cardiovascular Diseases Research in Africa and Cape Heart Institute, Department of Medicine, University of Cape Town, Cape Town, South Africa
| | - Gasnat Shaboodien
- Department of Medicine, University of Cape Town and Groote Schuur Hospital, Cape Town, South Africa
- Hatter Institute for Cardiovascular Diseases Research in Africa and Cape Heart Institute, Department of Medicine, University of Cape Town, Cape Town, South Africa
| | - Shahiemah Pandie
- Department of Medicine, University of Cape Town and Groote Schuur Hospital, Cape Town, South Africa
| | - Jantina de Vries
- Department of Medicine, University of Cape Town and Groote Schuur Hospital, Cape Town, South Africa
| | - Nakita Laing
- Department of Medicine, University of Cape Town and Groote Schuur Hospital, Cape Town, South Africa
| | - Alexia Joachim
- Department of Medicine, University of Cape Town and Groote Schuur Hospital, Cape Town, South Africa
| | - Rezeen Daniels
- Department of Medicine, University of Cape Town and Groote Schuur Hospital, Cape Town, South Africa
| | - Mpiko Ntsekhe
- Department of Medicine, University of Cape Town and Groote Schuur Hospital, Cape Town, South Africa
| | - Christopher T. Hugo-Hamman
- Rheumatic Heart Disease Clinic, Windhoek Central Hospital, Ministry of Health and Social Services, Windhoek, Republic of Namibia
| | - Bernard Gitura
- Cardiology Department of Medicine, Kenyatta National Hospital, University of Nairobi, Nairobi, Kenya
| | - Stephen Ogendo
- Cardiology Department of Medicine, Kenyatta National Hospital, University of Nairobi, Nairobi, Kenya
| | | | | | - Albertino Damasceno
- Faculty of Medicine, Eduardo Mondlane University/Nucleo de Investigaçao, Departamento de Medicina, Hospital Central de Maputo, Maputo, Mozambique
| | - Celia Novela
- Faculty of Medicine, Eduardo Mondlane University/Nucleo de Investigaçao, Departamento de Medicina, Hospital Central de Maputo, Maputo, Mozambique
| | - Ana O. Mocumbi
- Instituto Nacional de Saúde Ministério da Saúde, Maputo, Moçambique
| | - Goeffrey Madeira
- Emergency Department, World Health Organization Mozambique, Maputo, Mozambique
| | - John Musuku
- Department of Paediatrics and Child Health, University Teaching Hospital–Children’s Hospital, University of Zambia, Lusaka, Zambia
| | - Agnes Mtaja
- Department of Paediatrics and Child Health, University Teaching Hospital–Children’s Hospital, University of Zambia, Lusaka, Zambia
| | - Ahmed ElSayed
- Department of Cardiothoracic Surgery, Alshaab Teaching Hospital, Alazhari Health Research Center, Alzaiem Alazhari University, Khartoum, Sudan
| | - Huda H. M. Elhassan
- Department of Cardiothoracic Surgery, Alshaab Teaching Hospital, Alazhari Health Research Center, Alzaiem Alazhari University, Khartoum, Sudan
| | - Fidelia Bode-Thomas
- Department of Paediatrics, Jos University Teaching Hospital and University of Jos, Jos, Plateau State Nigeria
| | - Basil N. Okeahialam
- Department of Paediatrics, Jos University Teaching Hospital and University of Jos, Jos, Plateau State Nigeria
| | - Liesl J. Zühlke
- Department of Medicine, University of Cape Town and Groote Schuur Hospital, Cape Town, South Africa
- Division of Paediatric Cardiology, Department of Paediatrics and Child Health, Red Cross War Memorial Children’s Hospital and University of Cape Town, South Africa
| | - Nicola Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Raj Ramesar
- Department of Pathology, University of Cape Town, Cape Town, South Africa
| | - Maia Lesosky
- Department of Medicine, University of Cape Town and Groote Schuur Hospital, Cape Town, South Africa
| | - Tom Parks
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Heather J. Cordell
- Population Health Sciences Institute, Faculty of Medical Sciences, Newcastle University, International Centre for Life, Newcastle upon Tyne, United Kingdom
| | - Bernard Keavney
- Division of Cardiovascular Sciences, School of Medical Sciences, Faculty of Biology, Medicine, and Health, The University of Manchester, Manchester, United Kingdom
- Manchester University National Health Service Foundation Trust, Manchester Academic Health Science CentreManchester, United Kingdom
| | - Mark E. Engel
- Department of Medicine, University of Cape Town and Groote Schuur Hospital, Cape Town, South Africa
| | - Guillaume Paré
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada
- Thrombosis and Atherosclerosis Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada
- Department of Pathology and Molecular Medicine, McMaster University, Michael G. DeGroote School of Medicine, Hamilton, Ontario, Canada
- Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton Ontario, Canada
| |
Collapse
|
29
|
Abstract
Interpreting the effects of genetic variants is key to understanding individual susceptibility to disease and designing personalized therapeutic approaches. Modern experimental technologies are enabling the generation of massive compendia of human genome sequence data and associated molecular and phenotypic traits, together with genome-scale expression, epigenomics and other functional genomic data. Integrative computational models can leverage these data to understand variant impact, elucidate the effect of dysregulated genes on biological pathways in specific disease and tissue contexts, and interpret disease risk beyond what is feasible with experiments alone. In this Review, we discuss recent developments in machine learning algorithms for genome interpretation and for integrative molecular-level modelling of cells, tissues and organs relevant to disease. More specifically, we highlight existing methods and key challenges and opportunities in identifying specific disease-causing genetic variants and linking them to molecular pathways and, ultimately, to disease phenotypes.
Collapse
|
30
|
Zhou YH, Saghapour E. ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data. Front Genet 2021; 12:691274. [PMID: 34276792 PMCID: PMC8283820 DOI: 10.3389/fgene.2021.691274] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 05/25/2021] [Indexed: 11/13/2022] Open
Abstract
Electronic health records (EHRs) have been widely adopted in recent years, but often include a high proportion of missing data, which can create difficulties in implementing machine learning and other tools of personalized medicine. Completed datasets are preferred for a number of analysis methods, and successful imputation of missing EHR data can improve interpretation and increase our power to predict health outcomes. However, use of the most popular imputation methods mainly require scripting skills, and are implemented using various packages and syntax. Thus, the implementation of a full suite of methods is generally out of reach to all except experienced data scientists. Moreover, imputation is often considered as a separate exercise from exploratory data analysis, but should be considered as art of the data exploration process. We have created a new graphical tool, ImputEHR, that is based on a Python base and allows implementation of a range of simple and sophisticated (e.g., gradient-boosted tree-based and neural network) data imputation approaches. In addition to imputation, the tool enables data exploration for informed decision-making, as well as implementing machine learning prediction tools for response data selected by the user. Although the approach works for any missing data problem, the tool is primarily motivated by problems encountered for EHR and other biomedical data. We illustrate the tool using multiple real datasets, providing performance measures of imputation and downstream predictive analysis.
Collapse
Affiliation(s)
- Yi-Hui Zhou
- Department of Biological Science, North Carolina State University, Raleigh, NC, United States
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, United States
| | - Ehsan Saghapour
- Department of Biological Science, North Carolina State University, Raleigh, NC, United States
| |
Collapse
|
31
|
Swart PC, van den Heuvel LL, Lewis CM, Seedat S, Hemmings SMJ. A Genome-Wide Association Study and Polygenic Risk Score Analysis of Posttraumatic Stress Disorder and Metabolic Syndrome in a South African Population. Front Neurosci 2021; 15:677800. [PMID: 34177453 PMCID: PMC8222611 DOI: 10.3389/fnins.2021.677800] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 05/07/2021] [Indexed: 11/13/2022] Open
Abstract
Posttraumatic stress disorder (PTSD) is a trauma-related disorder that frequently co-occurs with metabolic syndrome (MetS). MetS is characterized by obesity, dyslipidemia, and insulin resistance. To provide insight into these co-morbidities, we performed a genome-wide association study (GWAS) meta-analysis to identify genetic variants associated with PTSD, and determined if PTSD polygenic risk scores (PRS) could predict PTSD and MetS in a South African mixed-ancestry sample. The GWAS meta-analysis of PTSD participants (n = 260) and controls (n = 343) revealed no SNPs of genome-wide significance. However, several independent loci, as well as five SNPs in the PARK2 gene, were suggestively associated with PTSD (p < 5 × 10-6). PTSD-PRS was associated with PTSD diagnosis (Nagelkerke's pseudo R 2 = 0.0131, p = 0.00786), PTSD symptom severity [as measured by CAPS-5 total score (R 2 = 0.00856, p = 0.0367) and PCL-5 score (R 2 = 0.00737, p = 0.0353)], and MetS (Nagelkerke's pseudo R 2 = 0.00969, p = 0.0217). These findings suggest an association between PTSD and PARK2, corresponding with results from the largest PTSD-GWAS conducted to date. PRS analysis suggests that genetic variants associated with PTSD are also involved in the development of MetS. Overall, the results contribute to a broader goal of increasing diversity in psychiatric genetics.
Collapse
Affiliation(s)
- Patricia C. Swart
- Department of Psychiatry, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa
- South African Medical Research Council, Stellenbosch University Genomics of Brain Disorders Research Unit, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Leigh L. van den Heuvel
- Department of Psychiatry, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa
- South African Medical Research Council, Stellenbosch University Genomics of Brain Disorders Research Unit, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Cathryn M. Lewis
- Social, Genetic and Developmental Psychiatry Centre, King’s College London, London, United Kingdom
| | - Soraya Seedat
- Department of Psychiatry, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa
- South African Medical Research Council, Stellenbosch University Genomics of Brain Disorders Research Unit, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Sian M. J. Hemmings
- Department of Psychiatry, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa
- South African Medical Research Council, Stellenbosch University Genomics of Brain Disorders Research Unit, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| |
Collapse
|
32
|
Aleknonytė-Resch M, Szymczak S, Freitag-Wolf S, Dempfle A, Krawczak M. Genotype imputation in case-only studies of gene-environment interaction: validity and power. Hum Genet 2021; 140:1217-1228. [PMID: 34041609 PMCID: PMC8263402 DOI: 10.1007/s00439-021-02294-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 05/10/2021] [Indexed: 11/26/2022]
Abstract
Case-only (CO) studies are a powerful means to uncover gene-environment (G × E) interactions for complex human diseases. Moreover, such studies may in principle also draw upon genotype imputation to increase statistical power even further. However, genotype imputation usually employs healthy controls such as the Haplotype Reference Consortium (HRC) data as an imputation base, which may systematically perturb CO studies in genomic regions with main effects upon disease risk. Using genotype data from 719 German Crohn Disease (CD) patients, we investigated the level of imputation accuracy achievable for single nucleotide polymorphisms (SNPs) with or without a genetic main effect, and with varying minor allele frequency (MAF). Genotypes were imputed from neighbouring SNPs at different levels of linkage disequilibrium (LD) to the target SNP using the HRC data as an imputation base. Comparison of the true and imputed genotypes revealed lower imputation accuracy for SNPs with strong main effects. We also simulated different levels of G × E interaction to evaluate the potential loss of statistical validity and power incurred by the use of imputed genotypes. Simulations under the null hypothesis revealed that genotype imputation does not inflate the type I error rate of CO studies of G × E. However, the statistical power was found to be reduced by imputation, particularly for SNPs with low MAF, and a gradual loss of statistical power resulted when the level of LD to the SNPs driving the imputation decreased. Our study thus highlights that genotype imputation should be employed with great care in CO studies of G × E interaction.
Collapse
Affiliation(s)
| | - Silke Szymczak
- Institute of Medical Informatics and Statistics, Kiel University, Kiel, Germany
- Institute of Medical Biometry and Statistics, University of Lübeck, Lübeck, Germany
| | - Sandra Freitag-Wolf
- Institute of Medical Informatics and Statistics, Kiel University, Kiel, Germany
| | - Astrid Dempfle
- Institute of Medical Informatics and Statistics, Kiel University, Kiel, Germany
| | - Michael Krawczak
- Institute of Medical Informatics and Statistics, Kiel University, Kiel, Germany.
| |
Collapse
|
33
|
Müller SJ, Schurz H, Tromp G, van der Spuy GD, Hoal EG, van Helden PD, Owusu-Dabo E, Meyer CG, Muntau B, Thye T, Niemann S, Warren RM, Streicher E, Möller M, Kinnear C. A multi-phenotype genome-wide association study of clades causing tuberculosis in a Ghanaian- and South African cohort. Genomics 2021; 113:1802-1815. [PMID: 33862184 DOI: 10.1016/j.ygeno.2021.04.024] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 03/26/2021] [Accepted: 04/11/2021] [Indexed: 01/31/2023]
Abstract
Despite decades of research and advancements in diagnostics and treatment, tuberculosis remains a major public health concern. New computational methods are needed to interrogate the intersection of host- and bacterial genomes. Paired host genotype datum and infecting bacterial isolate information were analysed for associations using a multinomial logistic regression framework implemented in SNPTest. A cohort of 853 admixed South African participants and a Ghanaian cohort of 1359 participants were included. Two directly genotyped variants, namely rs529920 and rs41472447, were identified in the Ghanaian cohort as being statistically significantly associated with risk for infection with strains of different members of the MTBC. Thus, a multinomial logistic regression using paired host-pathogen data may prove valuable for investigating the complex relationships driving infectious disease.
Collapse
Affiliation(s)
- Stephanie J Müller
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa; South African Tuberculosis Bioinformatics Initiative (SATBBI), Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa.
| | - Haiko Schurz
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa; South African Tuberculosis Bioinformatics Initiative (SATBBI), Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Gerard Tromp
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa; South African Tuberculosis Bioinformatics Initiative (SATBBI), Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Gian D van der Spuy
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa; South African Tuberculosis Bioinformatics Initiative (SATBBI), Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Eileen G Hoal
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Paul D van Helden
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Ellis Owusu-Dabo
- School of Public Health, College of Health Sciences, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana
| | - Christian G Meyer
- Institute of Tropical Medicine, Eberhard-Karls University, Tübingen, Germany; Faculty of Medicine, Duy Tan University, Da Nang, Vietnam
| | - Birgit Muntau
- National Reference Centre for Tropical Pathogens, Bernhard Nocht Institute for Tropical Medicine, Hamburg, Germany
| | - Thorsten Thye
- Bernhard Nocht Institute for Tropical Medicine, Hamburg, Germany
| | - Stefan Niemann
- German Centre for Infection Research (DZIF), Partner site Hamburg-Lübeck-Borstel, Borstel, Germany
| | - Robin M Warren
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Elizabeth Streicher
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Marlo Möller
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Craig Kinnear
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| |
Collapse
|
34
|
Roberts R, Chang CC, Hadley T. Genetic Risk Stratification: A Paradigm Shift in Prevention of Coronary Artery Disease. ACTA ACUST UNITED AC 2021; 6:287-304. [PMID: 33778213 PMCID: PMC7987546 DOI: 10.1016/j.jacbts.2020.09.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 09/08/2020] [Accepted: 09/13/2020] [Indexed: 12/12/2022]
Abstract
CAD is a pandemic that can be prevented. Conventional risk factors are inadequate to detect who is at risk early in the asymptomatic stage. Genetic risk for CAD can be determined at birth, and those at highest genetic risk have been shown to respond to lifestyle changes and statin therapy with a 40% to 50% reduction in cardiac events. Genetic risk stratification for CAD should be brought to the bedside in an attempt to prevent this pandemic disease.
Coronary artery disease (CAD) is a pandemic disease that is highly preventable as shown by secondary prevention. Primary prevention is preferred knowing that 50% of the population can expect a cardiac event in their lifetime. Risk stratification for primary prevention using the American Heart Association/American College of Cardiology predicted 10-year risk based on conventional risk factors for CAD is less than optimal. Conventional risk factors such as hypertension, cholesterol, and age are age-dependent and not present until the sixth or seventh decade of life. The genetic risk score (GRS), which is estimated from the recently discovered genetic variants predisposed to CAD, offers a potential solution to this dilemma. The GRS, which is derived from genotyping the population with a microarray containing these genetic risk variants, has indicated that genetic risk stratification based on the GRS is superior to that of conventional risk factors in detecting those at high risk and who would benefit most from statin therapy. Studies performed in >1 million individuals confirmed genetic risk stratification is superior and primarily independent of conventional risk factors. Prospective clinical trials based on risk stratification for CAD using the GRS have shown lifestyle changes, physical activity, and statin therapy are associated with 40% to 50% reduction in cardiac events in the high genetic risk group (20%). Genetic risk stratification has the advantage of being innate to an individual’s DNA, and because DNA does not change in a lifetime, it is independent of age. Genetic risk stratification is inexpensive and can be performed worldwide, providing risk analysis at any age and thus has the potential to revolutionize primary prevention.
Collapse
Key Words
- ACC, American College of Cardiology
- AHA, American Heart Association
- ANRIL, antisense non-coding RNA in the INK4 Locust
- CAD, coronary artery disease
- GRS, genetic risk score
- GWAS, genome-wide association study
- LDL-C, low-density lipoprotein cholesterol
- MR, Mendelian randomization
- SNP, single nucleotide polymorphism
- bp, base pair
- cardiovascular genetics
- coronary artery disease
- genetic risk score for CAD
- genome-wide association studies
- prevention of CAD
Collapse
Affiliation(s)
- Robert Roberts
- Department of Medicine, Dignity Health at St. Joseph's Hospital and Medical Center, Phoenix, Arizona, USA
| | - Chih Chao Chang
- Department of Medicine, Dignity Health at St. Joseph's Hospital and Medical Center, Phoenix, Arizona, USA
| | | |
Collapse
|
35
|
Impact of pre- and post-variant filtration strategies on imputation. Sci Rep 2021; 11:6214. [PMID: 33737531 PMCID: PMC7973508 DOI: 10.1038/s41598-021-85333-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Accepted: 02/22/2021] [Indexed: 01/04/2023] Open
Abstract
Quality control (QC) methods for genome-wide association studies and fine mapping are commonly used for imputation, however they result in loss of many single nucleotide polymorphisms (SNPs). To investigate the consequences of filtration on imputation, we studied the direct effects on the number of markers, their allele frequencies, imputation quality scores and post-filtration events. We pre-phrased 1031 genotyped individuals from diverse ethnicities and compared the imputed variants to 1089 NCBI recorded individuals for additional validation. Without QC-based variant pre-filtration, we observed no impairment in the imputation of SNPs that failed QC whereas with pre-filtration there was an overall loss of information. Significant differences between frequencies with and without pre-filtration were found only in the range of very rare (5E-04-1E-03) and rare variants (1E-03-5E-03) (p < 1E-04). Increasing the post-filtration imputation quality score from 0.3 to 0.8 reduced the number of single nucleotide variants (SNVs) < 0.001 2.5 fold with or without QC pre-filtration and halved the number of very rare variants (5E-04). Thus, to maintain confidence and enough SNVs, we propose here a two-step filtering procedure which allows less stringent filtering prior to imputation and post-imputation in order to increase the number of very rare and rare variants compared to conservative filtration methods.
Collapse
|
36
|
Li JH, Mazur CA, Berisa T, Pickrell JK. Low-pass sequencing increases the power of GWAS and decreases measurement error of polygenic risk scores compared to genotyping arrays. Genome Res 2021; 31:529-537. [PMID: 33536225 PMCID: PMC8015847 DOI: 10.1101/gr.266486.120] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 02/01/2021] [Indexed: 01/08/2023]
Abstract
Low-pass sequencing (sequencing a genome to an average depth less than 1× coverage) combined with genotype imputation has been proposed as an alternative to genotyping arrays for trait mapping and calculation of polygenic scores. To empirically assess the relative performance of these technologies for different applications, we performed low-pass sequencing (targeting coverage levels of 0.5× and 1×) and array genotyping (using the Illumina Global Screening Array [GSA]) on 120 DNA samples derived from African- and European-ancestry individuals that are part of the 1000 Genomes Project. We then imputed both the sequencing data and the genotyping array data to the 1000 Genomes Phase 3 haplotype reference panel using a leave-one-out design. We evaluated overall imputation accuracy from these different assays as well as overall power for GWAS from imputed data and computed polygenic risk scores for coronary artery disease and breast cancer using previously derived weights. We conclude that low-pass sequencing plus imputation, in addition to providing a substantial increase in statistical power for genome-wide association studies, provides increased accuracy for polygenic risk prediction at effective coverages of ∼0.5× and higher compared to the Illumina GSA.
Collapse
Affiliation(s)
- Jeremiah H Li
- Gencove, Incorporated, New York, New York 10016, USA
| | - Chase A Mazur
- Gencove, Incorporated, New York, New York 10016, USA
| | - Tomaz Berisa
- Gencove, Incorporated, New York, New York 10016, USA
| | | |
Collapse
|
37
|
Prospective avenues for human population genomics and disease mapping in southern Africa. Mol Genet Genomics 2020; 295:1079-1089. [PMID: 32440765 PMCID: PMC7240165 DOI: 10.1007/s00438-020-01684-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 05/06/2020] [Indexed: 12/22/2022]
Abstract
Population substructure within human populations is globally evident and a well-known confounding factor in many genetic studies. In contrast, admixture mapping exploits population stratification to detect genotype-phenotype correlations in admixed populations. Southern Africa has untapped potential for disease mapping of ancestry-specific disease risk alleles due to the distinct genetic diversity in its populations compared to other populations worldwide. This diversity contributes to a number of phenotypes, including ancestry-specific disease risk and response to pathogens. Although the 1000 Genomes Project significantly improved our understanding of genetic variation globally, southern African populations are still severely underrepresented in biomedical and human genetic studies due to insufficient large-scale publicly available data. In addition to a lack of genetic data in public repositories, existing software, algorithms and resources used for imputation and phasing of genotypic data (amongst others) are largely ineffective for populations with a complex genetic architecture such as that seen in southern Africa. This review article, therefore, aims to summarise the current limitations of conducting genetic studies on populations with a complex genetic architecture to identify potential areas for further research and development.
Collapse
|
38
|
Magdy T, Kuo HH, Burridge PW. Precise and Cost-Effective Nanopore Sequencing for Post-GWAS Fine-Mapping and Causal Variant Identification. iScience 2020; 23:100971. [PMID: 32203907 PMCID: PMC7096756 DOI: 10.1016/j.isci.2020.100971] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Revised: 01/13/2020] [Accepted: 03/05/2020] [Indexed: 01/01/2023] Open
Abstract
Fine-mapping of interesting loci discovered by genome-wide association study (GWAS) is mandatory to pinpoint causal variants. Traditionally, this fine-mapping is completed through increasing the genotyping density at candidate loci, for which imputation is the current standard approach. Although imputation is a useful technique, it has a number of limitations that impede accuracy. In this work, we describe the development of a precise and cost-effective Nanopore sequencing-based pipeline that provides comprehensive and accurate information at candidate loci to identify potential causal single-nucleotide polymorphisms (SNPs). We demonstrate the utility of this technique via the fine-mapping of a GWAS positive hit comprising a synonymous SNP that is associated with doxorubicin-induced cardiotoxicity. In this work, we provide a proof of principle for the application of Nanopore sequencing in post-GWAS fine-mapping and pinpointing of potential causal SNPs with a minimal cost of just ~$10/100 kb/sample.
Collapse
Affiliation(s)
- Tarek Magdy
- Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA; Center for Pharmacogenomics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Hui-Hsuan Kuo
- Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA; Center for Pharmacogenomics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Paul W Burridge
- Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA; Center for Pharmacogenomics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA.
| |
Collapse
|
39
|
Bentley AR, Callier SL, Rotimi CN. Evaluating the promise of inclusion of African ancestry populations in genomics. NPJ Genom Med 2020; 5:5. [PMID: 32140257 PMCID: PMC7042246 DOI: 10.1038/s41525-019-0111-x] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Accepted: 12/16/2019] [Indexed: 12/24/2022] Open
Abstract
The lack of representation of diverse ancestral backgrounds in genomic research is well-known, and the resultant scientific and ethical limitations are becoming increasingly appreciated. The paucity of data on individuals with African ancestry is especially noteworthy as Africa is the birthplace of modern humans and harbors the greatest genetic diversity. It is expected that greater representation of those with African ancestry in genomic research will bring novel insights into human biology, and lead to improvements in clinical care and improved understanding of health disparities. Now that major efforts have been undertaken to address this failing, is there evidence of these anticipated advances? Here, we evaluate the promise of including diverse individuals in genomic research in the context of recent literature on individuals of African ancestry. In addition, we discuss progress and achievements on related technological challenges and diversity among scientists conducting genomic research.
Collapse
Affiliation(s)
- Amy R Bentley
- 1Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD USA
| | - Shawneequa L Callier
- 1Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD USA.,2Department of Clinical Research and Leadership, The George Washington University School of Medicine and Health Sciences, Washington, DC USA
| | - Charles N Rotimi
- 1Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD USA
| |
Collapse
|
40
|
Hyun DY, Sebastin R, Lee KJ, Lee GA, Shin MJ, Kim SH, Lee JR, Cho GT. Genotyping-by-Sequencing Derived Single Nucleotide Polymorphisms Provide the First Well-Resolved Phylogeny for the Genus Triticum (Poaceae). FRONTIERS IN PLANT SCIENCE 2020; 11:688. [PMID: 32625218 PMCID: PMC7311657 DOI: 10.3389/fpls.2020.00688] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 04/30/2020] [Indexed: 05/17/2023]
Abstract
Wheat (Triticum spp.) has been an important staple food crop for mankind since the beginning of agriculture. The genus Triticum L. is composed of diploid, tetraploid, and hexaploid species, majority of which have not yet been discriminated clearly, and hence their phylogeny and classification remain unresolved. Genotyping-by-sequencing (GBS) is an easy and affordable method that allows us to generate genome-wide single nucleotide polymorphism (SNP) markers. In this study, we used GBS to obtain SNPs covering all seven chromosomes from 283 accessions of Triticum-related genera. After filtering low-quality and redundant SNPs based on haplotype information, the GBS assay provided 14,188 high-quality SNPs that were distributed across the A (71%), B (26%), and D (2.4%) genomes. Cluster analysis and discriminant analysis of principal components (DAPC) allowed us to distinguish six distinct groups that matched well with Triticum species complexity. We constructed a Bayesian phylogenetic tree using 14,188 SNPs, in which 17 Triticum species and subspecies were discriminated. Dendrogram analysis revealed that the polyploid wheat species could be divided into groups according to the presence of A, B, D, and G genomes with strong nodal support and provided new insight into the evolution of spelt wheat. A total of 2,692 species-specific SNPs were identified to discriminate the common (T. aestivum) and durum (T. turgidum) wheat cultivar and landraces. In principal component analysis grouping, the two wheat species formed individual clusters and the SNPs were able to distinguish up to nine groups of 10 subspecies. This study demonstrated that GBS-derived SNPs could be used efficiently in genebank management to classify Triticum species and subspecies that are very difficult to distinguish by their morphological characters.
Collapse
|