1
|
de Barros Rodrigues ML, Rodrigues MP, Norton HL, Mendes-Junior CT, Simões AL, Lawson DJ. Large-scale selection of highly informative microhaplotypes for ancestry inference and population specific informativeness. Forensic Sci Int Genet 2024; 74:103153. [PMID: 39378714 DOI: 10.1016/j.fsigen.2024.103153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 09/30/2024] [Accepted: 10/01/2024] [Indexed: 10/10/2024]
Abstract
Microhaplotypes (MHs) describe physically close genetic markers that are inherited together and are gaining prominence due to their efficiency in forensic, clinical, and population studies. They excel in kinship analysis, DNA mixture detection, and ancestry inference, offering advantages in precision over individual SNPs and STRs. In this study, a pipeline was developed to efficiently select highly informative MHs from large-scale genomic datasets. Over 120,000 MHs were identified from almost a million markers, which allow this non-independent information to be efficiently used for inference. The MHs were compared to SNPs in terms of their informativeness and performance of their subsets in ancestry inference and all the results consistently favored MHs. A method for ranking markers by specific population informativeness was also introduced, which showed improvement in the accuracy of Native American ancestry estimation, overcoming the challenges of its underrepresentation in datasets. In conclusion, this study presents a comprehensive way for selecting highly informative MHs for accurate ancestry inference. The proposed approach and the subsets selected by specific population informativeness offer valuable tools for improving ancestry inference accuracy, particularly for admixed populations as demonstrated for a Brazilian dataset.
Collapse
Affiliation(s)
- Maria Luisa de Barros Rodrigues
- Programa de Pós-Graduação em Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Av. Bandeirantes 3900, Ribeirão Preto, SP 14049-900, Brazil.
| | | | - Heather L Norton
- Department of Anthropology, University of Cincinnati, Cincinnati, OH 45221, United States
| | - Celso Teixeira Mendes-Junior
- Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP 14040-901, Brazil
| | - Aguinaldo Luiz Simões
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Av. Bandeirantes 3900, Ribeirão Preto, SP 14049-900, Brazil
| | - Daniel John Lawson
- Institute of Statistical Sciences, School of Mathematics, Woodland Road, University of Bristol, Bristol BS8 1UG, UK; MRC Integrative Epidemiology Unit, School of Medicine, Oakfield Grove, University of Bristol, Bristol BS8 2BN, UK.
| |
Collapse
|
2
|
Wei Y, Zhi D, Zhang S. Fast and accurate local ancestry inference with Recomb-Mix. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.17.567650. [PMID: 38014185 PMCID: PMC10680832 DOI: 10.1101/2023.11.17.567650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
The availability of large genotyped cohorts brings new opportunities for revealing the high-resolution genetic structure of admixed populations via local ancestry inference (LAI), the process of identifying the ancestry of each segment of an individual haplotype. Though current methods achieve high accuracy in standard cases, LAI is still challenging when reference populations are more similar (e.g., intra-continental), when the number of reference populations is too numerous, or when the admixture events are deep in time, all of which are increasingly unavoidable in large biobanks. Here, we present a new LAI method, Recomb-Mix. Recomb-Mix integrates the elements of existing methods of the site-based Li and Stephens model and introduces a new graph collapsing trick to simplify counting paths with the same ancestry label readout. Through comprehensive benchmarking on various simulated datasets, we show that Recomb-Mix is more accurate than existing methods in diverse sets of scenarios while being competitive in terms of resource efficiency. We expect that Recomb-Mix will be a useful method for advancing genetics studies of admixed populations.
Collapse
Affiliation(s)
- Yuan Wei
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| | - Degui Zhi
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| |
Collapse
|
3
|
Smith JL, Schaid DJ, Kullo IJ. Implementing Reporting Standards for Polygenic Risk Scores for Atherosclerotic Cardiovascular Disease. Curr Atheroscler Rep 2023; 25:323-330. [PMID: 37223852 PMCID: PMC10495216 DOI: 10.1007/s11883-023-01104-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/13/2023] [Indexed: 05/25/2023]
Abstract
PURPOSE OF REVIEW There is considerable interest in using polygenic risk scores (PRSs) for assessing risk of atherosclerotic cardiovascular disease (ASCVD). A barrier to the clinical use of PRSs is heterogeneity in how PRS studies are reported. In this review, we summarize approaches to establish a uniform reporting framework for PRSs for coronary heart disease (CHD), the most common form of ASCVD. RECENT FINDINGS Reporting standards for PRSs need to be contextualized for disease specific applications. In addition to metrics of predictive performance, reporting standards for PRSs for CHD should include how cases/control were ascertained, degree of adjustment for conventional CHD risk factors, portability to diverse genetic ancestry groups and admixed individuals, and quality control measures for clinical deployment. Such a framework will enable PRSs to be optimized and benchmarked for clinical use.
Collapse
Affiliation(s)
- Johanna L Smith
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Daniel J Schaid
- Department of Quantitative Health Sciences, Rochester, MN, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA.
- Gonda Vascular Center, Rochester, MN, USA.
| |
Collapse
|
4
|
Wagner JK, Yu JH, Fullwiley D, Moore C, Wilson JF, Bamshad MJ, Royal CD. Guidelines for genetic ancestry inference created through roundtable discussions. HGG ADVANCES 2023; 4:100178. [PMID: 36798092 PMCID: PMC9926022 DOI: 10.1016/j.xhgg.2023.100178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 01/03/2023] [Indexed: 01/15/2023] Open
Abstract
The use of genetic and genomic technology to infer ancestry is commonplace in a variety of contexts, particularly in biomedical research and for direct-to-consumer genetic testing. In 2013 and 2015, two roundtables engaged a diverse group of stakeholders toward the development of guidelines for inferring genetic ancestry in academia and industry. This report shares the stakeholder groups' work and provides an analysis of, commentary on, and views from the groundbreaking and sustained dialogue. We describe the engagement processes and the stakeholder groups' resulting statements and proposed guidelines. The guidelines focus on five key areas: application of genetic ancestry inference, assumptions and confidence/laboratory and statistical methods, terminology and population identifiers, impact on individuals and groups, and communication or translation of genetic ancestry inferences. We delineate the terms and limitations of the guidelines and discuss their critical role in advancing the development and implementation of best practices for inferring genetic ancestry and reporting the results. These efforts should inform both governmental regulation and self-regulation.
Collapse
Affiliation(s)
- Jennifer K. Wagner
- School of Engineering Design and Innovation, Pennsylvania State University, University Park, PA 16802, USA
- Institute for Computational and Data Science, Pennsylvania State University, University Park, PA 16802, USA
- Department of Biomedical Engineering, Pennsylvania State University, University Park, PA 16802, USA
- Rock Ethics Institute, Pennsylvania State University, University Park, PA 16802, USA
- Penn State Law, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
| | - Joon-Ho Yu
- Department of Pediatrics and Institute for Public Health Genetics, University of Washington, Seattle, WA 98195, USA
- Treuman Katz Center for Pediatric Bioethics, Seattle Children’s Hospital and Research Institute, Seattle, WA 98101, USA
| | - Duana Fullwiley
- Department of Anthropology, Stanford University, Stanford, CA 94305, USA
| | | | - James F. Wilson
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh EH8 9AG, Scotland
| | - Michael J. Bamshad
- Department of Pediatrics and Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Division of Genetic Medicine, Seattle Children’s Hospital, Seattle, WA 98101, USA
| | - Charmaine D. Royal
- Departments of African and African American Studies, Biology, Global Health, and Family Medicine and Community Health, Duke University, Durham, NC 27708, USA
| | - Genetic Ancestry Inference Roundtable Participants
- School of Engineering Design and Innovation, Pennsylvania State University, University Park, PA 16802, USA
- Institute for Computational and Data Science, Pennsylvania State University, University Park, PA 16802, USA
- Department of Biomedical Engineering, Pennsylvania State University, University Park, PA 16802, USA
- Rock Ethics Institute, Pennsylvania State University, University Park, PA 16802, USA
- Penn State Law, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
- Department of Pediatrics and Institute for Public Health Genetics, University of Washington, Seattle, WA 98195, USA
- Treuman Katz Center for Pediatric Bioethics, Seattle Children’s Hospital and Research Institute, Seattle, WA 98101, USA
- Department of Anthropology, Stanford University, Stanford, CA 94305, USA
- The DNA Detectives, Dana Point, CA, USA
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh EH8 9AG, Scotland
- Department of Pediatrics and Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Division of Genetic Medicine, Seattle Children’s Hospital, Seattle, WA 98101, USA
- Departments of African and African American Studies, Biology, Global Health, and Family Medicine and Community Health, Duke University, Durham, NC 27708, USA
| |
Collapse
|
5
|
Challenges in selecting admixture models and marker sets to infer genetic ancestry in a Brazilian admixed population. Sci Rep 2022; 12:21240. [PMID: 36481695 PMCID: PMC9731996 DOI: 10.1038/s41598-022-25521-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 11/30/2022] [Indexed: 12/13/2022] Open
Abstract
The inference of genetic ancestry plays an increasingly prominent role in clinical, population, and forensic genetics studies. Several genotyping strategies and analytical methodologies have been developed over the last few decades to assign individuals to specific biogeographic regions. However, despite these efforts, ancestry inference in populations with a recent history of admixture, such as those in Brazil, remains a challenge. In admixed populations, proportion and components of genetic ancestry vary on different levels: (i) between populations; (ii) between individuals of the same population, and (iii) throughout the individual's genome. The present study evaluated 1171 admixed Brazilian samples to compare the genetic ancestry inferred by tri-/tetra-hybrid admixture models and evaluated different marker sets from those with small numbers of ancestry informative markers panels (AIMs), to high-density SNPs (HDSNP) and whole-genome-sequence (WGS) data. Analyses revealed greater variation in the correlation coefficient of ancestry components within and between admixed populations, especially for minority ancestral components. We also observed positive correlation between the number of markers in the AIMs panel and HDSNP/WGS. Furthermore, the greater the number of markers, the more accurate the tri-/tetra-hybrid admixture models.
Collapse
|
6
|
Swart Y, van Eeden G, Uren C, van der Spuy G, Tromp G, Möller M. GWAS in the southern African context. PLoS One 2022; 17:e0264657. [PMID: 36170230 PMCID: PMC9518849 DOI: 10.1371/journal.pone.0264657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Accepted: 08/06/2022] [Indexed: 11/18/2022] Open
Abstract
Researchers would generally adjust for the possible confounding effect of population structure by considering global ancestry proportions or top principle components. Alternatively, researchers would conduct admixture mapping to increase the power to detect variants with an ancestry effect. This is sufficient in simple admixture scenarios, however, populations from southern Africa can be complex multi-way admixed populations. Duan et al. (2018) first described local ancestry adjusted allelic (LAAA) analysis as a robust method for discovering association signals, while producing minimal false positive hits. Their simulation study, however, was limited to a two-way admixed population. Realizing that their findings might not translate to other admixture scenarios, we simulated a three- and five-way admixed population to compare the LAAA model to other models commonly used in genome-wide association studies (GWAS). We found that, given our admixture scenarios, the LAAA model identifies the most causal variants in most of the phenotypes we tested across both the three-way and five-way admixed populations. The LAAA model also produced a high number of false positive hits which was potentially caused by the ancestry effect size that we assumed. Considering the extent to which the various models tested differed in their results and considering that the source of a given association is unknown, we recommend that researchers use multiple GWAS models when analysing populations with complex ancestry.
Collapse
Affiliation(s)
- Yolandi Swart
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Gerald van Eeden
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Caitlin Uren
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
- Centre for Bioinformatics and Computational Biology, Stellenbosch University, Stellenbosch, South Africa
| | - Gian van der Spuy
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
- SAMRC-SHIP South African Tuberculosis Bioinformatics Initiative (SATBBI), Center for Bioinformatics and Computational Biology, Cape Town, South Africa
| | - Gerard Tromp
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
- Centre for Bioinformatics and Computational Biology, Stellenbosch University, Stellenbosch, South Africa
- SAMRC-SHIP South African Tuberculosis Bioinformatics Initiative (SATBBI), Center for Bioinformatics and Computational Biology, Cape Town, South Africa
| | - Marlo Möller
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
- Centre for Bioinformatics and Computational Biology, Stellenbosch University, Stellenbosch, South Africa
- * E-mail:
| |
Collapse
|
7
|
Oriol Sabat B, Mas Montserrat D, Giro-i-Nieto X, Ioannidis AG. SALAI-Net: species-agnostic local ancestry inference network. Bioinformatics 2022; 38:ii27-ii33. [PMID: 36124792 PMCID: PMC9486591 DOI: 10.1093/bioinformatics/btac464] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Local ancestry inference (LAI) is the high resolution prediction of ancestry labels along a DNA sequence. LAI is important in the study of human history and migrations, and it is beginning to play a role in precision medicine applications including ancestry-adjusted genome-wide association studies (GWASs) and polygenic risk scores (PRSs). Existing LAI models do not generalize well between species, chromosomes or even ancestry groups, requiring re-training for each different setting. Furthermore, such methods can lack interpretability, which is an important element in each of these applications. RESULTS We present SALAI-Net, a portable statistical LAI method that can be applied on any set of species and ancestries (species-agnostic), requiring only haplotype data and no other biological parameters. Inspired by identity by descent methods, SALAI-Net estimates population labels for each segment of DNA by performing a reference matching approach, which leads to an interpretable and fast technique. We benchmark our models on whole-genome data of humans and we test these models' ability to generalize to dog breeds when trained on human data. SALAI-Net outperforms previous methods in terms of balanced accuracy, while generalizing between different settings, species and datasets. Moreover, it is up to two orders of magnitude faster and uses considerably less RAM memory than competing methods. AVAILABILITY AND IMPLEMENTATION We provide an open source implementation and links to publicly available data at github.com/AI-sandbox/SALAI-Net. Data is publicly available as follows: https://www.internationalgenome.org (1000 Genomes), https://www.simonsfoundation.org/simons-genome-diversity-project (Simons Genome Diversity Project), https://www.sanger.ac.uk/resources/downloads/human/hapmap3.html (HapMap), ftp://ngs.sanger.ac.uk/production/hgdp/hgdp_wgs.20190516 (Human Genome Diversity Project) and https://www.ncbi.nlm.nih.gov/bioproject/PRJNA448733 (Canid genomes). SUPPLEMENTARY INFORMATION Supplementary data are available from Bioinformatics online.
Collapse
Affiliation(s)
- Benet Oriol Sabat
- Department of Signal Theory and Communications, Universitat Politecnica de Catalunya, Barcelona 08034, Spain
- Department of Biomedical Data Science, Stanford Medical School
| | | | - Xavier Giro-i-Nieto
- Department of Signal Theory and Communications, Universitat Politecnica de Catalunya, Barcelona 08034, Spain
| | - Alexander G Ioannidis
- Department of Biomedical Data Science, Stanford Medical School
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
8
|
Suarez-Pajes E, Díaz-García C, Rodríguez-Pérez H, Lorenzo-Salazar JM, Marcelino-Rodríguez I, Corrales A, Zheng X, Callero A, Perez-Rodriguez E, Garcia-Robaina JC, González-Montelongo R, Flores C, Guillen-Guio B. Targeted analysis of genomic regions enriched in African ancestry reveals novel classical HLA alleles associated with asthma in Southwestern Europeans. Sci Rep 2021; 11:23686. [PMID: 34880287 PMCID: PMC8654850 DOI: 10.1038/s41598-021-02893-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 11/24/2021] [Indexed: 12/30/2022] Open
Abstract
Despite asthma has a considerable genetic component, an important proportion of genetic risks remain unknown, especially for non-European populations. Canary Islanders have the largest African genetic ancestry observed among Southwestern Europeans and the highest asthma prevalence in Spain. Here we examined broad chromosomal regions previously associated with an excess of African genetic ancestry in Canary Islanders, with the aim of identifying novel risk variants associated with asthma susceptibility. In a two-stage cases-control study, we revealed a variant within HLA-DQB1 significantly associated with asthma risk (rs1049213, meta-analysis p = 1.30 × 10–7, OR [95% CI] = 1.74 [1.41–2.13]) previously associated with asthma and broad allergic phenotype. Subsequent fine-mapping analyses of classical HLA alleles revealed a novel allele significantly associated with asthma protection (HLA-DQA1*01:02, meta-analysis p = 3.98 × 10–4, OR [95% CI] = 0.64 [0.50–0.82]) that had been linked to infectious and autoimmune diseases, and peanut allergy. HLA haplotype analyses revealed a novel haplotype DQA1*01:02-DQB1*06:04 conferring asthma protection (meta-analysis p = 4.71 × 10–4, OR [95% CI] = 0.47 [0.29– 0.73]).
Collapse
Affiliation(s)
- Eva Suarez-Pajes
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Universidad de La Laguna, Santa Cruz de Tenerife, Spain
| | - Claudio Díaz-García
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Universidad de La Laguna, Santa Cruz de Tenerife, Spain
| | - Héctor Rodríguez-Pérez
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Universidad de La Laguna, Santa Cruz de Tenerife, Spain
| | - Jose M Lorenzo-Salazar
- Genomics Division, Instituto Tecnológico Y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Itahisa Marcelino-Rodríguez
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Universidad de La Laguna, Santa Cruz de Tenerife, Spain
| | - Almudena Corrales
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Universidad de La Laguna, Santa Cruz de Tenerife, Spain.,CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain
| | - Xiuwen Zheng
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Ariel Callero
- Allergy Unit, Hospital Universitario N.S. de Candelaria, Santa Cruz de Tenerife, Spain
| | - Eva Perez-Rodriguez
- Allergy Unit, Hospital Universitario N.S. de Candelaria, Santa Cruz de Tenerife, Spain
| | - Jose C Garcia-Robaina
- Allergy Unit, Hospital Universitario N.S. de Candelaria, Santa Cruz de Tenerife, Spain
| | | | - Carlos Flores
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Universidad de La Laguna, Santa Cruz de Tenerife, Spain. .,Genomics Division, Instituto Tecnológico Y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain. .,CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain.
| | - Beatriz Guillen-Guio
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Universidad de La Laguna, Santa Cruz de Tenerife, Spain. .,Department of Health Sciences, University of Leicester, Leicester, UK.
| |
Collapse
|