1
|
French JN, Pua VB, Laboulaye R, Leal TP, Olivas MC, Lima-Costa MF, Horta BL, Barreto ML, Tarazona-Santos E, Mata I, O’Connor TD. Comparing the effect of imputation reference panel composition in four distinct Latin American cohorts. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.11.589057. [PMID: 38659746 PMCID: PMC11042191 DOI: 10.1101/2024.04.11.589057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Genome-wide association studies have been useful in identifying genetic risk factors for various phenotypes. These studies rely on imputation and many existing panels are largely composed of individuals of European ancestry, resulting in lower levels of imputation quality in underrepresented populations. We aim to analyze how the composition of imputation reference panels affects imputation quality in four target Latin American cohorts. We compared imputation quality for chromosomes 7 and X when altering the imputation reference panel by: 1) increasing the number of Latin American individuals; 2) excluding either Latin American, African, or European individuals, or 3) increasing the Indigenous American (IA) admixture proportions of included Latin Americans. We found that increasing the number of Latin Americans in the reference panel improved imputation quality in the four populations; however, there were differences between chromosomes 7 and X in some cohorts. Excluding Latin Americans from analysis resulted in worse imputation quality in every cohort, while differential effects were seen when excluding Europeans and Africans between and within cohorts and between chromosomes 7 and X. Finally, increasing IA-like admixture proportions in the reference panel increased imputation quality at different levels in different populations. The difference in results between populations and chromosomes suggests that existing and future reference panels containing Latin American individuals are likely to perform differently in different Latin American populations.
Collapse
Affiliation(s)
- Jennifer N French
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD
| | - Victor Borda Pua
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD
- University of Maryland Institute for Health Computing, Rockville, MD
| | - Roland Laboulaye
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD
| | - Thiago Peixoto Leal
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, USA
| | - Mario Cornejo Olivas
- Neurogenetics Working Group, Universidad Cientifica del Sur, Lima, Peru
- Neurogenetics Research Center, Instituto Nacional de Ciencias Neurologicas, Lima, Peru
| | | | - Bernardo L Horta
- Postgraduate Program in Epidemiology, Federal University of Pelotas, Pelotas, Brazil
| | - Mauricio L Barreto
- Center for Data and Knowledge Integration for Health (CIDACS), Gonçalo Moniz Institute (IGM), Oswaldo Cruz Foundation (FIOCRUZ-BA), Salvador, Bahia, Brazil
- Collective Health Institute, Federal University of Bahia (UFBA), Salvador, Bahia, Brazil
| | - Eduardo Tarazona-Santos
- Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Brazil
| | - Ignacio Mata
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, USA
| | - Timothy D. O’Connor
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD
- Program in Health Equity and Population Health, University of Maryland School of Medicine
| |
Collapse
|
2
|
Childebayeva A, Zavala EI. Review: Computational analysis of human skeletal remains in ancient DNA and forensic genetics. iScience 2023; 26:108066. [PMID: 37927550 PMCID: PMC10622734 DOI: 10.1016/j.isci.2023.108066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2023] Open
Abstract
Degraded DNA is used to answer questions in the fields of ancient DNA (aDNA) and forensic genetics. While aDNA studies typically center around human evolution and past history, and forensic genetics is often more concerned with identifying a specific individual, scientists in both fields face similar challenges. The overlap in source material has prompted periodic discussions and studies on the advantages of collaboration between fields toward mutually beneficial methodological advancements. However, most have been centered around wet laboratory methods (sampling, DNA extraction, library preparation, etc.). In this review, we focus on the computational side of the analytical workflow. We discuss limitations and considerations to consider when working with degraded DNA. We hope this review provides a framework to researchers new to computational workflows for how to think about analyzing highly degraded DNA and prompts an increase of collaboration between the forensic genetics and aDNA fields.
Collapse
Affiliation(s)
- Ainash Childebayeva
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Anthropology, University of Kansas, Lawrence, KS, USA
| | - Elena I. Zavala
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Biology, University of Oregon, Eugene, OR, USA
| |
Collapse
|
3
|
Baldrighi GN, Nova A, Bernardinelli L, Fazia T. A Pipeline for Phasing and Genotype Imputation on Mixed Human Data (Parents-Offspring Trios and Unrelated Subjects) by Reviewing Current Methods and Software. LIFE (BASEL, SWITZERLAND) 2022; 12:life12122030. [PMID: 36556394 PMCID: PMC9781110 DOI: 10.3390/life12122030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 12/01/2022] [Accepted: 12/02/2022] [Indexed: 12/09/2022]
Abstract
Genotype imputation has become an essential prerequisite when performing association analysis. It is a computational technique that allows us to infer genetic markers that have not been directly genotyped, thereby increasing statistical power in subsequent association studies, which consequently has a crucial impact on the identification of causal variants. Many features need to be considered when choosing the proper algorithm for imputation, including the target sample on which it is performed, i.e., related individuals, unrelated individuals, or both. Problems could arise when dealing with a target sample made up of mixed data, composed of both related and unrelated individuals, especially since the scientific literature on this topic is not sufficiently clear. To shed light on this issue, we examined existing algorithms and software for performing phasing and imputation on mixed human data from SNP arrays, specifically when related subjects belong to trios. By discussing the advantages and limitations of the current algorithms, we identified LD-based methods as being the most suitable for reconstruction of haplotypes in this specific context, and we proposed a feasible pipeline that can be used for imputing genotypes in both phased and unphased human data.
Collapse
|
4
|
Multivariate genome-wide association study models to improve prediction of Crohn’s disease risk and identification of potential novel variants. Comput Biol Med 2022; 145:105398. [DOI: 10.1016/j.compbiomed.2022.105398] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Revised: 03/09/2022] [Accepted: 03/09/2022] [Indexed: 12/21/2022]
|