1
|
Elhaik E. Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated. Sci Rep 2022; 12:14683. [PMID: 36038559 PMCID: PMC9424212 DOI: 10.1038/s41598-022-14395-4] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 06/06/2022] [Indexed: 12/29/2022] Open
Abstract
Principal Component Analysis (PCA) is a multivariate analysis that reduces the complexity of datasets while preserving data covariance. The outcome can be visualized on colorful scatterplots, ideally with only a minimal loss of information. PCA applications, implemented in well-cited packages like EIGENSOFT and PLINK, are extensively used as the foremost analyses in population genetics and related fields (e.g., animal and plant or medical genetics). PCA outcomes are used to shape study design, identify, and characterize individuals and populations, and draw historical and ethnobiological conclusions on origins, evolution, dispersion, and relatedness. The replicability crisis in science has prompted us to evaluate whether PCA results are reliable, robust, and replicable. We analyzed twelve common test cases using an intuitive color-based model alongside human population data. We demonstrate that PCA results can be artifacts of the data and can be easily manipulated to generate desired outcomes. PCA adjustment also yielded unfavorable outcomes in association studies. PCA results may not be reliable, robust, or replicable as the field assumes. Our findings raise concerns about the validity of results reported in the population genetics literature and related fields that place a disproportionate reliance upon PCA outcomes and the insights derived from them. We conclude that PCA may have a biasing role in genetic investigations and that 32,000-216,000 genetic studies should be reevaluated. An alternative mixed-admixture population genetic model is discussed.
Collapse
Affiliation(s)
- Eran Elhaik
- Department of Biology, Lund University, 22362, Lund, Sweden.
| |
Collapse
|
2
|
Behnamian S, Esposito U, Holland G, Alshehab G, Dobre AM, Pirooznia M, Brimacombe CS, Elhaik E. Temporal population structure, a genetic dating method for ancient Eurasian genomes from the past 10,000 years. CELL REPORTS METHODS 2022; 2:100270. [PMID: 36046618 PMCID: PMC9421539 DOI: 10.1016/j.crmeth.2022.100270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 06/17/2022] [Accepted: 07/19/2022] [Indexed: 11/21/2022]
Abstract
Radiocarbon dating is the gold standard in archeology to estimate the age of skeletons, a key to studying their origins. Many published ancient genomes lack reliable and direct dates, which results in obscure and contradictory reports. We developed the temporal population structure (TPS), a DNA-based dating method for genomes ranging from the Late Mesolithic to today, and applied it to 3,591 ancient and 1,307 modern Eurasians. TPS predictions aligned with the known dates and correctly accounted for kin relationships. TPS dating of poorly dated Eurasian samples resolved conflicting reports in the literature, as illustrated by one test case. We also demonstrated how TPS improved the ability to study phenotypic traits over time. TPS can be used when radiocarbon dating is unfeasible or uncertain or to develop alternative hypotheses for samples younger than 10,000 years ago, a limitation that may be resolved over time as ancient data accumulate.
Collapse
Affiliation(s)
- Sara Behnamian
- Department of Biology, Lund University, 22362 Lund, Sweden
| | - Umberto Esposito
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, UK
| | - Grace Holland
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, UK
| | - Ghadeer Alshehab
- Department of Automatic Control and Systems Engineering, University of Sheffield, Sheffield S1 3JD, UK
| | - Ann M. Dobre
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, UK
| | - Mehdi Pirooznia
- National Heart, Lung, and Blood Institute (NHLBI), Bethesda, MD 20892, USA
| | - Conrad S. Brimacombe
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, UK
- Department of Anthropology and Archaeology, University of Bristol, Bristol BS8 1TH, UK
| | - Eran Elhaik
- Department of Biology, Lund University, 22362 Lund, Sweden
| |
Collapse
|
3
|
Recapitulating whole genome based population genetic structure for Indian wild tigers through an ancestry informative marker panel. Heredity (Edinb) 2022; 128:88-96. [PMID: 34857925 PMCID: PMC8813985 DOI: 10.1038/s41437-021-00477-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 09/30/2021] [Accepted: 10/01/2021] [Indexed: 02/03/2023] Open
Abstract
Identification of genetic structure within wildlife populations have implications in their conservation and management. Accurately inferring population genetic structure requires whole-genome data across the geographical range of the species, which can be resource-intensive. A cheaper strategy is to employ a subset of markers that can efficiently recapitulate the population genetic structure inferred by the whole genome data. Such ancestry informative markers (AIMs), have rarely been developed for endangered species such as tigers utilizing single nucleotide polymorphisms (SNPs). Here, we first identify the population structure of the Indian tiger using whole-genome sequences and then develop an AIMs panel with a minimum number of SNPs that can recapitulate this structure. We identified four population clusters of Indian tigers with North-East, North-West, and South Indian tigers forming three separate groups, and Terai and Central Indian tigers forming a single cluster. To evaluate the robustness of our AIMs, we applied it to a separate dataset of tigers from across India. Out of 92 SNPs present in our AIMs panel, 49 were present in the new dataset. These 49 SNPs were sufficient to recapitulate the population genetic structure obtained from the whole genome data. To the best of our knowledge, this is the first-ever SNP-based AIMs panel for big cats, which can be used as a cost-effective alternative to whole-genome sequencing for detecting the biogeographical origin of Indian tigers. Our study can be used as a guideline for developing an AIMs panel for the management of other endangered species where obtaining whole genome sequences are difficult.
Collapse
|
4
|
Undercutting efforts of precision medicine: roadblocks to minority representation in breast cancer clinical trials. Breast Cancer Res Treat 2021; 187:605-611. [PMID: 34080093 DOI: 10.1007/s10549-021-06264-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 05/18/2021] [Indexed: 12/16/2022]
Abstract
Precision (or personalized) medicine holds great promise in the treatment of breast cancer. The success of personalized medicine is contingent upon inclusivity and representation for minority groups in clinical trials. In this article, we focus on the roadblocks for the African American demographic, including the barriers to access and enrollment in breast oncology trials, the prevailing classification of race and ethnicity, and the need to refine monolithic categorization by employing genetic ancestry mapping tools for a more accurate determination of race or ethnicity.
Collapse
|
5
|
Carress H, Lawson DJ, Elhaik E. Population genetic considerations for using biobanks as international resources in the pandemic era and beyond. BMC Genomics 2021; 22:351. [PMID: 34001009 PMCID: PMC8127217 DOI: 10.1186/s12864-021-07618-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 04/14/2021] [Indexed: 12/11/2022] Open
Abstract
The past years have seen the rise of genomic biobanks and mega-scale meta-analysis of genomic data, which promises to reveal the genetic underpinnings of health and disease. However, the over-representation of Europeans in genomic studies not only limits the global understanding of disease risk but also inhibits viable research into the genomic differences between carriers and patients. Whilst the community has agreed that more diverse samples are required, it is not enough to blindly increase diversity; the diversity must be quantified, compared and annotated to lead to insight. Genetic annotations from separate biobanks need to be comparable and computable and to operate without access to raw data due to privacy concerns. Comparability is key both for regular research and to allow international comparison in response to pandemics. Here, we evaluate the appropriateness of the most common genomic tools used to depict population structure in a standardized and comparable manner. The end goal is to reduce the effects of confounding and learn from genuine variation in genetic effects on phenotypes across populations, which will improve the value of biobanks (locally and internationally), increase the accuracy of association analyses and inform developmental efforts.
Collapse
Affiliation(s)
- Hannah Carress
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK
| | - Daniel John Lawson
- School of Mathematics and Integrative Epidemiology Unit, University of Bristol, Bristol, UK
| | - Eran Elhaik
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK. .,Department of Biology, Lund University, Lund, Sweden.
| |
Collapse
|
6
|
Developmental validations of a self-developed 39 AIM-InDel panel and its forensic efficiency evaluations in the Shaanxi Han population. Int J Legal Med 2021; 135:1359-1367. [PMID: 33907868 DOI: 10.1007/s00414-021-02600-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Accepted: 04/06/2021] [Indexed: 11/27/2022]
Abstract
Most of insertion/deletion polymorphisms are diallelic molecular markers characterized as small amplicon sizes, high inter-population diversities, and low mutation rates, which make them the promising genetic markers in biogeographic ancestor inference field. The developmental validations of a 39 ancestry informative marker-insertion/deletion (AIM-InDel) panel and the genetic polymorphic investigations of this panel were performed in the Shaanxi Han population of China. The developmental validation included the optimizations of PCR-related indicators, repeatability, reproducibility, precision, accuracy, sensitivity, species specificity, stability of the panel, and the abilities in analyzing degraded, casework, and mixture samples, and the present results demonstrated that this 39 AIM-InDel panel was robust, sensitive, and accurate. For the population diversity analyses, the combined discrimination power value of 38 AIM-InDel loci except for rs36038238 locus was 0.999999999931257, indicating that this novel panel was highly polymorphic, biogeographic informative, and could be also used in individual identifications in the Shaanxi Han population.
Collapse
|
7
|
Flores-Alvarado S, Orellana-Soto M, Moraga M. Ancestry and admixture of a southernmost Chilean population: The reflection of a migratory history. Am J Hum Biol 2021; 34:e23598. [PMID: 33763944 DOI: 10.1002/ajhb.23598] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 02/08/2021] [Accepted: 02/25/2021] [Indexed: 11/07/2022] Open
Abstract
OBJECTIVES Punta Arenas is a Chilean city situated on ancestral Aönikenk territory. The city was founded by 19th- and 20th-century colonists from Chile (Chiloé) and Europe (Croatia). This work uses uniparental and ancestry-informative markers (AIMs) to explore the effects of historic migratory and admixture patterns on the current genetic composition of Punta Arenas. METHODS We analyzed mitochondrial DNA (mtDNA), Y-chromosome single-nucleotide polymorphisms (SNPs), and 141 AIMs obtained from 129 DNA samples from male residents with regional ancestry. After characterizing uniparental lineages and ancestry proportions, multivariate analysis was used to explore relationships among the various types of data. RESULTS Punta Arenas has an admixed population with three main genetic components: European (56.5%), northern Native (11.3%), and south-central Native (28.6%). The Native component is preponderant in the mtDNA (83.76%), while the foreign component predominates in the Y-chromosome (92.25%). Non-Native mtDNA lineages are associated with European genetic ancestry, and Native mtDNA lineages originated mainly in the southern and southernmost regions of Chile. Most non-Native Y-chromosome SNPs originated in Spain, and secondly, in Croatia. CONCLUSIONS The population of Punta Arenas is mainly of Chilote origin with south-central Native and Spanish ancestral components, as well as some Croatian components. The persistence of local Native lineages is notable, suggesting continuity with the ancestral populations of the region such as the Kawésqar, Aönikenk, Yámana, or Selknam peoples. This study contributes to our knowledge of local history and its links to national and global developments in genetic ancestry.
Collapse
Affiliation(s)
- Sandra Flores-Alvarado
- Programa de Bioestadística, Instituto de Salud Poblacional, Facultad de Medicina, Universidad de Chile, Santiago, Chile
- Programa de Genética Humana, ICBM, Facultad de Medicina, Universidad de Chile, Santiago, Chile
- Departamento de Antropología, Facultad de Ciencias Sociales, Universidad de Chile, Santiago, Chile
| | - Michael Orellana-Soto
- Programa de Genética Humana, ICBM, Facultad de Medicina, Universidad de Chile, Santiago, Chile
| | - Mauricio Moraga
- Programa de Genética Humana, ICBM, Facultad de Medicina, Universidad de Chile, Santiago, Chile
- Departamento de Antropología, Facultad de Ciencias Sociales, Universidad de Chile, Santiago, Chile
| |
Collapse
|
8
|
Yang HC, Chen CW, Lin YT, Chu SK. Genetic ancestry plays a central role in population pharmacogenomics. Commun Biol 2021; 4:171. [PMID: 33547344 PMCID: PMC7864978 DOI: 10.1038/s42003-021-01681-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Accepted: 01/06/2021] [Indexed: 12/12/2022] Open
Abstract
Recent studies have pointed out the essential role of genetic ancestry in population pharmacogenetics. In this study, we analyzed the whole-genome sequencing data from The 1000 Genomes Project (Phase 3) and the pharmacogenetic information from Drug Bank, PharmGKB, PharmaADME, and Biotransformation. Here we show that ancestry-informative markers are enriched in pharmacogenetic loci, suggesting that trans-ancestry differentiation must be carefully considered in population pharmacogenetics studies. Ancestry-informative pharmacogenetic loci are located in both protein-coding and non-protein-coding regions, illustrating that a whole-genome analysis is necessary for an unbiased examination over pharmacogenetic loci. Finally, those ancestry-informative pharmacogenetic loci that target multiple drugs are often a functional variant, which reflects their importance in biological functions and pathways. In summary, we develop an efficient algorithm for an ultrahigh-dimensional principal component analysis. We create genetic catalogs of ancestry-informative markers and genes. We explore pharmacogenetic patterns and establish a high-accuracy prediction panel of genetic ancestry. Moreover, we construct a genetic ancestry pharmacogenomic database Genetic Ancestry PhD (http://hcyang.stat.sinica.edu.tw/databases/genetic_ancestry_phd/). Hsin-Chou Yang et al. examine population structure in several genomic databases and identify that pharmacogenetic loci are enriched for markers of genetic ancestry. Their results suggest that genetic ancestry must be carefully considered in population pharmacogenetics studies.
Collapse
Affiliation(s)
- Hsin-Chou Yang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan. .,Institute of Statistics, National Cheng Kung University, Tainan, Taiwan. .,Institute of Public Health, National Yang-Ming University, Taipei, Taiwan.
| | - Chia-Wei Chen
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Yu-Ting Lin
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Shih-Kai Chu
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
9
|
Freeman L, Brimacombe CS, Elhaik E. aYChr-DB: a database of ancient human Y haplogroups. NAR Genom Bioinform 2020; 2:lqaa081. [PMID: 33575627 PMCID: PMC7671346 DOI: 10.1093/nargab/lqaa081] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 08/21/2020] [Accepted: 09/16/2020] [Indexed: 12/20/2022] Open
Abstract
Ancient Y-Chromosomal DNA is an invaluable tool for dating and discerning the origins of migration routes and demographic processes that occurred thousands of years ago. Driven by the adoption of high-throughput sequencing and capture enrichment methods in paleogenomics, the number of published ancient genomes has nearly quadrupled within the last three years (2018-2020). Whereas ancient mtDNA haplogroup repositories are available, no similar resource exists for ancient Y-Chromosomal haplogroups. Here, we present aYChr-DB-a comprehensive collection of 1797 ancient Eurasian human Y-Chromosome haplogroups ranging from 44 930 BC to 1945 AD. We include descriptors of age, location, genomic coverage and associated archaeological cultures. We also produced a visualization of ancient Y haplogroup distribution over time. The aYChr-DB database is a valuable resource for population genomic and paleogenomic studies.
Collapse
Affiliation(s)
- Laurence Freeman
- University of Sheffield, Department of Animal and Plant Sciences, Sheffield S10 2TN, UK
| | | | - Eran Elhaik
- University of Sheffield, Department of Animal and Plant Sciences, Sheffield S10 2TN, UK
| |
Collapse
|
10
|
Mendisco F, Pemonge MH, Romon T, Lafleur G, Richard G, Courtaud P, Deguilloux MF. Tracing the genetic legacy in the French Caribbean islands: A study of mitochondrial and Y-chromosome lineages in the Guadeloupe archipelago. AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 2019; 170:507-518. [PMID: 31599974 DOI: 10.1002/ajpa.23931] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2018] [Revised: 08/22/2019] [Accepted: 09/11/2019] [Indexed: 12/11/2022]
Abstract
OBJECTIVES The history of the Caribbean region is marked by numerous and various successive migration waves that resulted in a global blending of African, European, and Amerindian lineages. As the origin and genetic composition of the current population of French Caribbean islands has not been studied to date, we used both mitochondrial DNA and Y-chromosome markers to complete the characterization of the dynamics of admixture in the Guadeloupe archipelago. MATERIALS AND METHODS We sequenced the mitochondrial hypervariable regions and genotyped mitochondrial and Y-chromosomal single nucleotide polymorphisms (SNPs) of 198 individuals from five localities of the Guadeloupe archipelago. RESULTS The maternal haplogroups revealed a blend of 85% African lineages (mainly traced to Western, West-Central, and South-Eastern Africa), 12.5% Eurasian lineages, and 0.5% Amerindian lineages. We highlighted disequilibria between European paternal contribution (44%) and European maternal contribution (7%), pointing out an important sexual asymmetry. Finally, the estimated Native American component was strikingly low and supported the near-extinction of native lineages in the region. DISCUSSION We confirmed that all historically known migratory events indeed left a visible genetic imprint in the contemporary Caribbean populations. The data gathered clearly demonstrated the significant impact of the transatlantic slave trade on the Guadeloupean population's constitution. Altogether, the data in our study confirm that in the Caribbean region, human population variation is correlated with colonial and postcolonial policies and unique island histories.
Collapse
Affiliation(s)
- Fanny Mendisco
- University of Bordeaux, UMR 5199 PACEA, Allée Geoffroy de St Hilaire, Pessac, France
| | - Marie-Hélène Pemonge
- University of Bordeaux, UMR 5199 PACEA, Allée Geoffroy de St Hilaire, Pessac, France
| | - Thomas Romon
- University of Bordeaux, UMR 5199 PACEA, Allée Geoffroy de St Hilaire, Pessac, France.,Centre de Gourbeyre, Institut National de Recherches Archéologiques Préventives Guadeloupe, Gourbeyre, France
| | - Gérard Lafleur
- Archives Départementales de la Guadeloupe, Société D'histoire de la Guadeloupe, Basse-Terre, France
| | - Gérard Richard
- Centre de Gourbeyre, Institut National de Recherches Archéologiques Préventives Guadeloupe, Gourbeyre, France
| | - Patrice Courtaud
- University of Bordeaux, UMR 5199 PACEA, Allée Geoffroy de St Hilaire, Pessac, France
| | | |
Collapse
|
11
|
On Peopling of India: Ancient DNA perspectives By K Thangaraj and Niraj Rai. J Biosci 2019. [DOI: 10.1007/s12038-019-9889-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
12
|
Das R. OnPeopling of India: Ancient DNA perspectives By K Thangaraj and Niraj Rai. J Biosci 2019; 44:71. [PMID: 31389360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Affiliation(s)
- Ranajit Das
- Manipal Academy of Higher Education (MAHE), Manipal, India,
| |
Collapse
|