1
|
Yüncü E, Işıldak U, Williams MP, Huber CD, Flegontova O, Vyazov LA, Changmai P, Flegontov P. False discovery rates of qpAdm-based screens for genetic admixture. bioRxiv 2023:2023.04.25.538339. [PMID: 37904998 PMCID: PMC10614728 DOI: 10.1101/2023.04.25.538339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Although a broad range of methods exists for reconstructing population history from genome-wide single nucleotide polymorphism data, just a few methods gained popularity in archaeogenetics: principal component analysis (PCA); ADMIXTURE, an algorithm that models individuals as mixtures of multiple ancestral sources represented by actual or inferred populations; formal tests for admixture such as f3-statistics and D/f4-statistics; and qpAdm, a tool for fitting two-component and more complex admixture models to groups or individuals. Despite their popularity in archaeogenetics, which is explained by modest computational requirements and ability to analyze data of various types and qualities, protocols relying on qpAdm that screen numerous alternative models of varying complexity and find "fitting" models (often considering both estimated admixture proportions and p-values as a composite criterion of model fit) remain untested on complex simulated population histories in the form of admixture graphs of random topology. We analyzed genotype data extracted from such simulations and tested various types of high-throughput qpAdm protocols ("rotating" and "non-rotating", with or without temporal stratification of target groups and proxy ancestry sources, and with or without a "model competition" step). We caution that high-throughput qpAdm protocols may be inappropriate for exploratory analyses in poorly studied regions/periods since their false discovery rates varied between 12% and 68% depending on the details of the protocol and on the amount and quality of simulated data (i.e., >12% of fitting two-way admixture models imply gene flows that were not simulated). We demonstrate that for reducing false discovery rates of qpAdm protocols to nearly 0% it is advisable to use large SNP sets with low missing data rates, the rotating qpAdm protocol with a strictly enforced rule that target groups do not pre-date their proxy sources, and an unsupervised ADMIXTURE analysis as a way to verify feasible qpAdm models. Our study has a number of limitations: for instance, these recommendations depend on the assumption that the underlying genetic history is a complex admixture graph and not a stepping-stone model.
Collapse
Affiliation(s)
- Eren Yüncü
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - Ulaş Işıldak
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - Matthew P. Williams
- Department of Biology, Eberly College of Science, Pennsylvania State University, PA, USA
| | - Christian D. Huber
- Department of Biology, Eberly College of Science, Pennsylvania State University, PA, USA
| | - Olga Flegontova
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
- Institute of Parasitology, Biology Centre of the Czech Academy of Sciences, České Budějovice, Czechia
| | - Leonid A. Vyazov
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - Piya Changmai
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - Pavel Flegontov
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
2
|
Işıldak U, Dönertaş HM. Evolutionary paths to mammalian longevity through the lens of gene expression. EMBO J 2023; 42:e114879. [PMID: 37519235 PMCID: PMC10476271 DOI: 10.15252/embj.2023114879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 07/18/2023] [Indexed: 08/01/2023] Open
Abstract
The natural variation in mammalian longevity and its underlying mechanisms remain an active area of aging research. In the latest issue of The EMBO Journal, Liu et al (2023) analyze gene expression levels in 103 mammalian species across three tissues, revealing tissue-specific associations between gene expression patterns and longevity. Remarkably, the study suggests that methionine restriction, a strategy shown to increase lifespan, may extend beyond artificial interventions and is similarly employed by natural selection.
Collapse
Affiliation(s)
- Ulaş Işıldak
- Leibniz Institute on Aging ‐ Fritz Lipmann Institute (FLI)LeibnizGermany
| | | |
Collapse
|
3
|
Flegontov P, Işıldak U, Maier R, Yüncü E, Changmai P, Reich D. Modeling of African population history using f-statistics is biased when applying all previously proposed SNP ascertainment schemes. PLoS Genet 2023; 19:e1010931. [PMID: 37676865 PMCID: PMC10508636 DOI: 10.1371/journal.pgen.1010931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Revised: 09/19/2023] [Accepted: 08/21/2023] [Indexed: 09/09/2023] Open
Abstract
f-statistics have emerged as a first line of analysis for making inferences about demographic history from genome-wide data. Not only are they guaranteed to allow robust tests of the fits of proposed models of population history to data when analyzing full genome sequencing data-that is, all single nucleotide polymorphisms (SNPs) in the individuals being analyzed-but they are also guaranteed to allow robust tests of models for SNPs ascertained as polymorphic in a population that is an outgroup in a phylogenetic sense to all groups being analyzed. True "outgroup ascertainment" is in practice impossible in humans because our species has arisen from a substructured ancestral population that does not descend from a homogeneous ancestral population going back many hundreds of thousands of years into the past. However, initial studies suggested that non-outgroup-ascertainment schemes might produce robust enough results using f-statistics, and that motivated widespread fitting of models to data using non-outgroup-ascertained SNP panels such as the "Affymetrix Human Origins array" which has been genotyped on thousands of modern individuals from hundreds of populations, or the "1240k" in-solution enrichment reagent which has been the source of about 70% of published genome-wide data for ancient humans. In this study, we show that while analyses of population history using such panels work well for studies of relationships among non-African populations and one African outgroup, when co-modeling more than one sub-Saharan African and/or archaic human groups (Neanderthals and Denisovans), fitting of f-statistics to such SNP sets is expected to frequently lead to false rejection of true demographic histories, and failure to reject incorrect models. Analyzing panels of SNPs polymorphic in archaic humans, which has been suggested as a solution for the ascertainment problem, has limited statistical power and retains important biases. However, by carrying out simulations of diverse demographic histories, we show that bias in inferences based on f-statistics can be minimized by ascertaining on variants common in a union of diverse African groups; such ascertainment retains high statistical power while allowing co-analysis of archaic and modern groups.
Collapse
Affiliation(s)
- Pavel Flegontov
- Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
- Kalmyk Research Center of the Russian Academy of Sciences, Elista, Russia
| | - Ulaş Işıldak
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - Robert Maier
- Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Eren Yüncü
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - Piya Changmai
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
- Howard Hughes Medical Institute, Harvard Medical School, Boston, Massachusetts, United States of America
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
| |
Collapse
|
4
|
Işıldak U, Somel M, Thornton JM, Dönertaş HM. Author Correction: Temporal changes in the gene expression heterogeneity during brain development and aging. Sci Rep 2023; 13:10157. [PMID: 37349363 DOI: 10.1038/s41598-023-37105-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/24/2023] Open
Affiliation(s)
- Ulaş Işıldak
- Department of Biological Sciences, Middle East Technical University, 06800, Ankara, Turkey
| | - Mehmet Somel
- Department of Biological Sciences, Middle East Technical University, 06800, Ankara, Turkey
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Handan Melike Dönertaş
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| |
Collapse
|
5
|
Flegontov P, Işıldak U, Maier R, Yüncü E, Changmai P, Reich D. Modeling of African population history using f -statistics can be highly biased and is not addressed by previously suggested SNP ascertainment schemes. bioRxiv 2023:2023.01.22.525077. [PMID: 36711923 PMCID: PMC9882349 DOI: 10.1101/2023.01.22.525077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
f -statistics have emerged as a first line of analysis for making inferences about demographic history from genome-wide data. These statistics can provide strong evidence for either admixture or cladality, which can be robust to substantial rates of errors or missing data. f -statistics are guaranteed to be unbiased under "SNP ascertainment" (analyzing non-randomly chosen subsets of single nucleotide polymorphisms) only if it relies on a population that is an outgroup for all groups analyzed. However, ascertainment on a true outgroup that is not co-analyzed with other populations is often impractical and uncommon in the literature. In this study focused on practical rather than theoretical aspects of SNP ascertainment, we show that many non-outgroup ascertainment schemes lead to false rejection of true demographic histories, as well as to failure to reject incorrect models. But the bias introduced by common ascertainments such as the 1240K panel is mostly limited to situations when more than one sub-Saharan African and/or archaic human groups (Neanderthals and Denisovans) or non-human outgroups are co-modelled, for example, f 4 -statistics involving one non-African group, two African groups, and one archaic group. Analyzing panels of SNPs polymorphic in archaic humans, which has been suggested as a solution for the ascertainment problem, cannot fix all these problems since for some classes of f -statistics it is not a clean outgroup ascertainment, and in other cases it demonstrates relatively low power to reject incorrect demographic models since it provides a relatively small number of variants common in anatomically modern humans. And due to the paucity of high-coverage archaic genomes, archaic individuals used for ascertainment often act as sole representatives of the respective groups in an analysis, and we show that this approach is highly problematic. By carrying out large numbers of simulations of diverse demographic histories, we find that bias in inferences based on f -statistics introduced by non-outgroup ascertainment can be minimized if the derived allele frequency spectrum in the population used for ascertainment approaches the spectrum that existed at the root of all groups being co-analyzed. Ascertaining on sites with variants common in a diverse group of African individuals provides a good approximation to such a set of SNPs, addressing the great majority of biases and also retaining high statistical power for studying population history. Such a "pan-African" ascertainment, although not completely problem-free, allows unbiased exploration of demographic models for the widest set of archaic and modern human populations, as compared to the other ascertainment schemes we explored.
Collapse
Affiliation(s)
- Pavel Flegontov
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
- Kalmyk Research Center of the Russian Academy of Sciences, Elista, Russia
| | - Ulaş Işıldak
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - Robert Maier
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Eren Yüncü
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - Piya Changmai
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| |
Collapse
|
6
|
Işıldak U, Somel M, Thornton JM, Dönertaş HM. Temporal changes in the gene expression heterogeneity during brain development and aging. Sci Rep 2020; 10:4080. [PMID: 32139741 PMCID: PMC7058021 DOI: 10.1038/s41598-020-60998-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Accepted: 02/11/2020] [Indexed: 01/06/2023] Open
Abstract
Cells in largely non-mitotic tissues such as the brain are prone to stochastic (epi-)genetic alterations that may cause increased variability between cells and individuals over time. Although increased inter-individual heterogeneity in gene expression was previously reported, whether this process starts during development or if it is restricted to the aging period has not yet been studied. The regulatory dynamics and functional significance of putative aging-related heterogeneity are also unknown. Here we address these by a meta-analysis of 19 transcriptome datasets from three independent studies, covering diverse human brain regions. We observed a significant increase in inter-individual heterogeneity during aging (20 + years) compared to postnatal development (0 to 20 years). Increased heterogeneity during aging was consistent among different brain regions at the gene level and associated with lifespan regulation and neuronal functions. Overall, our results show that increased expression heterogeneity is a characteristic of aging human brain, and may influence aging-related changes in brain functions.
Collapse
Affiliation(s)
- Ulaş Işıldak
- Department of Biological Sciences, Middle East Technical University, 06800, Ankara, Turkey
| | - Mehmet Somel
- Department of Biological Sciences, Middle East Technical University, 06800, Ankara, Turkey
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Handan Melike Dönertaş
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| |
Collapse
|