1
|
Tallman S, Sungo MDD, Saranga S, Beleza S. Whole genomes from Angola and Mozambique inform about the origins and dispersals of major African migrations. Nat Commun 2023; 14:7967. [PMID: 38042927 PMCID: PMC10693643 DOI: 10.1038/s41467-023-43717-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 11/17/2023] [Indexed: 12/04/2023] Open
Abstract
As the continent of origin for our species, Africa harbours the highest levels of diversity anywhere on Earth. However, many regions of Africa remain under-sampled genetically. Here we present 350 whole genomes from Angola and Mozambique belonging to ten Bantu ethnolinguistic groups, enabling the construction of a reference variation catalogue including 2.9 million novel SNPs. We investigate the emergence of Bantu speaker population structure, admixture involving migrations across sub-Saharan Africa and model the demographic histories of Angolan and Mozambican Bantu speakers. Our results bring together concordant views from genomics, archaeology, and linguistics to paint an updated view of the complexity of the Bantu Expansion. Moreover, we generate reference panels that better represents the diversity of African populations involved in the trans-Atlantic slave trade, improving imputation accuracy in African Americans and Brazilians. We anticipate that our collection of genomes will form the foundation for future African genomic healthcare initiatives.
Collapse
Affiliation(s)
- Sam Tallman
- University of Leicester, Department of Genetics & Genome Biology, University Road, Leicester, LE1 7RH, UK
- Genomics England, 1 Canada Square, London, E14 5AB, UK
| | | | - Sílvio Saranga
- Universidade Pedagógica, Avenida Eduardo Mondlane, CP 2107, Maputo, Mozambique
| | - Sandra Beleza
- University of Leicester, Department of Genetics & Genome Biology, University Road, Leicester, LE1 7RH, UK.
| |
Collapse
|
2
|
Genome-wide data from medieval German Jews show that the Ashkenazi founder event pre-dated the 14 th century. Cell 2022; 185:4703-4716.e16. [PMID: 36455558 PMCID: PMC9793425 DOI: 10.1016/j.cell.2022.11.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Revised: 08/26/2022] [Accepted: 11/01/2022] [Indexed: 12/05/2022]
Abstract
We report genome-wide data from 33 Ashkenazi Jews (AJ), dated to the 14th century, obtained following a salvage excavation at the medieval Jewish cemetery of Erfurt, Germany. The Erfurt individuals are genetically similar to modern AJ, but they show more variability in Eastern European-related ancestry than modern AJ. A third of the Erfurt individuals carried a mitochondrial lineage common in modern AJ and eight carried pathogenic variants known to affect AJ today. These observations, together with high levels of runs of homozygosity, suggest that the Erfurt community had already experienced the major reduction in size that affected modern AJ. The Erfurt bottleneck was more severe, implying substructure in medieval AJ. Overall, our results suggest that the AJ founder event and the acquisition of the main sources of ancestry pre-dated the 14th century and highlight late medieval genetic heterogeneity no longer present in modern AJ.
Collapse
|
3
|
Elhaik E. Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated. Sci Rep 2022; 12:14683. [PMID: 36038559 PMCID: PMC9424212 DOI: 10.1038/s41598-022-14395-4] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 06/06/2022] [Indexed: 12/29/2022] Open
Abstract
Principal Component Analysis (PCA) is a multivariate analysis that reduces the complexity of datasets while preserving data covariance. The outcome can be visualized on colorful scatterplots, ideally with only a minimal loss of information. PCA applications, implemented in well-cited packages like EIGENSOFT and PLINK, are extensively used as the foremost analyses in population genetics and related fields (e.g., animal and plant or medical genetics). PCA outcomes are used to shape study design, identify, and characterize individuals and populations, and draw historical and ethnobiological conclusions on origins, evolution, dispersion, and relatedness. The replicability crisis in science has prompted us to evaluate whether PCA results are reliable, robust, and replicable. We analyzed twelve common test cases using an intuitive color-based model alongside human population data. We demonstrate that PCA results can be artifacts of the data and can be easily manipulated to generate desired outcomes. PCA adjustment also yielded unfavorable outcomes in association studies. PCA results may not be reliable, robust, or replicable as the field assumes. Our findings raise concerns about the validity of results reported in the population genetics literature and related fields that place a disproportionate reliance upon PCA outcomes and the insights derived from them. We conclude that PCA may have a biasing role in genetic investigations and that 32,000-216,000 genetic studies should be reevaluated. An alternative mixed-admixture population genetic model is discussed.
Collapse
Affiliation(s)
- Eran Elhaik
- Department of Biology, Lund University, 22362, Lund, Sweden.
| |
Collapse
|
4
|
Middle eastern genetic legacy in the paternal and maternal gene pools of Chuetas. Sci Rep 2020; 10:21428. [PMID: 33293675 PMCID: PMC7722846 DOI: 10.1038/s41598-020-78487-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Accepted: 11/19/2020] [Indexed: 11/08/2022] Open
Abstract
Chuetas are a group of descendants of Majorcan Crypto-Jews (Balearic Islands, Spain) who were socially stigmatized and segregated by their Majorcan neighbours until recently; generating a community that, although after the seventeenth century no longer contained Judaic religious elements, maintained strong group cohesion, Jewishness consciousness, and endogamy. Collective memory fixed 15 surnames as a most important defining element of Chueta families. Previous studies demonstrated Chuetas were a differentiated population, with a considerable proportion of their original genetic make-up. Genetic data of Y-chromosome polymorphism and mtDNA control region showed, in Chuetas’ paternal lineages, high prevalence of haplogroups J2-M172 (33%) and J1-M267 (18%). In maternal lineages, the Chuetas hallmark is the presence of a new sub-branching of the rare haplogroup R0a2m as their modal haplogroup (21%). Genetic diversity in both Y-chromosome and mtDNA indicates the Chueta community has managed to avoid the expected heterogeneity decrease in their gene pool after centuries of isolation and inbreeding. Moreover, the composition of their uniparentally transmitted lineages demonstrates a remarkable signature of Middle Eastern ancestry—despite some degree of host admixture—confirming Chuetas have retained over the centuries a considerable degree of ancestral genetic signature along with the cultural memory of their Jewish origin.
Collapse
|
5
|
Sanchez T, Cury J, Charpiat G, Jay F. Deep learning for population size history inference: Design, comparison and combination with approximate Bayesian computation. Mol Ecol Resour 2020; 21:2645-2660. [DOI: 10.1111/1755-0998.13224] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 06/19/2020] [Accepted: 07/02/2020] [Indexed: 12/28/2022]
Affiliation(s)
- Théophile Sanchez
- Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris‐Saclay Orsay France
| | - Jean Cury
- Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris‐Saclay Orsay France
| | - Guillaume Charpiat
- Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris‐Saclay Orsay France
| | - Flora Jay
- Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris‐Saclay Orsay France
| |
Collapse
|
6
|
Adrion JR, Cole CB, Dukler N, Galloway JG, Gladstein AL, Gower G, Kyriazis CC, Ragsdale AP, Tsambos G, Baumdicker F, Carlson J, Cartwright RA, Durvasula A, Gronau I, Kim BY, McKenzie P, Messer PW, Noskova E, Ortega-Del Vecchyo D, Racimo F, Struck TJ, Gravel S, Gutenkunst RN, Lohmueller KE, Ralph PL, Schrider DR, Siepel A, Kelleher J, Kern AD. A community-maintained standard library of population genetic models. eLife 2020; 9:e54967. [PMID: 32573438 PMCID: PMC7438115 DOI: 10.7554/elife.54967] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 06/15/2020] [Indexed: 12/18/2022] Open
Abstract
The explosion in population genomic data demands ever more complex modes of analysis, and increasingly, these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here, we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.
Collapse
Affiliation(s)
- Jeffrey R Adrion
- Department of Biology and Institute of Ecology and Evolution, University of OregonEugeneUnited States
| | - Christopher B Cole
- Weatherall Institute of Molecular Medicine, University of OxfordOxfordUnited Kingdom
| | - Noah Dukler
- Simons Center for Quantitative Biology, Cold Spring Harbor LaboratoryCold Spring HarborUnited States
| | - Jared G Galloway
- Department of Biology and Institute of Ecology and Evolution, University of OregonEugeneUnited States
| | - Ariella L Gladstein
- Department of Genetics, University of North Carolina at Chapel HillChapel HillUnited States
| | - Graham Gower
- Lundbeck GeoGenetics Centre, Globe Institute, University of CopenhagenCopenhagenDenmark
| | - Christopher C Kyriazis
- Department of Ecology and Evolutionary Biology, University of California, Los AngelesLos AngelesUnited States
| | | | - Georgia Tsambos
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of MelbourneMelbourneAustralia
| | - Franz Baumdicker
- Department of Mathematical Stochastics, University of FreiburgFreiburgGermany
| | - Jedidiah Carlson
- Department of Genome Sciences, University of WashingtonSeattleUnited States
| | - Reed A Cartwright
- The Biodesign Institute and The School of Life Sciences, Arizona State UniversityTempeUnited States
| | - Arun Durvasula
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los AngelesLos AngelesUnited States
| | - Ilan Gronau
- The Efi Arazi School of Computer Science, Herzliya Interdisciplinary CenterHerzliyaIsrael
| | - Bernard Y Kim
- Department of Biology, Stanford UniversityStanfordUnited States
| | - Patrick McKenzie
- Department of Ecology, Evolution, and Environmental Biology, Columbia UniversityNew YorkUnited States
| | - Philipp W Messer
- Department of Computational BiologyCornell UniversityIthacaUnited States
| | - Ekaterina Noskova
- Computer Technologies Laboratory, ITMO UniversitySaint PetersburgRussian Federation
| | - Diego Ortega-Del Vecchyo
- International Laboratory for Human Genome Research, National Autonomous University of MexicoJuriquillaMexico
| | - Fernando Racimo
- Lundbeck GeoGenetics Centre, Globe Institute, University of CopenhagenCopenhagenDenmark
| | - Travis J Struck
- Departmentof Molecular and Cellular Biology, University of ArizonaTucsonUnited States
| | - Simon Gravel
- Department of Human Genetics, McGill UniversityMontrealCanada
| | - Ryan N Gutenkunst
- Departmentof Molecular and Cellular Biology, University of ArizonaTucsonUnited States
| | - Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los AngelesLos AngelesUnited States
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los AngelesLos AngelesUnited States
| | - Peter L Ralph
- Department of Biology and Institute of Ecology and Evolution, University of OregonEugeneUnited States
- Department of Mathematics, University of OregonEugeneUnited States
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina at Chapel HillChapel HillUnited States
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor LaboratoryCold Spring HarborUnited States
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of OxfordOxfordUnited Kingdom
| | - Andrew D Kern
- Department of Biology and Institute of Ecology and Evolution, University of OregonEugeneUnited States
| |
Collapse
|
7
|
Deelman E, Vahi K, Rynge M, Mayani R, da Silva RF, Papadimitriou G, Livny M. The Evolution of the Pegasus Workflow Management Software. Comput Sci Eng 2019. [DOI: 10.1109/mcse.2019.2919690] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Ewa Deelman
- Information Sciences InstituteUniversity of Southern California
| | - Karan Vahi
- Information Sciences InstituteUniversity of Southern California
| | - Mats Rynge
- Information Sciences InstituteUniversity of Southern California
| | - Rajiv Mayani
- Information Sciences InstituteUniversity of Southern California
| | | | | | | |
Collapse
|