1
|
Song M, Zhou Y, Zhao C, Song F, Hou Y. YHP: Y-chromosome Haplogroup Predictor for predicting male lineages based on Y-STRs. Forensic Sci Int 2024; 361:112113. [PMID: 38936202 DOI: 10.1016/j.forsciint.2024.112113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 05/24/2024] [Accepted: 06/16/2024] [Indexed: 06/29/2024]
Abstract
Human Y chromosome reflects the evolutionary process of males. Male lineage tracing by Y chromosome is of great use in evolutionary, forensic, and anthropological studies. Identifying the male lineage based on the specific distribution of Y haplogroups narrows down the investigation scope, which has been used in forensic scenarios. However, existing software aids in familial searching using Y-STRs (Y-chromosome short tandem repeats) to predict Y-SNP (Y-chromosome single nucleotide polymorphism) haplogroups, they often lack resolution. In this study, we developed YHP (Y Haplogroup Predictor), a novel software offering high-resolution haplogroup inference without requiring extensive Y-SNP sequencing. Leveraging existing datasets (219 haplogroups, 4064 samples in total), YHP predicts haplogroups with 0.923 accuracy under the highest haplogroup resolution, employing a random forest algorithm. YHP, available on Github (https://github.com/cissy123/YHP-Y-Haplogroup-Predictor-), facilitates high-resolution haplogroup prediction, haplotype mismatch analysis, and haplotype similarity comparison. Notably, it demonstrates efficacy in East Asian populations, benefiting from training data from eight distinct East Asian ethnic populations. Moreover, it enables seamless integration of additional training sets, extending its utility to diverse populations.
Collapse
Affiliation(s)
- Mengyuan Song
- Department of Forensic Genetics, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu 610041, China; Department of Laboratory Medicine, West China Hospital, Sichuan University, Chengdu, China
| | - Yuxiang Zhou
- Department of Forensic Genetics, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu 610041, China
| | - Chenxi Zhao
- College of Computer Science, Sichuan University, Chengdu, China
| | - Feng Song
- Department of Forensic Genetics, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu 610041, China.
| | - Yiping Hou
- Department of Forensic Genetics, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu 610041, China.
| |
Collapse
|
2
|
Hong S, Choi YA, Joo DS, Gürsoy G. Privacy-preserving model evaluation for logistic and linear regression using homomorphically encrypted genotype data. J Biomed Inform 2024; 156:104678. [PMID: 38936565 PMCID: PMC11272436 DOI: 10.1016/j.jbi.2024.104678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 05/29/2024] [Accepted: 06/19/2024] [Indexed: 06/29/2024]
Abstract
OBJECTIVE Linear and logistic regression are widely used statistical techniques in population genetics for analyzing genetic data and uncovering patterns and associations in large genetic datasets, such as identifying genetic variations linked to specific diseases or traits. However, obtaining statistically significant results from these studies requires large amounts of sensitive genotype and phenotype information from thousands of patients, which raises privacy concerns. Although cryptographic techniques such as homomorphic encryption offers a potential solution to the privacy concerns as it allows computations on encrypted data, previous methods leveraging homomorphic encryption have not addressed the confidentiality of shared models, which can leak information about the training data. METHODS In this work, we present a secure model evaluation method for linear and logistic regression using homomorphic encryption for six prediction tasks, where input genotypes, output phenotypes, and model parameters are all encrypted. RESULTS Our method ensures no private information leakage during inference and achieves high accuracy (≥93% for all outcomes) with each inference taking less than ten seconds for ∼200 genomes. CONCLUSION Our study demonstrates that it is possible to perform linear and logistic regression model evaluation while protecting patient confidentiality with theoretical security guarantees. Our implementation and test data are available at https://github.com/G2Lab/privateML/.
Collapse
Affiliation(s)
- Seungwan Hong
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA; New York Genome Center, New York, NY 10013, USA
| | - Yoolim A Choi
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA; New York Genome Center, New York, NY 10013, USA
| | - Daniel S Joo
- New York Genome Center, New York, NY 10013, USA; Department of Computer Science, Columbia University, New York, NY 10032, USA
| | - Gamze Gürsoy
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA; New York Genome Center, New York, NY 10013, USA; Department of Computer Science, Columbia University, New York, NY 10032, USA.
| |
Collapse
|
3
|
Sandercock AM, Westbrook JW, Zhang Q, Holliday JA. A genome-guided strategy for climate resilience in American chestnut restoration populations. Proc Natl Acad Sci U S A 2024; 121:e2403505121. [PMID: 39012830 PMCID: PMC11287244 DOI: 10.1073/pnas.2403505121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 06/11/2024] [Indexed: 07/18/2024] Open
Abstract
American chestnut (Castanea dentata) is a deciduous tree species of eastern North America that was decimated by the introduction of the chestnut blight fungus (Cryphonectria parasitica) in the early 20th century. Although millions of American chestnuts survive as root collar sprouts, these trees rarely reproduce. Thus, the species is considered functionally extinct. American chestnuts with improved blight resistance have been developed through interspecific hybridization followed by conspecific backcrossing, and by genetic engineering. Incorporating adaptive genomic diversity into these backcross families and transgenic lines is important for restoring the species across broad climatic gradients. To develop sampling recommendations for ex situ conservation of wild adaptive genetic variation, we coupled whole-genome resequencing of 384 stump sprouts with genotype-environment association analyses and found that the species range can be subdivided into three seed zones characterized by relatively homogeneous adaptive allele frequencies. We estimated that 21 to 29 trees per seed zone will need to be conserved to capture most extant adaptive diversity. We also resequenced the genomes of 269 backcross trees to understand the extent to which the breeding program has already captured wild adaptive diversity, and to estimate optimal reintroduction sites for specific families on the basis of their adaptive portfolio and future climate projections. Taken together, these results inform the development of an ex situ germplasm conservation and breeding plan to target blight-resistant breeding populations to specific environments and provides a blueprint for developing restoration plans for other imperiled tree species.
Collapse
Affiliation(s)
| | | | - Qian Zhang
- Department of Forest Resources and Environmental Conservation, Virginia Tech,Blacksburg, VA24060
| | - Jason A. Holliday
- Department of Forest Resources and Environmental Conservation, Virginia Tech,Blacksburg, VA24060
| |
Collapse
|
4
|
Smith CCR, Patterson G, Ralph PL, Kern AD. Estimation of spatial demographic maps from polymorphism data using a neural network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.15.585300. [PMID: 38559192 PMCID: PMC10980082 DOI: 10.1101/2024.03.15.585300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
A fundamental goal in population genetics is to understand how variation is arrayed over natural landscapes. From first principles we know that common features such as heterogeneous population densities and barriers to dispersal should shape genetic variation over space, however there are few tools currently available that can deal with these ubiquitous complexities. Geographically referenced single nucleotide polymorphism (SNP) data are increasingly accessible, presenting an opportunity to study genetic variation across geographic space in myriad species. We present a new inference method that uses geo-referenced SNPs and a deep neural network to estimate spatially heterogeneous maps of population density and dispersal rate. Our neural network trains on simulated input and output pairings, where the input consists of genotypes and sampling locations generated from a continuous space population genetic simulator, and the output is a map of the true demographic parameters. We benchmark our tool against existing methods and discuss qualitative differences between the different approaches; in particular, our program is unique because it infers the magnitude of both dispersal and density as well as their variation over the landscape, and it does so using SNP data. Similar methods are constrained to estimating relative migration rates, or require identity by descent blocks as input. We applied our tool to empirical data from North American grey wolves, for which it estimated mostly reasonable demographic parameters, but was affected by incomplete spatial sampling. Genetic based methods like ours complement other, direct methods for estimating past and present demography, and we believe will serve as valuable tools for applications in conservation, ecology, and evolutionary biology. An open source software package implementing our method is available from https://github.com/kr-colab/mapNN .
Collapse
|
5
|
Lasky JR, Takou M, Gamba D, Keitt TH. Estimating scale-specific and localized spatial patterns in allele frequency. Genetics 2024; 227:iyae082. [PMID: 38758968 DOI: 10.1093/genetics/iyae082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 09/07/2023] [Accepted: 04/28/2024] [Indexed: 05/19/2024] Open
Abstract
Characterizing spatial patterns in allele frequencies is fundamental to evolutionary biology because these patterns contain evidence of underlying processes. However, the spatial scales at which gene flow, changing selection, and drift act are often unknown. Many of these processes can operate inconsistently across space, causing nonstationary patterns. We present a wavelet approach to characterize spatial pattern in allele frequency that helps solve these problems. We show how our approach can characterize spatial patterns in relatedness at multiple spatial scales, i.e. a multilocus wavelet genetic dissimilarity. We also develop wavelet tests of spatial differentiation in allele frequency and quantitative trait loci (QTL). With simulation, we illustrate these methods under different scenarios. We also apply our approach to natural populations of Arabidopsis thaliana to characterize population structure and identify locally adapted loci across scales. We find, for example, that Arabidopsis flowering time QTL show significantly elevated genetic differentiation at 300-1,300 km scales. Wavelet transforms of allele frequencies offer a flexible way to reveal geographic patterns and underlying evolutionary processes.
Collapse
Affiliation(s)
- Jesse R Lasky
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Margarita Takou
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Diana Gamba
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Timothy H Keitt
- Department of Integrative Biology, University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
6
|
Librado P, Tressières G, Chauvey L, Fages A, Khan N, Schiavinato S, Calvière-Tonasso L, Kusliy MA, Gaunitz C, Liu X, Wagner S, Der Sarkissian C, Seguin-Orlando A, Perdereau A, Aury JM, Southon J, Shapiro B, Bouchez O, Donnadieu C, Collin YRH, Gregersen KM, Jessen MD, Christensen K, Claudi-Hansen L, Pruvost M, Pucher E, Vulic H, Novak M, Rimpf A, Turk P, Reiter S, Brem G, Schwall C, Barrey É, Robert C, Degueurce C, Horwitz LK, Klassen L, Rasmussen U, Kveiborg J, Johannsen NN, Makowiecki D, Makarowicz P, Szeliga M, Ilchyshyn V, Rud V, Romaniszyn J, Mullin VE, Verdugo M, Bradley DG, Cardoso JL, Valente MJ, Telles Antunes M, Ameen C, Thomas R, Ludwig A, Marzullo M, Prato O, Bagnasco Gianni G, Tecchiati U, Granado J, Schlumbaum A, Deschler-Erb S, Mráz MS, Boulbes N, Gardeisen A, Mayer C, Döhle HJ, Vicze M, Kosintsev PA, Kyselý R, Peške L, O'Connor T, Ananyevskaya E, Shevnina I, Logvin A, Kovalev AA, Iderkhangai TO, Sablin MV, Dashkovskiy PK, Graphodatsky AS, Merts I, Merts V, Kasparov AK, Pitulko VV, Onar V, Öztan A, Arbuckle BS, McColl H, Renaud G, Khaskhanov R, Demidenko S, Kadieva A, Atabiev B, Sundqvist M, Lindgren G, López-Cachero FJ, Albizuri S, Trbojević Vukičević T, Rapan Papeša A, Burić M, Rajić Šikanjić P, Weinstock J, Asensio Vilaró D, Codina F, García Dalmau C, Morer de Llorens J, Pou J, de Prado G, Sanmartí J, Kallala N, Torres JR, Maraoui-Telmini B, Belarte Franco MC, Valenzuela-Lamas S, Zazzo A, Lepetz S, Duchesne S, Alexeev A, Bayarsaikhan J, Houle JL, Bayarkhuu N, Turbat T, Crubézy É, Shingiray I, Mashkour M, Berezina NY, Korobov DS, Belinskiy A, Kalmykov A, Demoule JP, Reinhold S, Hansen S, Wallner B, Roslyakova N, Kuznetsov PF, Tishkin AA, Wincker P, Kanne K, Outram A, Orlando L. Widespread horse-based mobility arose around 2200 BCE in Eurasia. Nature 2024; 631:819-825. [PMID: 38843826 PMCID: PMC11269178 DOI: 10.1038/s41586-024-07597-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 05/23/2024] [Indexed: 07/19/2024]
Abstract
Horses revolutionized human history with fast mobility1. However, the timeline between their domestication and their widespread integration as a means of transport remains contentious2-4. Here we assemble a collection of 475 ancient horse genomes to assess the period when these animals were first reshaped by human agency in Eurasia. We find that reproductive control of the modern domestic lineage emerged around 2200 BCE, through close-kin mating and shortened generation times. Reproductive control emerged following a severe domestication bottleneck starting no earlier than approximately 2700 BCE, and coincided with a sudden expansion across Eurasia that ultimately resulted in the replacement of nearly every local horse lineage. This expansion marked the rise of widespread horse-based mobility in human history, which refutes the commonly held narrative of large horse herds accompanying the massive migration of steppe peoples across Europe around 3000 BCE and earlier3,5. Finally, we detect significantly shortened generation times at Botai around 3500 BCE, a settlement from central Asia associated with corrals and a subsistence economy centred on horses6,7. This supports local horse husbandry before the rise of modern domestic bloodlines.
Collapse
Affiliation(s)
- Pablo Librado
- Centre d'Anthropobiologie et de Génomique de Toulouse, CNRS UMR 5288, Université Paul Sabatier, Faculté de Médecine Purpan, Toulouse, France.
- Institut de Biologia Evolutiva (CSIC - Universitat Pompeu Fabra), Barcelona, Spain.
| | - Gaetan Tressières
- Centre d'Anthropobiologie et de Génomique de Toulouse, CNRS UMR 5288, Université Paul Sabatier, Faculté de Médecine Purpan, Toulouse, France
| | - Lorelei Chauvey
- Centre d'Anthropobiologie et de Génomique de Toulouse, CNRS UMR 5288, Université Paul Sabatier, Faculté de Médecine Purpan, Toulouse, France
| | - Antoine Fages
- Centre d'Anthropobiologie et de Génomique de Toulouse, CNRS UMR 5288, Université Paul Sabatier, Faculté de Médecine Purpan, Toulouse, France
- Zoological institute, Department of Environmental Sciences, University of Basel, Basel, Switzerland
| | - Naveed Khan
- Centre d'Anthropobiologie et de Génomique de Toulouse, CNRS UMR 5288, Université Paul Sabatier, Faculté de Médecine Purpan, Toulouse, France
- Department of Biotechnology, Abdul Wali Khan University, Mardan, Pakistan
| | - Stéphanie Schiavinato
- Centre d'Anthropobiologie et de Génomique de Toulouse, CNRS UMR 5288, Université Paul Sabatier, Faculté de Médecine Purpan, Toulouse, France
| | - Laure Calvière-Tonasso
- Centre d'Anthropobiologie et de Génomique de Toulouse, CNRS UMR 5288, Université Paul Sabatier, Faculté de Médecine Purpan, Toulouse, France
| | - Mariya A Kusliy
- Centre d'Anthropobiologie et de Génomique de Toulouse, CNRS UMR 5288, Université Paul Sabatier, Faculté de Médecine Purpan, Toulouse, France
- Department of the Diversity and Evolution of Genomes, Institute of Molecular and Cellular Biology, Novosibirsk, Russia
| | - Charleen Gaunitz
- Centre d'Anthropobiologie et de Génomique de Toulouse, CNRS UMR 5288, Université Paul Sabatier, Faculté de Médecine Purpan, Toulouse, France
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Xuexue Liu
- Centre d'Anthropobiologie et de Génomique de Toulouse, CNRS UMR 5288, Université Paul Sabatier, Faculté de Médecine Purpan, Toulouse, France
| | - Stefanie Wagner
- Centre d'Anthropobiologie et de Génomique de Toulouse, CNRS UMR 5288, Université Paul Sabatier, Faculté de Médecine Purpan, Toulouse, France
- INRAE Division Ecology and Biodiversity (ECODIV), Plant Genomic Resources Center (CNRGV), Castanet Tolosan Cedex, France
| | - Clio Der Sarkissian
- Centre d'Anthropobiologie et de Génomique de Toulouse, CNRS UMR 5288, Université Paul Sabatier, Faculté de Médecine Purpan, Toulouse, France
| | - Andaine Seguin-Orlando
- Centre d'Anthropobiologie et de Génomique de Toulouse, CNRS UMR 5288, Université Paul Sabatier, Faculté de Médecine Purpan, Toulouse, France
| | - Aude Perdereau
- Genoscope, Institut de Biologie François Jacob, CEA, CNRS, Université d'Évry, Université Paris-Saclay, Évry, France
| | - Jean-Marc Aury
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Université d'Évry, Université Paris-Saclay, Évry, France
| | - John Southon
- Department of Earth System Science, University of California, Irvine, CA, USA
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | - Yvette Running Horse Collin
- Centre d'Anthropobiologie et de Génomique de Toulouse, CNRS UMR 5288, Université Paul Sabatier, Faculté de Médecine Purpan, Toulouse, France
- Taku Skan Skan Wasakliyapi: Global Institute for Traditional Sciences, Rapid City, SD, USA
| | | | - Mads Dengsø Jessen
- Department for Prehistory Middle Ages and Renaissance, National Museum of Denmark, Copenhagen K, Denmark
| | | | | | - Mélanie Pruvost
- UMR 5199 De la Préhistoire à l'Actuel: Culture, Environnement et Anthropologie (PACEA), CNRS, Université de Bordeaux, Pessac Cédex, France
| | | | | | - Mario Novak
- Centre for Applied Bioanthropology, Institute for Anthropological Research, Zagreb, Croatia
| | | | - Peter Turk
- Narodni muzej Slovenije, Ljubljana, Slovenia
| | - Simone Reiter
- Institute of Animal Breeding and Genetics, Department of Biomedical Sciences, University of Veterinary Medicine Vienna, Vienna, Austria
| | - Gottfried Brem
- Institute of Animal Breeding and Genetics, Department of Biomedical Sciences, University of Veterinary Medicine Vienna, Vienna, Austria
| | - Christoph Schwall
- Leibniz-Zentrum für Archäologie (LEIZA), Mainz, Germany
- Department of Prehistory & Western Asian/Northeast African Archaeology, Austrian Archaeological Institute (OeAI), Austrian Academy of Sciences (OeAW), Vienna, Austria
| | - Éric Barrey
- Université Paris-Saclay, AgroParisTech, INRAE GABI UMR1313, Jouy-en-Josas, France
| | - Céline Robert
- Université Paris-Saclay, AgroParisTech, INRAE GABI UMR1313, Jouy-en-Josas, France
- Ecole Nationale Vétérinaire d'Alfort, Maisons-Alfort, France
| | | | - Liora Kolska Horwitz
- National Natural History Collections, Edmond J. Safra Campus, Givat Ram, The Hebrew University, Jerusalem, Israel
| | | | - Uffe Rasmussen
- Department of Archaeology, Moesgaard Museum, Højbjerg, Denmark
| | - Jacob Kveiborg
- Department of Archaeological Science and Conservation, Moesgaard Museum, Højbjerg, Denmark
| | | | - Daniel Makowiecki
- Institute of Archaeology, Faculty of History, Nicolaus Copernicus University, Toruń, Poland
| | | | - Marcin Szeliga
- Institute of Archaeology, Maria Curie-Skłodowska University, Lublin, Poland
| | - Vasyl Ilchyshyn
- Kremenetsko-Pochaivskii Derzhavnyi Istoriko-arkhitekturnyi Zapovidnik, Kremenets, Ukraine
| | - Vitalii Rud
- Institute of Archaeology, National Academy of Sciences of Ukraine, Kyiv, Ukraine
| | - Jan Romaniszyn
- Faculty of Archaeology, Adam Mickiewicz University, Poznań, Poland
| | - Victoria E Mullin
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
| | - Marta Verdugo
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
| | - Daniel G Bradley
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
| | - João L Cardoso
- ICArEHB, Campus de Gambelas, University of Algarve, Faro, Portugal
- Universidade Aberta, Lisbon, Portugal
| | - Maria J Valente
- Faculdade de Ciências Humanas e Sociais, Centro de Estudos de Arqueologia, Artes e Ciências do Património, Universidade do Algarve, Faro, Portugal
| | - Miguel Telles Antunes
- Centre for Research on Science and Geological Engineering, Universidade Nova de Lisboa, Lisbon, Portugal
| | - Carly Ameen
- Department of Archaeology and History, University of Exeter, Exeter, UK
| | - Richard Thomas
- School of Archaeology and Ancient History, University of Leicester, Leicester, UK
| | - Arne Ludwig
- Department of Evolutionary Genetics, Leibniz-Institute for Zoo and Wildlife Research, Berlin, Germany
- Albrecht Daniel Thaer-Institute, Faculty of Life Sciences, Humboldt University Berlin, Berlin, Germany
| | - Matilde Marzullo
- Dipartimento di Beni Culturali e Ambientali, Università degli Studi di Milano, Milan, Italy
| | - Ornella Prato
- Dipartimento di Beni Culturali e Ambientali, Università degli Studi di Milano, Milan, Italy
| | | | - Umberto Tecchiati
- Dipartimento di Beni Culturali e Ambientali, Università degli Studi di Milano, Milan, Italy
| | - José Granado
- Department of Environmental Sciences, Integrative Prehistory and Archaeological Science, Basel University, Basel, Switzerland
| | - Angela Schlumbaum
- Department of Environmental Sciences, Integrative Prehistory and Archaeological Science, Basel University, Basel, Switzerland
| | - Sabine Deschler-Erb
- Department of Environmental Sciences, Integrative Prehistory and Archaeological Science, Basel University, Basel, Switzerland
| | - Monika Schernig Mráz
- Department of Environmental Sciences, Integrative Prehistory and Archaeological Science, Basel University, Basel, Switzerland
| | - Nicolas Boulbes
- Institut de Paléontologie Humaine, Fondation Albert Ier, Paris/UMR 7194 HNHP, MNHN-CNRS-UPVD/EPCC Centre Européen de Recherche Préhistorique, Tautavel, France
| | - Armelle Gardeisen
- Archéologie des Sociétés Méditeranéennes, Archimède IA-ANR-11-LABX-0032-01, CNRS UMR 5140, Université Paul Valéry, Montpellier, France
| | - Christian Mayer
- Department for Digitalization and Knowledge Transfer, Federal Monuments Authority Austria, Vienna, Austria
| | - Hans-Jürgen Döhle
- Landesamt für Denkmalpflege und Archäologie Sachsen-Anhalt - Landesmuseum für Vorgeschichte, Halle (Saale), Germany
| | - Magdolna Vicze
- National Institute of Archaeology, Hungarian National Museum, Budapest, Hungary
| | - Pavel A Kosintsev
- Paleoecology Laboratory, Institute of Plant and Animal Ecology, Ural Branch of the Russian Academy of Sciences, Ekaterinburg, Russia
- Department of History of the Institute of Humanities, Ural Federal University, Ekaterinburg, Russia
| | - René Kyselý
- Department of Natural Sciences and Archaeometry, Institute of Archaeology of the Czech Academy of Sciences, Prague, Czechia
| | | | | | - Elina Ananyevskaya
- Department of Archaeology, History Faculty, Vilnius University, Vilnius, Lithuania
| | - Irina Shevnina
- Laboratory for Archaeological Research, Akhmet Baitursynuly Kostanay Regional University, Kostanay, Kazakhstan
| | - Andrey Logvin
- Laboratory for Archaeological Research, Akhmet Baitursynuly Kostanay Regional University, Kostanay, Kazakhstan
| | - Alexey A Kovalev
- Department of Archaeological Heritage Preservation, Institute of Archaeology of the Russian Academy of Sciences, Moscow, Russia
| | - Tumur-Ochir Iderkhangai
- Department of Innovation and Technology, Ulaanbaatar Science and Technology Park, National University of Mongolia, Ulaanbaatar, Mongolia
| | - Mikhail V Sablin
- Zoological Institute, Russian Academy of Sciences, St Petersburg, Russia
| | - Petr K Dashkovskiy
- Department of Russian Regional Studies, National and State-confessional Relations, Altai State University, Barnaul, Russia
| | - Alexander S Graphodatsky
- Department of the Diversity and Evolution of Genomes, Institute of Molecular and Cellular Biology, Novosibirsk, Russia
| | - Ilia Merts
- Toraighyrov University, Joint Research Center for Archeological Studies, Pavlodar, Kazakhstan
- Department of Archaeology, Ethnography and Museology, Altai State University, Barnaul, Russia
| | - Viktor Merts
- Toraighyrov University, Joint Research Center for Archeological Studies, Pavlodar, Kazakhstan
| | - Aleksei K Kasparov
- Institute of the History of Material Culture, Russian Academy of Sciences, St. Petersburg, Russia
| | - Vladimir V Pitulko
- Institute of the History of Material Culture, Russian Academy of Sciences, St. Petersburg, Russia
- Peter the Great Museum of Anthropology and Ethnography (Kunstkamera), Russian Academy of Sciences, St Petersburg, Russia
| | - Vedat Onar
- Osteoarchaeology Practice and Research Center and Department of Anatomy, Faculty of Veterinary Medicine, Istanbul University-Cerrahpaşa, Istanbul, Türkiye
| | - Aliye Öztan
- Archaeology Department, Ankara University, Ankara, Türkiye
| | - Benjamin S Arbuckle
- Department of Anthropology, Alumni Building, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Hugh McColl
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Gabriel Renaud
- Centre d'Anthropobiologie et de Génomique de Toulouse, CNRS UMR 5288, Université Paul Sabatier, Faculté de Médecine Purpan, Toulouse, France
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark (DTU), Copenhagen, Denmark
| | - Ruslan Khaskhanov
- Kh. Ibragimov Complex Institute of the Russian Academy of Sciences (CI RAS), Grozny, Russia
| | - Sergey Demidenko
- Institute of Archaeology, Russian Academy of Sciences, Moscow, Russia
| | - Anna Kadieva
- Department of Archaeological Monuments, State Historical Museum, Moscow, Russian Federation
| | | | | | - Gabriella Lindgren
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
- Center for Animal Breeding and Genetics, Department of Biosystems, KU Leuven, Leuven, Belgium
| | - F Javier López-Cachero
- Institut d'Arqueologia de la Universitat de Barcelona (IAUB), Seminari d'Estudis i Recerques Prehistoriques (SERP-UB), Universitat de Barcelona (UB), Barcelona, Spain
| | - Silvia Albizuri
- Institut d'Arqueologia de la Universitat de Barcelona (IAUB), Seminari d'Estudis i Recerques Prehistoriques (SERP-UB), Universitat de Barcelona (UB), Barcelona, Spain
| | - Tajana Trbojević Vukičević
- Department of Anatomy, Histology and Embryology, Faculty of Veterinary Medicine, University of Zagreb, Zagreb, Croatia
| | | | - Marcel Burić
- Department of Archaeology, Faculty of Humanities and Social Sciences, University of Zagreb, Zagreb, Croatia
| | | | - Jaco Weinstock
- Faculty of Arts and Humanities (Archaeology), University of Southampton, Southampton, UK
| | - David Asensio Vilaró
- Secció de Prehistòria i Arqueologia, IAUB Institut d'Arqueologia de la Universitat de Barcelona, Barcelona, Spain
| | - Ferran Codina
- C/Major, 20, Norfeu, Arqueologia Art i Patrimoni S.C., La Tallada d'Empordà, Spain
| | | | | | - Josep Pou
- Ajuntament de Calafell, Calafell (Tarragona), Spain
| | - Gabriel de Prado
- Museu d'Arqueologia de Catalunya (MAC-Ullastret), Ullastret, Spain
| | - Joan Sanmartí
- IEC-Institut d'Estudis Catalans (Union Académique Internationale), Barcelona, Spain
- Departament d'Història i Arqueologia, Facultat de Geografia i Història, Universitat de Barcelona, Barcelona, Spain
| | - Nabil Kallala
- Ecole Tunisienne d'Histoire et d'Anthropologie, Tunis, Tunisia
- University of Tunis, Institut National du Patrimoine, Tunis, Tunisia
| | | | | | - Maria-Carme Belarte Franco
- IEC-Institut d'Estudis Catalans (Union Académique Internationale), Barcelona, Spain
- ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, Spain
- ICAC (Catalan Institute of Classical Archaeology), Tarragona, Spain
| | - Silvia Valenzuela-Lamas
- Archaeology of Social Dynamics (ASD), Institució Milà i Fontanals, Consejo Superior de Investigaciones Científicas (IMF-CSIC), Barcelona, Spain
- UNIARQ - Unidade de Arqueologia, Universidade de Lisboa, Alameda da Universidade, Lisboa, Portugal
| | - Antoine Zazzo
- Centre National de Recherche Scientifique, Muséum national d'Histoire naturelle, Archéozoologie, Archéobotanique (AASPE), CP 56, Paris, France
| | - Sébastien Lepetz
- Centre National de Recherche Scientifique, Muséum national d'Histoire naturelle, Archéozoologie, Archéobotanique (AASPE), CP 56, Paris, France
| | - Sylvie Duchesne
- Centre d'Anthropobiologie et de Génomique de Toulouse, CNRS UMR 5288, Université Paul Sabatier, Faculté de Médecine Purpan, Toulouse, France
| | - Anatoly Alexeev
- Institute for Humanities Research and Indigenous Studies of the North (IHRISN), Yakutsk, Russia
| | - Jamsranjav Bayarsaikhan
- Max Planck Institute of Geoanthropology, Jena, Germany
- Institute of Archaeology, Mongolian Academy of Science, Ulaanbaatar, Mongolia
| | - Jean-Luc Houle
- Department of Folk Studies and Anthropology, Western Kentucky University, Bowling Green, KY, USA
| | - Noost Bayarkhuu
- Archaeological Research Center and Department of Anthropology and Archaeology, National University of Mongolia, Ulaanbaatar, Mongolia
| | - Tsagaan Turbat
- Archaeological Research Center and Department of Anthropology and Archaeology, National University of Mongolia, Ulaanbaatar, Mongolia
| | - Éric Crubézy
- Centre d'Anthropobiologie et de Génomique de Toulouse, CNRS UMR 5288, Université Paul Sabatier, Faculté de Médecine Purpan, Toulouse, France
| | | | - Marjan Mashkour
- Centre National de Recherche Scientifique, Muséum national d'Histoire naturelle, Archéozoologie, Archéobotanique (AASPE), CP 56, Paris, France
- Central Laboratory, Bioarchaeology Laboratory, Archaeozoology section, University of Tehran, Tehran, Iran
| | - Natalia Ya Berezina
- Research Institute and Museum of Anthropology, Lomonosov Moscow State University, Moscow, Russia
| | - Dmitriy S Korobov
- Institute of Archaeology, Russian Academy of Sciences, Moscow, Russia
| | | | | | - Jean-Paul Demoule
- UMR du CNRS 8215 Trajectoires, Institut d'Art et Archéologie, Paris, France
| | - Sabine Reinhold
- Eurasia Department of the German Archaeological Institute, Berlin, Germany
| | - Svend Hansen
- Eurasia Department of the German Archaeological Institute, Berlin, Germany
| | - Barbara Wallner
- Institute of Animal Breeding and Genetics, Department of Biomedical Sciences, University of Veterinary Medicine Vienna, Vienna, Austria
| | - Natalia Roslyakova
- Department of Russian History and Archaeology, Samara State University of Social Sciences and Education, Samara, Russia
| | - Pavel F Kuznetsov
- Department of Russian History and Archaeology, Samara State University of Social Sciences and Education, Samara, Russia
| | - Alexey A Tishkin
- Department of Archaeology, Ethnography and Museology, Altai State University, Barnaul, Russia
| | - Patrick Wincker
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Université d'Évry, Université Paris-Saclay, Évry, France
| | - Katherine Kanne
- Department of Archaeology and History, University of Exeter, Exeter, UK
- School of Archaeology, University College Dublin, Dublin, Ireland
| | - Alan Outram
- Department of Archaeology and History, University of Exeter, Exeter, UK
| | - Ludovic Orlando
- Centre d'Anthropobiologie et de Génomique de Toulouse, CNRS UMR 5288, Université Paul Sabatier, Faculté de Médecine Purpan, Toulouse, France.
| |
Collapse
|
7
|
Thompson A, Liebeskind BJ, Scully EJ, Landis MJ. Deep Learning and Likelihood Approaches for Viral Phylogeography Converge on the Same Answers Whether the Inference Model Is Right or Wrong. Syst Biol 2024; 73:183-206. [PMID: 38189575 PMCID: PMC11249978 DOI: 10.1093/sysbio/syad074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 11/22/2023] [Accepted: 01/05/2024] [Indexed: 01/09/2024] Open
Abstract
Analysis of phylogenetic trees has become an essential tool in epidemiology. Likelihood-based methods fit models to phylogenies to draw inferences about the phylodynamics and history of viral transmission. However, these methods are often computationally expensive, which limits the complexity and realism of phylodynamic models and makes them ill-suited for informing policy decisions in real-time during rapidly developing outbreaks. Likelihood-free methods using deep learning are pushing the boundaries of inference beyond these constraints. In this paper, we extend, compare, and contrast a recently developed deep learning method for likelihood-free inference from trees. We trained multiple deep neural networks using phylogenies from simulated outbreaks that spread among 5 locations and found they achieve close to the same levels of accuracy as Bayesian inference under the true simulation model. We compared robustness to model misspecification of a trained neural network to that of a Bayesian method. We found that both models had comparable performance, converging on similar biases. We also implemented a method of uncertainty quantification called conformalized quantile regression that we demonstrate has similar patterns of sensitivity to model misspecification as Bayesian highest posterior density (HPD) and greatly overlap with HPDs, but have lower precision (more conservative). Finally, we trained and tested a neural network against phylogeographic data from a recent study of the SARS-Cov-2 pandemic in Europe and obtained similar estimates of region-specific epidemiological parameters and the location of the common ancestor in Europe. Along with being as accurate and robust as likelihood-based methods, our trained neural networks are on average over 3 orders of magnitude faster after training. Our results support the notion that neural networks can be trained with simulated data to accurately mimic the good and bad statistical properties of the likelihood functions of generative phylogenetic models.
Collapse
Affiliation(s)
- Ammon Thompson
- Participant in an Education Program Sponsored by U.S. Department of Defense (DOD) at the National Geospatial-Intelligence Agency, Springfield, VA 22150, USA
| | | | - Erik J Scully
- National Geospatial-Intelligence Agency, Springfield, VA 22150, USA
| | - Michael J Landis
- Department of Biology, Washington University in St. Louis, Rebstock Hall, St. Louis, MO 63130, USA
| |
Collapse
|
8
|
Rehmann CT, Ralph PL, Kern AD. Evaluating evidence for co-geography in the Anopheles-Plasmodium host-parasite system. G3 (BETHESDA, MD.) 2024; 14:jkae008. [PMID: 38230808 PMCID: PMC10917517 DOI: 10.1093/g3journal/jkae008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 11/08/2023] [Accepted: 12/22/2023] [Indexed: 01/18/2024]
Abstract
The often tight association between parasites and their hosts means that under certain scenarios, the evolutionary histories of the two species can become closely coupled both through time and across space. Using spatial genetic inference, we identify a potential signal of common dispersal patterns in the Anopheles gambiae and Plasmodium falciparum host-parasite system as seen through a between-species correlation of the differences between geographic sampling location and geographic location predicted from the genome. This correlation may be due to coupled dispersal dynamics between host and parasite but may also reflect statistical artifacts due to uneven spatial distribution of sampling locations. Using continuous-space population genetics simulations, we investigate the degree to which uneven distribution of sampling locations leads to bias in prediction of spatial location from genetic data and implement methods to counter this effect. We demonstrate that while algorithmic bias presents a problem in inference from spatio-genetic data, the correlation structure between A. gambiae and P. falciparum predictions cannot be attributed to spatial bias alone and is thus likely a genetic signal of co-dispersal in a host-parasite system.
Collapse
Affiliation(s)
- Clara T Rehmann
- Institute of Ecology and Evolution and Department of Biology, University of Oregon, Eugene 97403, USA
| | - Peter L Ralph
- Institute of Ecology and Evolution and Department of Biology, University of Oregon, Eugene 97403, USA
- Department of Mathematics, University of Oregon, Eugene 97403, USA
| | - Andrew D Kern
- Institute of Ecology and Evolution and Department of Biology, University of Oregon, Eugene 97403, USA
| |
Collapse
|
9
|
Ray DD, Flagel L, Schrider DR. IntroUNET: Identifying introgressed alleles via semantic segmentation. PLoS Genet 2024; 20:e1010657. [PMID: 38377104 PMCID: PMC10906877 DOI: 10.1371/journal.pgen.1010657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 03/01/2024] [Accepted: 01/29/2024] [Indexed: 02/22/2024] Open
Abstract
A growing body of evidence suggests that gene flow between closely related species is a widespread phenomenon. Alleles that introgress from one species into a close relative are typically neutral or deleterious, but sometimes confer a significant fitness advantage. Given the potential relevance to speciation and adaptation, numerous methods have therefore been devised to identify regions of the genome that have experienced introgression. Recently, supervised machine learning approaches have been shown to be highly effective for detecting introgression. One especially promising approach is to treat population genetic inference as an image classification problem, and feed an image representation of a population genetic alignment as input to a deep neural network that distinguishes among evolutionary models (i.e. introgression or no introgression). However, if we wish to investigate the full extent and fitness effects of introgression, merely identifying genomic regions in a population genetic alignment that harbor introgressed loci is insufficient-ideally we would be able to infer precisely which individuals have introgressed material and at which positions in the genome. Here we adapt a deep learning algorithm for semantic segmentation, the task of correctly identifying the type of object to which each individual pixel in an image belongs, to the task of identifying introgressed alleles. Our trained neural network is thus able to infer, for each individual in a two-population alignment, which of those individual's alleles were introgressed from the other population. We use simulated data to show that this approach is highly accurate, and that it can be readily extended to identify alleles that are introgressed from an unsampled "ghost" population, performing comparably to a supervised learning method tailored specifically to that task. Finally, we apply this method to data from Drosophila, showing that it is able to accurately recover introgressed haplotypes from real data. This analysis reveals that introgressed alleles are typically confined to lower frequencies within genic regions, suggestive of purifying selection, but are found at much higher frequencies in a region previously shown to be affected by adaptive introgression. Our method's success in recovering introgressed haplotypes in challenging real-world scenarios underscores the utility of deep learning approaches for making richer evolutionary inferences from genomic data.
Collapse
Affiliation(s)
- Dylan D. Ray
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Lex Flagel
- Division of Data Science, Gencove Inc., New York, New York, United States of America
- Department of Plant and Microbial Biology, University of Minnesota, Saint Paul, Minnesota, United States of America
| | - Daniel R. Schrider
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| |
Collapse
|
10
|
Ray DD, Flagel L, Schrider DR. IntroUNET: identifying introgressed alleles via semantic segmentation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.02.07.527435. [PMID: 36865105 PMCID: PMC9979274 DOI: 10.1101/2023.02.07.527435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
A growing body of evidence suggests that gene flow between closely related species is a widespread phenomenon. Alleles that introgress from one species into a close relative are typically neutral or deleterious, but sometimes confer a significant fitness advantage. Given the potential relevance to speciation and adaptation, numerous methods have therefore been devised to identify regions of the genome that have experienced introgression. Recently, supervised machine learning approaches have been shown to be highly effective for detecting introgression. One especially promising approach is to treat population genetic inference as an image classification problem, and feed an image representation of a population genetic alignment as input to a deep neural network that distinguishes among evolutionary models (i.e. introgression or no introgression). However, if we wish to investigate the full extent and fitness effects of introgression, merely identifying genomic regions in a population genetic alignment that harbor introgressed loci is insufficient-ideally we would be able to infer precisely which individuals have introgressed material and at which positions in the genome. Here we adapt a deep learning algorithm for semantic segmentation, the task of correctly identifying the type of object to which each individual pixel in an image belongs, to the task of identifying introgressed alleles. Our trained neural network is thus able to infer, for each individual in a two-population alignment, which of those individual's alleles were introgressed from the other population. We use simulated data to show that this approach is highly accurate, and that it can be readily extended to identify alleles that are introgressed from an unsampled "ghost" population, performing comparably to a supervised learning method tailored specifically to that task. Finally, we apply this method to data from Drosophila, showing that it is able to accurately recover introgressed haplotypes from real data. This analysis reveals that introgressed alleles are typically confined to lower frequencies within genic regions, suggestive of purifying selection, but are found at much higher frequencies in a region previously shown to be affected by adaptive introgression. Our method's success in recovering introgressed haplotypes in challenging real-world scenarios underscores the utility of deep learning approaches for making richer evolutionary inferences from genomic data.
Collapse
Affiliation(s)
- Dylan D. Ray
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Lex Flagel
- Division of Data Science, Gencove Inc., New York, NY 11101, USA
- Department of Plant and Microbial Biology, University of Minnesota, St Paul MN, 55108, USA
| | - Daniel R. Schrider
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
11
|
McGaughran A, Dhami MK, Parvizi E, Vaughan AL, Gleeson DM, Hodgins KA, Rollins LA, Tepolt CK, Turner KG, Atsawawaranunt K, Battlay P, Congrains C, Crottini A, Dennis TPW, Lange C, Liu XP, Matheson P, North HL, Popovic I, Rius M, Santure AW, Stuart KC, Tan HZ, Wang C, Wilson J. Genomic Tools in Biological Invasions: Current State and Future Frontiers. Genome Biol Evol 2024; 16:evad230. [PMID: 38109935 PMCID: PMC10776249 DOI: 10.1093/gbe/evad230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/16/2023] [Accepted: 12/12/2023] [Indexed: 12/20/2023] Open
Abstract
Human activities are accelerating rates of biological invasions and climate-driven range expansions globally, yet we understand little of how genomic processes facilitate the invasion process. Although most of the literature has focused on underlying phenotypic correlates of invasiveness, advances in genomic technologies are showing a strong link between genomic variation and invasion success. Here, we consider the ability of genomic tools and technologies to (i) inform mechanistic understanding of biological invasions and (ii) solve real-world issues in predicting and managing biological invasions. For both, we examine the current state of the field and discuss how genomics can be leveraged in the future. In addition, we make recommendations pertinent to broader research issues, such as data sovereignty, metadata standards, collaboration, and science communication best practices that will require concerted efforts from the global invasion genomics community.
Collapse
Affiliation(s)
- Angela McGaughran
- Te Aka Mātuatua/School of Science, University of Waikato, Hamilton, New Zealand
| | - Manpreet K Dhami
- Biocontrol and Molecular Ecology, Manaaki Whenua Landcare Research, Lincoln, New Zealand
- School of Biological Sciences, Waipapa Taumata Rau/University of Auckland, Auckland, New Zealand
| | - Elahe Parvizi
- Te Aka Mātuatua/School of Science, University of Waikato, Hamilton, New Zealand
| | - Amy L Vaughan
- Biocontrol and Molecular Ecology, Manaaki Whenua Landcare Research, Lincoln, New Zealand
| | - Dianne M Gleeson
- Centre for Conservation Ecology and Genomics, Faculty of Science and Technology, University of Canberra, Canberra, ACT, Australia
| | - Kathryn A Hodgins
- School of Biological Sciences, Monash University, Melbourne, VIC, Australia
| | - Lee A Rollins
- Evolution and Ecology Research Centre, University of New South Wales, Sydney, NSW, Australia
| | - Carolyn K Tepolt
- Department of Biology, Woods Hole Oceanographic Institution, Woods Hole, MA, USA
| | - Kathryn G Turner
- Department of Biological Sciences, Idaho State University, Pocatello, ID, USA
| | - Kamolphat Atsawawaranunt
- School of Biological Sciences, Waipapa Taumata Rau/University of Auckland, Auckland, New Zealand
| | - Paul Battlay
- School of Biological Sciences, Monash University, Melbourne, VIC, Australia
| | - Carlos Congrains
- Entomology Section, Department of Plant and Environmental Protection Sciences, University of Hawaiʻi at Mānoa, Honolulu, HI 96822, USA
- US Department of Agriculture-Agricultural Research Service, Daniel K. Inouye US Pacific Basin Agricultural Research Center, Hilo, HI 96720, USA
| | - Angelica Crottini
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, Vairão 4485-661, Portugal
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Porto 4169–007, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão 4485-661, Portugal
| | - Tristan P W Dennis
- Department of Vector Biology, Liverpool School of Tropical Medicine, Liverpool, UK
| | - Claudia Lange
- Biocontrol and Molecular Ecology, Manaaki Whenua Landcare Research, Lincoln, New Zealand
| | - Xiaoyue P Liu
- Department of Marine Science, University of Otago, Dunedin, New Zealand
| | - Paige Matheson
- Te Aka Mātuatua/School of Science, University of Waikato, Hamilton, New Zealand
| | - Henry L North
- Department of Zoology, University of Cambridge, Cambridge, UK
| | - Iva Popovic
- School of the Environment, University of Queensland, Brisbane, QLD, Australia
| | - Marc Rius
- Centre for Advanced Studies of Blanes (CEAB, CSIC), Accés a la Cala Sant Francesc, Blanes, Spain
- Department of Zoology, Centre for Ecological Genomics and Wildlife Conservation, University of Johannesburg, Johannesburg 2006, South Africa
| | - Anna W Santure
- School of Biological Sciences, Waipapa Taumata Rau/University of Auckland, Auckland, New Zealand
| | - Katarina C Stuart
- School of Biological Sciences, Waipapa Taumata Rau/University of Auckland, Auckland, New Zealand
| | - Hui Zhen Tan
- School of Biological Sciences, Waipapa Taumata Rau/University of Auckland, Auckland, New Zealand
| | - Cui Wang
- The Organismal and Evolutionary Biology Research Programme, University of Helsinki, Helsinki, Finland
| | - Jonathan Wilson
- School of Biological Sciences, Monash University, Melbourne, VIC, Australia
| |
Collapse
|
12
|
Huang X, Rymbekova A, Dolgova O, Lao O, Kuhlwilm M. Harnessing deep learning for population genetic inference. Nat Rev Genet 2024; 25:61-78. [PMID: 37666948 DOI: 10.1038/s41576-023-00636-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2023] [Indexed: 09/06/2023]
Abstract
In population genetics, the emergence of large-scale genomic data for various species and populations has provided new opportunities to understand the evolutionary forces that drive genetic diversity using statistical inference. However, the era of population genomics presents new challenges in analysing the massive amounts of genomes and variants. Deep learning has demonstrated state-of-the-art performance for numerous applications involving large-scale data. Recently, deep learning approaches have gained popularity in population genetics; facilitated by the advent of massive genomic data sets, powerful computational hardware and complex deep learning architectures, they have been used to identify population structure, infer demographic history and investigate natural selection. Here, we introduce common deep learning architectures and provide comprehensive guidelines for implementing deep learning models for population genetic inference. We also discuss current challenges and future directions for applying deep learning in population genetics, focusing on efficiency, robustness and interpretability.
Collapse
Affiliation(s)
- Xin Huang
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria.
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria.
| | - Aigerim Rymbekova
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria
| | - Olga Dolgova
- Integrative Genomics Laboratory, CIC bioGUNE - Centro de Investigación Cooperativa en Biociencias, Derio, Biscaya, Spain
| | - Oscar Lao
- Institute of Evolutionary Biology, CSIC-Universitat Pompeu Fabra, Barcelona, Spain.
| | - Martin Kuhlwilm
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria.
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria.
| |
Collapse
|
13
|
Rehmann CT, Ralph PL, Kern AD. Evaluating evidence for co-geography in the Anopheles-Plasmodium host-parasite system. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.17.549405. [PMID: 37503196 PMCID: PMC10370088 DOI: 10.1101/2023.07.17.549405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
The often tight association between parasites and their hosts means that under certain scenarios, the evolutionary histories of the two species can become closely coupled both through time and across space. Using spatial genetic inference, we identify a potential signal of common dispersal patterns in the Anopheles gambiae and Plasmodium falciparum host-parasite system as seen through a between-species correlation of the differences between geographic sampling location and geographic location predicted from the genome. This correlation may be due to coupled dispersal dynamics between host and parasite, but may also reflect statistical artifacts due to uneven spatial distribution of sampling locations. Using continuous-space population genetics simulations, we investigate the degree to which uneven distribution of sampling locations leads to bias in prediction of spatial location from genetic data and implement methods to counter this effect. We demonstrate that while algorithmic bias presents a problem in inference from spatio-genetic data, the correlation structure between A. gambiae and P. falciparum predictions cannot be attributed to spatial bias alone, and is thus likely a genetic signal of co-dispersal in a host-parasite system.
Collapse
Affiliation(s)
- Clara T Rehmann
- University of Oregon, Institute of Ecology and Evolution and Department of Biology
| | - Peter L Ralph
- University of Oregon, Institute of Ecology and Evolution and Department of Biology
- University of Oregon, Department of Mathematics
| | - Andrew D Kern
- University of Oregon, Institute of Ecology and Evolution and Department of Biology
| |
Collapse
|
14
|
Kloska A, Giełczyk A, Grzybowski T, Płoski R, Kloska SM, Marciniak T, Pałczyński K, Rogalla-Ładniak U, Malyarchuk BA, Derenko MV, Kovačević-Grujičić N, Stevanović M, Drakulić D, Davidović S, Spólnicka M, Zubańska M, Woźniak M. A Machine-Learning-Based Approach to Prediction of Biogeographic Ancestry within Europe. Int J Mol Sci 2023; 24:15095. [PMID: 37894775 PMCID: PMC10606184 DOI: 10.3390/ijms242015095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 10/03/2023] [Accepted: 10/07/2023] [Indexed: 10/29/2023] Open
Abstract
Data obtained with the use of massive parallel sequencing (MPS) can be valuable in population genetics studies. In particular, such data harbor the potential for distinguishing samples from different populations, especially from those coming from adjacent populations of common origin. Machine learning (ML) techniques seem to be especially well suited for analyzing large datasets obtained using MPS. The Slavic populations constitute about a third of the population of Europe and inhabit a large area of the continent, while being relatively closely related in population genetics terms. In this proof-of-concept study, various ML techniques were used to classify DNA samples from Slavic and non-Slavic individuals. The primary objective of this study was to empirically evaluate the feasibility of discerning the genetic provenance of individuals of Slavic descent who exhibit genetic similarity, with the overarching goal of categorizing DNA specimens derived from diverse Slavic population representatives. Raw sequencing data were pre-processed, to obtain a 1200 character-long binary vector. A total of three classifiers were used-Random Forest, Support Vector Machine (SVM), and XGBoost. The most-promising results were obtained using SVM with a linear kernel, with 99.9% accuracy and F1-scores of 0.9846-1.000 for all classes.
Collapse
Affiliation(s)
- Anna Kloska
- Department of Forensic Medicine, The Ludwik Rydygier Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Torun, 85067 Bydgoszcz, Poland
- Faculty of Medical Sciences, Bydgoszcz University of Science and Technology, 85796 Bydgoszcz, Poland
| | - Agata Giełczyk
- Faculty of Telecommunications, Computer Science and Electrical Engineering, Bydgoszcz University of Science and Technology, 85796 Bydgoszcz, Poland
| | - Tomasz Grzybowski
- Department of Forensic Medicine, The Ludwik Rydygier Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Torun, 85067 Bydgoszcz, Poland
| | - Rafał Płoski
- Department of Medical Genetics, Warsaw Medical University, 02106 Warsaw, Poland
| | - Sylwester M. Kloska
- Department of Forensic Medicine, The Ludwik Rydygier Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Torun, 85067 Bydgoszcz, Poland
- Faculty of Medical Sciences, Bydgoszcz University of Science and Technology, 85796 Bydgoszcz, Poland
| | - Tomasz Marciniak
- Faculty of Telecommunications, Computer Science and Electrical Engineering, Bydgoszcz University of Science and Technology, 85796 Bydgoszcz, Poland
| | - Krzysztof Pałczyński
- Faculty of Telecommunications, Computer Science and Electrical Engineering, Bydgoszcz University of Science and Technology, 85796 Bydgoszcz, Poland
| | - Urszula Rogalla-Ładniak
- Department of Forensic Medicine, The Ludwik Rydygier Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Torun, 85067 Bydgoszcz, Poland
| | - Boris A. Malyarchuk
- Institute of Biological Problems of the North, Russian Academy of Sciences, 685000 Magadan, Russia
| | - Miroslava V. Derenko
- Institute of Biological Problems of the North, Russian Academy of Sciences, 685000 Magadan, Russia
| | - Nataša Kovačević-Grujičić
- Institute of Molecular Genetics and Genetic Engineering, University of Belgrade, 11042 Belgrade, Serbia
| | - Milena Stevanović
- Institute of Molecular Genetics and Genetic Engineering, University of Belgrade, 11042 Belgrade, Serbia
- Faculty of Biology, University of Belgrade, 11000 Belgrade, Serbia
- Serbian Academy of Sciences and Arts, 11000 Belgrade, Serbia
| | - Danijela Drakulić
- Institute of Molecular Genetics and Genetic Engineering, University of Belgrade, 11042 Belgrade, Serbia
| | - Slobodan Davidović
- Institute for Biological Research “Siniša Stanković”, National Institute of Republic of Serbia, University of Belgrade, 11060 Belgrade, Serbia
| | | | - Magdalena Zubańska
- Faculty of Law and Administration, Department of Criminology and Forensic Sciences, University of Warmia and Mazury, 10726 Olsztyn, Poland
| | - Marcin Woźniak
- Department of Forensic Medicine, The Ludwik Rydygier Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Torun, 85067 Bydgoszcz, Poland
| |
Collapse
|
15
|
Nait Saada J, Tsangalidou Z, Stricker M, Palamara PF. Inference of Coalescence Times and Variant Ages Using Convolutional Neural Networks. Mol Biol Evol 2023; 40:msad211. [PMID: 37738175 PMCID: PMC10581698 DOI: 10.1093/molbev/msad211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 09/11/2023] [Accepted: 09/18/2023] [Indexed: 09/24/2023] Open
Abstract
Accurate inference of the time to the most recent common ancestor (TMRCA) between pairs of individuals and of the age of genomic variants is key in several population genetic analyses. We developed a likelihood-free approach, called CoalNN, which uses a convolutional neural network to predict pairwise TMRCAs and allele ages from sequencing or SNP array data. CoalNN is trained through simulation and can be adapted to varying parameters, such as demographic history, using transfer learning. Across several simulated scenarios, CoalNN matched or outperformed the accuracy of model-based approaches for pairwise TMRCA and allele age prediction. We applied CoalNN to settings for which model-based approaches are under-developed and performed analyses to gain insights into the set of features it uses to perform TMRCA prediction. We next used CoalNN to analyze 2,504 samples from 26 populations in the 1,000 Genome Project data set, inferring the age of ∼80 million variants. We observed substantial variation across populations and for variants predicted to be pathogenic, reflecting heterogeneous demographic histories and the action of negative selection. We used CoalNN's predicted allele ages to construct genome-wide annotations capturing the signature of past negative selection. We performed LD-score regression analysis of heritability using summary association statistics from 63 independent complex traits and diseases (average N=314k), observing increased annotation-specific effects on heritability compared to a previous allele age annotation. These results highlight the effectiveness of using likelihood-free, simulation-trained models to infer properties of gene genealogies in large genomic data sets.
Collapse
Affiliation(s)
| | | | | | - Pier Francesco Palamara
- Department of Statistics, University of Oxford, Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| |
Collapse
|
16
|
Mantes AD, Montserrat DM, Bustamante CD, Giró-i-Nieto X, Ioannidis AG. Neural ADMIXTURE for rapid genomic clustering. NATURE COMPUTATIONAL SCIENCE 2023; 3:621-629. [PMID: 37600116 PMCID: PMC10438426 DOI: 10.1038/s43588-023-00482-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 06/06/2023] [Indexed: 08/22/2023]
Abstract
Characterizing the genetic structure of large cohorts has become increasingly important as genetic studies extend to massive, increasingly diverse biobanks. Popular methods decompose individual genomes into fractional cluster assignments with each cluster representing a vector of DNA variant frequencies. However, with rapidly increasing biobank sizes, these methods have become computationally intractable. Here we present Neural ADMIXTURE, a neural network autoencoder that follows the same modeling assumptions as the current standard algorithm, ADMIXTURE, while reducing the compute time by orders of magnitude surpassing even the fastest alternatives. One month of continuous compute using ADMIXTURE can be reduced to just hours with Neural ADMIXTURE. A multi-head approach allows Neural ADMIXTURE to offer even further acceleration by calculating multiple cluster numbers in a single run. Furthermore, the models can be stored, allowing cluster assignment to be performed on new data in linear time without needing to share the training samples.
Collapse
Affiliation(s)
- Albert Dominguez Mantes
- Department of Biomedical Data Science, Stanford Medical School, Stanford, CA, United States
- Signal Theory and Communications Department, Universitat Politècnica de Catalunya, Barcelona, Catalonia, Spain
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Vaud, Switzerland
| | - Daniel Mas Montserrat
- Department of Biomedical Data Science, Stanford Medical School, Stanford, CA, United States
| | | | - Xavier Giró-i-Nieto
- Signal Theory and Communications Department, Universitat Politècnica de Catalunya, Barcelona, Catalonia, Spain
| | - Alexander G. Ioannidis
- Department of Biomedical Data Science, Stanford Medical School, Stanford, CA, United States
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA, United States
| |
Collapse
|
17
|
Smith CCR, Tittes S, Ralph PL, Kern AD. Dispersal inference from population genetic variation using a convolutional neural network. Genetics 2023; 224:iyad068. [PMID: 37052957 PMCID: PMC10213498 DOI: 10.1093/genetics/iyad068] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 02/08/2023] [Accepted: 04/07/2023] [Indexed: 04/14/2023] Open
Abstract
The geographic nature of biological dispersal shapes patterns of genetic variation over landscapes, making it possible to infer properties of dispersal from genetic variation data. Here, we present an inference tool that uses geographically distributed genotype data in combination with a convolutional neural network to estimate a critical population parameter: the mean per-generation dispersal distance. Using extensive simulation, we show that our deep learning approach is competitive with or outperforms state-of-the-art methods, particularly at small sample sizes. In addition, we evaluate varying nuisance parameters during training-including population density, demographic history, habitat size, and sampling area-and show that this strategy is effective for estimating dispersal distance when other model parameters are unknown. Whereas competing methods depend on information about local population density or accurate inference of identity-by-descent tracts, our method uses only single-nucleotide-polymorphism data and the spatial scale of sampling as input. Strikingly, and unlike other methods, our method does not use the geographic coordinates of the genotyped individuals. These features make our method, which we call "disperseNN," a potentially valuable new tool for estimating dispersal distance in nonmodel systems with whole genome data or reduced representation data. We apply disperseNN to 12 different species with publicly available data, yielding reasonable estimates for most species. Importantly, our method estimated consistently larger dispersal distances than mark-recapture calculations in the same species, which may be due to the limited geographic sampling area covered by some mark-recapture studies. Thus genetic tools like ours complement direct methods for improving our understanding of dispersal.
Collapse
Affiliation(s)
- Chris C R Smith
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
| | - Silas Tittes
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
| | - Peter L Ralph
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
| | - Andrew D Kern
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
| |
Collapse
|
18
|
Ahlquist KD, Sugden LA, Ramachandran S. Enabling interpretable machine learning for biological data with reliability scores. PLoS Comput Biol 2023; 19:e1011175. [PMID: 37235578 PMCID: PMC10249903 DOI: 10.1371/journal.pcbi.1011175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 06/08/2023] [Accepted: 05/10/2023] [Indexed: 05/28/2023] Open
Abstract
Machine learning tools have proven useful across biological disciplines, allowing researchers to draw conclusions from large datasets, and opening up new opportunities for interpreting complex and heterogeneous biological data. Alongside the rapid growth of machine learning, there have also been growing pains: some models that appear to perform well have later been revealed to rely on features of the data that are artifactual or biased; this feeds into the general criticism that machine learning models are designed to optimize model performance over the creation of new biological insights. A natural question arises: how do we develop machine learning models that are inherently interpretable or explainable? In this manuscript, we describe the SWIF(r) reliability score (SRS), a method building on the SWIF(r) generative framework that reflects the trustworthiness of the classification of a specific instance. The concept of the reliability score has the potential to generalize to other machine learning methods. We demonstrate the utility of the SRS when faced with common challenges in machine learning including: 1) an unknown class present in testing data that was not present in training data, 2) systemic mismatch between training and testing data, and 3) instances of testing data that have missing values for some attributes. We explore these applications of the SRS using a range of biological datasets, from agricultural data on seed morphology, to 22 quantitative traits in the UK Biobank, and population genetic simulations and 1000 Genomes Project data. With each of these examples, we demonstrate how the SRS can allow researchers to interrogate their data and training approach thoroughly, and to pair their domain-specific knowledge with powerful machine-learning frameworks. We also compare the SRS to related tools for outlier and novelty detection, and find that it has comparable performance, with the advantage of being able to operate when some data are missing. The SRS, and the broader discussion of interpretable scientific machine learning, will aid researchers in the biological machine learning space as they seek to harness the power of machine learning without sacrificing rigor and biological insight.
Collapse
Affiliation(s)
- K. D. Ahlquist
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, Rhode Island, United States of America
| | - Lauren A. Sugden
- Department of Mathematics and Computer Science, Duquesne University, Pittsburgh, Pennsylvania, United States of America
| | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Department of Ecology, Evolution and Organismal Biology, Brown University, Providence, Rhode Island, United States of America
- Data Science Initiative, Brown University, Providence, Rhode Island, United States of America
| |
Collapse
|
19
|
Hamid I, Korunes KL, Schrider DR, Goldberg A. Localizing Post-Admixture Adaptive Variants with Object Detection on Ancestry-Painted Chromosomes. Mol Biol Evol 2023; 40:msad074. [PMID: 36947126 PMCID: PMC10116606 DOI: 10.1093/molbev/msad074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2022] [Revised: 03/14/2023] [Accepted: 03/20/2023] [Indexed: 03/23/2023] Open
Abstract
Gene flow between previously differentiated populations during the founding of an admixed or hybrid population has the potential to introduce adaptive alleles into the new population. If the adaptive allele is common in one source population, but not the other, then as the adaptive allele rises in frequency in the admixed population, genetic ancestry from the source containing the adaptive allele will increase nearby as well. Patterns of genetic ancestry have therefore been used to identify post-admixture positive selection in humans and other animals, including examples in immunity, metabolism, and animal coloration. A common method identifies regions of the genome that have local ancestry "outliers" compared with the distribution across the rest of the genome, considering each locus independently. However, we lack theoretical models for expected distributions of ancestry under various demographic scenarios, resulting in potential false positives and false negatives. Further, ancestry patterns between distant sites are often not independent. As a result, current methods tend to infer wide genomic regions containing many genes as under selection, limiting biological interpretation. Instead, we develop a deep learning object detection method applied to images generated from local ancestry-painted genomes. This approach preserves information from the surrounding genomic context and avoids potential pitfalls of user-defined summary statistics. We find the method is robust to a variety of demographic misspecifications using simulated data. Applied to human genotype data from Cabo Verde, we localize a known adaptive locus to a single narrow region compared with multiple or long windows obtained using two other ancestry-based methods.
Collapse
Affiliation(s)
- Iman Hamid
- Department of Evolutionary Anthropology, Duke University, Durham, NC
| | | | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC
| | - Amy Goldberg
- Department of Evolutionary Anthropology, Duke University, Durham, NC
| |
Collapse
|
20
|
Estimating human mobility in Holocene Western Eurasia with large-scale ancient genomic data. Proc Natl Acad Sci U S A 2023; 120:e2218375120. [PMID: 36821583 PMCID: PMC9992830 DOI: 10.1073/pnas.2218375120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023] Open
Abstract
The recent increase in openly available ancient human DNA samples allows for large-scale meta-analysis applications. Trans-generational past human mobility is one of the key aspects that ancient genomics can contribute to since changes in genetic ancestry-unlike cultural changes seen in the archaeological record-necessarily reflect movements of people. Here, we present an algorithm for spatiotemporal mapping of genetic profiles, which allow for direct estimates of past human mobility from large ancient genomic datasets. The key idea of the method is to derive a spatial probability surface of genetic similarity for each individual in its respective past. This is achieved by first creating an interpolated ancestry field through space and time based on multivariate statistics and Gaussian process regression and then using this field to map the ancient individuals into space according to their genetic profile. We apply this algorithm to a dataset of 3138 aDNA samples with genome-wide data from Western Eurasia in the last 10,000 y. Finally, we condense this sample-wise record with a simple summary statistic into a diachronic measure of mobility for subregions in Western, Central, and Southern Europe. For regions and periods with sufficient data coverage, our similarity surfaces and mobility estimates show general concordance with previous results and provide a meta-perspective of genetic changes and human mobility.
Collapse
|
21
|
Korfmann K, Gaggiotti OE, Fumagalli M. Deep Learning in Population Genetics. Genome Biol Evol 2023; 15:6997869. [PMID: 36683406 PMCID: PMC9897193 DOI: 10.1093/gbe/evad008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 12/19/2022] [Accepted: 01/16/2023] [Indexed: 01/24/2023] Open
Abstract
Population genetics is transitioning into a data-driven discipline thanks to the availability of large-scale genomic data and the need to study increasingly complex evolutionary scenarios. With likelihood and Bayesian approaches becoming either intractable or computationally unfeasible, machine learning, and in particular deep learning, algorithms are emerging as popular techniques for population genetic inferences. These approaches rely on algorithms that learn non-linear relationships between the input data and the model parameters being estimated through representation learning from training data sets. Deep learning algorithms currently employed in the field comprise discriminative and generative models with fully connected, convolutional, or recurrent layers. Additionally, a wide range of powerful simulators to generate training data under complex scenarios are now available. The application of deep learning to empirical data sets mostly replicates previous findings of demography reconstruction and signals of natural selection in model organisms. To showcase the feasibility of deep learning to tackle new challenges, we designed a branched architecture to detect signals of recent balancing selection from temporal haplotypic data, which exhibited good predictive performance on simulated data. Investigations on the interpretability of neural networks, their robustness to uncertain training data, and creative representation of population genetic data, will provide further opportunities for technological advancements in the field.
Collapse
Affiliation(s)
- Kevin Korfmann
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, Germany
| | - Oscar E Gaggiotti
- Centre for Biological Diversity, Sir Harold Mitchell Building, University of St Andrews, Fife KY16 9TF, UK
| | | |
Collapse
|
22
|
Image Geo-Site Estimation Using Convolutional Auto-Encoder and Multi-Label Support Vector Machine. INFORMATION 2023. [DOI: 10.3390/info14010029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The estimation of an image geo-site solely based on its contents is a promising task. Compelling image labelling relies heavily on contextual information, which is not as simple as recognizing a single object in an image. An Auto-Encode-based support vector machine approach is proposed in this work to estimate the image geo-site to address the issue of misclassifying the estimations. The proposed method for geo-site estimation is conducted using a dataset consisting of 125 classes of various images captured within 125 countries. The proposed work uses a convolutional Auto-Encode for training and dimensionality reduction. After that, the acquired preprocessed input dataset is further processed by a multi-label support vector machine. The performance assessment of the proposed approach has been accomplished using accuracy, sensitivity, specificity, and F1-score as evaluation parameters. Eventually, the proposed approach for image geo-site estimation presented in this article outperforms Auto-Encode-based K-Nearest Neighbor and Auto-Encode-Random Forest methods.
Collapse
|
23
|
Vermant M, Goos T, Gogaert S, De Cock D, Verschueren P, Wuyts WA. Are genes the missing link to detect and prognosticate RA-ILD? Rheumatol Adv Pract 2023; 7:rkad023. [PMID: 36923263 PMCID: PMC10010659 DOI: 10.1093/rap/rkad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/31/2023] [Indexed: 03/09/2023] Open
Affiliation(s)
- Marie Vermant
- Laboratory of Respiratory Diseases and Thoracic Surgery (BREATHE), Department of Chronic Diseases and Metabolism, KU Leuven, Leuven, Belgium.,Pulmonology, University Hospitals Leuven, Leuven, Belgium
| | - Tinne Goos
- Laboratory of Respiratory Diseases and Thoracic Surgery (BREATHE), Department of Chronic Diseases and Metabolism, KU Leuven, Leuven, Belgium.,Pulmonology, University Hospitals Leuven, Leuven, Belgium
| | - Stefan Gogaert
- Laboratory of Respiratory Diseases and Thoracic Surgery (BREATHE), Department of Chronic Diseases and Metabolism, KU Leuven, Leuven, Belgium.,Pulmonology, University Hospitals Leuven, Leuven, Belgium
| | - Diederik De Cock
- Biostatistics and Medical Informatics Research Group, Department of Public Health, Vrije Universiteit Brussel, Brussels, Belgium
| | - Patrick Verschueren
- Skeletal Biology and Engineering Research Center, Department of Development and Regeneration, KU Leuven, Leuven, Belgium.,Rheumatology, University Hospitals Leuven, Leuven, Belgium
| | - Wim A Wuyts
- Laboratory of Respiratory Diseases and Thoracic Surgery (BREATHE), Department of Chronic Diseases and Metabolism, KU Leuven, Leuven, Belgium.,Pulmonology, University Hospitals Leuven, Leuven, Belgium
| |
Collapse
|
24
|
Deelder W, Manko E, Phelan JE, Campino S, Palla L, Clark TG. Geographical classification of malaria parasites through applying machine learning to whole genome sequence data. Sci Rep 2022; 12:21150. [PMID: 36476815 PMCID: PMC9729610 DOI: 10.1038/s41598-022-25568-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Accepted: 12/01/2022] [Indexed: 12/12/2022] Open
Abstract
Malaria, caused by Plasmodium parasites, is a major global health challenge. Whole genome sequencing (WGS) of Plasmodium falciparum and Plasmodium vivax genomes is providing insights into parasite genetic diversity, transmission patterns, and can inform decision making for clinical and surveillance purposes. Advances in sequencing technologies are helping to generate timely and big genomic datasets, with the prospect of applying Artificial Intelligence analytical techniques (e.g., machine learning) to support programmatic malaria control and elimination. Here, we assess the potential of applying deep learning convolutional neural network approaches to predict the geographic origin of infections (continents, countries, GPS locations) using WGS data of P. falciparum (n = 5957; 27 countries) and P. vivax (n = 659; 13 countries) isolates. Using identified high-quality genome-wide single nucleotide polymorphisms (SNPs) (P. falciparum: 750 k, P. vivax: 588 k), an analysis of population structure and ancestry revealed clustering at the country-level. When predicting locations for both species, classification (compared to regression) methods had the lowest distance errors, and > 90% accuracy at a country level. Our work demonstrates the utility of machine learning approaches for geo-classification of malaria parasites. With timelier WGS data generation across more malaria-affected regions, the performance of machine learning approaches for geo-classification will improve, thereby supporting disease control activities.
Collapse
Affiliation(s)
- Wouter Deelder
- London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
- Dalberg Advisors, 7 Rue de Chantepoulet, 1201, Geneva, Switzerland
| | - Emilia Manko
- London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Jody E Phelan
- London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Susana Campino
- London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Luigi Palla
- London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
- Department of Public Health and Infectious Diseases, University of Rome La Sapienza, Rome, Italy
| | - Taane G Clark
- London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK.
| |
Collapse
|
25
|
Sanchez T, Bray EM, Jobic P, Guez J, Letournel AC, Charpiat G, Cury J, Jay F. dnadna: a deep learning framework for population genetics inference. Bioinformatics 2022; 39:6851140. [PMID: 36445000 PMCID: PMC9825738 DOI: 10.1093/bioinformatics/btac765] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 10/30/2022] [Accepted: 11/28/2022] [Indexed: 11/30/2022] Open
Abstract
MOTIVATION We present dnadna, a flexible python-based software for deep learning inference in population genetics. It is task-agnostic and aims at facilitating the development, reproducibility, dissemination and re-usability of neural networks designed for population genetic data. RESULTS dnadna defines multiple user-friendly workflows. First, users can implement new architectures and tasks, while benefiting from dnadna utility functions, training procedure and test environment, which saves time and decreases the likelihood of bugs. Second, the implemented networks can be re-optimized based on user-specified training sets and/or tasks. Newly implemented architectures and pre-trained networks are easily shareable with the community for further benchmarking or other applications. Finally, users can apply pre-trained networks in order to predict evolutionary history from alternative real or simulated genetic datasets, without requiring extensive knowledge in deep learning or coding in general. dnadna comes with a peer-reviewed, exchangeable neural network, allowing demographic inference from SNP data, that can be used directly or retrained to solve other tasks. Toy networks are also available to ease the exploration of the software, and we expect that the range of available architectures will keep expanding thanks to community contributions. AVAILABILITY AND IMPLEMENTATION dnadna is a Python (≥3.7) package, its repository is available at gitlab.com/mlgenetics/dnadna and its associated documentation at mlgenetics.gitlab.io/dnadna/.
Collapse
Affiliation(s)
| | | | - Pierre Jobic
- Université Paris-Saclay, CNRS UMR 9015, INRIA, Laboratoire Interdisciplinaire des Sciences du Numérique, 91400 Orsay, France
- ENS Paris-Saclay, 91190 Gif-sur-Yvette, France
| | - Jérémy Guez
- Université Paris-Saclay, CNRS UMR 9015, INRIA, Laboratoire Interdisciplinaire des Sciences du Numérique, 91400 Orsay, France
- UMR7206 Eco-Anthropologie, Muséum National d’Histoire Naturelle, CNRS, Université de Paris, 75016 Paris, France
| | - Anne-Catherine Letournel
- Université Paris-Saclay, CNRS UMR 9015, INRIA, Laboratoire Interdisciplinaire des Sciences du Numérique, 91400 Orsay, France
| | - Guillaume Charpiat
- Université Paris-Saclay, CNRS UMR 9015, INRIA, Laboratoire Interdisciplinaire des Sciences du Numérique, 91400 Orsay, France
| | - Jean Cury
- To whom correspondence should be addressed. or
| | - Flora Jay
- To whom correspondence should be addressed. or
| |
Collapse
|
26
|
Gretzinger J, Sayer D, Justeau P, Altena E, Pala M, Dulias K, Edwards CJ, Jodoin S, Lacher L, Sabin S, Vågene ÅJ, Haak W, Ebenesersdóttir SS, Moore KHS, Radzeviciute R, Schmidt K, Brace S, Bager MA, Patterson N, Papac L, Broomandkhoshbacht N, Callan K, Harney É, Iliev L, Lawson AM, Michel M, Stewardson K, Zalzala F, Rohland N, Kappelhoff-Beckmann S, Both F, Winger D, Neumann D, Saalow L, Krabath S, Beckett S, Van Twest M, Faulkner N, Read C, Barton T, Caruth J, Hines J, Krause-Kyora B, Warnke U, Schuenemann VJ, Barnes I, Dahlström H, Clausen JJ, Richardson A, Popescu E, Dodwell N, Ladd S, Phillips T, Mortimer R, Sayer F, Swales D, Stewart A, Powlesland D, Kenyon R, Ladle L, Peek C, Grefen-Peters S, Ponce P, Daniels R, Spall C, Woolcock J, Jones AM, Roberts AV, Symmons R, Rawden AC, Cooper A, Bos KI, Booth T, Schroeder H, Thomas MG, Helgason A, Richards MB, Reich D, Krause J, Schiffels S. The Anglo-Saxon migration and the formation of the early English gene pool. Nature 2022; 610:112-119. [PMID: 36131019 PMCID: PMC9534755 DOI: 10.1038/s41586-022-05247-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Accepted: 08/17/2022] [Indexed: 11/09/2022]
Abstract
The history of the British Isles and Ireland is characterized by multiple periods of major cultural change, including the influential transformation after the end of Roman rule, which precipitated shifts in language, settlement patterns and material culture1. The extent to which migration from continental Europe mediated these transitions is a matter of long-standing debate2-4. Here we study genome-wide ancient DNA from 460 medieval northwestern Europeans-including 278 individuals from England-alongside archaeological data, to infer contemporary population dynamics. We identify a substantial increase of continental northern European ancestry in early medieval England, which is closely related to the early medieval and present-day inhabitants of Germany and Denmark, implying large-scale substantial migration across the North Sea into Britain during the Early Middle Ages. As a result, the individuals who we analysed from eastern England derived up to 76% of their ancestry from the continental North Sea zone, albeit with substantial regional variation and heterogeneity within sites. We show that women with immigrant ancestry were more often furnished with grave goods than women with local ancestry, whereas men with weapons were as likely not to be of immigrant ancestry. A comparison with present-day Britain indicates that subsequent demographic events reduced the fraction of continental northern European ancestry while introducing further ancestry components into the English gene pool, including substantial southwestern European ancestry most closely related to that seen in Iron Age France5,6.
Collapse
Affiliation(s)
- Joscha Gretzinger
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | | | | | | | - Maria Pala
- University of Huddersfield, Huddersfield, UK
| | - Katharina Dulias
- University of Huddersfield, Huddersfield, UK
- Institute of Geosystems and Bioindication, Technische Universität Braunschweig, Braunschweig, Germany
| | - Ceiridwen J Edwards
- University of Huddersfield, Huddersfield, UK
- University of Oxford, Oxford, UK
| | | | - Laura Lacher
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Susanna Sabin
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Åshild J Vågene
- Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Wolfgang Haak
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - S Sunna Ebenesersdóttir
- deCODE Genetics/AMGEN Inc., Reykjavík, Iceland
- Department of Anthropology, School of Social Sciences, University of Iceland, Reykjavík, Iceland
| | | | - Rita Radzeviciute
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | | | - Selina Brace
- Department of Earth Sciences, Natural History Museum, London, UK
| | - Martina Abenhus Bager
- Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Nick Patterson
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Luka Papac
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Nasreen Broomandkhoshbacht
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Kimberly Callan
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Éadaoin Harney
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Lora Iliev
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Ann Marie Lawson
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Megan Michel
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Kristin Stewardson
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Fatma Zalzala
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Nadin Rohland
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | | | - Frank Both
- Landesmuseum Natur und Mensch, Oldenburg, Germany
| | | | | | - Lars Saalow
- Landesamt für Kultur und Denkmalpflege Mecklenburg-Vorpommern, Schwerin, Germany
| | - Stefan Krabath
- Institute for Historical Coastal Research (NIhK), Wilhelmshaven, Germany
| | - Sophie Beckett
- Sedgeford Historical and Archaeological Research Project, Sedgeford, UK
- Cranfield Forensic Institute, Cranfield Defence and Security, Cranfield University, Cranfield, UK
- Melbourne Dental School, University of Melbourne, Melbourne, Victoria, Australia
| | - Melanie Van Twest
- Sedgeford Historical and Archaeological Research Project, Sedgeford, UK
| | - Neil Faulkner
- Sedgeford Historical and Archaeological Research Project, Sedgeford, UK
| | - Chris Read
- The Atlantic Technological University, Sligo, Ireland
| | | | | | | | | | | | - Verena J Schuenemann
- University of Zurich, Zurich, Switzerland
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria
- Human Evolution and Archaeological Sciences, University of Vienna, Vienna, Austria
| | - Ian Barnes
- Department of Earth Sciences, Natural History Museum, London, UK
| | | | | | - Andrew Richardson
- Canterbury Archaeological Trust, Canterbury, UK
- Isle Heritage CIC, Sandgate, UK
| | | | | | | | | | - Richard Mortimer
- Oxford Archaeology East, Cambridge, UK
- Cotswold Archaeology, Needham Market, UK
| | - Faye Sayer
- University of Birmingham, Birmingham, UK
| | - Diana Swales
- Centre for Anatomy and Human Identification (CAHID), University of Dundee, Dundee, UK
| | | | | | - Robert Kenyon
- East Dorset Antiquarian Society (EDAS), West Bexington, UK
| | - Lilian Ladle
- Department of Archaeology and Anthropology, Bournemouth University, Poole, UK
| | - Christina Peek
- Institute for Historical Coastal Research (NIhK), Wilhelmshaven, Germany
| | | | | | | | | | | | | | | | | | - Anooshka C Rawden
- Fishbourne Roman Palace, Fishbourne, UK
- South Downs Centre, Midhurst, UK
| | - Alan Cooper
- BlueSkyGenetics, Adelaide, South Australia, Australia
| | - Kirsten I Bos
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | | | - Hannes Schroeder
- Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | | - Agnar Helgason
- deCODE Genetics/AMGEN Inc., Reykjavík, Iceland
- Department of Anthropology, School of Social Sciences, University of Iceland, Reykjavík, Iceland
| | | | - David Reich
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Johannes Krause
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Stephan Schiffels
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.
| |
Collapse
|
27
|
Nikolakis ZL, Adams RH, Wade KJ, Lund AJ, Carlton EJ, Castoe TA, Pollock DD. Prospects for genomic surveillance for selection in schistosome parasites. FRONTIERS IN EPIDEMIOLOGY 2022; 2:932021. [PMID: 38455290 PMCID: PMC10910990 DOI: 10.3389/fepid.2022.932021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 09/12/2022] [Indexed: 03/09/2024]
Abstract
Schistosomiasis is a neglected tropical disease caused by multiple parasitic Schistosoma species, and which impacts over 200 million people globally, mainly in low- and middle-income countries. Genomic surveillance to detect evidence for natural selection in schistosome populations represents an emerging and promising approach to identify and interpret schistosome responses to ongoing control efforts or other environmental factors. Here we review how genomic variation is used to detect selection, how these approaches have been applied to schistosomes, and how future studies to detect selection may be improved. We discuss the theory of genomic analyses to detect selection, identify experimental designs for such analyses, and review studies that have applied these approaches to schistosomes. We then consider the biological characteristics of schistosomes that are expected to respond to selection, particularly those that may be impacted by control programs. Examples include drug resistance, host specificity, and life history traits, and we review our current understanding of specific genes that underlie them in schistosomes. We also discuss how inherent features of schistosome reproduction and demography pose substantial challenges for effective identification of these traits and their genomic bases. We conclude by discussing how genomic surveillance for selection should be designed to improve understanding of schistosome biology, and how the parasite changes in response to selection.
Collapse
Affiliation(s)
- Zachary L. Nikolakis
- Department of Biology, University of Texas at Arlington, Arlington, TX, United States
| | - Richard H. Adams
- Department of Biological and Environmental Sciences, Georgia College and State University, Milledgeville, GA, United States
| | - Kristen J. Wade
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO, United States
| | - Andrea J. Lund
- Department of Environmental and Occupational Health, Colorado School of Public Health, University of Colorado, Anschutz, Aurora, CO, United States
| | - Elizabeth J. Carlton
- Department of Environmental and Occupational Health, Colorado School of Public Health, University of Colorado, Anschutz, Aurora, CO, United States
| | - Todd A. Castoe
- Department of Biology, University of Texas at Arlington, Arlington, TX, United States
| | - David D. Pollock
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO, United States
| |
Collapse
|
28
|
Qin X, Chiang CWK, Gaggiotti OE. Deciphering signatures of natural selection via deep learning. Brief Bioinform 2022; 23:6686736. [PMID: 36056746 PMCID: PMC9487700 DOI: 10.1093/bib/bbac354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 07/11/2022] [Accepted: 07/28/2022] [Indexed: 11/12/2022] Open
Abstract
Identifying genomic regions influenced by natural selection provides fundamental insights into the genetic basis of local adaptation. However, it remains challenging to detect loci under complex spatially varying selection. We propose a deep learning-based framework, DeepGenomeScan, which can detect signatures of spatially varying selection. We demonstrate that DeepGenomeScan outperformed principal component analysis- and redundancy analysis-based genome scans in identifying loci underlying quantitative traits subject to complex spatial patterns of selection. Noticeably, DeepGenomeScan increases statistical power by up to 47.25% under nonlinear environmental selection patterns. We applied DeepGenomeScan to a European human genetic dataset and identified some well-known genes under selection and a substantial number of clinically important genes that were not identified by SPA, iHS, Fst and Bayenv when applied to the same dataset.
Collapse
Affiliation(s)
- Xinghu Qin
- Centre for Biological Diversity, Sir Harold Mitchell Building, University of St Andrews, Fife, KY16 9TF, UK
| | - Charleston W K Chiang
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine & Department of Quantitative and Computational Biology, University of Southern California, USA
| | - Oscar E Gaggiotti
- Centre for Biological Diversity, Sir Harold Mitchell Building, University of St Andrews, Fife, KY16 9TF, UK
| |
Collapse
|
29
|
Gloria-Soria A, Faraji A, Hamik J, White G, Amsberry S, Donahue M, Buss B, Pless E, Cosme LV, Powell JR. Origins of high latitude introductions of Aedes aegypti to Nebraska and Utah during 2019. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2022; 103:105333. [PMID: 35817397 DOI: 10.1016/j.meegid.2022.105333] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 06/27/2022] [Accepted: 07/05/2022] [Indexed: 06/15/2023]
Abstract
Aedes aegypti (L.), the yellow fever mosquito, is also an important vector of dengue and Zika viruses, and an invasive species in North America. Aedes aegypti inhabits tropical and sub-tropical areas of the world and in North America is primarily distributed throughout the southern US states and Mexico. The northern range of Ae. aegypti is limited by cold winter months and establishment in these areas has been mostly unsuccessful. However, frequent introductions of Ae. aegypti to temperate, non-endemic areas during the warmer months can lead to seasonal activity and disease outbreaks. Two Ae. aegypti incursions were reported in the late summer of 2019 into York, Nebraska and Moab, Utah. These states had no history of established populations of this mosquito and no evidence of previous seasonal activity. We genotyped a subset of individuals from each location at 12 microsatellite loci and ~ 14,000 single nucleotide polymorphic markers to determine their genetic affinities to other populations worldwide and investigate their potential source of introduction. Our results support a single origin for each of the introductions from different sources. Aedes aegypti from Utah likely derived from Tucson, Arizona, or a nearby location. Nebraska specimen results were not as conclusive, but point to an origin from southcentral or southeastern US. In addition to an effective, efficient, and sustainable control of invasive mosquitoes, such as Ae. aegypti, identifying the potential routes of introduction will be key to prevent future incursions and assess their potential health threat based on the ability of the source population to transmit a particular virus and its insecticide resistance profile, which may complicate vector control.
Collapse
Affiliation(s)
- Andrea Gloria-Soria
- Department of Entomology, Center for Vector Biology & Zoonotic Diseases, The Connecticut Agricultural Experiment Station, 123 Huntington Street, P.O. Box 1106, New Haven, CT 06511, USA; Yale University, Department of Ecology and Evolutionary Biology, 21 Sachem Street, New Haven, CT 06511, USA.
| | - Ary Faraji
- Salt Lake City Mosquito Abatement District, 2215 North 2200 West, Salt Lake City, UT 84116-1108, USA.
| | - Jeff Hamik
- Nebraska Department of Health and Human Services, Epidemiology and Informatics Unit, 301 Centennial Mall South, Lincoln, NE 68509, USA; University of Nebraska-Lincoln, Department of Educational Psychology, 114 Teachers College Hall, Lincoln, NE 68588, USA.
| | - Gregory White
- Salt Lake City Mosquito Abatement District, 2215 North 2200 West, Salt Lake City, UT 84116-1108, USA.
| | - Shanon Amsberry
- Moab Mosquito Abatement District, 1000 Sand Flats Rd, Moab, UT 84532, USA.
| | - Matthew Donahue
- Nebraska Department of Health and Human Services, Epidemiology and Informatics Unit, 301 Centennial Mall South, Lincoln, NE 68509, USA; Epidemic Intelligence Service, CDC, USA.
| | - Bryan Buss
- Nebraska Department of Health and Human Services, Epidemiology and Informatics Unit, 301 Centennial Mall South, Lincoln, NE 68509, USA; Career Epidemiology Field Officer Program, Division of State and Local Readiness, Center for Preparedness and Response, CDC, USA.
| | | | - Luciano Veiga Cosme
- Yale University, Department of Ecology and Evolutionary Biology, 21 Sachem Street, New Haven, CT 06511, USA.
| | - Jeffrey R Powell
- Yale University, Department of Ecology and Evolutionary Biology, 21 Sachem Street, New Haven, CT 06511, USA.
| |
Collapse
|
30
|
Qin X, Chiang CWK, Gaggiotti OE. KLFDAPC: a supervised machine learning approach for spatial genetic structure analysis. Brief Bioinform 2022; 23:6596986. [PMID: 35649387 PMCID: PMC9294434 DOI: 10.1093/bib/bbac202] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 04/05/2022] [Accepted: 04/29/2022] [Indexed: 12/30/2022] Open
Abstract
Geographic patterns of human genetic variation provide important insights into human evolution and disease. A commonly used tool to detect and describe them is principal component analysis (PCA) or the supervised linear discriminant analysis of principal components (DAPC). However, genetic features produced from both approaches could fail to correctly characterize population structure for complex scenarios involving admixture. In this study, we introduce Kernel Local Fisher Discriminant Analysis of Principal Components (KLFDAPC), a supervised non-linear approach for inferring individual geographic genetic structure that could rectify the limitations of these approaches by preserving the multimodal space of samples. We tested the power of KLFDAPC to infer population structure and to predict individual geographic origin using neural networks. Simulation results showed that KLFDAPC has higher discriminatory power than PCA and DAPC. The application of our method to empirical European and East Asian genome-wide genetic datasets indicated that the first two reduced features of KLFDAPC correctly recapitulated the geography of individuals and significantly improved the accuracy of predicting individual geographic origin when compared to PCA and DAPC. Therefore, KLFDAPC can be useful for geographic ancestry inference, design of genome scans and correction for spatial stratification in GWAS that link genes to adaptation or disease susceptibility.
Collapse
Affiliation(s)
- Xinghu Qin
- Centre for Biological Diversity, Sir Harold Mitchell Building, University of St Andrews, Fife, KY16 9TF, UK
| | - Charleston W K Chiang
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine & Department of Quantitative and Computational Biology, University of Southern California, USA
| | - Oscar E Gaggiotti
- Centre for Biological Diversity, Sir Harold Mitchell Building, University of St Andrews, Fife, KY16 9TF, UK
| |
Collapse
|
31
|
Borowiec ML, Dikow RB, Frandsen PB, McKeeken A, Valentini G, White AE. Deep learning as a tool for ecology and evolution. Methods Ecol Evol 2022. [DOI: 10.1111/2041-210x.13901] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Marek L. Borowiec
- Entomology, Plant Pathology and Nematology University of Idaho Moscow ID USA
- Institute for Bioinformatics and Evolutionary Studies (IBEST) University of Idaho Moscow ID USA
| | - Rebecca B. Dikow
- Data Science Lab, Office of the Chief Information Officer Smithsonian Institution Washington DC USA
| | - Paul B. Frandsen
- Data Science Lab, Office of the Chief Information Officer Smithsonian Institution Washington DC USA
- Department of Plant and Wildlife Sciences Brigham Young University Provo UT USA
| | - Alexander McKeeken
- Entomology, Plant Pathology and Nematology University of Idaho Moscow ID USA
| | | | - Alexander E. White
- Data Science Lab, Office of the Chief Information Officer Smithsonian Institution Washington DC USA
- Department of Botany, National Museum of Natural History Smithsonian Institution Washington DC USA
| |
Collapse
|
32
|
|
33
|
Hübner S, Sisou D, Mandel T, Todesco M, Matzrafi M, Eizenberg H. Wild sunflower goes viral: citizen science and comparative genomics allow tracking the origin and establishment of invasive sunflower in the Levant. Mol Ecol 2022; 31:2061-2072. [PMID: 35106854 PMCID: PMC9542508 DOI: 10.1111/mec.16380] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 01/14/2022] [Accepted: 01/25/2022] [Indexed: 11/28/2022]
Abstract
Globalization and intensified volume of trade and transport around the world are accelerating the rate of biological invasions. It is therefore increasingly important to understand the processes through which invasive species colonize new habitats, often to the detriment of native flora. The initial steps of an invasion are particularly critical, as the introduced species relies on limited genetic diversity to adapt to a new environment. However, our understanding of this critical stage of the invasion is currently limited. We used a citizen science approach and social media to survey the distribution of invasive sunflower in Israel. We then sampled and sequenced a representative collection and compared it with available genomic data sets of North American wild sunflower, landraces and cultivars. We show that invasive wild sunflower is rapidly establishing throughout Israel, probably from a single, recent introduction from Texas, while maintaining high genetic diversity through ongoing gene flow. Since its introduction, invasive sunflower has spread quickly to most regions, and differentiation was detected despite extensive gene flow between clusters. Our findings suggest that rapid spread followed by continuous gene flow between diverging populations can serve as an efficient mechanism for maintaining sufficient genetic diversity at the early stages of invasion, promoting rapid adaptation and establishment in the new territory.
Collapse
Affiliation(s)
- Sariel Hübner
- Galilee Research Institute (MIGAL), Tel-Hai Academic College, Upper Galilee, 11016, Israel
| | - Dana Sisou
- Galilee Research Institute (MIGAL), Tel-Hai Academic College, Upper Galilee, 11016, Israel.,Department of Phytopathology and Weed Research, Agricultural Research Organization, Newe Ya'ar Research Center, Ramat Yishay, Israel.,The Robert H. Smith Institute of Plant Sciences and Genetics, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Tali Mandel
- Galilee Research Institute (MIGAL), Tel-Hai Academic College, Upper Galilee, 11016, Israel
| | - Marco Todesco
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Maor Matzrafi
- Department of Phytopathology and Weed Research, Agricultural Research Organization, Newe Ya'ar Research Center, Ramat Yishay, Israel
| | - Hanan Eizenberg
- Department of Phytopathology and Weed Research, Agricultural Research Organization, Newe Ya'ar Research Center, Ramat Yishay, Israel
| |
Collapse
|
34
|
Cronn RC, Finch KN, Hauck LL, Parker-Forney M, Milligan BG, Dowling J, Scientists A. Range-wide assessment of a SNP panel for individualization and geolocalization of bigleaf maple (Acer macrophyllum Pursh). FORENSIC SCIENCE INTERNATIONAL: ANIMALS AND ENVIRONMENTS 2021. [DOI: 10.1016/j.fsiae.2021.100033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
35
|
Yang B, Zhang Z, Yang C, Wang Y, Orr MC, Hongbin W, Zhang AB. Identification of Species by Combining Molecular and Morphological Data Using Convolutional Neural Networks. Syst Biol 2021; 71:690-705. [PMID: 34524452 DOI: 10.1093/sysbio/syab076] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Accepted: 09/08/2021] [Indexed: 11/14/2022] Open
Abstract
Integrative taxonomy is central to modern taxonomy and systematic biology, including behaviour, niche preference, distribution, morphological analysis and DNA barcoding. However, decades of use demonstrate that these methods can face challenges when used in isolation, for instance, potential misidentifications due to phenotypic plasticity for morphological methods, and incorrect identifications because of introgression, incomplete lineage sorting and horizontal gene transfer for DNA barcoding. Although researchers have advocated the use of integrative taxonomy, few detailed algorithms have been proposed. Here, we develop a convolutional neural network method (morphology-molecule network (MMNet)) that integrates morphological and molecular data for species identification. The newly proposed method (MMNet) worked better than four currently-available alternative methods when tested with 10 independent datasets representing varying genetic diversity from different taxa. High accuracies were achieved for all groups, including beetles (98.1% of 123 species), butterflies (98.8% of 24 species), fishes (96.3% of 214 species) and moths (96.4% of 150 total species). Further, MMNet demonstrated a high degree of accuracy (>98%) in four datasets including closely related species from the same genus. The average accuracy of two modest sub-genomic (single nucleotide polymorphism) datasets, comprising eight putative subspecies respectively, is 90%. Additional tests show that the success rate of species identification under this method most strongly depends on the amount of training data, and is robust to sequence length and image size. Analyses on the contribution of different data types (image versus gene) indicate that both morphological and genetic data are important to the model, and that genetic data contribute slightly more. The approaches developed here serve as a foundation for the future integration of multi-modal information for integrative taxonomy, such as image, audio, video, 3D scanning and biosensor data, to characterize organisms more comprehensively as a basis for improved investigation, monitoring and conservation of biodiversity.
Collapse
Affiliation(s)
- Bing Yang
- College of Life Sciences, Capital Normal University, Beijing 100048, People's Republic of China
| | - Zhenxin Zhang
- The Key Laboratory of 3D Information Acquisition and Application, MOE, Capital Normal University, Beijing 100048, People's Republic of China.,Beijing Laboratory of Water Resources Security, Capital Normal University, Beijing 100048, People's Republic of China.,Base of the State Key Laboratory of Urban Environmental Process and Digital, Capital Normal University, Beijing 100048, People's Republic of China
| | - Caiqing Yang
- College of Life Sciences, Capital Normal University, Beijing 100048, People's Republic of China
| | - Ying Wang
- College of Life Sciences, Capital Normal University, Beijing 100048, People's Republic of China
| | - Michael C Orr
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, People's Republic of China
| | - Wang Hongbin
- Museum of Forest Biodiversity, Research Institute of Forest Ecology, Environment and Protection, Chinese Academy of Forestry, Beijing 100091, People's Republic of China
| | - Ai-Bing Zhang
- College of Life Sciences, Capital Normal University, Beijing 100048, People's Republic of China
| |
Collapse
|
36
|
Agranat-Tamir L, Waldman S, Rosen N, Yakir B, Carmi S, Carmel L. LINADMIX: Evaluating the effect of ancient admixture events on modern populations. Bioinformatics 2021; 37:4744-4755. [PMID: 34270685 DOI: 10.1093/bioinformatics/btab531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 06/25/2021] [Accepted: 07/15/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The rise in the number of genotyped ancient individuals provides an opportunity to estimate population admixture models for many populations. However, in models describing modern populations as mixtures of ancient ones, it is typically difficult to estimate the model mixing coefficients and to evaluate its fit to the data. RESULTS We present LINADMIX, designed to tackle this problem by solving a constrained linear model when both the ancient and the modern genotypes are represented in a low-dimensional space. LINADMIX estimates the mixing coefficients and their standard errors, and computes a p-value for testing the model fit to the data. We quantified the performance of LINADMIX using an extensive set of simulated studies. We show that LINADMIX can accurately estimate admixture coefficients, and is robust to factors such as population size, genetic drift, proportion of missing data, and various types of model misspecification. AVAILABILITY LINADMIX is available as a python code at https://github.com/swidler/linadmix. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lily Agranat-Tamir
- Department of Genetics, The Alexander Silberman Institute of Life Sciences, Faculty of Science, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, Jerusalem 9190401, Israel.,Department of Statistics and Data Science, The Hebrew University of Jerusalem, Jerusalem 9190501, Israel
| | - Shamam Waldman
- Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Naomi Rosen
- Department of Genetics, The Alexander Silberman Institute of Life Sciences, Faculty of Science, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, Jerusalem 9190401, Israel
| | - Benjamin Yakir
- Department of Statistics and Data Science, The Hebrew University of Jerusalem, Jerusalem 9190501, Israel
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Liran Carmel
- Department of Genetics, The Alexander Silberman Institute of Life Sciences, Faculty of Science, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, Jerusalem 9190401, Israel
| |
Collapse
|
37
|
Gibson MJ, Torres MDL, Brandvain Y, Moyle LC. Introgression shapes fruit color convergence in invasive Galápagos tomato. eLife 2021; 10:64165. [PMID: 34165082 PMCID: PMC8294854 DOI: 10.7554/elife.64165] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Accepted: 06/23/2021] [Indexed: 12/17/2022] Open
Abstract
Invasive species represent one of the foremost risks to global biodiversity. Here, we use population genomics to evaluate the history and consequences of an invasion of wild tomato-Solanum pimpinellifolium-onto the Galápagos Islands from continental South America. Using >300 archipelago and mainland collections, we infer this invasion was recent and largely the result of a single event from central Ecuador. Patterns of ancestry within the genomes of invasive plants also reveal post-colonization hybridization and introgression between S. pimpinellifolium and the closely related Galápagos endemic Solanum cheesmaniae. Of admixed invasive individuals, those that carry endemic alleles at one of two different carotenoid biosynthesis loci also have orange fruits-characteristic of the endemic species-instead of typical red S. pimpinellifolium fruits. We infer that introgression of two independent fruit color loci explains this observed trait convergence, suggesting that selection has favored repeated transitions of red to orange fruits on the Galápagos.
Collapse
Affiliation(s)
- Matthew Js Gibson
- Department of Biology, Indiana University, Bloomington, United States
| | - María de Lourdes Torres
- Universidad San Francisco de Quito (USFQ). Colegio de Ciencias Biológicas y Ambientales, Laboratorio de Biotecnología Vegetal. Campus Cumbayá, Quito, Ecuador.,Galapagos Science Center, Universidad San Francisco de Quito and University of North Carolina at Chapel Hill, Galapagos, Ecuador
| | - Yaniv Brandvain
- Department of Plant Biology, University of Minnesota-Twin Cities, St. Paul, United States
| | - Leonie C Moyle
- Department of Biology, Indiana University, Bloomington, United States
| |
Collapse
|
38
|
Chafin TK, Zbinden ZD, Douglas MR, Martin BT, Middaugh CR, Gray MC, Ballard JR, Douglas ME. Spatial population genetics in heavily managed species: Separating patterns of historical translocation from contemporary gene flow in white-tailed deer. Evol Appl 2021; 14:1673-1689. [PMID: 34178112 PMCID: PMC8210790 DOI: 10.1111/eva.13233] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Accepted: 03/10/2021] [Indexed: 01/16/2023] Open
Abstract
Approximately 100 years ago, unregulated harvest nearly eliminated white-tailed deer (Odocoileus virginianus) from eastern North America, which subsequently served to catalyze wildlife management as a national priority. An extensive stock-replenishment effort soon followed, with deer broadly translocated among states as a means of re-establishment. However, an unintended consequence was that natural patterns of gene flow became obscured and pretranslocation signatures of population structure were replaced. We applied cutting-edge molecular and biogeographic tools to disentangle genetic signatures of historical management from those reflecting spatially heterogeneous dispersal by evaluating 35,099 single nucleotide polymorphisms (SNPs) derived via reduced-representation genomic sequencing from 1143 deer sampled statewide in Arkansas. We then employed Simpson's diversity index to summarize ancestry assignments and visualize spatial genetic transitions. Using sub-sampled transects across these transitions, we tested clinal patterns across loci against theoretical expectations of their response under scenarios of re-colonization and restricted dispersal. Two salient results emerged: (A) Genetic signatures from historic translocations are demonstrably apparent; and (B) Geographic filters (major rivers; urban centers; highways) now act as inflection points for the distribution of this contemporary ancestry. These results yielded a statewide assessment of contemporary population structure in deer as driven by historic translocations as well as ongoing processes. In addition, the analytical framework employed herein to effectively decipher extant/historic drivers of deer distribution in Arkansas is also applicable for other biodiversity elements with similarly complex demographic histories.
Collapse
Affiliation(s)
- Tyler K. Chafin
- Department of Biological SciencesUniversity of ArkansasFayettevilleARUSA
- Present address:
Department of Ecology and Evolutionary BiologyUniversity of ColoradoBoulderCOUSA
| | - Zachery D. Zbinden
- Department of Biological SciencesUniversity of ArkansasFayettevilleARUSA
| | - Marlis R. Douglas
- Department of Biological SciencesUniversity of ArkansasFayettevilleARUSA
| | - Bradley T. Martin
- Department of Biological SciencesUniversity of ArkansasFayettevilleARUSA
| | | | - M. Cory Gray
- Research DivisionArkansas Game and Fish CommissionLittle RockARUSA
| | | | - Michael E. Douglas
- Department of Biological SciencesUniversity of ArkansasFayettevilleARUSA
| |
Collapse
|
39
|
Roberts Kingman GA, Vyas DN, Jones FC, Brady SD, Chen HI, Reid K, Milhaven M, Bertino TS, Aguirre WE, Heins DC, von Hippel FA, Park PJ, Kirch M, Absher DM, Myers RM, Di Palma F, Bell MA, Kingsley DM, Veeramah KR. Predicting future from past: The genomic basis of recurrent and rapid stickleback evolution. SCIENCE ADVANCES 2021; 7:7/25/eabg5285. [PMID: 34144992 PMCID: PMC8213234 DOI: 10.1126/sciadv.abg5285] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 05/05/2021] [Indexed: 05/30/2023]
Abstract
Similar forms often evolve repeatedly in nature, raising long-standing questions about the underlying mechanisms. Here, we use repeated evolution in stickleback to identify a large set of genomic loci that change recurrently during colonization of freshwater habitats by marine fish. The same loci used repeatedly in extant populations also show rapid allele frequency changes when new freshwater populations are experimentally established from marine ancestors. Marked genotypic and phenotypic changes arise within 5 years, facilitated by standing genetic variation and linkage between adaptive regions. Both the speed and location of changes can be predicted using empirical observations of recurrence in natural populations or fundamental genomic features like allelic age, recombination rates, density of divergent loci, and overlap with mapped traits. A composite model trained on these stickleback features can also predict the location of key evolutionary loci in Darwin's finches, suggesting that similar features are important for evolution across diverse taxa.
Collapse
Affiliation(s)
- Garrett A Roberts Kingman
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305-5329, USA
| | - Deven N Vyas
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY 11794-5245, USA
| | - Felicity C Jones
- Friedrich Miescher Laboratory of the Max Planck Society, Max-Planck-Ring, Tübingen, Germany
| | - Shannon D Brady
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305-5329, USA
| | - Heidi I Chen
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305-5329, USA
| | - Kerry Reid
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY 11794-5245, USA
| | - Mark Milhaven
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY 11794-5245, USA
- School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA
| | - Thomas S Bertino
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY 11794-5245, USA
| | - Windsor E Aguirre
- Department of Biological Sciences, DePaul University, Chicago, IL 60614-3207, USA
| | - David C Heins
- Department of Ecology and Evolutionary Biology, Tulane University, New Orleans, LA 70118, USA
| | - Frank A von Hippel
- Department of Community, Environment and Policy, Mel & Enid Zuckerman College of Public Health, University of Arizona, Tucson, AZ 85724, USA
| | - Peter J Park
- Department of Biology, Farmingdale State College, Farmingdale, NY 11735-1021, USA
| | - Melanie Kirch
- Friedrich Miescher Laboratory of the Max Planck Society, Max-Planck-Ring, Tübingen, Germany
| | - Devin M Absher
- HudsonAlpha Institute for Biotechnology, 601 Genome Way, Huntsville, AL 35806, USA
| | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, 601 Genome Way, Huntsville, AL 35806, USA
| | - Federica Di Palma
- Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA
| | - Michael A Bell
- University of California Museum of Paleontology, University of California, Berkeley, Berkeley, CA 94720, USA.
| | - David M Kingsley
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305-5329, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Krishna R Veeramah
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY 11794-5245, USA.
| |
Collapse
|
40
|
North HL, McGaughran A, Jiggins CD. Insights into invasive species from whole-genome resequencing. Mol Ecol 2021; 30:6289-6308. [PMID: 34041794 DOI: 10.1111/mec.15999] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 03/12/2021] [Accepted: 04/30/2021] [Indexed: 12/12/2022]
Abstract
Studies of invasive species can simultaneously inform management strategies and quantify rapid evolution in the wild. The role of genomics in invasion science is increasingly recognised, and the growing availability of reference genomes for invasive species is paving the way for whole-genome resequencing studies in a wide range of systems. Here, we survey the literature to assess the application of whole-genome resequencing data in invasion biology. For some applications, such as the reconstruction of invasion routes in time and space, sequencing the whole genome of many individuals can increase the accuracy of existing methods. In other cases, population genomic approaches such as haplotype analysis can permit entirely new questions to be addressed and new technologies applied. To date whole-genome resequencing has only been used in a handful of invasive systems, but these studies have confirmed the importance of processes such as balancing selection and hybridization in allowing invasive species to reuse existing adaptations and rapidly overcome the challenges of a foreign ecosystem. The use of genomic data does not constitute a paradigm shift per se, but by leveraging new theory, tools, and technologies, population genomics can provide unprecedented insight into basic and applied aspects of invasion science.
Collapse
Affiliation(s)
- Henry L North
- Department of Zoology, University of Cambridge, Cambridge, UK
| | - Angela McGaughran
- Te Aka Mātuatua/School of Science, University of Waikato, Hamilton, New Zealand
| | - Chris D Jiggins
- Department of Zoology, University of Cambridge, Cambridge, UK
| |
Collapse
|
41
|
Xue AT, Schrider DR, Kern AD. Discovery of Ongoing Selective Sweeps within Anopheles Mosquito Populations Using Deep Learning. Mol Biol Evol 2021; 38:1168-1183. [PMID: 33022051 PMCID: PMC7947845 DOI: 10.1093/molbev/msaa259] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Identification of partial sweeps, which include both hard and soft sweeps that have not currently reached fixation, provides crucial information about ongoing evolutionary responses. To this end, we introduce partialS/HIC, a deep learning method to discover selective sweeps from population genomic data. partialS/HIC uses a convolutional neural network for image processing, which is trained with a large suite of summary statistics derived from coalescent simulations incorporating population-specific history, to distinguish between completed versus partial sweeps, hard versus soft sweeps, and regions directly affected by selection versus those merely linked to nearby selective sweeps. We perform several simulation experiments under various demographic scenarios to demonstrate partialS/HIC's performance, which exhibits excellent resolution for detecting partial sweeps. We also apply our classifier to whole genomes from eight mosquito populations sampled across sub-Saharan Africa by the Anopheles gambiae 1000 Genomes Consortium, elucidating both continent-wide patterns as well as sweeps unique to specific geographic regions. These populations have experienced intense insecticide exposure over the past two decades, and we observe a strong overrepresentation of sweeps at insecticide resistance loci. Our analysis thus provides a list of candidate adaptive loci that may be relevant to mosquito control efforts. More broadly, our supervised machine learning approach introduces a method to distinguish between completed and partial sweeps, as well as between hard and soft sweeps, under a variety of demographic scenarios. As whole-genome data rapidly accumulate for a greater diversity of organisms, partialS/HIC addresses an increasing demand for useful selection scan tools that can track in-progress evolutionary dynamics.
Collapse
Affiliation(s)
- Alexander T Xue
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC
| | - Andrew D Kern
- Institute of Ecology and Evolution, 5289 University of Oregon, Eugene, OR
| |
Collapse
|
42
|
Schmidt TL, Swan T, Chung J, Karl S, Demok S, Yang Q, Field MA, Muzari MO, Ehlers G, Brugh M, Bellwood R, Horne P, Burkot TR, Ritchie S, Hoffmann AA. Spatial population genomics of a recent mosquito invasion. Mol Ecol 2021; 30:1174-1189. [PMID: 33421231 DOI: 10.1111/mec.15792] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 12/20/2020] [Accepted: 01/04/2021] [Indexed: 02/06/2023]
Abstract
Population genomic approaches can characterize dispersal across a single generation through to many generations in the past, bridging the gap between individual movement and intergenerational gene flow. These approaches are particularly useful when investigating dispersal in recently altered systems, where they provide a way of inferring long-distance dispersal between newly established populations and their interactions with existing populations. Human-mediated biological invasions represent such altered systems which can be investigated with appropriate study designs and analyses. Here we apply temporally restricted sampling and a range of population genomic approaches to investigate dispersal in a 2004 invasion of Aedes albopictus (the Asian tiger mosquito) in the Torres Strait Islands (TSI) of Australia. We sampled mosquitoes from 13 TSI villages simultaneously and genotyped 373 mosquitoes at genome-wide single nucleotide polymorphisms (SNPs): 331 from the TSI, 36 from Papua New Guinea (PNG) and four incursive mosquitoes detected in uninvaded regions. Within villages, spatial genetic structure varied substantially but overall displayed isolation by distance and a neighbourhood size of 232-577. Close kin dyads revealed recent movement between islands 31-203 km apart, and deep learning inferences showed incursive Ae. albopictus had travelled to uninvaded regions from both adjacent and nonadjacent islands. Private alleles and a co-ancestry matrix indicated direct gene flow from PNG into nearby islands. Outlier analyses also detected four linked alleles introgressed from PNG, with the alleles surrounding 12 resistance-associated cytochrome P450 genes. By treating dispersal as both an intergenerational process and a set of discrete events, we describe a highly interconnected invasive system.
Collapse
Affiliation(s)
- Thomas L Schmidt
- Pest and Environmental Adaptation Research Group, School of BioSciences, Bio21 Institute, University of Melbourne, Parkville, VIC, Australia
| | - Tom Swan
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia.,College of Public Health, Medical and Veterinary Sciences, James Cook University, Cairns, QLD, Australia
| | - Jessica Chung
- Pest and Environmental Adaptation Research Group, School of BioSciences, Bio21 Institute, University of Melbourne, Parkville, VIC, Australia.,Melbourne Bioinformatics, University of Melbourne, Parkville, VIC, Australia
| | - Stephan Karl
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia.,Vector Borne Diseases Unit, Papua New Guinea Institute of Medical Research, Madang, Papua New Guinea
| | - Samuel Demok
- Vector Borne Diseases Unit, Papua New Guinea Institute of Medical Research, Madang, Papua New Guinea
| | - Qiong Yang
- Pest and Environmental Adaptation Research Group, School of BioSciences, Bio21 Institute, University of Melbourne, Parkville, VIC, Australia
| | - Matt A Field
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia.,John Curtin School of Medical Research, Australian National University, Canberra, ACT, Australia
| | - Mutizwa Odwell Muzari
- Medical Entomology, Tropical Public Health Services Cairns, Cairns and Hinterland Hospital & Health Services, Cairns, QLD, Australia
| | - Gerhard Ehlers
- Medical Entomology, Tropical Public Health Services Cairns, Cairns and Hinterland Hospital & Health Services, Cairns, QLD, Australia
| | - Mathew Brugh
- Medical Entomology, Tropical Public Health Services Cairns, Cairns and Hinterland Hospital & Health Services, Cairns, QLD, Australia
| | - Rodney Bellwood
- Medical Entomology, Tropical Public Health Services Cairns, Cairns and Hinterland Hospital & Health Services, Cairns, QLD, Australia
| | - Peter Horne
- Medical Entomology, Tropical Public Health Services Cairns, Cairns and Hinterland Hospital & Health Services, Cairns, QLD, Australia
| | - Thomas R Burkot
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | - Scott Ritchie
- College of Public Health, Medical and Veterinary Sciences, James Cook University, Cairns, QLD, Australia.,Institute of Vector-Borne Disease, Monash University, Clayton, VIC, Australia
| | - Ary A Hoffmann
- Pest and Environmental Adaptation Research Group, School of BioSciences, Bio21 Institute, University of Melbourne, Parkville, VIC, Australia
| |
Collapse
|
43
|
Librado P, Khan N, Fages A, Kusliy MA, Suchan T, Tonasso-Calvière L, Schiavinato S, Alioglu D, Fromentier A, Perdereau A, Aury JM, Gaunitz C, Chauvey L, Seguin-Orlando A, Der Sarkissian C, Southon J, Shapiro B, Tishkin AA, Kovalev AA, Alquraishi S, Alfarhan AH, Al-Rasheid KAS, Seregély T, Klassen L, Iversen R, Bignon-Lau O, Bodu P, Olive M, Castel JC, Boudadi-Maligne M, Alvarez N, Germonpré M, Moskal-del Hoyo M, Wilczyński J, Pospuła S, Lasota-Kuś A, Tunia K, Nowak M, Rannamäe E, Saarma U, Boeskorov G, Lōugas L, Kyselý R, Peške L, Bălășescu A, Dumitrașcu V, Dobrescu R, Gerber D, Kiss V, Szécsényi-Nagy A, Mende BG, Gallina Z, Somogyi K, Kulcsár G, Gál E, Bendrey R, Allentoft ME, Sirbu G, Dergachev V, Shephard H, Tomadini N, Grouard S, Kasparov A, Basilyan AE, Anisimov MA, Nikolskiy PA, Pavlova EY, Pitulko V, Brem G, Wallner B, Schwall C, Keller M, Kitagawa K, Bessudnov AN, Bessudnov A, Taylor W, Magail J, Gantulga JO, Bayarsaikhan J, Erdenebaatar D, Tabaldiev K, Mijiddorj E, Boldgiv B, Tsagaan T, Pruvost M, Olsen S, Makarewicz CA, Valenzuela Lamas S, Albizuri Canadell S, Nieto Espinet A, Iborra MP, Lira Garrido J, Rodríguez González E, Celestino S, Olària C, Arsuaga JL, Kotova N, Pryor A, Crabtree P, Zhumatayev R, Toleubaev A, Morgunova NL, Kuznetsova T, Lordkipanize D, Marzullo M, Prato O, Bagnasco Gianni G, Tecchiati U, Clavel B, Lepetz S, Davoudi H, Mashkour M, Berezina NY, Stockhammer PW, Krause J, Haak W, Morales-Muñiz A, Benecke N, Hofreiter M, Ludwig A, Graphodatsky AS, Peters J, Kiryushin KY, Iderkhangai TO, Bokovenko NA, Vasiliev SK, Seregin NN, Chugunov KV, Plasteeva NA, Baryshnikov GF, Petrova E, Sablin M, Ananyevskaya E, Logvin A, Shevnina I, Logvin V, Kalieva S, Loman V, Kukushkin I, Merz I, Merz V, Sakenov S, Varfolomeyev V, Usmanova E, Zaibert V, Arbuckle B, Belinskiy AB, Kalmykov A, Reinhold S, Hansen S, Yudin AI, Vybornov AA, Epimakhov A, Berezina NS, Roslyakova N, Kosintsev PA, Kuznetsov PF, Anthony D, Kroonen GJ, Kristiansen K, Wincker P, Outram A, Orlando L. The origins and spread of domestic horses from the Western Eurasian steppes. Nature 2021; 598:634-640. [PMID: 34671162 PMCID: PMC8550961 DOI: 10.1038/s41586-021-04018-9] [Citation(s) in RCA: 69] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 09/10/2021] [Indexed: 01/13/2023]
Abstract
Domestication of horses fundamentally transformed long-range mobility and warfare1. However, modern domesticated breeds do not descend from the earliest domestic horse lineage associated with archaeological evidence of bridling, milking and corralling2-4 at Botai, Central Asia around 3500 BC3. Other longstanding candidate regions for horse domestication, such as Iberia5 and Anatolia6, have also recently been challenged. Thus, the genetic, geographic and temporal origins of modern domestic horses have remained unknown. Here we pinpoint the Western Eurasian steppes, especially the lower Volga-Don region, as the homeland of modern domestic horses. Furthermore, we map the population changes accompanying domestication from 273 ancient horse genomes. This reveals that modern domestic horses ultimately replaced almost all other local populations as they expanded rapidly across Eurasia from about 2000 BC, synchronously with equestrian material culture, including Sintashta spoke-wheeled chariots. We find that equestrianism involved strong selection for critical locomotor and behavioural adaptations at the GSDMC and ZFPM1 genes. Our results reject the commonly held association7 between horseback riding and the massive expansion of Yamnaya steppe pastoralists into Europe around 3000 BC8,9 driving the spread of Indo-European languages10. This contrasts with the scenario in Asia where Indo-Iranian languages, chariots and horses spread together, following the early second millennium BC Sintashta culture11,12.
Collapse
Affiliation(s)
- Pablo Librado
- grid.15781.3a0000 0001 0723 035XCentre d’Anthropobiologie et de Génomique de Toulouse, Université Paul Sabatier, Toulouse, France
| | - Naveed Khan
- grid.15781.3a0000 0001 0723 035XCentre d’Anthropobiologie et de Génomique de Toulouse, Université Paul Sabatier, Toulouse, France ,grid.440522.50000 0004 0478 6450Department of Biotechnology, Abdul Wali Khan University, Mardan, Pakistan
| | - Antoine Fages
- grid.15781.3a0000 0001 0723 035XCentre d’Anthropobiologie et de Génomique de Toulouse, Université Paul Sabatier, Toulouse, France
| | - Mariya A. Kusliy
- grid.15781.3a0000 0001 0723 035XCentre d’Anthropobiologie et de Génomique de Toulouse, Université Paul Sabatier, Toulouse, France ,grid.415877.80000 0001 2254 1834Department of the Diversity and Evolution of Genomes, Institute of Molecular and Cellular Biology SB RAS, Novosibirsk, Russia
| | - Tomasz Suchan
- grid.15781.3a0000 0001 0723 035XCentre d’Anthropobiologie et de Génomique de Toulouse, Université Paul Sabatier, Toulouse, France ,grid.413454.30000 0001 1958 0162W. Szafer Institute of Botany, Polish Academy of Sciences, Kraków, Poland
| | - Laure Tonasso-Calvière
- grid.15781.3a0000 0001 0723 035XCentre d’Anthropobiologie et de Génomique de Toulouse, Université Paul Sabatier, Toulouse, France
| | - Stéphanie Schiavinato
- grid.15781.3a0000 0001 0723 035XCentre d’Anthropobiologie et de Génomique de Toulouse, Université Paul Sabatier, Toulouse, France
| | - Duha Alioglu
- grid.15781.3a0000 0001 0723 035XCentre d’Anthropobiologie et de Génomique de Toulouse, Université Paul Sabatier, Toulouse, France
| | - Aurore Fromentier
- grid.15781.3a0000 0001 0723 035XCentre d’Anthropobiologie et de Génomique de Toulouse, Université Paul Sabatier, Toulouse, France
| | - Aude Perdereau
- grid.460789.40000 0004 4910 6535Genoscope, Institut de biologie François-Jacob, Commissariat à l’Energie Atomique (CEA), Université Paris-Saclay, Evry, France
| | - Jean-Marc Aury
- grid.8390.20000 0001 2180 5818Génomique Métabolique, Genoscope, Institut de biologie François Jacob, CEA, CNRS, Université d’Evry, Université Paris-Saclay, Evry, France
| | - Charleen Gaunitz
- grid.15781.3a0000 0001 0723 035XCentre d’Anthropobiologie et de Génomique de Toulouse, Université Paul Sabatier, Toulouse, France
| | - Lorelei Chauvey
- grid.15781.3a0000 0001 0723 035XCentre d’Anthropobiologie et de Génomique de Toulouse, Université Paul Sabatier, Toulouse, France
| | - Andaine Seguin-Orlando
- grid.15781.3a0000 0001 0723 035XCentre d’Anthropobiologie et de Génomique de Toulouse, Université Paul Sabatier, Toulouse, France
| | - Clio Der Sarkissian
- grid.15781.3a0000 0001 0723 035XCentre d’Anthropobiologie et de Génomique de Toulouse, Université Paul Sabatier, Toulouse, France
| | - John Southon
- grid.266093.80000 0001 0668 7243Earth System Science Department, University of California, Irvine, Irvine, CA USA
| | - Beth Shapiro
- grid.205975.c0000 0001 0740 6917Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA USA ,grid.205975.c0000 0001 0740 6917Howard Hughes Medical Institute, University of California, Santa Cruz, Santa Cruz, CA USA
| | - Alexey A. Tishkin
- grid.77225.350000000112611077Department of Archaeology, Ethnography and Museology, Altai State University, Barnaul, Russia
| | - Alexey A. Kovalev
- grid.465449.e0000 0001 1214 1108Department of Archaeological Heritage Preservation, Institute of Archaeology of the Russian Academy of Sciences, Moscow, Russia
| | - Saleh Alquraishi
- grid.56302.320000 0004 1773 5396Zoology Department, College of Science, King Saud University, Riyadh, Saudi Arabia
| | - Ahmed H. Alfarhan
- grid.56302.320000 0004 1773 5396Zoology Department, College of Science, King Saud University, Riyadh, Saudi Arabia
| | - Khaled A. S. Al-Rasheid
- grid.56302.320000 0004 1773 5396Zoology Department, College of Science, King Saud University, Riyadh, Saudi Arabia
| | - Timo Seregély
- grid.7359.80000 0001 2325 4853Institute for Archaeology, Heritage Conservation Studies and Art History, University of Bamberg, Bamberg, Germany
| | | | - Rune Iversen
- grid.5254.60000 0001 0674 042XSaxo Institute, section of Archaeology, University of Copenhagen, Copenhagen, Denmark
| | - Olivier Bignon-Lau
- grid.4444.00000 0001 2112 9282ArScAn-UMR 7041, Equipe Ethnologie préhistorique, CNRS, MSH-Mondes, Nanterre Cedex, France
| | - Pierre Bodu
- grid.4444.00000 0001 2112 9282ArScAn-UMR 7041, Equipe Ethnologie préhistorique, CNRS, MSH-Mondes, Nanterre Cedex, France
| | - Monique Olive
- grid.4444.00000 0001 2112 9282ArScAn-UMR 7041, Equipe Ethnologie préhistorique, CNRS, MSH-Mondes, Nanterre Cedex, France
| | | | - Myriam Boudadi-Maligne
- grid.412041.20000 0001 2106 639XUMR 5199 De la Préhistoire à l’Actuel : Culture, Environnement et Anthropologie (PACEA), CNRS, Université de Bordeaux, Pessac Cedex, France
| | - Nadir Alvarez
- grid.466902.f0000 0001 2248 6951Geneva Natural History Museum, Geneva, Switzerland ,grid.8591.50000 0001 2322 4988Department of Genetics and Evolution, University of Geneva, Geneva, Switzerland
| | - Mietje Germonpré
- grid.20478.390000 0001 2171 9581OD Earth & History of Life, Royal Belgian Institute of Natural Sciences, Brussels, Belgium
| | - Magdalena Moskal-del Hoyo
- grid.413454.30000 0001 1958 0162W. Szafer Institute of Botany, Polish Academy of Sciences, Kraków, Poland
| | - Jarosław Wilczyński
- grid.413454.30000 0001 1958 0162Institute of Systematics and Evolution of Animals, Polish Academy of Sciences, Kraków, Poland
| | - Sylwia Pospuła
- grid.413454.30000 0001 1958 0162Institute of Systematics and Evolution of Animals, Polish Academy of Sciences, Kraków, Poland
| | - Anna Lasota-Kuś
- grid.413454.30000 0001 1958 0162Institute of Archaeology and Ethnology Polish Academy of Sciences, Kraków, Poland
| | - Krzysztof Tunia
- grid.413454.30000 0001 1958 0162Institute of Archaeology and Ethnology Polish Academy of Sciences, Kraków, Poland
| | - Marek Nowak
- grid.5522.00000 0001 2162 9631Institute of Archaeology, Jagiellonian University, Kraków, Poland
| | - Eve Rannamäe
- Department of Archaeology, Institute of History and Archaeology, Tartu, Estonia
| | - Urmas Saarma
- grid.10939.320000 0001 0943 7661Department of Zoology, Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Estonia
| | - Gennady Boeskorov
- Diamond and Precious Metals Geology Institute, SB RAS, Yakutsk, Russia
| | - Lembi Lōugas
- grid.8207.d0000 0000 9774 6466Archaeological Research Collection, Tallinn University, Tallinn, Estonia
| | - René Kyselý
- grid.447879.10000 0001 0792 540XDepartment of Natural Sciences and Archaeometry, Institute of Archaeology of the Czech Academy of Sciences, Prague, Czechia
| | | | - Adrian Bălășescu
- grid.418333.e0000 0004 1937 1389Vasile Pârvan Institute of Archaeology, Department of Bioarchaeology, Romanian Academy, Bucharest, Romania
| | - Valentin Dumitrașcu
- grid.418333.e0000 0004 1937 1389Vasile Pârvan Institute of Archaeology, Department of Bioarchaeology, Romanian Academy, Bucharest, Romania
| | - Roxana Dobrescu
- grid.418333.e0000 0004 1937 1389Vasile Pârvan Institute of Archaeology, Department of Bioarchaeology, Romanian Academy, Bucharest, Romania
| | - Daniel Gerber
- grid.481823.4Institute of Archaeogenomics, Research Centre for the Humanities, Eötvös Loránd Research Network, Budapest, Hungary ,grid.5591.80000 0001 2294 6276Department of Genetics, Eötvös Loránd University, Budapest, Hungary
| | - Viktória Kiss
- grid.481830.60000 0001 2238 5843Institute of Archaeology, Research Centre for the Humanities, Eötvös Loránd Research Network, Budapest, Hungary
| | - Anna Szécsényi-Nagy
- grid.481823.4Institute of Archaeogenomics, Research Centre for the Humanities, Eötvös Loránd Research Network, Budapest, Hungary
| | - Balázs G. Mende
- grid.481823.4Institute of Archaeogenomics, Research Centre for the Humanities, Eötvös Loránd Research Network, Budapest, Hungary
| | | | | | - Gabriella Kulcsár
- grid.481830.60000 0001 2238 5843Institute of Archaeology, Research Centre for the Humanities, Eötvös Loránd Research Network, Budapest, Hungary
| | - Erika Gál
- grid.481830.60000 0001 2238 5843Institute of Archaeology, Research Centre for the Humanities, Eötvös Loránd Research Network, Budapest, Hungary
| | - Robin Bendrey
- grid.4305.20000 0004 1936 7988School of History, Classics and Archaeology, University of Edinburgh, Old Medical School, Edinburgh, UK
| | - Morten E. Allentoft
- grid.1032.00000 0004 0375 4078Trace and Environmental DNA (TrEnD) Lab, School of Molecular and Life Sciences, Curtin University, Perth, Western Australia Australia ,grid.5254.60000 0001 0674 042XLundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| | - Ghenadie Sirbu
- grid.435140.7Department of Academic Management, Academy of Science of Moldova, Chișinău, Republic of Moldova
| | - Valentin Dergachev
- grid.435140.7Center of Archaeology, Institute of Cultural Heritage, Academy of Science of Moldova, Chișinău, Republic of Moldova
| | - Henry Shephard
- grid.446391.d0000 0001 2190 3450Archaeological Institute of America, Boston, MA USA
| | - Noémie Tomadini
- Centre National de Recherche Scientifique, Muséum national d’Histoire naturelle, Archéozoologie, Archéobotanique (AASPE), CP 56, Paris, France
| | - Sandrine Grouard
- Centre National de Recherche Scientifique, Muséum national d’Histoire naturelle, Archéozoologie, Archéobotanique (AASPE), CP 56, Paris, France
| | - Aleksei Kasparov
- grid.473277.20000 0001 2291 1890Institute for the History of Material Culture, Russian Academy of Sciences (IHMC RAS), St Petersburg, Russia
| | | | - Mikhail A. Anisimov
- grid.424187.c0000 0001 1942 9788Arctic and Antarctic Research Institute, St Petersburg, Russia
| | - Pavel A. Nikolskiy
- grid.465388.4Geological Institute, Russian Academy of Sciences, Moscow, Russia
| | - Elena Y. Pavlova
- grid.424187.c0000 0001 1942 9788Arctic and Antarctic Research Institute, St Petersburg, Russia
| | - Vladimir Pitulko
- grid.473277.20000 0001 2291 1890Institute for the History of Material Culture, Russian Academy of Sciences (IHMC RAS), St Petersburg, Russia
| | - Gottfried Brem
- grid.6583.80000 0000 9686 6466Institute of Animal Breeding and Genetics, University of Veterinary Medicine Vienna, Vienna, Austria
| | - Barbara Wallner
- grid.6583.80000 0000 9686 6466Institute of Animal Breeding and Genetics, University of Veterinary Medicine Vienna, Vienna, Austria
| | - Christoph Schwall
- grid.466489.10000 0001 2151 4674Department of Prehistory and Western Asian/Northeast African Archaeology, Austrian Archaeological Institute, Austrian Academy of Sciences, Vienna, Austria
| | - Marcel Keller
- grid.10939.320000 0001 0943 7661Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu, Estonia ,grid.469873.70000 0004 4914 1197Department of Archaeogenetics, Max Planck Institute for the Science of Human History, Jena, Germany
| | - Keiko Kitagawa
- grid.10392.390000 0001 2190 1447SFB 1070 Resource Cultures, University of Tübingen, Tübingen, Germany ,grid.10392.390000 0001 2190 1447Department of Early Prehistory and Quaternary Ecology, University of Tübingen, Tübingen, Germany ,grid.4444.00000 0001 2112 9282UMR 7194 Muséum National d’Histoire Naturelle, CNRS, UPVD, Paris, France
| | - Alexander N. Bessudnov
- grid.459698.f0000 0000 8989 8101Semenov-Tyan-Shanskii Lipetsk State Pedagogical University, Lipetsk, Russia
| | - Alexander Bessudnov
- grid.473277.20000 0001 2291 1890Institute for the History of Material Culture, Russian Academy of Sciences (IHMC RAS), St Petersburg, Russia
| | - William Taylor
- grid.266190.a0000000096214564Museum of Natural History, University of Colorado-Boulder, Boulder, CO USA
| | - Jérome Magail
- Musée d’Anthropologie préhistorique de Monaco, Monaco, Monaco
| | - Jamiyan-Ombo Gantulga
- grid.425564.40000 0004 0587 3863Institute of Archaeology, Mongolian Academy of Sciences, Ulaanbaatar, Mongolia
| | - Jamsranjav Bayarsaikhan
- grid.469873.70000 0004 4914 1197Department of Archaeology, Max Planck Institute for the Science of Human History, Jena, Germany ,Chinggis Khaan Museum, Ulaanbaatar, Mongolia
| | | | - Kubatbeek Tabaldiev
- grid.444269.90000 0004 0387 4627Department of History, Kyrgyz-Turkish Manas University, Bishkek, Kyrgyzstan
| | - Enkhbayar Mijiddorj
- Department of Archaeology, Ulaanbaatar State University, Ulaanbaatar, Mongolia
| | - Bazartseren Boldgiv
- grid.260731.10000 0001 2324 0259Department of Biology, National University of Mongolia, Ulaanbaatar, Mongolia
| | - Turbat Tsagaan
- grid.425564.40000 0004 0587 3863Institute of Archaeology, Mongolian Academy of Sciences, Ulaanbaatar, Mongolia
| | - Mélanie Pruvost
- grid.412041.20000 0001 2106 639XUMR 5199 De la Préhistoire à l’Actuel : Culture, Environnement et Anthropologie (PACEA), CNRS, Université de Bordeaux, Pessac Cedex, France
| | - Sandra Olsen
- grid.266515.30000 0001 2106 0692Division of Archaeology, Biodiversity Institute, University of Kansas, Lawrence, KS USA
| | - Cheryl A. Makarewicz
- grid.9764.c0000 0001 2153 9986Institute for Prehistoric and Protohistoric Archaeology, Kiel University, Kiel, Germany ,grid.9764.c0000 0001 2153 9986ROOTS Excellence Cluster, Kiel University, Kiel, Germany
| | - Silvia Valenzuela Lamas
- grid.4711.30000 0001 2183 4846Archaeology of Social Dynamics, Institució Milà i Fontanals d’Humanitats, Consejo Superior de Investigaciones Científicas (IMF-CSIC), Barcelona, Spain
| | - Silvia Albizuri Canadell
- grid.5841.80000 0004 1937 0247Departament d’Història i Arqueologia–SERP, Universitat de Barcelona, Barcelona, Spain
| | - Ariadna Nieto Espinet
- grid.15043.330000 0001 2163 1432Grup d’Investigació Prehistòrica, Universitat de Lleida, PID2019-110022GB-I00, Lleida, Spain
| | | | - Jaime Lira Garrido
- grid.8393.10000000119412521Departamento de Medicina Animal, Facultad de Veterinaria, Universidad de Extremadura, Cáceres, Spain ,Centro Mixto UCM-ISCIII de Evolución y Comportamiento Humanos, Madrid, Spain
| | | | - Sebastián Celestino
- grid.454770.50000 0001 1945 3489Instituto de Arqueología (CSIC–Junta de Extremadura), Mérida, Spain
| | - Carmen Olària
- grid.9612.c0000 0001 1957 9153Laboratori d’Arqueologia Prehistòrica, Universitat Jaume I, Castelló de la Plana, Spain
| | - Juan Luis Arsuaga
- Centro Mixto UCM-ISCIII de Evolución y Comportamiento Humanos, Madrid, Spain ,grid.4795.f0000 0001 2157 7667Departamento de Geodinámica, Estratigrafía y Paleontología, Facultad de Ciencias Geológicas, Universidad Complutense de Madrid, Madrid, Spain
| | - Nadiia Kotova
- grid.418751.e0000 0004 0385 8977Department of Eneolithic and Bronze Age, Institute of Archaeology National Academy of Sciences of Ukraine, Kyiv, Ukraine
| | - Alexander Pryor
- grid.8391.30000 0004 1936 8024Department of Archaeology, University of Exeter, Exeter, UK
| | - Pam Crabtree
- grid.137628.90000 0004 1936 8753Center for the Study of Human Origins, Anthropology Department, New York University, New York, NY USA
| | - Rinat Zhumatayev
- grid.77184.3d0000 0000 8887 5266Department of Archaeology, Ethnology and Museology, Al Farabi Kazakh National University, Almaty, Kazakhstan
| | - Abdesh Toleubaev
- grid.77184.3d0000 0000 8887 5266Department of Archaeology, Ethnology and Museology, Al Farabi Kazakh National University, Almaty, Kazakhstan
| | - Nina L. Morgunova
- grid.445474.20000 0001 1092 7131Scientific Research Department, Orenburg State Pedagogical University, Orenburg, Russia
| | - Tatiana Kuznetsova
- grid.14476.300000 0001 2342 9668Department of paleontology, Faculty of Geology, Moscow State University, Moscow, Russia ,grid.77268.3c0000 0004 0543 9688Institute of Geology and Petroleum Technologies, Kazan Federal University, Kazan, Russia
| | - David Lordkipanize
- grid.452450.20000 0001 0739 408XGeorgian National Museum, Tbilisi, Georgia ,grid.26193.3f0000 0001 2034 6082Tbilisi State University, Tbilisi, Georgia
| | - Matilde Marzullo
- grid.4708.b0000 0004 1757 2822Università degli Studi di Milano, Dipartimento di Beni Culturali e Ambientali, Milan, Italy
| | - Ornella Prato
- grid.4708.b0000 0004 1757 2822Università degli Studi di Milano, Dipartimento di Beni Culturali e Ambientali, Milan, Italy
| | - Giovanna Bagnasco Gianni
- grid.4708.b0000 0004 1757 2822Università degli Studi di Milano, Dipartimento di Beni Culturali e Ambientali, Milan, Italy
| | - Umberto Tecchiati
- grid.4708.b0000 0004 1757 2822Università degli Studi di Milano, Dipartimento di Beni Culturali e Ambientali, Milan, Italy
| | - Benoit Clavel
- Centre National de Recherche Scientifique, Muséum national d’Histoire naturelle, Archéozoologie, Archéobotanique (AASPE), CP 56, Paris, France
| | - Sébastien Lepetz
- Centre National de Recherche Scientifique, Muséum national d’Histoire naturelle, Archéozoologie, Archéobotanique (AASPE), CP 56, Paris, France
| | - Hossein Davoudi
- grid.46072.370000 0004 0612 7950University of Tehran, Central Laboratory, Bioarchaeology Laboratory, Archaeozoology Section, Tehran, Iran
| | - Marjan Mashkour
- Centre National de Recherche Scientifique, Muséum national d’Histoire naturelle, Archéozoologie, Archéobotanique (AASPE), CP 56, Paris, France ,grid.46072.370000 0004 0612 7950University of Tehran, Central Laboratory, Bioarchaeology Laboratory, Archaeozoology Section, Tehran, Iran
| | - Natalia Ya. Berezina
- grid.14476.300000 0001 2342 9668Research Institute and Museum of Anthropology, Lomonosov Moscow State University, Moscow, Russia
| | - Philipp W. Stockhammer
- grid.419518.00000 0001 2159 1813Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany ,grid.5252.00000 0004 1936 973XInstitute for Pre- and Protohistoric Archaeology and Archaeology of the Roman Provinces, Ludwig Maximilian University, Munich, Munich, Germany
| | - Johannes Krause
- grid.469873.70000 0004 4914 1197Department of Archaeogenetics, Max Planck Institute for the Science of Human History, Jena, Germany ,grid.419518.00000 0001 2159 1813Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Wolfgang Haak
- grid.469873.70000 0004 4914 1197Department of Archaeogenetics, Max Planck Institute for the Science of Human History, Jena, Germany ,grid.419518.00000 0001 2159 1813Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany ,grid.1010.00000 0004 1936 7304School of Biological Sciences, The University of Adelaide, Adelaide, South Australia Australia
| | - Arturo Morales-Muñiz
- grid.5515.40000000119578126Department of Biology, Universidad Autónoma de Madrid, Madrid, Spain
| | - Norbert Benecke
- grid.424195.f0000 0001 2106 6832Eurasia Department of the German Archaeological Institute, Berlin, Germany
| | - Michael Hofreiter
- grid.11348.3f0000 0001 0942 1117Evolutionary Adaptive Genomics, Institute of Biochemistry and Biology, Faculty of Mathematics and Science, University of Potsdam, Potsdam, Germany
| | - Arne Ludwig
- grid.418779.40000 0001 0708 0355Department of Evolutionary Genetics, Leibniz-Institute for Zoo and Wildlife Research, Berlin, Germany ,grid.7468.d0000 0001 2248 7639Albrecht Daniel Thaer-Institute, Faculty of Life Sciences, Humboldt University Berlin, Berlin, Germany
| | - Alexander S. Graphodatsky
- grid.415877.80000 0001 2254 1834Department of the Diversity and Evolution of Genomes, Institute of Molecular and Cellular Biology SB RAS, Novosibirsk, Russia
| | - Joris Peters
- grid.5252.00000 0004 1936 973XArchaeoBioCenter and Institute of Palaeoanatomy, Domestication Research and the History of Veterinary Medicine, LMU Munich, Munich, Germany ,grid.452781.d0000 0001 2203 6205SNSB, State Collection of Anthropology and Palaeoanatomy, Munich, Germany
| | - Kirill Yu. Kiryushin
- grid.77225.350000000112611077Department of Archaeology, Ethnography and Museology, Altai State University, Barnaul, Russia
| | | | - Nikolay A. Bokovenko
- grid.473277.20000 0001 2291 1890Institute for the History of Material Culture, Russian Academy of Sciences (IHMC RAS), St Petersburg, Russia
| | - Sergey K. Vasiliev
- grid.415877.80000 0001 2254 1834ArchaeoZOOlogy in Siberia and Central Asia—ZooSCAn International Research Laboratory, Institute of Archeology and Ethnography of the Siberian Branch of the RAS, Novosibirsk, Russia
| | - Nikolai N. Seregin
- grid.77225.350000000112611077Department of Archaeology, Ethnography and Museology, Altai State University, Barnaul, Russia
| | - Konstantin V. Chugunov
- grid.426493.e0000 0004 1800 742XDepartment of Eastern European and Siberian Archaeology, State Hermitage Museum, St Petersburg, Russia
| | - Natalya A. Plasteeva
- grid.482778.60000 0001 2197 0186Paleoecology Laboratory, Institute of Plant and Animal Ecology, Ural Branch of the Russian Academy of Sciences, Ekaterinburg, Russia
| | - Gennady F. Baryshnikov
- grid.439287.30000 0001 2314 7601Zoological Institute, Russian Academy of Sciences, St Petersburg, Russia
| | - Ekaterina Petrova
- grid.6441.70000 0001 2243 2806Department of Archaeology, History Faculty, Vilnius University, Vilnius, Lithuania
| | - Mikhail Sablin
- grid.439287.30000 0001 2314 7601Zoological Institute, Russian Academy of Sciences, St Petersburg, Russia
| | - Elina Ananyevskaya
- grid.6441.70000 0001 2243 2806Department of Archaeology, History Faculty, Vilnius University, Vilnius, Lithuania
| | - Andrey Logvin
- grid.443586.8Laboratory for Archaeological Research, Faculty of History and Law, Kostanay State University, Kostanay, Kazakhstan
| | - Irina Shevnina
- grid.443586.8Laboratory for Archaeological Research, Faculty of History and Law, Kostanay State University, Kostanay, Kazakhstan
| | - Victor Logvin
- Department of History and Archaeology, Surgut Governmental University, Surgut, Russia
| | - Saule Kalieva
- Department of History and Archaeology, Surgut Governmental University, Surgut, Russia
| | - Valeriy Loman
- Saryarka Archaeological Institute, Buketov Karaganda University, Karaganda, Kazakhstan
| | - Igor Kukushkin
- Saryarka Archaeological Institute, Buketov Karaganda University, Karaganda, Kazakhstan
| | - Ilya Merz
- Toraighyrov University, Joint Research Center for Archeological Studies, Pavlodar, Kazakhstan
| | - Victor Merz
- Toraighyrov University, Joint Research Center for Archeological Studies, Pavlodar, Kazakhstan
| | - Sergazy Sakenov
- grid.55380.3b0000 0004 0398 5415Faculty of History, L. N. Gumilev Eurasian National University, Nur-Sultan, Kazakhstan
| | - Victor Varfolomeyev
- Saryarka Archaeological Institute, Buketov Karaganda University, Karaganda, Kazakhstan
| | - Emma Usmanova
- Saryarka Archaeological Institute, Buketov Karaganda University, Karaganda, Kazakhstan
| | - Viktor Zaibert
- grid.77184.3d0000 0000 8887 5266Institute of Archaeology and Steppe Civilizations, Al-Farabi Kazakh National University, Almaty, Kazakhstan
| | - Benjamin Arbuckle
- grid.10698.360000000122483208Department of Anthropology, Alumni Building, University of North Carolina at Chapel Hill, Chapel Hill, NC USA
| | | | | | - Sabine Reinhold
- grid.424195.f0000 0001 2106 6832Eurasia Department of the German Archaeological Institute, Berlin, Germany
| | - Svend Hansen
- grid.424195.f0000 0001 2106 6832Eurasia Department of the German Archaeological Institute, Berlin, Germany
| | - Aleksandr I. Yudin
- Research Center for the Preservation of Cultural Heritage, Saratov, Russia
| | - Alekandr A. Vybornov
- grid.445790.b0000 0001 2218 2982Department of Russian History and Archaeology, Samara State University of Social Sciences and Education, Samara, Russia
| | - Andrey Epimakhov
- grid.440724.10000 0000 9958 5862Russian and Foreign History Department, South Ural State University, Chelyabinsk, Russia ,grid.465317.20000 0001 2224 8785South Ural Department, Institute of History and Archaeology, Ural Branch of the Russian Academy of Sciences, Ekaterinburg, Russia
| | - Natalia S. Berezina
- Archaeological School, Chuvash State Institute of Humanities, Cheboksary, Russia
| | - Natalia Roslyakova
- grid.445790.b0000 0001 2218 2982Department of Russian History and Archaeology, Samara State University of Social Sciences and Education, Samara, Russia
| | - Pavel A. Kosintsev
- grid.482778.60000 0001 2197 0186Paleoecology Laboratory, Institute of Plant and Animal Ecology, Ural Branch of the Russian Academy of Sciences, Ekaterinburg, Russia ,grid.412761.70000 0004 0645 736XDepartment of History of the Institute of Humanities, Ural Federal University, Ekaterinburg, Russia
| | - Pavel F. Kuznetsov
- grid.445790.b0000 0001 2218 2982Department of Russian History and Archaeology, Samara State University of Social Sciences and Education, Samara, Russia
| | - David Anthony
- grid.38142.3c000000041936754XDepartment of Human Evolutionary Biology, Harvard University, Cambridge, MA USA ,grid.418410.80000 0001 0115 6427Anthropology Faculty, Hartwick College, Oneonta NY, USA
| | - Guus J. Kroonen
- grid.5254.60000 0001 0674 042XDepartment of Nordic Studies and Linguistics, University of Copenhagen, Copenhagen, Denmark ,grid.5132.50000 0001 2312 1970Leiden University Center for Linguistics, Leiden University, Leiden, The Netherlands
| | - Kristian Kristiansen
- grid.8761.80000 0000 9919 9582Department of Historical Studies, University of Gothenburg, Gothenburg, Sweden ,grid.452548.a0000 0000 9817 5300Present Address: Lundbeck Foundation GeoGenetics Centre, Copenhagen, Denmark
| | - Patrick Wincker
- grid.8390.20000 0001 2180 5818Génomique Métabolique, Genoscope, Institut de biologie François Jacob, CEA, CNRS, Université d’Evry, Université Paris-Saclay, Evry, France
| | - Alan Outram
- grid.8391.30000 0004 1936 8024Department of Archaeology, University of Exeter, Exeter, UK
| | - Ludovic Orlando
- grid.15781.3a0000 0001 0723 035XCentre d’Anthropobiologie et de Génomique de Toulouse, Université Paul Sabatier, Toulouse, France
| |
Collapse
|