1
|
Qin X, Chiang CWK, Gaggiotti OE. KLFDAPC: a supervised machine learning approach for spatial genetic structure analysis. Brief Bioinform 2022; 23:6596986. [PMID: 35649387 PMCID: PMC9294434 DOI: 10.1093/bib/bbac202] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 04/05/2022] [Accepted: 04/29/2022] [Indexed: 12/30/2022] Open
Abstract
Geographic patterns of human genetic variation provide important insights into human evolution and disease. A commonly used tool to detect and describe them is principal component analysis (PCA) or the supervised linear discriminant analysis of principal components (DAPC). However, genetic features produced from both approaches could fail to correctly characterize population structure for complex scenarios involving admixture. In this study, we introduce Kernel Local Fisher Discriminant Analysis of Principal Components (KLFDAPC), a supervised non-linear approach for inferring individual geographic genetic structure that could rectify the limitations of these approaches by preserving the multimodal space of samples. We tested the power of KLFDAPC to infer population structure and to predict individual geographic origin using neural networks. Simulation results showed that KLFDAPC has higher discriminatory power than PCA and DAPC. The application of our method to empirical European and East Asian genome-wide genetic datasets indicated that the first two reduced features of KLFDAPC correctly recapitulated the geography of individuals and significantly improved the accuracy of predicting individual geographic origin when compared to PCA and DAPC. Therefore, KLFDAPC can be useful for geographic ancestry inference, design of genome scans and correction for spatial stratification in GWAS that link genes to adaptation or disease susceptibility.
Collapse
Affiliation(s)
- Xinghu Qin
- Centre for Biological Diversity, Sir Harold Mitchell Building, University of St Andrews, Fife, KY16 9TF, UK
| | - Charleston W K Chiang
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine & Department of Quantitative and Computational Biology, University of Southern California, USA
| | - Oscar E Gaggiotti
- Centre for Biological Diversity, Sir Harold Mitchell Building, University of St Andrews, Fife, KY16 9TF, UK
| |
Collapse
|
2
|
Rengifo‐Correa L, Abad‐Franch F, Martínez‐Hernández F, Salazar‐Schettino PM, Téllez‐Rendón JL, Villalobos G, Morrone JJ. A biogeographic–ecological approach to disentangle reticulate evolution in the
Triatoma phyllosoma
species group (Heteroptera: Triatominae), vectors of Chagas disease. J ZOOL SYST EVOL RES 2020. [DOI: 10.1111/jzs.12409] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Affiliation(s)
- Laura Rengifo‐Correa
- Departamento de Biología Evolutiva Facultad de Ciencias Museo de Zoología ‘Alfonso L. Herrera’Universidad Nacional Autónoma de México Mexico City Mexico
| | - Fernando Abad‐Franch
- Programa de Pós‐graduação em Medicina Tropical Núcleo de Medicina Tropical Facultade Medicina Universidade de Brasília Brasília Brazil
| | | | - Paz M. Salazar‐Schettino
- Laboratorio de Biología de Parásitos Departamento de Microbiología y Parasitología Facultad de Medicina Universidad Nacional Autónoma de México Mexico City Mexico
| | | | - Guiehdani Villalobos
- Departamento de Ecología de Agentes Patógenos Hospital General Dr. Manuel Gea González Mexico City Mexico
| | - Juan J. Morrone
- Departamento de Biología Evolutiva Facultad de Ciencias Museo de Zoología ‘Alfonso L. Herrera’Universidad Nacional Autónoma de México Mexico City Mexico
| |
Collapse
|
3
|
Pei J, Zhang Y, Nielsen R, Wu Y. Inferring the ancestry of parents and grandparents from genetic data. PLoS Comput Biol 2020; 16:e1008065. [PMID: 32797037 PMCID: PMC7449501 DOI: 10.1371/journal.pcbi.1008065] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2019] [Revised: 08/26/2020] [Accepted: 06/17/2020] [Indexed: 11/18/2022] Open
Abstract
Inference of admixture proportions is a classical statistical problem in population genetics. Standard methods implicitly assume that both parents of an individual have the same admixture fraction. However, this is rarely the case in real data. In this paper we show that the distribution of admixture tract lengths in a genome contains information about the admixture proportions of the ancestors of an individual. We develop a Hidden Markov Model (HMM) framework for estimating the admixture proportions of the immediate ancestors of an individual, i.e. a type of decomposition of an individual's admixture proportions into further subsets of ancestral proportions in the ancestors. Based on a genealogical model for admixture tracts, we develop an efficient algorithm for computing the sampling probability of the genome from a single individual, as a function of the admixture proportions of the ancestors of this individual. This allows us to perform probabilistic inference of admixture proportions of ancestors only using the genome of an extant individual. We perform extensive simulations to quantify the error in the estimation of ancestral admixture proportions under various conditions. To illustrate the utility of the method, we apply it to real genetic data.
Collapse
Affiliation(s)
- Jingwen Pei
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut, United States of America
| | - Yiming Zhang
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut, United States of America
| | - Rasmus Nielsen
- Departments of Integrative Biology and Statistics, University of California, Berkeley, Berkeley, California, United States of America
- Museum of Natural History, University of Copenhagen, Copenhagen, Denmark
- * E-mail: (RN); (YW)
| | - Yufeng Wu
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut, United States of America
- * E-mail: (RN); (YW)
| |
Collapse
|
4
|
Das R, Upadhyai P. Application of the geographic population structure (GPS) algorithm for biogeographical analyses of wild and captive gorillas. BMC Bioinformatics 2019; 20:35. [PMID: 30717677 PMCID: PMC6362561 DOI: 10.1186/s12859-018-2568-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Background The utilization of high resolution genome data has important implications for the phylogeographical evaluation of non-human species. Biogeographical analyses can yield detailed understanding of their population biology and facilitate the geo-localization of individuals to promote their efficacious management, particularly when bred in captivity. The Geographic Population Structure (GPS) algorithm is an admixture based tool for inference of biogeographical affinities and has been employed for the geo-localization of various human populations worldwide. Here, we applied the GPS tool for biogeographical analyses and localization of the ancestral origins of wild and captive gorilla genomes, of unknown geographic source, available in the Great Ape Genome Project (GAGP), employing Gorillas with known ancestral origin as the reference data. Results Our findings suggest that GPS was successful in recapitulating the population history and estimating the geographic origins of all gorilla genomes queried and localized the wild gorillas with unknown geographical origin < 150 km of National Parks/Wildlife Reserves within the political boundaries of countries, considered as prominent modern-day abode for gorillas in the wild. Further, the GPS localization of most captive-born gorillas was congruent with their previously presumed ancestral homes. Conclusions Currently there is limited knowledge of the ancestral origins of most North American captive gorillas, and our study highlights the usefulness of GPS for inferring ancestry of captive gorillas. Determination of the native geographical source of captive gorillas can provide valuable information to guide breeding programs and ensure their appropriate management at the population level. Finally, our findings shine light on the broader applicability of GPS for protecting the genetic integrity of other endangered non-human species, where controlled breeding is a vital component of their conservation. Electronic supplementary material The online version of this article (10.1186/s12859-018-2568-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ranajit Das
- Manipal Centre for Natural Sciences (MCNS), Manipal Academy of Higher Education (MAHE), University building, Lab 11, Madhav Nagar, Manipal, Karnataka, 576104, India.
| | - Priyanka Upadhyai
- Department of Medical Genetics, Kasturba Medical College, Manipal Academy of Higher Education, Manipal, Karnataka, India
| |
Collapse
|
5
|
Caye K, Jay F, Michel O, François O. Fast inference of individual admixture coefficients using geographic data. Ann Appl Stat 2018. [DOI: 10.1214/17-aoas1106] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
6
|
Triska P, Chekanov N, Stepanov V, Khusnutdinova EK, Kumar GPA, Akhmetova V, Babalyan K, Boulygina E, Kharkov V, Gubina M, Khidiyatova I, Khitrinskaya I, Khrameeva EE, Khusainova R, Konovalova N, Litvinov S, Marusin A, Mazur AM, Puzyrev V, Ivanoshchuk D, Spiridonova M, Teslyuk A, Tsygankova S, Triska M, Trofimova N, Vajda E, Balanovsky O, Baranova A, Skryabin K, Tatarinova TV, Prokhortchouk E. Between Lake Baikal and the Baltic Sea: genomic history of the gateway to Europe. BMC Genet 2017; 18:110. [PMID: 29297395 PMCID: PMC5751809 DOI: 10.1186/s12863-017-0578-3] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND The history of human populations occupying the plains and mountain ridges separating Europe from Asia has been eventful, as these natural obstacles were crossed westward by multiple waves of Turkic and Uralic-speaking migrants as well as eastward by Europeans. Unfortunately, the material records of history of this region are not dense enough to reconstruct details of population history. These considerations stimulate growing interest to obtain a genetic picture of the demographic history of migrations and admixture in Northern Eurasia. RESULTS We genotyped and analyzed 1076 individuals from 30 populations with geographical coverage spanning from Baltic Sea to Baikal Lake. Our dense sampling allowed us to describe in detail the population structure, provide insight into genomic history of numerous European and Asian populations, and significantly increase quantity of genetic data available for modern populations in region of North Eurasia. Our study doubles the amount of genome-wide profiles available for this region. We detected unusually high amount of shared identical-by-descent (IBD) genomic segments between several Siberian populations, such as Khanty and Ket, providing evidence of genetic relatedness across vast geographic distances and between speakers of different language families. Additionally, we observed excessive IBD sharing between Khanty and Bashkir, a group of Turkic speakers from Southern Urals region. While adding some weight to the "Finno-Ugric" origin of Bashkir, our studies highlighted that the Bashkir genepool lacks the main "core", being a multi-layered amalgamation of Turkic, Ugric, Finnish and Indo-European contributions, which points at intricacy of genetic interface between Turkic and Uralic populations. Comparison of the genetic structure of Siberian ethnicities and the geography of the region they inhabit point at existence of the "Great Siberian Vortex" directing genetic exchanges in populations across the Siberian part of Asia. Slavic speakers of Eastern Europe are, in general, very similar in their genetic composition. Ukrainians, Belarusians and Russians have almost identical proportions of Caucasus and Northern European components and have virtually no Asian influence. We capitalized on wide geographic span of our sampling to address intriguing question about the place of origin of Russian Starovers, an enigmatic Eastern Orthodox Old Believers religious group relocated to Siberia in seventeenth century. A comparative reAdmix analysis, complemented by IBD sharing, placed their roots in the region of the Northern European Plain, occupied by North Russians and Finno-Ugric Komi and Karelian people. Russians from Novosibirsk and Russian Starover exhibit ancestral proportions close to that of European Eastern Slavs, however, they also include between five to 10 % of Central Siberian ancestry, not present at this level in their European counterparts. CONCLUSIONS Our project has patched the hole in the genetic map of Eurasia: we demonstrated complexity of genetic structure of Northern Eurasians, existence of East-West and North-South genetic gradients, and assessed different inputs of ancient populations into modern populations.
Collapse
MESH Headings
- Algorithms
- Asia
- DNA
- Datasets as Topic
- Emigration and Immigration/history
- Ethnicity/genetics
- Europe
- Female
- Genetic Variation
- Genetics, Population
- Genotyping Techniques
- History, 15th Century
- History, 16th Century
- History, 17th Century
- History, 18th Century
- History, 19th Century
- History, 20th Century
- History, 21st Century
- History, Ancient
- History, Medieval
- Humans
- Male
- Russia
Collapse
Affiliation(s)
- Petr Triska
- Children's Hospital Los Angeles, Los Angeles, CA, USA
| | - Nikolay Chekanov
- Federal State Institution "Federal Research Centre «Fundamentals of Biotechnology» of the Russian Academy of Sciences", Moscow, Russia
- "Genoanalytica" CJSC, Moscow, Russia
| | - Vadim Stepanov
- Institute of Medical Genetics, Tomsk National Medical Research Center, Russian Academy of Sciences, Siberian Branch, Tomsk, Russia
| | - Elza K Khusnutdinova
- Institute of Biochemistry and Genetics, Russian Academy of Sciences, Ufa Scientific Centre of Russian Academy of Sciences, Ufa, Russia
- Bashkir State University, Ufa, Russia
| | | | - Vita Akhmetova
- Institute of Biochemistry and Genetics, Russian Academy of Sciences, Ufa Scientific Centre of Russian Academy of Sciences, Ufa, Russia
| | - Konstantin Babalyan
- Moscow Institute of Physics and Technology, Department of Molecular and Bio-Physics, Moscow, Russia
| | | | - Vladimir Kharkov
- Institute of Medical Genetics, Tomsk National Medical Research Center, Russian Academy of Sciences, Siberian Branch, Tomsk, Russia
| | - Marina Gubina
- Institute of Cytology and Genetics, Russian Academy of Sciences, Siberian Branch, Novosibirsk, Russia
| | - Irina Khidiyatova
- Institute of Biochemistry and Genetics, Russian Academy of Sciences, Ufa Scientific Centre of Russian Academy of Sciences, Ufa, Russia
- Bashkir State University, Ufa, Russia
| | - Irina Khitrinskaya
- Institute of Medical Genetics, Tomsk National Medical Research Center, Russian Academy of Sciences, Siberian Branch, Tomsk, Russia
| | - Ekaterina E Khrameeva
- "Genoanalytica" CJSC, Moscow, Russia
- Skolkovo Institute of Science and Technology, Skolkovo Innovation Center, Moscow, Russia
| | - Rita Khusainova
- Institute of Biochemistry and Genetics, Russian Academy of Sciences, Ufa Scientific Centre of Russian Academy of Sciences, Ufa, Russia
- Bashkir State University, Ufa, Russia
| | | | - Sergey Litvinov
- Institute of Biochemistry and Genetics, Russian Academy of Sciences, Ufa Scientific Centre of Russian Academy of Sciences, Ufa, Russia
| | - Andrey Marusin
- Institute of Medical Genetics, Tomsk National Medical Research Center, Russian Academy of Sciences, Siberian Branch, Tomsk, Russia
| | - Alexandr M Mazur
- Federal State Institution "Federal Research Centre «Fundamentals of Biotechnology» of the Russian Academy of Sciences", Moscow, Russia
| | - Valery Puzyrev
- Institute of Medical Genetics, Tomsk National Medical Research Center, Russian Academy of Sciences, Siberian Branch, Tomsk, Russia
| | - Dinara Ivanoshchuk
- Institute of Cytology and Genetics, Russian Academy of Sciences, Siberian Branch, Novosibirsk, Russia
| | - Maria Spiridonova
- Institute of Medical Genetics, Tomsk National Medical Research Center, Russian Academy of Sciences, Siberian Branch, Tomsk, Russia
| | - Anton Teslyuk
- Moscow Institute of Physics and Technology, Department of Molecular and Bio-Physics, Moscow, Russia
| | - Svetlana Tsygankova
- Moscow Institute of Physics and Technology, Department of Molecular and Bio-Physics, Moscow, Russia
| | - Martin Triska
- Children's Hospital Los Angeles, Los Angeles, CA, USA
| | - Natalya Trofimova
- Institute of Biochemistry and Genetics, Russian Academy of Sciences, Ufa Scientific Centre of Russian Academy of Sciences, Ufa, Russia
| | - Edward Vajda
- Department of Modern and Classical Languages, Western Washington University, Bellingham, WA, USA
| | - Oleg Balanovsky
- Research Centre for Medical Genetics, Moscow, Russia
- Vavilov Institute of General Genetics, Moscow, Russia
| | - Ancha Baranova
- Research Centre for Medical Genetics, Moscow, Russia
- School of Systems Biology, George Mason University, Fairfax, VA, USA
- Atlas Biomed Group, Moscow, Russia
| | - Konstantin Skryabin
- Federal State Institution "Federal Research Centre «Fundamentals of Biotechnology» of the Russian Academy of Sciences", Moscow, Russia
- Russian Scientific Centre "Kurchatov Institute", Moscow, Russia
- Department of Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Tatiana V Tatarinova
- Vavilov Institute of General Genetics, Moscow, Russia.
- School of Systems Biology, George Mason University, Fairfax, VA, USA.
- Atlas Biomed Group, Moscow, Russia.
- Department of Biology, University of La Verne, La Verne, CA, USA.
- A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.
| | - Egor Prokhortchouk
- Federal State Institution "Federal Research Centre «Fundamentals of Biotechnology» of the Russian Academy of Sciences", Moscow, Russia.
- Department of Biology, Lomonosov Moscow State University, Moscow, Russia.
| |
Collapse
|
7
|
Application of geographic population structure (GPS) algorithm for biogeographical analyses of populations with complex ancestries: a case study of South Asians from 1000 genomes project. BMC Genet 2017; 18:109. [PMID: 29297311 PMCID: PMC5751663 DOI: 10.1186/s12863-017-0579-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Background The utilization of biological data to infer the geographic origins of human populations has been a long standing quest for biologists and anthropologists. Several biogeographical analysis tools have been developed to infer the geographical origins of human populations utilizing genetic data. However due to the inherent complexity of genetic information these approaches are prone to misinterpretations. The Geographic Population Structure (GPS) algorithm is an admixture based tool for biogeographical analyses and has been employed for the geo-localization of various populations worldwide. Here we sought to dissect its sensitivity and accuracy for localizing highly admixed groups. Given the complex history of population dispersal and gene flow in the Indian subcontinent, we have employed the GPS tool to localize five South Asian populations, Punjabi, Gujarati, Tamil, Telugu and Bengali from the 1000 Genomes project, some of whom were recent migrants to USA and UK, using populations from the Indian subcontinent available in Human Genome Diversity Panel (HGDP) and those previously described as reference. Results Our findings demonstrate reasonably high accuracy with regards to GPS assignment even for recent migrant populations sampled elsewhere, namely the Tamil, Telugu and Gujarati individuals, where 96%, 87% and 79% of the individuals, respectively, were positioned within 600 km of their native locations. While the absence of appropriate reference populations resulted in moderate-to-low levels of precision in positioning of Punjabi and Bengali genomes. Conclusions Our findings reflect that the GPS approach is useful but likely overtly dependent on the relative proportions of admixture in the reference populations for determination of the biogeographical origins of test individuals. We conclude that further modifications are desired to make this approach more suitable for highly admixed individuals. Electronic supplementary material The online version of this article (doi: 10.1186/s12863-017-0579-2) contains supplementary material, which is available to authorized users.
Collapse
|
8
|
Das R, Wexler P, Pirooznia M, Elhaik E. The Origins of Ashkenaz, Ashkenazic Jews, and Yiddish. Front Genet 2017; 8:87. [PMID: 28680441 PMCID: PMC5478715 DOI: 10.3389/fgene.2017.00087] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2016] [Accepted: 06/07/2017] [Indexed: 12/11/2022] Open
Abstract
Recently, the geographical origins of Ashkenazic Jews (AJs) and their native language Yiddish were investigated by applying the Geographic Population Structure (GPS) to a cohort of exclusively Yiddish-speaking and multilingual AJs. GPS localized most AJs along major ancient trade routes in northeastern Turkey adjacent to primeval villages with names that resemble the word "Ashkenaz." These findings were compatible with the hypothesis of an Irano-Turko-Slavic origin for AJs and a Slavic origin for Yiddish and at odds with the Rhineland hypothesis advocating a Levantine origin for AJs and German origins for Yiddish. We discuss how these findings advance three ongoing debates concerning (1) the historical meaning of the term "Ashkenaz;" (2) the genetic structure of AJs and their geographical origins as inferred from multiple studies employing both modern and ancient DNA and original ancient DNA analyses; and (3) the development of Yiddish. We provide additional validation to the non-Levantine origin of AJs using ancient DNA from the Near East and the Levant. Due to the rising popularity of geo-localization tools to address questions of origin, we briefly discuss the advantages and limitations of popular tools with focus on the GPS approach. Our results reinforce the non-Levantine origins of AJs.
Collapse
Affiliation(s)
- Ranajit Das
- Manipal Centre for Natural Sciences, Manipal UniversityManipal, India
| | - Paul Wexler
- Department of Linguistics, Tel Aviv UniversityTel-Aviv, Israel
| | - Mehdi Pirooznia
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins UniversityBaltimore, MD, United States
| | - Eran Elhaik
- Department of Animal and Plant Sciences, University of SheffieldSheffield, United Kingdom
| |
Collapse
|
9
|
Morozova I, Flegontov P, Mikheyev AS, Bruskin S, Asgharian H, Ponomarenko P, Klyuchnikov V, ArunKumar G, Prokhortchouk E, Gankin Y, Rogaev E, Nikolsky Y, Baranova A, Elhaik E, Tatarinova TV. Toward high-resolution population genomics using archaeological samples. DNA Res 2016; 23:295-310. [PMID: 27436340 PMCID: PMC4991838 DOI: 10.1093/dnares/dsw029] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2015] [Accepted: 05/22/2016] [Indexed: 12/30/2022] Open
Abstract
The term ‘ancient DNA’ (aDNA) is coming of age, with over 1,200 hits in the PubMed database, beginning in the early 1980s with the studies of ‘molecular paleontology’. Rooted in cloning and limited sequencing of DNA from ancient remains during the pre-PCR era, the field has made incredible progress since the introduction of PCR and next-generation sequencing. Over the last decade, aDNA analysis ushered in a new era in genomics and became the method of choice for reconstructing the history of organisms, their biogeography, and migration routes, with applications in evolutionary biology, population genetics, archaeogenetics, paleo-epidemiology, and many other areas. This change was brought by development of new strategies for coping with the challenges in studying aDNA due to damage and fragmentation, scarce samples, significant historical gaps, and limited applicability of population genetics methods. In this review, we describe the state-of-the-art achievements in aDNA studies, with particular focus on human evolution and demographic history. We present the current experimental and theoretical procedures for handling and analysing highly degraded aDNA. We also review the challenges in the rapidly growing field of ancient epigenomics. Advancement of aDNA tools and methods signifies a new era in population genetics and evolutionary medicine research.
Collapse
Affiliation(s)
- Irina Morozova
- Institute of Evolutionary Medicine, University of Zurich, Zurich, Switzerland
| | - Pavel Flegontov
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic Bioinformatics Center, A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russian Federation
| | - Alexander S Mikheyev
- Ecology and Evolution Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Sergey Bruskin
- Vavilov Institute of General Genetics RAS, Moscow, Russia
| | - Hosseinali Asgharian
- Department of Computational and Molecular Biology, University of Southern California, Los Angeles, CA, USA
| | - Petr Ponomarenko
- Center for Personalized Medicine, Children's Hospital Los Angeles, Los Angeles, CA, USA Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA
| | | | | | - Egor Prokhortchouk
- Research Center of Biotechnology RAS, Moscow, Russia Department of Biology, Lomonosov Moscow State University, Russia
| | | | - Evgeny Rogaev
- Vavilov Institute of General Genetics RAS, Moscow, Russia University of Massachusetts Medical School, Worcester, MA, USA
| | - Yuri Nikolsky
- Vavilov Institute of General Genetics RAS, Moscow, Russia F1 Genomics, San Diego, CA, USA School of Systems Biology, George Mason University, VA, USA
| | - Ancha Baranova
- School of Systems Biology, George Mason University, VA, USA Research Centre for Medical Genetics, Moscow, Russia Atlas Biomed Group, Moscow, Russia
| | - Eran Elhaik
- Department of Animal & Plant Sciences, University of Sheffield, Sheffield, South Yorkshire, UK
| | - Tatiana V Tatarinova
- Bioinformatics Center, A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russian Federation Center for Personalized Medicine, Children's Hospital Los Angeles, Los Angeles, CA, USA Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
10
|
Payseur BA, Rieseberg LH. A genomic perspective on hybridization and speciation. Mol Ecol 2016; 25:2337-60. [PMID: 26836441 PMCID: PMC4915564 DOI: 10.1111/mec.13557] [Citation(s) in RCA: 292] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Revised: 01/18/2016] [Accepted: 01/25/2016] [Indexed: 12/13/2022]
Abstract
Hybridization among diverging lineages is common in nature. Genomic data provide a special opportunity to characterize the history of hybridization and the genetic basis of speciation. We review existing methods and empirical studies to identify recent advances in the genomics of hybridization, as well as issues that need to be addressed. Notable progress has been made in the development of methods for detecting hybridization and inferring individual ancestries. However, few approaches reconstruct the magnitude and timing of gene flow, estimate the fitness of hybrids or incorporate knowledge of recombination rate. Empirical studies indicate that the genomic consequences of hybridization are complex, including a highly heterogeneous landscape of differentiation. Inferred characteristics of hybridization differ substantially among species groups. Loci showing unusual patterns - which may contribute to reproductive barriers - are usually scattered throughout the genome, with potential enrichment in sex chromosomes and regions of reduced recombination. We caution against the growing trend of interpreting genomic variation in summary statistics across genomes as evidence of differential gene flow. We argue that converting genomic patterns into useful inferences about hybridization will ultimately require models and methods that directly incorporate key ingredients of speciation, including the dynamic nature of gene flow, selection acting in hybrid populations and recombination rate variation.
Collapse
Affiliation(s)
- Bret A. Payseur
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Loren H. Rieseberg
- Department of Botany, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| |
Collapse
|
11
|
Bradburd GS, Ralph PL, Coop GM. A Spatial Framework for Understanding Population Structure and Admixture. PLoS Genet 2016; 12:e1005703. [PMID: 26771578 PMCID: PMC4714911 DOI: 10.1371/journal.pgen.1005703] [Citation(s) in RCA: 89] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Accepted: 11/05/2015] [Indexed: 01/26/2023] Open
Abstract
Geographic patterns of genetic variation within modern populations, produced by complex histories of migration, can be difficult to infer and visually summarize. A general consequence of geographically limited dispersal is that samples from nearby locations tend to be more closely related than samples from distant locations, and so genetic covariance often recapitulates geographic proximity. We use genome-wide polymorphism data to build "geogenetic maps," which, when applied to stationary populations, produces a map of the geographic positions of the populations, but with distances distorted to reflect historical rates of gene flow. In the underlying model, allele frequency covariance is a decreasing function of geogenetic distance, and nonlocal gene flow such as admixture can be identified as anomalously strong covariance over long distances. This admixture is explicitly co-estimated and depicted as arrows, from the source of admixture to the recipient, on the geogenetic map. We demonstrate the utility of this method on a circum-Tibetan sampling of the greenish warbler (Phylloscopus trochiloides), in which we find evidence for gene flow between the adjacent, terminal populations of the ring species. We also analyze a global sampling of human populations, for which we largely recover the geography of the sampling, with support for significant histories of admixture in many samples. This new tool for understanding and visualizing patterns of population structure is implemented in a Bayesian framework in the program SpaceMix.
Collapse
Affiliation(s)
- Gideon S. Bradburd
- Center for Population Biology, Department of Evolution and Ecology, University of California, Davis, California, United States of America
| | - Peter L. Ralph
- Department of Molecular and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - Graham M. Coop
- Center for Population Biology, Department of Evolution and Ecology, University of California, Davis, California, United States of America
| |
Collapse
|
12
|
Kozlov K, Chebotarev D, Hassan M, Triska M, Triska P, Flegontov P, Tatarinova TV. Differential Evolution approach to detect recent admixture. BMC Genomics 2015; 16 Suppl 8:S9. [PMID: 26111206 PMCID: PMC4480842 DOI: 10.1186/1471-2164-16-s8-s9] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The genetic structure of human populations is extraordinarily complex and of fundamental importance to studies of anthropology, evolution, and medicine. As increasingly many individuals are of mixed origin, there is an unmet need for tools that can infer multiple origins. Misclassification of such individuals can lead to incorrect and costly misinterpretations of genomic data, primarily in disease studies and drug trials. We present an advanced tool to infer ancestry that can identify the biogeographic origins of highly mixed individuals. reAdmix can incorporate individual's knowledge of ancestors (e.g. having some ancestors from Turkey or a Scottish grandmother). reAdmix is an online tool available at http://chcb.saban-chla.usc.edu/reAdmix/.
Collapse
|
13
|
Abstract
Modeling human genetic variation along the continuous geographic space is a new research direction that has been stirring interest in the community during the past few years. Multiple recent works suggested different probabilistic models for the relation between geography and genetic sequence, and applied them to geographic localization, detection of selection, and correction of confounding in Genome-Wide Association Studies (GWAS). Prior to these developments, continuous representations of genetic structure were produced almost exclusively using dimensionality reduction techniques, mostly principal component analysis (PCA). Although fast and effective in some tasks, PCA suffers from multiple disadvantages, primarily stemming from a lack of explicit underlying genetic model. We begin this note by explaining the implicit spatio-genetic model that underlies PCA. Our presentation provides insights into some of the recently proposed spatial models; particularly, we show that two of these models can be formulated as modifications of PCA, each removing one of PCA's limitations in the context of genetic analysis. We build on one of the models to derive a nonsupervised procedure for the inference of spatial structure, and empirically demonstrate that it outperforms PCA in spatial inference. We then go on to review a few additional recent works in this unifying perspective.
Collapse
Affiliation(s)
- Yael Baran
- 1 The Blavatnik School of Computer Science, Tel Aviv University , Tel Aviv, Israel
| | - Eran Halperin
- 1 The Blavatnik School of Computer Science, Tel Aviv University , Tel Aviv, Israel .,2 Department of Molecular Microbiology and Biotechnology, Tel Aviv University , Tel Aviv, Israel .,3 International Computer Science Institute , Berkeley, California
| |
Collapse
|