1
|
Baey C, Smith HG, Rundlöf M, Olsson O, Clough Y, Sahlin U. Calibration of a bumble bee foraging model using Approximate Bayesian Computation. Ecol Modell 2023. [DOI: 10.1016/j.ecolmodel.2022.110251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
2
|
Polymorphisms and microvariant sequences in the Japanese population for 25 Y-STR markers and their relationships to Y-chromosome haplogroups. Forensic Sci Int Genet 2019; 41:e1-e7. [DOI: 10.1016/j.fsigen.2019.03.004] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Revised: 02/04/2019] [Accepted: 03/03/2019] [Indexed: 01/22/2023]
|
3
|
Hernández CL, Dugoujon JM, Sánchez-Martínez LJ, Cuesta P, Novelletto A, Calderón R. Paternal lineages in southern Iberia provide time frames for gene flow from mainland Europe and the Mediterranean world. Ann Hum Biol 2019; 46:63-76. [PMID: 30822152 DOI: 10.1080/03014460.2019.1587507] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
BACKGROUND The geography of southern Iberia and an abundant archaeological record of human occupation are ideal conditions for a full understanding of scenarios of genetic history in the area. Recent advances in the phylogeography of Y-chromosome lineages offer the opportunity to set upper bounds for the appearance of different genetic components. AIM To provide a global knowledge on the Y haplogroups observed in Andalusia with their Y microsatellite variation. Preferential attention is given to the vehement debate about the age, origin and expansion of R1b-M269 clade and sub-lineages. SUBJECT AND METHODS Four hundred and fourteen male DNA samples from western and eastern autochthonous Andalusians were genotyped for a set of Y-SNPs and Y-STRs. Gene diversity, potential population genetic structures and coalescent times were assessed. RESULTS Most of the analysed samples belong to the European haplogroup R1b1a1a2-M269, whereas haplogroups E, J, I, G and T show lower frequencies. A phylogenetic dissection of the R1b-M269 was performed and younger time frames than those previously reported in the literature were obtained for its sub-lineages. CONCLUSION The particular Andalusian R1b-M269 assemblage confirms the shallow topology of the clade. Moreover, the sharing of lineages with the rest of Europe indicates the impact in Iberia of an amount of pre-existing diversity, with the possible exception of R1b-DF27. Lineages such as J2-M172 and G-M201 highlight the importance of maritime travels of early farmers who reached the Iberian Peninsula.
Collapse
Affiliation(s)
- Candela L Hernández
- a Departamento de Biodiversidad, Ecología y Evolución, Facultad de Biología , Universidad Complutense , Madrid , Spain
| | - Jean-Michel Dugoujon
- b CNRS UMR 5288 Laboratoire d'Anthropologie Moléculaire et d'Imagerie de Synthèse (AMIS) , Université Paul Sabatier Toulouse III , Toulouse , France
| | - Luis J Sánchez-Martínez
- a Departamento de Biodiversidad, Ecología y Evolución, Facultad de Biología , Universidad Complutense , Madrid , Spain
| | - Pedro Cuesta
- c Centro de Proceso de Datos , Universidad Complutense , Madrid , Spain
| | | | - Rosario Calderón
- a Departamento de Biodiversidad, Ecología y Evolución, Facultad de Biología , Universidad Complutense , Madrid , Spain
| |
Collapse
|
4
|
Zhang C, Ogilvie HA, Drummond AJ, Stadler T. Bayesian Inference of Species Networks from Multilocus Sequence Data. Mol Biol Evol 2019; 35:504-517. [PMID: 29220490 PMCID: PMC5850812 DOI: 10.1093/molbev/msx307] [Citation(s) in RCA: 100] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Reticulate species evolution, such as hybridization or introgression, is relatively common in nature. In the presence of reticulation, species relationships can be captured by a rooted phylogenetic network, and orthologous gene evolution can be modeled as bifurcating gene trees embedded in the species network. We present a Bayesian approach to jointly infer species networks and gene trees from multilocus sequence data. A novel birth-hybridization process is used as the prior for the species network, and we assume a multispecies network coalescent prior for the embedded gene trees. We verify the ability of our method to correctly sample from the posterior distribution, and thus to infer a species network, through simulations. To quantify the power of our method, we reanalyze two large data sets of genes from spruces and yeasts. For the three closely related spruces, we verify the previously suggested homoploid hybridization event in this clade; for the yeast data, we find extensive hybridization events. Our method is available within the BEAST 2 add-on SpeciesNetwork, and thus provides an extensible framework for Bayesian inference of reticulate evolution.
Collapse
Affiliation(s)
- Chi Zhang
- Department of Biosystems Science and Engineering, Eidgenössische Technische Hochschule Zürich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Switzerland.,Key Laboratory of Vertebrate Evolution and Human Origins of Chinese Academy of Sciences, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing, China
| | - Huw A Ogilvie
- Division of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australia.,Centre for Computational Evolution, University of Auckland, Auckland, New Zealand
| | - Alexei J Drummond
- Centre for Computational Evolution, University of Auckland, Auckland, New Zealand.,Department of Computer Science, University of Auckland, Auckland, New Zealand
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, Eidgenössische Technische Hochschule Zürich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Switzerland
| |
Collapse
|
5
|
Hey J, Chung Y, Sethuraman A, Lachance J, Tishkoff S, Sousa VC, Wang Y. Phylogeny Estimation by Integration over Isolation with Migration Models. Mol Biol Evol 2018; 35:2805-2818. [PMID: 30137463 PMCID: PMC6231491 DOI: 10.1093/molbev/msy162] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Phylogeny estimation is difficult for closely related populations and species, especially if they have been exchanging genes. We present a hierarchical Bayesian, Markov-chain Monte Carlo method with a state space that includes all possible phylogenies in a full Isolation-with-Migration model framework. The method is based on a new type of genealogy augmentation called a "hidden genealogy" that enables efficient updating of the phylogeny. This is the first likelihood-based method to fully incorporate directional gene flow and genetic drift for estimation of a species or population phylogeny. Application to human hunter-gatherer populations from Africa revealed a clear phylogenetic history, with strong support for gene exchange with an unsampled ghost population, and relatively ancient divergence between a ghost population and modern human populations, consistent with human/archaic divergence. In contrast, a study of five chimpanzee populations reveals a clear phylogeny with several pairs of populations having exchanged DNA, but does not support a history with an unsampled ghost population.
Collapse
Affiliation(s)
- Jody Hey
- Department of Biology, Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA
| | - Yujin Chung
- Department of Biology, Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA
- The Department of Applied Statistics, Kyonggi University, Suwon, South Korea
| | - Arun Sethuraman
- Department of Biology, Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA
- Department of Biological Sciences, California State University San Marcos, San Marcos, CA
| | - Joseph Lachance
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
- Georgia Institute of Technology, Atlanta, GA
| | - Sarah Tishkoff
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Vitor C Sousa
- Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, NJ
- University of Lisbon, Lisboa, Portugal
| | - Yong Wang
- Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, NJ
- Ancestry, San Francisco, CA
| |
Collapse
|
6
|
Chung Y, Hey J. Bayesian Analysis of Evolutionary Divergence with Genomic Data under Diverse Demographic Models. Mol Biol Evol 2017; 34:1517-1528. [PMID: 28333230 DOI: 10.1093/molbev/msx070] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We present a new Bayesian method for estimating demographic and phylogenetic history using population genomic data. Several key innovations are introduced that allow the study of diverse models within an Isolation-with-Migration framework. The new method implements a 2-step analysis, with an initial Markov chain Monte Carlo (MCMC) phase that samples simple coalescent trees, followed by the calculation of the joint posterior density for the parameters of a demographic model. In step 1, the MCMC sampling phase, the method uses a reduced state space, consisting of coalescent trees without migration paths, and a simple importance sampling distribution without the demography of interest. Once obtained, a single sample of trees can be used in step 2 to calculate the joint posterior density for model parameters under multiple diverse demographic models, without having to repeat MCMC runs. Because migration paths are not included in the state space of the MCMC phase, but rather are handled by analytic integration in step 2 of the analysis, the method is scalable to a large number of loci with excellent MCMC mixing properties. With an implementation of the new method in the computer program MIST, we demonstrate the method's accuracy, scalability, and other advantages using simulated data and DNA sequences of two common chimpanzee subspecies: Pan troglodytes (P. t.) troglodytes and P. t. verus.
Collapse
Affiliation(s)
- Yujin Chung
- Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA
| | - Jody Hey
- Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA
| |
Collapse
|
7
|
Cabrera AA, Palsbøll PJ. Inferring past demographic changes from contemporary genetic data: A simulation-based evaluation of the ABC methods implemented indiyabc. Mol Ecol Resour 2017; 17:e94-e110. [DOI: 10.1111/1755-0998.12696] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Revised: 06/12/2017] [Accepted: 06/20/2017] [Indexed: 01/19/2023]
Affiliation(s)
- Andrea A. Cabrera
- Marine Evolution and Conservation; Groningen Institute of Evolutionary Life Sciences; University of Groningen; Groningen The Netherlands
| | - Per J. Palsbøll
- Marine Evolution and Conservation; Groningen Institute of Evolutionary Life Sciences; University of Groningen; Groningen The Netherlands
| |
Collapse
|
8
|
Zhu W, Marin JM, Leisen F. A Bootstrap Likelihood Approach to Bayesian Computation. AUST NZ J STAT 2016. [DOI: 10.1111/anzs.12156] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Weixuan Zhu
- Departamento de Estadistica; Universidad Carlos III de Madrid; Calle Madrid 126 28903 Getafe Madrid Spain
| | - J. Miguel Marin
- Departamento de Estadistica; Universidad Carlos III de Madrid; Calle Madrid 126 28903 Getafe Madrid Spain
| | - Fabrizio Leisen
- School of Mathematics, Statistics and Actuarial Science; University of Kent; Canterbury CT2 7NF UK
| |
Collapse
|
9
|
Abstract
Stephens and Donnelly (2000) constructed an efficient sequential importance-sampling proposal distribution on coalescent histories of a sample of genes for computing the likelihood of a type configuration of genes in the sample. In the current paper a characterization of their importance-sampling proposal distribution is given in terms of the diffusion-process generator describing the distribution of the population gene frequencies. This characterization leads to a new technique for constructing importance-sampling algorithms in a much more general framework when the distribution of population gene frequencies follows a diffusion process, by approximating the generator of the process.
Collapse
|
10
|
Donnelly P, Wiuf C, Hein J, Slatkin M, Ewens WJ, Kingman JFC. Discussion: Recent Common Ancestors of all Present-Day Individuals. ADV APPL PROBAB 2016. [DOI: 10.1239/aap/1029955257] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
11
|
Abstract
Stephens and Donnelly (2000) constructed an efficient sequential importance-sampling proposal distribution on coalescent histories of a sample of genes for computing the likelihood of a type configuration of genes in the sample. In the current paper a characterization of their importance-sampling proposal distribution is given in terms of the diffusion-process generator describing the distribution of the population gene frequencies. This characterization leads to a new technique for constructing importance-sampling algorithms in a much more general framework when the distribution of population gene frequencies follows a diffusion process, by approximating the generator of the process.
Collapse
|
12
|
Willems T, Gymrek M, Poznik G, Tyler-Smith C, Erlich Y, Erlich Y. Population-Scale Sequencing Data Enable Precise Estimates of Y-STR Mutation Rates. Am J Hum Genet 2016; 98:919-933. [PMID: 27126583 DOI: 10.1016/j.ajhg.2016.04.001] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2016] [Accepted: 04/01/2016] [Indexed: 01/23/2023] Open
Abstract
Short tandem repeats (STRs) are mutation-prone loci that span nearly 1% of the human genome. Previous studies have estimated the mutation rates of highly polymorphic STRs by using capillary electrophoresis and pedigree-based designs. Although this work has provided insights into the mutational dynamics of highly mutable STRs, the mutation rates of most others remain unknown. Here, we harnessed whole-genome sequencing data to estimate the mutation rates of Y chromosome STRs (Y-STRs) with 2-6 bp repeat units that are accessible to Illumina sequencing. We genotyped 4,500 Y-STRs by using data from the 1000 Genomes Project and the Simons Genome Diversity Project. Next, we developed MUTEA, an algorithm that infers STR mutation rates from population-scale data by using a high-resolution SNP-based phylogeny. After extensive intrinsic and extrinsic validations, we harnessed MUTEA to derive mutation-rate estimates for 702 polymorphic STRs by tracing each locus over 222,000 meioses, resulting in the largest collection of Y-STR mutation rates to date. Using our estimates, we identified determinants of STR mutation rates and built a model to predict rates for STRs across the genome. These predictions indicate that the load of de novo STR mutations is at least 75 mutations per generation, rivaling the load of all other known variant types. Finally, we identified Y-STRs with potential applications in forensics and genetic genealogy, assessed the ability to differentiate between the Y chromosomes of father-son pairs, and imputed Y-STR genotypes.
Collapse
Affiliation(s)
| | | | | | | | | | - Yaniv Erlich
- New York Genome Center, New York, NY 10013, USA; Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02139, USA; Department of Computer Science, Fu Foundation School of Engineering, Columbia University, New York, NY 10027, USA; Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10032, USA.
| |
Collapse
|
13
|
Hall M, Woolhouse M, Rambaut A. Epidemic Reconstruction in a Phylogenetics Framework: Transmission Trees as Partitions of the Node Set. PLoS Comput Biol 2015; 11:e1004613. [PMID: 26717515 PMCID: PMC4701012 DOI: 10.1371/journal.pcbi.1004613] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2015] [Accepted: 10/17/2015] [Indexed: 12/14/2022] Open
Abstract
The use of genetic data to reconstruct the transmission tree of infectious disease epidemics and outbreaks has been the subject of an increasing number of studies, but previous approaches have usually either made assumptions that are not fully compatible with phylogenetic inference, or, where they have based inference on a phylogeny, have employed a procedure that requires this tree to be fixed. At the same time, the coalescent-based models of the pathogen population that are employed in the methods usually used for time-resolved phylogeny reconstruction are a considerable simplification of epidemic process, as they assume that pathogen lineages mix freely. Here, we contribute a new method that is simultaneously a phylogeny reconstruction method for isolates taken from an epidemic, and a procedure for transmission tree reconstruction. We observe that, if one or more samples is taken from each host in an epidemic or outbreak and these are used to build a phylogeny, a transmission tree is equivalent to a partition of the set of nodes of this phylogeny, such that each partition element is a set of nodes that is connected in the full tree and contains all the tips corresponding to samples taken from one and only one host. We then implement a Monte Carlo Markov Chain (MCMC) procedure for simultaneous sampling from the spaces of both trees, utilising a newly-designed set of phylogenetic tree proposals that also respect node partitions. We calculate the posterior probability of these partitioned trees based on a model that acknowledges the population structure of an epidemic by employing an individual-based disease transmission model and a coalescent process taking place within each host. We demonstrate our method, first using simulated data, and then with sequences taken from the H7N7 avian influenza outbreak that occurred in the Netherlands in 2003. We show that it is superior to established coalescent methods for reconstructing the topology and node heights of the phylogeny and performs well for transmission tree reconstruction when the phylogeny is well-resolved by the genetic data, but caution that this will often not be the case in practice and that existing genetic and epidemiological data should be used to configure such analyses whenever possible. This method is available for use by the research community as part of BEAST, one of the most widely-used packages for reconstruction of dated phylogenies. With sequence data becoming available in increasing high volumes and at decreasing costs, there has been substantial recent interest in the possibility of using pathogen genome sequences as a means to retrace the spread of disease amongst the infected hosts in an epidemic. While several such methods exist, many of them are not fully compatible with phylogenetic inference, which is the most commonly-used methodology for exploring the ancestry of the isolates represented by a set of sequences. Procedures using phylogenetics as a basis have either taken a single, fixed phylogenetic tree as input, or have been quite narrow in scope and not available in any current package for general use. For their part, standard phylogenetic methods usually assume a model of the pathogen population that is overly simplistic for the situation in an epidemic. Here, we bridge the gap by introducing a new, highly flexible method, implemented in the publicly-available BEAST package, which simultaneously reconstructs the transmission history of an epidemic and the phylogeny for samples taken from it. We apply the procedure to simulated data and to sequences from the 2003 H7N7 avian influenza outbreak in the Netherlands.
Collapse
Affiliation(s)
- Matthew Hall
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
- Centre for Immunity, Infection and Evolution, University of Edinburgh, Edinburgh, United Kingdom
- * E-mail:
| | - Mark Woolhouse
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
- Centre for Immunity, Infection and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
- Centre for Immunity, Infection and Evolution, University of Edinburgh, Edinburgh, United Kingdom
- Fogarty International Center, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
14
|
Hey J, Chung Y, Sethuraman A. On the occurrence of false positives in tests of migration under an isolation-with-migration model. Mol Ecol 2015; 24:5078-83. [PMID: 26456794 DOI: 10.1111/mec.13381] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Revised: 07/02/2015] [Accepted: 07/15/2015] [Indexed: 12/24/2022]
Abstract
The population genetic study of divergence is often carried out using a Bayesian genealogy sampler, like those implemented in ima2 and related programs, and these analyses frequently include a likelihood ratio test of the null hypothesis of no migration between populations. Cruickshank and Hahn (2014, Molecular Ecology, 23, 3133-3157) recently reported a high rate of false-positive test results with ima2 for data simulated with small numbers of loci under models with no migration and recent splitting times. We confirm these findings and discover that they are caused by a failure of the assumptions underlying likelihood ratio tests that arises when using marginal likelihoods for a subset of model parameters. We also show that for small data sets, with little divergence between samples from two populations, an excellent fit can often be found by a model with a low migration rate and recent splitting time and a model with a high migration rate and a deep splitting time.
Collapse
Affiliation(s)
- Jody Hey
- Center for Computational Genetics and Genomics, Temple University, 1900 N. 12th Street, Philadelphia, PA, 19122, USA
| | - Yujin Chung
- Center for Computational Genetics and Genomics, Temple University, 1900 N. 12th Street, Philadelphia, PA, 19122, USA
| | - Arun Sethuraman
- Center for Computational Genetics and Genomics, Temple University, 1900 N. 12th Street, Philadelphia, PA, 19122, USA
| |
Collapse
|
15
|
Triki-Fendri S, Sánchez-Diz P, Rey-González D, Alfadhli S, Ayadi I, Ben Marzoug R, Carracedo Á, Rebai A. Genetic structure of the Kuwaiti population revealed by paternal lineages. Am J Hum Biol 2015; 28:203-12. [PMID: 26293354 DOI: 10.1002/ajhb.22773] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2014] [Revised: 06/18/2015] [Accepted: 07/25/2015] [Indexed: 12/27/2022] Open
Abstract
OBJECTIVE We analyzed the Y-chromosome haplogroup diversity in the Kuwaiti population to gain a more complete overview of its genetic landscape. METHOD A sample of 117 males from the Kuwaiti population was studied through the analysis of 22 Y-SNPs. The results were then interpreted in conjunction with those of other populations from the Middle East, South Asia, North and East Africa, and East Europe. RESULTS The analyzed markers allowed the discrimination of 19 different haplogroups with a diversity of 0.7713. J-M304 was the most frequent haplogroup in the Kuwaiti population (55.5%) followed by E-M96 (18%). They revealed a genetic homogeneity between the Kuwaiti population and those of the Middle East (FST = 6.1%, P-value < 0.0001), although a significant correlation between genetic and geographic distances was found (r = 0.41, P-value = 0.009). Moreover, the nonsignificant pairwise FST genetic distances between the Kuwait population on the one hand and the Arabs of Iran and those of Sudan on the other, corroborate the hypothesis of bidirectional gene flow between Arabia and both Iran and Sudan. CONCLUSION Overall, we have revealed that the Kuwaiti population has experienced significant gene flow from neighboring populations like Saudi Arabia, Iran, and East Africa. Therefore, we have confirmed that the population of Kuwait is genetically coextensive with those of the Middle East.
Collapse
Affiliation(s)
- Soumaya Triki-Fendri
- Laboratory of Molecular and Cellular Screening Processes, Centre of Biotechnology of Sfax, BP1177 Route Sidi Mansour Km 6, Sfax, Tunisia
| | - Paula Sánchez-Diz
- Forensic Genetics Unit, Institute of Forensic Science, University of Santiago De Compostela, Santiago De Compostela, Galicia, Spain
| | - Danel Rey-González
- Forensic Genetics Unit, Institute of Forensic Science, University of Santiago De Compostela, Santiago De Compostela, Galicia, Spain
| | - Suad Alfadhli
- Department of Medical Laboratory Sciences, Faculty of Allied Health Sciences, Kuwait University, Kuwait
| | - Imen Ayadi
- Laboratory of Molecular and Cellular Screening Processes, Centre of Biotechnology of Sfax, BP1177 Route Sidi Mansour Km 6, Sfax, Tunisia
| | - Riadh Ben Marzoug
- Laboratory of Molecular and Cellular Screening Processes, Centre of Biotechnology of Sfax, BP1177 Route Sidi Mansour Km 6, Sfax, Tunisia
| | - Ángel Carracedo
- Forensic Genetics Unit, Institute of Forensic Science, University of Santiago De Compostela, Santiago De Compostela, Galicia, Spain.,Center of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Ahmed Rebai
- Laboratory of Molecular and Cellular Screening Processes, Centre of Biotechnology of Sfax, BP1177 Route Sidi Mansour Km 6, Sfax, Tunisia
| |
Collapse
|
16
|
Merker M, Blin C, Mona S, Duforet-Frebourg N, Lecher S, Willery E, Blum MGB, Rüsch-Gerdes S, Mokrousov I, Aleksic E, Allix-Béguec C, Antierens A, Augustynowicz-Kopeć E, Ballif M, Barletta F, Beck HP, Barry CE, Bonnet M, Borroni E, Campos-Herrero I, Cirillo D, Cox H, Crowe S, Crudu V, Diel R, Drobniewski F, Fauville-Dufaux M, Gagneux S, Ghebremichael S, Hanekom M, Hoffner S, Jiao WW, Kalon S, Kohl TA, Kontsevaya I, Lillebæk T, Maeda S, Nikolayevskyy V, Rasmussen M, Rastogi N, Samper S, Sanchez-Padilla E, Savic B, Shamputa IC, Shen A, Sng LH, Stakenas P, Toit K, Varaine F, Vukovic D, Wahl C, Warren R, Supply P, Niemann S, Wirth T. Evolutionary history and global spread of the Mycobacterium tuberculosis Beijing lineage. Nat Genet 2015; 47:242-9. [PMID: 25599400 PMCID: PMC11044984 DOI: 10.1038/ng.3195] [Citation(s) in RCA: 354] [Impact Index Per Article: 39.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2014] [Accepted: 12/19/2014] [Indexed: 01/18/2023]
Abstract
Mycobacterium tuberculosis strains of the Beijing lineage are globally distributed and are associated with the massive spread of multidrug-resistant (MDR) tuberculosis in Eurasia. Here we reconstructed the biogeographical structure and evolutionary history of this lineage by genetic analysis of 4,987 isolates from 99 countries and whole-genome sequencing of 110 representative isolates. We show that this lineage initially originated in the Far East, from where it radiated worldwide in several waves. We detected successive increases in population size for this pathogen over the last 200 years, practically coinciding with the Industrial Revolution, the First World War and HIV epidemics. Two MDR clones of this lineage started to spread throughout central Asia and Russia concomitantly with the collapse of the public health system in the former Soviet Union. Mutations identified in genes putatively under positive selection and associated with virulence might have favored the expansion of the most successful branches of the lineage.
Collapse
Affiliation(s)
- Matthias Merker
- Molecular Mycobacteriology, Research Center Borstel, Borstel, Germany
| | - Camille Blin
- 1] Laboratoire Biologie Intégrative des Population, Evolution Moléculaire, Ecole Pratique des Hautes Etudes, Paris, France. [2] Institut de Systématique, Evolution, Biodiversité, UMR-CNRS 7205, Muséum National d'Histoire Naturelle, Université Pierre et Marie Curie, Ecole Pratique des Hautes Etudes, Sorbonne Universités, Paris, France
| | - Stefano Mona
- 1] Laboratoire Biologie Intégrative des Population, Evolution Moléculaire, Ecole Pratique des Hautes Etudes, Paris, France. [2] Institut de Systématique, Evolution, Biodiversité, UMR-CNRS 7205, Muséum National d'Histoire Naturelle, Université Pierre et Marie Curie, Ecole Pratique des Hautes Etudes, Sorbonne Universités, Paris, France
| | - Nicolas Duforet-Frebourg
- Université Joseph Fourier, Centre National de la Recherche Scientifique, Laboratoire Techniques de l'Ingénierie Médicale et de la Complexité-Informatique, Mathématiques et Applications, Grenoble, France
| | - Sophie Lecher
- 1] INSERM U1019, Center for Infection and Immunity of Lille, Lille, France. [2] Centre National de la Recherche Scientifique, UMR 8204, Lille, France. [3] Université Lille Nord, Center for Infection and Immunity of Lille, Lille, France. [4] Institut Pasteur de Lille, Center for Infection and Immunity of Lille, Lille, France
| | - Eve Willery
- 1] INSERM U1019, Center for Infection and Immunity of Lille, Lille, France. [2] Centre National de la Recherche Scientifique, UMR 8204, Lille, France. [3] Université Lille Nord, Center for Infection and Immunity of Lille, Lille, France. [4] Institut Pasteur de Lille, Center for Infection and Immunity of Lille, Lille, France
| | - Michael G B Blum
- Université Joseph Fourier, Centre National de la Recherche Scientifique, Laboratoire Techniques de l'Ingénierie Médicale et de la Complexité-Informatique, Mathématiques et Applications, Grenoble, France
| | - Sabine Rüsch-Gerdes
- National Reference Center for Mycobacteria, Research Center Borstel, Borstel, Germany
| | - Igor Mokrousov
- Laboratory of Molecular Microbiology, St. Petersburg Pasteur Institute, St. Petersburg, Russia
| | - Eman Aleksic
- Centre for Biomedical Research, Burnet Institute, Melbourne, Victoria, Australia
| | | | - Annick Antierens
- Medical Department, Médecins sans Frontières Switzerland, Geneva, Switzerland
| | - Ewa Augustynowicz-Kopeć
- Department of Microbiology, National Tuberculosis and Lung Diseases Research Institute, Warsaw, Poland
| | - Marie Ballif
- Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
| | - Francesca Barletta
- Instituto de Medicina Tropical Alexander von Humboldt, Molecular Epidemiology Unit-Tuberculosis, Universidad Peruana Cayetano Heredia, Lima, Peru
| | - Hans Peter Beck
- Department of Medical Parasitology and Infection Biology, Swiss Tropical and Public Health Institute, Basel, Switzerland
| | - Clifton E Barry
- Tuberculosis Research Section, National Institute of Allergy and Infectious Diseases, US National Institutes of Health, Bethesda, Maryland, USA
| | | | - Emanuele Borroni
- Emerging Bacterial Pathogens Unit, San Raffaele Scientific Institute, Milan, Italy
| | - Isolina Campos-Herrero
- Department of Microbiology, Hospital Universitario de Gran Canaria Dr. Negrín, Las Palmas de Gran Canaria, Spain
| | - Daniela Cirillo
- Emerging Bacterial Pathogens Unit, San Raffaele Scientific Institute, Milan, Italy
| | - Helen Cox
- Division of Medical Microbiology, University of Cape Town, Cape Town, South Africa
| | - Suzanne Crowe
- 1] Centre for Biomedical Research, Burnet Institute, Melbourne, Victoria, Australia. [2] Department of Infectious Diseases, Alfred Hospital, Melbourne, Victoria, Australia. [3] Central Clinical School, Monash University, Melbourne, Victoria, Australia
| | - Valeriu Crudu
- National Tuberculosis Reference Laboratory, Phthysiopneumology Institute, Chisinau, Republic of Moldova
| | - Roland Diel
- Institute for Epidemiology, Schleswig-Holstein University Hospital, Kiel, Germany
| | - Francis Drobniewski
- 1] Public Health England National Mycobacterial Reference Laboratory and Clinical Tuberculosis and Human Immunodeficiency Virus Group, Queen Mary's School of Medicine and Dentistry, London, UK. [2] Department of Infectious Diseases, Imperial College, London, UK
| | | | - Sébastien Gagneux
- Department of Medical Parasitology and Infection Biology, Swiss Tropical and Public Health Institute, Basel, Switzerland
| | | | - Madeleine Hanekom
- Department of Science and Technology/National Research Foundation, Centre of Excellence for Biomedical Tuberculosis Research/Medical Research Council, Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Sven Hoffner
- Department of Diagnostics and Vaccinology, Swedish Institute for Communicable Disease Control, Solna, Sweden
| | - Wei-wei Jiao
- Key Laboratory of Major Diseases in Children and National Key Discipline of Pediatrics (Capital Medical University), Ministry of Education, Beijing Pediatric Research Institute, Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Stobdan Kalon
- US Agency for International Development Quality Health Care Project, Bishkek, Kyrgyzstan
| | - Thomas A Kohl
- Molecular Mycobacteriology, Research Center Borstel, Borstel, Germany
| | | | - Troels Lillebæk
- Statens Serum Institute, International Reference Laboratory of Mycobacteriology, Copenhagen, Denmark
| | - Shinji Maeda
- Research Institute of Tuberculosis, Japan Anti-Tuberculosis Association, Tokyo, Japan
| | - Vladyslav Nikolayevskyy
- 1] Public Health England National Mycobacterial Reference Laboratory and Clinical Tuberculosis and Human Immunodeficiency Virus Group, Queen Mary's School of Medicine and Dentistry, London, UK. [2] Department of Infectious Diseases, Imperial College, London, UK
| | - Michael Rasmussen
- Statens Serum Institute, International Reference Laboratory of Mycobacteriology, Copenhagen, Denmark
| | - Nalin Rastogi
- World Health Organization Supranational Tuberculosis Reference Laboratory, Institut Pasteur de la Guadeloupe, Abymes, France
| | - Sofia Samper
- Instituto de Investigación Sanitaria Aragón, Hospital Universitario Miguel Servet, Zaragoza, Spain
| | | | - Branislava Savic
- Institute of Microbiology and Immunology, Faculty of Medicine, University of Belgrade, Belgrade, Serbia
| | - Isdore Chola Shamputa
- Tuberculosis Research Section, National Institute of Allergy and Infectious Diseases, US National Institutes of Health, Bethesda, Maryland, USA
| | - Adong Shen
- Key Laboratory of Major Diseases in Children and National Key Discipline of Pediatrics (Capital Medical University), Ministry of Education, Beijing Pediatric Research Institute, Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Li-Hwei Sng
- Central Tuberculosis Laboratory, Department of Pathology, Singapore General Hospital, Singapore
| | - Petras Stakenas
- Department of Immunology and Cell Biology, Institute of Biotechnology, Vilnius University, Vilnius, Lithuania
| | - Kadri Toit
- Tartu University Hospital United Laboratories, Mycobacteriology, Tartu, Estonia
| | | | - Dragana Vukovic
- Institute of Microbiology and Immunology, Faculty of Medicine, University of Belgrade, Belgrade, Serbia
| | | | - Robin Warren
- Department of Science and Technology/National Research Foundation, Centre of Excellence for Biomedical Tuberculosis Research/Medical Research Council, Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Philip Supply
- 1] INSERM U1019, Center for Infection and Immunity of Lille, Lille, France. [2] Centre National de la Recherche Scientifique, UMR 8204, Lille, France. [3] Université Lille Nord, Center for Infection and Immunity of Lille, Lille, France. [4] Institut Pasteur de Lille, Center for Infection and Immunity of Lille, Lille, France. [5] Genoscreen, Lille, France
| | - Stefan Niemann
- 1] Molecular Mycobacteriology, Research Center Borstel, Borstel, Germany. [2] German Center for Infection Research, Borstel Site, Borstel, Germany
| | - Thierry Wirth
- 1] Laboratoire Biologie Intégrative des Population, Evolution Moléculaire, Ecole Pratique des Hautes Etudes, Paris, France. [2] Institut de Systématique, Evolution, Biodiversité, UMR-CNRS 7205, Muséum National d'Histoire Naturelle, Université Pierre et Marie Curie, Ecole Pratique des Hautes Etudes, Sorbonne Universités, Paris, France
| |
Collapse
|
17
|
Palstra FP, Heyer E, Austerlitz F. Statistical inference on genetic data reveals the complex demographic history of human populations in central Asia. Mol Biol Evol 2015; 32:1411-24. [PMID: 25678589 DOI: 10.1093/molbev/msv030] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The demographic history of modern humans constitutes a combination of expansions, colonizations, contractions, and remigrations. The advent of large scale genetic data combined with statistically refined methods facilitates inference of this complex history. Here we study the demographic history of two genetically admixed ethnic groups in Central Asia, an area characterized by high levels of genetic diversity and a history of recurrent immigration. Using Approximate Bayesian Computation, we infer that the timing of admixture markedly differs between the two groups. Admixture in the traditionally agricultural Tajiks could be dated back to the onset of the Neolithic transition in the region, whereas admixture in Kyrgyz is more recent, and may have involved the westward movement of Turkic peoples. These results are confirmed by a coalescent method that fits an isolation-with-migration model to the genetic data, with both Central Asian groups having received gene flow from the extremities of Eurasia. Interestingly, our analyses also uncover signatures of gene flow from Eastern to Western Eurasia during Paleolithic times. In conclusion, the high genetic diversity currently observed in these two Central Asian peoples most likely reflects the effects of recurrent immigration that likely started before historical times. Conversely, conquests during historical times may have had a relatively limited genetic impact. These results emphasize the need for a better understanding of the genetic consequences of transmission of culture and technological innovations, as well as those of invasions and conquests.
Collapse
Affiliation(s)
- Friso P Palstra
- Laboratoire d'Eco-Anthropologie et Ethnobiologie, UMR 7206, Muséum National d'Histoire Naturelle-Centre National de la Recherche Scientifique-Université Paris 7 Diderot, Paris, France
| | - Evelyne Heyer
- Laboratoire d'Eco-Anthropologie et Ethnobiologie, UMR 7206, Muséum National d'Histoire Naturelle-Centre National de la Recherche Scientifique-Université Paris 7 Diderot, Paris, France
| | - Frédéric Austerlitz
- Laboratoire d'Eco-Anthropologie et Ethnobiologie, UMR 7206, Muséum National d'Histoire Naturelle-Centre National de la Recherche Scientifique-Université Paris 7 Diderot, Paris, France
| |
Collapse
|
18
|
Balaresque P, Poulet N, Cussat-Blanc S, Gerard P, Quintana-Murci L, Heyer E, Jobling MA. Y-chromosome descent clusters and male differential reproductive success: young lineage expansions dominate Asian pastoral nomadic populations. Eur J Hum Genet 2015; 23:1413-22. [PMID: 25585703 DOI: 10.1038/ejhg.2014.285] [Citation(s) in RCA: 65] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2014] [Revised: 11/25/2014] [Accepted: 11/28/2014] [Indexed: 11/09/2022] Open
Abstract
High-frequency microsatellite haplotypes of the male-specific Y-chromosome can signal past episodes of high reproductive success of particular men and their patrilineal descendants. Previously, two examples of such successful Y-lineages have been described in Asia, both associated with Altaic-speaking pastoral nomadic societies, and putatively linked to dynasties descending, respectively, from Genghis Khan and Giocangga. Here we surveyed a total of 5321 Y-chromosomes from 127 Asian populations, including novel Y-SNP and microsatellite data on 461 Central Asian males, to ask whether additional lineage expansions could be identified. Based on the most frequent eight-microsatellite haplotypes, we objectively defined 11 descent clusters (DCs), each within a specific haplogroup, that represent likely past instances of high male reproductive success, including the two previously identified cases. Analysis of the geographical patterns and ages of these DCs and their associated cultural characteristics showed that the most successful lineages are found both among sedentary agriculturalists and pastoral nomads, and expanded between 2100 BCE and 1100 CE. However, those with recent origins in the historical period are almost exclusively found in Altaic-speaking pastoral nomadic populations, which may reflect a shift in political organisation in pastoralist economies and a greater ease of transmission of Y-chromosomes through time and space facilitated by the use of horses.
Collapse
Affiliation(s)
- Patricia Balaresque
- UMR 5288, Faculté de Médecine Purpan, Laboratoire d'Anthropologie Moléculaire et Imagerie de Synthèse (AMIS), CNRS/Université Paul Sabatier, Toulouse, France.,Department of Genetics, University of Leicester, Leicester, UK
| | - Nicolas Poulet
- Onema, Direction de l'Action Scientifique et Technique, Toulouse, France
| | | | - Patrice Gerard
- UMR 5288, Faculté de Médecine Purpan, Laboratoire d'Anthropologie Moléculaire et Imagerie de Synthèse (AMIS), CNRS/Université Paul Sabatier, Toulouse, France
| | - Lluis Quintana-Murci
- CNRS URA3012, Unit of Human Evolutionary Genetics, Institut Pasteur, Paris, France
| | - Evelyne Heyer
- Eco-Anthropologie et Ethnobiologie, UMR 7206 CNRS, MNHN, Université Paris Diderot, Sorbonne Paris Cité, Sorbonne Universités, Paris, France
| | - Mark A Jobling
- Department of Genetics, University of Leicester, Leicester, UK
| |
Collapse
|
19
|
Sincero TCM, Stoco PH, Steindel M, Vallejo GA, Grisard EC. Trypanosoma rangeli displays a clonal population structure, revealing a subdivision of KP1(-) strains and the ancestry of the Amazonian group. Int J Parasitol 2015; 45:225-35. [PMID: 25592964 DOI: 10.1016/j.ijpara.2014.11.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2014] [Revised: 11/12/2014] [Accepted: 11/24/2014] [Indexed: 12/13/2022]
Abstract
Assessment of the genetic variability and population structure of Trypanosoma rangeli, a non-pathogenic American trypanosome, was carried out through microsatellite and single-nucleotide polymorphism analyses. Two approaches were used for microsatellite typing: data mining in expressed sequence tag /open reading frame expressed sequence tags libraries and PCR-based Isolation of Microsatellite Arrays from genomic libraries. All microsatellites found were evaluated for their abundance, frequency and usefulness as markers. Genotyping of T. rangeli strains and clones was performed for 18 loci amplified by PCR from expressed sequence tag/open reading frame expressed sequence tags libraries. The presence of single-nucleotide polymorphisms in the nuclear, multi-copy, spliced leader gene was assessed in 18 T. rangeli strains, and the results show that T. rangeli has a predominantly clonal population structure, allowing a robust phylogenetic analysis. Microsatellite typing revealed a subdivision of the KP1(-) genetic group, which may be influenced by geographical location and/or by the co-evolution of parasite and vectors occurring within the same geographical areas. The hypothesis of parasite-vector co-evolution was corroborated by single-nucleotide polymorphism analysis of the spliced leader gene. Taken together, the results suggest three T. rangeli groups: (i) the T. rangeli Amazonian group; (ii) the T. rangeli KP1(-) group; and (iii) the T. rangeli KP1(+) group. The latter two groups possibly evolved from the Amazonian group to produce KP1(+) and KP1(-) strains.
Collapse
Affiliation(s)
- Thaís Cristine Marques Sincero
- Universidade Federal de Santa Catarina (UFSC), Centro de Ciências da Saúde (CCS), Departamento de Análises Clínicas (ACL), Setor E, Bloco K, Florianópolis, SC 88.040-970, Brazil.
| | - Patricia Hermes Stoco
- Universidade Federal de Santa Catarina (UFSC), Centro de Ciências Biológicas (CCB), Departamento de Microbiologia, Imunologia e Parasitologia (MIP), Setor F, Bloco A, Florianópolis, SC 88.040-970, Brazil
| | - Mário Steindel
- Universidade Federal de Santa Catarina (UFSC), Centro de Ciências Biológicas (CCB), Departamento de Microbiologia, Imunologia e Parasitologia (MIP), Setor F, Bloco A, Florianópolis, SC 88.040-970, Brazil
| | - Gustavo Adolfo Vallejo
- Laboratorio de Investigaciones en Parasitología Tropical, Universidad del Tolima, Altos de Santa Helena, A.A. 546, Ibagué, Colombia
| | - Edmundo Carlos Grisard
- Universidade Federal de Santa Catarina (UFSC), Centro de Ciências Biológicas (CCB), Departamento de Microbiologia, Imunologia e Parasitologia (MIP), Setor F, Bloco A, Florianópolis, SC 88.040-970, Brazil.
| |
Collapse
|
20
|
Nakagome S. On the use of kernel approximate Bayesian computation to infer population history. Genes Genet Syst 2015; 90:153-62. [DOI: 10.1266/ggs.90.153] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Affiliation(s)
- Shigeki Nakagome
- Department of Human Genetics, University of Chicago
- School of Statistical Thinking, The Institute of Statistical Mathematics
| |
Collapse
|
21
|
Noormohammadi Z, Sheidai M, Foroutan M, Alishah O. Networking and Bayesian analyses of genetic affinity in cotton germplasm. THE NUCLEUS 2014. [DOI: 10.1007/s13237-014-0123-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
22
|
Balaresque P, King TE, Parkin EJ, Heyer E, Carvalho-Silva D, Kraaijenbrink T, de Knijff P, Tyler-Smith C, Jobling MA. Gene conversion violates the stepwise mutation model for microsatellites in y-chromosomal palindromic repeats. Hum Mutat 2014; 35:609-17. [PMID: 24610746 PMCID: PMC4233959 DOI: 10.1002/humu.22542] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2013] [Accepted: 02/25/2014] [Indexed: 01/19/2023]
Abstract
The male-specific region of the human Y chromosome (MSY) contains eight large inverted repeats (palindromes), in which high-sequence similarity between repeat arms is maintained by gene conversion. These palindromes also harbor microsatellites, considered to evolve via a stepwise mutation model (SMM). Here, we ask whether gene conversion between palindrome microsatellites contributes to their mutational dynamics. First, we study the duplicated tetranucleotide microsatellite DYS385a,b lying in palindrome P4. We show, by comparing observed data with simulated data under a SMM within haplogroups, that observed heteroallelic combinations in which the modal repeat number difference between copies was large, can give rise to homoallelic combinations with zero-repeats difference, equivalent to many single-step mutations. These are unlikely to be generated under a strict SMM, suggesting the action of gene conversion. Second, we show that the intercopy repeat number difference for a large set of duplicated microsatellites in all palindromes in the MSY reference sequence is significantly reduced compared with that for nonpalindrome-duplicated microsatellites, suggesting that the former are characterized by unusual evolutionary dynamics. These observations indicate that gene conversion violates the SMM for microsatellites in palindromes, homogenizing copies within individual Y chromosomes, but increasing overall haplotype diversity among chromosomes within related groups.
Collapse
Affiliation(s)
- Patricia Balaresque
- UMR5288 CNRS/UPS-AMIS-Université Paul Sabatier, Toulouse, France; Department of Genetics, University of Leicester, Leicester, UK
| | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Gavryushkina A, Welch D, Stadler T, Drummond AJ. Bayesian inference of sampled ancestor trees for epidemiology and fossil calibration. PLoS Comput Biol 2014; 10:e1003919. [PMID: 25474353 PMCID: PMC4263412 DOI: 10.1371/journal.pcbi.1003919] [Citation(s) in RCA: 183] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2014] [Accepted: 09/08/2014] [Indexed: 12/22/2022] Open
Abstract
Phylogenetic analyses which include fossils or molecular sequences that are sampled through time require models that allow one sample to be a direct ancestor of another sample. As previously available phylogenetic inference tools assume that all samples are tips, they do not allow for this possibility. We have developed and implemented a Bayesian Markov Chain Monte Carlo (MCMC) algorithm to infer what we call sampled ancestor trees, that is, trees in which sampled individuals can be direct ancestors of other sampled individuals. We use a family of birth-death models where individuals may remain in the tree process after sampling, in particular we extend the birth-death skyline model [Stadler et al., 2013] to sampled ancestor trees. This method allows the detection of sampled ancestors as well as estimation of the probability that an individual will be removed from the process when it is sampled. We show that even if sampled ancestors are not of specific interest in an analysis, failing to account for them leads to significant bias in parameter estimates. We also show that sampled ancestor birth-death models where every sample comes from a different time point are non-identifiable and thus require one parameter to be known in order to infer other parameters. We apply our phylogenetic inference accounting for sampled ancestors to epidemiological data, where the possibility of sampled ancestors enables us to identify individuals that infected other individuals after being sampled and to infer fundamental epidemiological parameters. We also apply the method to infer divergence times and diversification rates when fossils are included along with extant species samples, so that fossilisation events are modelled as a part of the tree branching process. Such modelling has many advantages as argued in the literature. The sampler is available as an open-source BEAST2 package (https://github.com/CompEvol/sampled-ancestors). A central goal of phylogenetic analysis is to estimate evolutionary relationships and the dynamical parameters underlying the evolutionary branching process (e.g. macroevolutionary or epidemiological parameters) from molecular data. The statistical methods used in these analyses require that the underlying tree branching process is specified. Standard models for the branching process which were originally designed to describe the evolutionary past of present day species do not allow one sampled taxon to be the ancestor of another. However the probability of sampling a direct ancestor is not negligible for many types of data. For example, when fossil and living species are analysed together to infer species divergence times, fossil species may or may not be direct ancestors of living species. In epidemiology, a sampled individual (a host from which a pathogen sequence was obtained) can infect other individuals after sampling, which then go on to be sampled themselves. The models that account for direct ancestors produce phylogenetic trees with a different structure from classic phylogenetic trees and so using these models in inference requires new computational methods. Here we developed a method for phylogenetic analysis that accounts for the possibility of direct ancestors.
Collapse
Affiliation(s)
- Alexandra Gavryushkina
- Department of Computer Science, University of Auckland, Auckland, New Zealand
- Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand
- * E-mail: (AJD); (AG)
| | - David Welch
- Department of Computer Science, University of Auckland, Auckland, New Zealand
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Switzerland
| | - Alexei J. Drummond
- Department of Computer Science, University of Auckland, Auckland, New Zealand
- Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand
- * E-mail: (AJD); (AG)
| |
Collapse
|
24
|
Nikolic N, Chevalet C. Detecting past changes of effective population size. Evol Appl 2014; 7:663-81. [PMID: 25067949 PMCID: PMC4105917 DOI: 10.1111/eva.12170] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2013] [Accepted: 04/21/2014] [Indexed: 12/03/2022] Open
Abstract
Understanding and predicting population abundance is a major challenge confronting scientists. Several genetic models have been developed using microsatellite markers to estimate the present and ancestral effective population sizes. However, to get an overview on the evolution of population requires that past fluctuation of population size be traceable. To address the question, we developed a new model estimating the past changes of effective population size from microsatellite by resolving coalescence theory and using approximate likelihoods in a Monte Carlo Markov Chain approach. The efficiency of the model and its sensitivity to gene flow and to assumptions on the mutational process were checked using simulated data and analysis. The model was found especially useful to provide evidence of transient changes of population size in the past. The times at which some past demographic events cannot be detected because they are too ancient and the risk that gene flow may suggest the false detection of a bottleneck are discussed considering the distribution of coalescence times. The method was applied on real data sets from several Atlantic salmon populations. The method called VarEff (Variation of Effective size) was implemented in the R package VarEff and is made available at https://qgsp.jouy.inra.fr and at http://cran.r-project.org/web/packages/VarEff.
Collapse
Affiliation(s)
| | - Claude Chevalet
- INRA Génétique, Physiologie et Systèmes d'ElevageCastanet-Tolosan, France
- Université de Toulouse INP, ENSAT, Génétique, Physiologie et Systémes d'ElevageCastanet-Tolosan, France
- Université de Toulouse INP, ENVT, Génétique, Physiologie et Systémes d'ElevageToulouse, France
| |
Collapse
|
25
|
Gerbault P, Allaby RG, Boivin N, Rudzinski A, Grimaldi IM, Pires JC, Climer Vigueira C, Dobney K, Gremillion KJ, Barton L, Arroyo-Kalin M, Purugganan MD, Rubio de Casas R, Bollongino R, Burger J, Fuller DQ, Bradley DG, Balding DJ, Richerson PJ, Gilbert MTP, Larson G, Thomas MG. Storytelling and story testing in domestication. Proc Natl Acad Sci U S A 2014; 111:6159-64. [PMID: 24753572 PMCID: PMC4035932 DOI: 10.1073/pnas.1400425111] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The domestication of plants and animals marks one of the most significant transitions in human, and indeed global, history. Traditionally, study of the domestication process was the exclusive domain of archaeologists and agricultural scientists; today it is an increasingly multidisciplinary enterprise that has come to involve the skills of evolutionary biologists and geneticists. Although the application of new information sources and methodologies has dramatically transformed our ability to study and understand domestication, it has also generated increasingly large and complex datasets, the interpretation of which is not straightforward. In particular, challenges of equifinality, evolutionary variance, and emergence of unexpected or counter-intuitive patterns all face researchers attempting to infer past processes directly from patterns in data. We argue that explicit modeling approaches, drawing upon emerging methodologies in statistics and population genetics, provide a powerful means of addressing these limitations. Modeling also offers an approach to analyzing datasets that avoids conclusions steered by implicit biases, and makes possible the formal integration of different data types. Here we outline some of the modeling approaches most relevant to current problems in domestication research, and demonstrate the ways in which simulation modeling is beginning to reshape our understanding of the domestication process.
Collapse
Affiliation(s)
| | - Robin G. Allaby
- School of Life Sciences, Gibbet Hill Campus, University of Warwick, Coventry CV4 7AL, United Kingdom
| | - Nicole Boivin
- Research Laboratory for Archaeology and the History of Art, School of Archaeology, Oxford OX1 3QY, United Kingdom
| | - Anna Rudzinski
- Research Department of Genetics, Evolution, and Environment and
| | - Ilaria M. Grimaldi
- Research Laboratory for Archaeology and the History of Art, School of Archaeology, Oxford OX1 3QY, United Kingdom
| | - J. Chris Pires
- Division of Biological Sciences, University of Missouri, Columbia, MO 65211
| | | | - Keith Dobney
- Department of Archaeology, University of Aberdeen, Aberdeen AB24 3UF, United Kingdom
| | | | - Loukas Barton
- Department of Anthropology, Center for Comparative Archaeology, University of Pittsburgh, Pittsburgh, PA 15260
| | - Manuel Arroyo-Kalin
- Institute of Archaeology, University College London, London WC1H 0PY, United Kingdom
| | - Michael D. Purugganan
- Department of Biology, New York University, New York, NY 10003-6688
- Center for Genomics and Systems Biology, New York University Abu Dhabi Research Institute, Abu Dhabi, United Arab Emirates
| | | | - Ruth Bollongino
- Institute of Anthropology, Johannes Gutenberg University, D-55099 Mainz, Germany
| | - Joachim Burger
- Institute of Anthropology, Johannes Gutenberg University, D-55099 Mainz, Germany
| | - Dorian Q. Fuller
- Institute of Archaeology, University College London, London WC1H 0PY, United Kingdom
| | | | - David J. Balding
- University College London Genetics Institute, University College London, London WC1E 6BT, United Kingdom
| | - Peter J. Richerson
- Department of Environmental Science and Policy, University of California, Davis, CA 95616
| | - M. Thomas P. Gilbert
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, 1350 Copenhagen, Denmark; and
| | - Greger Larson
- Durham Evolution and Ancient DNA, Department of Archaeology, Durham University, Durham DH1 3LE, United Kingdom
| | - Mark G. Thomas
- Research Department of Genetics, Evolution, and Environment and
| |
Collapse
|
26
|
Vaughan TG, Kühnert D, Popinga A, Welch D, Drummond AJ. Efficient Bayesian inference under the structured coalescent. Bioinformatics 2014; 30:2272-9. [PMID: 24753484 PMCID: PMC4207426 DOI: 10.1093/bioinformatics/btu201] [Citation(s) in RCA: 97] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Motivation: Population structure significantly affects evolutionary dynamics. Such structure may be due to spatial segregation, but may also reflect any other gene-flow-limiting aspect of a model. In combination with the structured coalescent, this fact can be used to inform phylogenetic tree reconstruction, as well as to infer parameters such as migration rates and subpopulation sizes from annotated sequence data. However, conducting Bayesian inference under the structured coalescent is impeded by the difficulty of constructing Markov Chain Monte Carlo (MCMC) sampling algorithms (samplers) capable of efficiently exploring the state space. Results: In this article, we present a new MCMC sampler capable of sampling from posterior distributions over structured trees: timed phylogenetic trees in which lineages are associated with the distinct subpopulation in which they lie. The sampler includes a set of MCMC proposal functions that offer significant mixing improvements over a previously published method. Furthermore, its implementation as a BEAST 2 package ensures maximum flexibility with respect to model and prior specification. We demonstrate the usefulness of this new sampler by using it to infer migration rates and effective population sizes of H3N2 influenza between New Zealand, New York and Hong Kong from publicly available hemagglutinin (HA) gene sequences under the structured coalescent. Availability and implementation: The sampler has been implemented as a publicly available BEAST 2 package that is distributed under version 3 of the GNU General Public License at http://compevol.github.io/MultiTypeTree. Contact:tgvaughan@gmail.com Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Timothy G Vaughan
- Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North 4442, New Zealand, Institute of Integrative Biology, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland and Department of Computer Science, University of Auckland, Auckland 1142, New Zealand
| | - Denise Kühnert
- Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North 4442, New Zealand, Institute of Integrative Biology, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland and Department of Computer Science, University of Auckland, Auckland 1142, New ZealandAllan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North 4442, New Zealand, Institute of Integrative Biology, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland and Department of Computer Science, University of Auckland, Auckland 1142, New ZealandAllan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North 4442, New Zealand, Institute of Integrative Biology, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland and Department of Computer Science, University of Auckland, Auckland 1142, New Zealand
| | - Alex Popinga
- Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North 4442, New Zealand, Institute of Integrative Biology, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland and Department of Computer Science, University of Auckland, Auckland 1142, New ZealandAllan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North 4442, New Zealand, Institute of Integrative Biology, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland and Department of Computer Science, University of Auckland, Auckland 1142, New Zealand
| | - David Welch
- Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North 4442, New Zealand, Institute of Integrative Biology, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland and Department of Computer Science, University of Auckland, Auckland 1142, New ZealandAllan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North 4442, New Zealand, Institute of Integrative Biology, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland and Department of Computer Science, University of Auckland, Auckland 1142, New Zealand
| | - Alexei J Drummond
- Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North 4442, New Zealand, Institute of Integrative Biology, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland and Department of Computer Science, University of Auckland, Auckland 1142, New ZealandAllan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North 4442, New Zealand, Institute of Integrative Biology, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland and Department of Computer Science, University of Auckland, Auckland 1142, New Zealand
| |
Collapse
|
27
|
|
28
|
Andersen MM, Caliebe A, Jochens A, Willuweit S, Krawczak M. Estimating trace-suspect match probabilities for singleton Y-STR haplotypes using coalescent theory. Forensic Sci Int Genet 2013; 7:264-71. [PMID: 23270696 DOI: 10.1016/j.fsigen.2012.11.004] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2012] [Revised: 11/07/2012] [Accepted: 11/24/2012] [Indexed: 11/27/2022]
|
29
|
Abstract
Approximate Bayesian computation has become an essential tool for the analysis of complex stochastic models when the likelihood function is numerically unavailable. However, the well-established statistical method of empirical likelihood provides another route to such settings that bypasses simulations from the model and the choices of the approximate Bayesian computation parameters (summary statistics, distance, tolerance), while being convergent in the number of observations. Furthermore, bypassing model simulations may lead to significant time savings in complex models, for instance those found in population genetics. The Bayesian computation with empirical likelihood algorithm we develop in this paper also provides an evaluation of its own performance through an associated effective sample size. The method is illustrated using several examples, including estimation of standard distributions, time series, and population genetics models.
Collapse
Affiliation(s)
- Kerrie L. Mengersen
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, QLD 4001, Australia
| | - Pierre Pudlo
- Centre de Biologie pour la Gestion des Populations, Institut National de la Recherche Agronomique, 34988 Montferrier-sur-Lez Cedex, France
- Université Montpellier 2, Institut de Mathématiques et de Modélisation de Montpellier, 34095 Montpellier Cedex 5, France
- Institut de Biologie Computationnelle, Montpellier, France
| | - Christian P. Robert
- Université Paris Dauphine, Centre de Recherche en Mathematiques de la Decision, 75775 Paris Cedex 16, France
- Institut Universitaire de France, Paris, France; and
- Centre de Recherche en Statistique et Economie, 92245 Malakoff Cedex, France
| |
Collapse
|
30
|
Fernandes AM, Gonzalez J, Wink M, Aleixo A. Multilocus phylogeography of the Wedge-billed Woodcreeper Glyphorynchus spirurus (Aves, Furnariidae) in lowland Amazonia: Widespread cryptic diversity and paraphyly reveal a complex diversification pattern. Mol Phylogenet Evol 2013; 66:270-82. [DOI: 10.1016/j.ympev.2012.09.033] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2012] [Revised: 09/25/2012] [Accepted: 09/27/2012] [Indexed: 11/26/2022]
|
31
|
Wu CH, Suchard MA, Drummond AJ. Bayesian selection of nucleotide substitution models and their site assignments. Mol Biol Evol 2012; 30:669-88. [PMID: 23233462 PMCID: PMC3563969 DOI: 10.1093/molbev/mss258] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Probabilistic inference of a phylogenetic tree from molecular sequence data is predicated on a substitution model describing the relative rates of change between character states along the tree for each site in the multiple sequence alignment. Commonly, one assumes that the substitution model is homogeneous across sites within large partitions of the alignment, assigns these partitions a priori, and then fixes their underlying substitution model to the best-fitting model from a hierarchy of named models. Here, we introduce an automatic model selection and model averaging approach within a Bayesian framework that simultaneously estimates the number of partitions, the assignment of sites to partitions, the substitution model for each partition, and the uncertainty in these selections. This new approach is implemented as an add-on to the BEAST 2 software platform. We find that this approach dramatically improves the fit of the nucleotide substitution model compared with existing approaches, and we show, using a number of example data sets, that as many as nine partitions are required to explain the heterogeneity in nucleotide substitution process across sites in a single gene analysis. In some instances, this improved modeling of the substitution process can have a measurable effect on downstream inference, including the estimated phylogeny, relative divergence times, and effective population size histories.
Collapse
Affiliation(s)
- Chieh-Hsi Wu
- Department of Computer Science, University of Auckland, Auckland, New Zealand
| | | | | |
Collapse
|
32
|
Höhna S, Drummond AJ. Guided Tree Topology Proposals for Bayesian Phylogenetic Inference. Syst Biol 2011; 61:1-11. [DOI: 10.1093/sysbio/syr074] [Citation(s) in RCA: 94] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Sebastian Höhna
- Department of Mathematics, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Alexei J. Drummond
- Department of Computer Science, University of Auckland, Private Bag 92019, Auckland 1142, New Zealand
- Allan Wilson Centre for Molecular Ecology and Evolution, University of Auckland, Private Bag 92019, Auckland 1142, New Zealand
| |
Collapse
|
33
|
Szpiech ZA, Rosenberg NA. On the size distribution of private microsatellite alleles. Theor Popul Biol 2011; 80:100-13. [PMID: 21514313 DOI: 10.1016/j.tpb.2011.03.006] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2010] [Revised: 03/29/2011] [Accepted: 03/30/2011] [Indexed: 10/18/2022]
Abstract
Private microsatellite alleles tend to be found in the tails rather than in the interior of the allele size distribution. To explain this phenomenon, we have investigated the size distribution of private alleles in a coalescent model of two populations, assuming the symmetric stepwise mutation model as the mode of microsatellite mutation. For the case in which four alleles are sampled, two from each population, we condition on the configuration in which three distinct allele sizes are present, one of which is common to both populations, one of which is private to one population, and the third of which is private to the other population. Conditional on this configuration, we calculate the probability that the two private alleles occupy the two tails of the size distribution. This probability, which increases as a function of mutation rate and divergence time between the two populations, is seen to be greater than the value that would be predicted if there was no relationship between privacy and location in the allele size distribution. In accordance with the prediction of the model, we find that in pairs of human populations, the frequency with which private microsatellite alleles occur in the tails of the allele size distribution increases as a function of genetic differentiation between populations.
Collapse
Affiliation(s)
- Zachary A Szpiech
- Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA.
| | | |
Collapse
|
34
|
Kim SH, Kim KC, Shin DJ, Jin HJ, Kwak KD, Han MS, Song JM, Kim W, Kim W. High frequencies of Y-chromosome haplogroup O2b-SRY465 lineages in Korea: a genetic perspective on the peopling of Korea. INVESTIGATIVE GENETICS 2011; 2:10. [PMID: 21463511 PMCID: PMC3087676 DOI: 10.1186/2041-2223-2-10] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/13/2011] [Accepted: 04/04/2011] [Indexed: 11/11/2022]
Abstract
Background Koreans are generally considered a Northeast Asian group, thought to be related to Altaic-language-speaking populations. However, recent findings have indicated that the peopling of Korea might have been more complex, involving dual origins from both southern and northern parts of East Asia. To understand the male lineage history of Korea, more data from informative genetic markers from Korea and its surrounding regions are necessary. In this study, 25 Y-chromosome single nucleotide polymorphism markers and 17 Y-chromosome short tandem repeat (Y-STR) loci were genotyped in 1,108 males from several populations in East Asia. Results In general, we found East Asian populations to be characterized by male haplogroup homogeneity, showing major Y-chromosomal expansions of haplogroup O-M175 lineages. Interestingly, a high frequency (31.4%) of haplogroup O2b-SRY465 (and its sublineage) is characteristic of male Koreans, whereas the haplogroup distribution elsewhere in East Asian populations is patchy. The ages of the haplogroup O2b-SRY465 lineages (~9,900 years) and the pattern of variation within the lineages suggested an ancient origin in a nearby part of northeastern Asia, followed by an expansion in the vicinity of the Korean Peninsula. In addition, the coalescence time (~4,400 years) for the age of haplogroup O2b1-47z, and its Y-STR diversity, suggest that this lineage probably originated in Korea. Further studies with sufficiently large sample sizes to cover the vast East Asian region and using genomewide genotyping should provide further insights. Conclusions These findings are consistent with linguistic, archaeological and historical evidence, which suggest that the direct ancestors of Koreans were proto-Koreans who inhabited the northeastern region of China and the Korean Peninsula during the Neolithic (8,000-1,000 BC) and Bronze (1,500-400 BC) Ages.
Collapse
Affiliation(s)
- Soon-Hee Kim
- School of Biological Sciences, Seoul National University, Seoul 151-747, Korea.,Eastern District Office, National Forensic Service, Gangwon-do 220-805, Korea
| | - Ki-Cheol Kim
- Department of Biological Sciences, Dankook University, Cheonan 330-714, Korea
| | - Dong-Jik Shin
- Cardiovascular Genome Center, Yonsei University College of Medicine, Seoul 120-749, Korea
| | - Han-Jun Jin
- Center for Genome Science, Korea National Institute of Health, Seoul 122-701, Korea
| | - Kyoung-Don Kwak
- DNA Analysis Division, National Forensic Service, Seoul 158-707, Korea
| | - Myun-Soo Han
- DNA Analysis Division, National Forensic Service, Seoul 158-707, Korea
| | - Joon-Myong Song
- Research Institute of Pharmaceutical Sciences and College of Pharmacy, Seoul National University, Seoul 151-742, Korea
| | - Won Kim
- School of Biological Sciences, Seoul National University, Seoul 151-747, Korea
| | - Wook Kim
- Department of Biological Sciences, Dankook University, Cheonan 330-714, Korea
| |
Collapse
|
35
|
Jasra A, De Iorio M, Chadeau-Hyam M. The time machine: a simulation approach for stochastic trees. Proc Math Phys Eng Sci 2011. [DOI: 10.1098/rspa.2010.0497] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In this paper, we consider a simulation technique for stochastic trees. One of the most important areas in computational genetics is the calculation and subsequent maximization of the likelihood function associated with such models. This typically consists of using importance sampling and sequential Monte Carlo techniques. The approach proceeds by simulating the tree, backward in time from observed data, to a most recent common ancestor. However, in many cases, the computational time and variance of estimators are often too high to make standard approaches useful. In this paper, we propose to stop the simulation, subsequently yielding biased estimates of the likelihood surface. The bias is investigated from a theoretical point of view. Results from simulation studies are also given to investigate the balance between loss of accuracy, saving in computing time and variance reduction.
Collapse
Affiliation(s)
- Ajay Jasra
- Department of Mathematics, Imperial College London, London SW7 2AZ, UK
| | - Maria De Iorio
- School of Public Health, Imperial College London, London W2 1PG, UK
| | | |
Collapse
|
36
|
Inferring population decline and expansion from microsatellite data: a simulation-based evaluation of the Msvar method. Genetics 2011; 188:165-79. [PMID: 21385729 DOI: 10.1534/genetics.110.121764] [Citation(s) in RCA: 117] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Reconstructing the demographic history of populations is a central issue in evolutionary biology. Using likelihood-based methods coupled with Monte Carlo simulations, it is now possible to reconstruct past changes in population size from genetic data. Using simulated data sets under various demographic scenarios, we evaluate the statistical performance of Msvar, a full-likelihood Bayesian method that infers past demographic change from microsatellite data. Our simulation tests show that Msvar is very efficient at detecting population declines and expansions, provided the event is neither too weak nor too recent. We further show that Msvar outperforms two moment-based methods (the M-ratio test and Bottleneck) for detecting population size changes, whatever the time and the severity of the event. The same trend emerges from a compilation of empirical studies. The latest version of Msvar provides estimates of the current and the ancestral population size and the time since the population started changing in size. We show that, in the absence of prior knowledge, Msvar provides little information on the mutation rate, which results in biased estimates and/or wide credibility intervals for each of the demographic parameters. However, scaling the population size parameters with the mutation rate and scaling the time with current population size, as coalescent theory requires, significantly improves the quality of the estimates for contraction but not for expansion scenarios. Finally, our results suggest that Msvar is robust to moderate departures from a strict stepwise mutation model.
Collapse
|
37
|
Joint inference of microsatellite mutation models, population history and genealogies using transdimensional Markov Chain Monte Carlo. Genetics 2011; 188:151-64. [PMID: 21385725 PMCID: PMC3120151 DOI: 10.1534/genetics.110.125260] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
We provide a framework for Bayesian coalescent inference from microsatellite data that enables inference of population history parameters averaged over microsatellite mutation models. To achieve this we first implemented a rich family of microsatellite mutation models and related components in the software package BEAST. BEAST is a powerful tool that performs Bayesian MCMC analysis on molecular data to make coalescent and evolutionary inferences. Our implementation permits the application of existing nonparametric methods to microsatellite data. The implemented microsatellite models are based on the replication slippage mechanism and focus on three properties of microsatellite mutation: length dependency of mutation rate, mutational bias toward expansion or contraction, and number of repeat units changed in a single mutation event. We develop a new model that facilitates microsatellite model averaging and Bayesian model selection by transdimensional MCMC. With Bayesian model averaging, the posterior distributions of population history parameters are integrated across a set of microsatellite models and thus account for model uncertainty. Simulated data are used to evaluate our method in terms of accuracy and precision of θ estimation and also identification of the true mutation model. Finally we apply our method to a red colobus monkey data set as an example.
Collapse
|
38
|
Nater A, Nietlisbach P, Arora N, van Schaik CP, van Noordwijk MA, Willems EP, Singleton I, Wich SA, Goossens B, Warren KS, Verschoor EJ, Perwitasari-Farajallah D, Pamungkas J, Krützen M. Sex-Biased Dispersal and Volcanic Activities Shaped Phylogeographic Patterns of Extant Orangutans (genus: Pongo). Mol Biol Evol 2011; 28:2275-88. [DOI: 10.1093/molbev/msr042] [Citation(s) in RCA: 120] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
|
39
|
Agudo R, Rico C, Vilà C, Hiraldo F, Donázar JA. The role of humans in the diversification of a threatened island raptor. BMC Evol Biol 2010; 10:384. [PMID: 21144015 PMCID: PMC3009672 DOI: 10.1186/1471-2148-10-384] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2010] [Accepted: 12/13/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Anthropogenic habitat modifications have led to the extinction of many species and have favoured the expansion of others. Nonetheless, the possible role of humans as a diversifying force in vertebrate evolution has rarely been considered, especially for species with long generation times. We examine the influence that humans have had on the colonization and phenotypic and genetic differentiation of an insular population of a long-lived raptor species, the Egyptian vulture (Neophron percnopterus). RESULTS The morphological comparison between the Canarian Egyptian vultures and the main and closest population in Western Europe (Iberia) indicated that insular vultures are significantly heavier (16%) and larger (about 3%) than those from Iberia. Bayesian and standard genetic analyses also showed differentiation (FST = 0.11, p < 0.01). The inference of changes in the effective size of the Canarian deme, using two likelihood-based Bayesian approaches, suggested that the establishment of this insular population took place some 2500 years ago, matching the date of human colonization. This is consistent with the lack of earlier fossils. CONCLUSIONS Archaeological remains show that first colonizers were Berber people from northern Africa who imported goats. This new and abundant food source could have allowed vultures to colonize, expand and adapt to the island environment. Our results suggest that anthropogenic environmental change can induce diversification and that this process may take place on an ecological time scale (less than 200 generations), even in the case of a long-lived species.
Collapse
Affiliation(s)
- Rosa Agudo
- Department of Conservation Biology, Doñana Biological Station (CSIC), Av. Américo Vespucio s/n, E-41092 Seville, Spain
| | - Ciro Rico
- Department of Wetland Ecology, Doñana Biological Station (CSIC), Av. Américo Vespucio s/n, E-41092 Seville, Spain
| | - Carles Vilà
- Department of Integrative Ecology, Doñana Biological Station (CSIC), Av. Américo Vespucio s/n, E-41092 Seville, Spain
| | - Fernando Hiraldo
- Department of Conservation Biology, Doñana Biological Station (CSIC), Av. Américo Vespucio s/n, E-41092 Seville, Spain
| | - José Antonio Donázar
- Department of Conservation Biology, Doñana Biological Station (CSIC), Av. Américo Vespucio s/n, E-41092 Seville, Spain
| |
Collapse
|
40
|
Pinho C, Hey J. Divergence with Gene Flow: Models and Data. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2010. [DOI: 10.1146/annurev-ecolsys-102209-144644] [Citation(s) in RCA: 285] [Impact Index Per Article: 20.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Catarina Pinho
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto. Campus Agrário de Vairão, 4485-661 Vairão, Portugal;
| | - Jody Hey
- Department of Genetics, Rutgers University, Piscataway, New Jersey 08854;
| |
Collapse
|
41
|
Sargsyan O. Topologies of the conditional ancestral trees and full-likelihood-based inference in the general coalescent tree framework. Genetics 2010; 185:1355-68. [PMID: 20479148 PMCID: PMC2927762 DOI: 10.1534/genetics.109.112847] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Accepted: 05/07/2010] [Indexed: 11/18/2022] Open
Abstract
The general coalescent tree framework is a family of models for determining ancestries among random samples of DNA sequences at a nonrecombining locus. The ancestral models included in this framework can be derived under various evolutionary scenarios. Here, a computationally tractable full-likelihood-based inference method for neutral polymorphisms is presented, using the general coalescent tree framework and the infinite-sites model for mutations in DNA sequences. First, an exact sampling scheme is developed to determine the topologies of conditional ancestral trees. However, this scheme has some computational limitations and to overcome these limitations a second scheme based on importance sampling is provided. Next, these schemes are combined with Monte Carlo integrations to estimate the likelihood of full polymorphism data, the ages of mutations in the sample, and the time of the most recent common ancestor. In addition, this article shows how to apply this method for estimating the likelihood of neutral polymorphism data in a sample of DNA sequences completely linked to a mutant allele of interest. This method is illustrated using the data in a sample of DNA sequences at the APOE gene locus.
Collapse
Affiliation(s)
- Ori Sargsyan
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138, USA.
| |
Collapse
|
42
|
Haasl RJ, Payseur BA. The number of alleles at a microsatellite defines the allele frequency spectrum and facilitates fast accurate estimation of theta. Mol Biol Evol 2010; 27:2702-15. [PMID: 20605970 DOI: 10.1093/molbev/msq164] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Theoretical work focused on microsatellite variation has produced a number of important results, including the expected distribution of repeat sizes and the expected squared difference in repeat size between two randomly selected samples. However, closed-form expressions for the sampling distribution and frequency spectrum of microsatellite variation have not been identified. Here, we use coalescent simulations of the stepwise mutation model to develop gamma and exponential approximations of the microsatellite allele frequency spectrum, a distribution central to the description of microsatellite variation across the genome. For both approximations, the parameter of biological relevance is the number of alleles at a locus, which we express as a function of θ, the population-scaled mutation rate, based on simulated data. Discovered relationships between θ, the number of alleles, and the frequency spectrum support the development of three new estimators of microsatellite θ. The three estimators exhibit roughly similar mean squared errors (MSEs) and all are biased. However, across a broad range of sample sizes and θ values, the MSEs of these estimators are frequently lower than all other estimators tested. The new estimators are also reasonably robust to mutation that includes step sizes greater than one. Finally, our approximation to the microsatellite allele frequency spectrum provides a null distribution of microsatellite variation. In this context, a preliminary analysis of the effects of demographic change on the frequency spectrum is performed. We suggest that simulations of the microsatellite frequency spectrum under evolutionary scenarios of interest may guide investigators to the use of relevant and sometimes novel summary statistics.
Collapse
Affiliation(s)
- Ryan J Haasl
- Laboratory of Genetics, University of Wisconsin, USA
| | | |
Collapse
|
43
|
Reconstruction of the late Pleistocene human skull from Hofmeyr, South Africa. J Hum Evol 2010; 59:1-15. [DOI: 10.1016/j.jhevol.2010.02.007] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2009] [Revised: 11/16/2009] [Accepted: 12/23/2009] [Indexed: 11/23/2022]
|
44
|
An examination of genetic diversity and effective population size in Atlantic salmon populations. Genet Res (Camb) 2010; 91:395-412. [PMID: 20122296 DOI: 10.1017/s0016672309990346] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Effective population size (Ne) is an important parameter in the conservation of genetic diversity. Comparative studies of empirical data that gauge the relative accuracy of Ne methods are limited, and a better understanding of the limitations and potential of Ne estimators is needed. This paper investigates genetic diversity and Ne in four populations of wild anadromous Atlantic salmon (Salmo salar L.) in Europe, from the Rivers Oir and Scorff (France) and Spey and Shin (Scotland). We aimed to understand present diversity and historical processes influencing current population structure. Our results showed high genetic diversity for all populations studied, despite their wide range of current effective sizes. To improve understanding of high genetic diversity observed in the populations with low effective size, we developed a model predicting present diversity as a function of past demographic history. This suggested that high genetic diversity could be explained by a bottleneck occurring within recent centuries rather than by gene flow. Previous studies have demonstrated the efficiency of coalescence models to estimate Ne. Using nine subsets from 37 microsatellite DNA markers from the four salmon populations, we compared three coalescence estimators based on single and dual samples. Comparing Ne estimates confirmed the efficiency of increasing the number and variability of microsatellite markers. This efficiency was more accentuated for the smaller populations. Analysis with low numbers of neutral markers revealed uneven distributions of allelic frequencies and overestimated short-term Ne. In addition, we found evidence of artificial stock enhancement using native and non-native origin. We propose estimates of Ne for the four populations, and their applications for salmon conservation and management are discussed.
Collapse
|
45
|
Beaumont MA, Nielsen R, Robert C, Hey J, Gaggiotti O, Knowles L, Estoup A, Panchal M, Corander J, Hickerson M, Sisson SA, Fagundes N, Chikhi L, Beerli P, Vitalis R, Cornuet JM, Huelsenbeck J, Foll M, Yang Z, Rousset F, Balding D, Excoffier L. In defence of model-based inference in phylogeography. Mol Ecol 2010; 19:436-446. [PMID: 29284924 DOI: 10.1111/j.1365-294x.2009.04515.x] [Citation(s) in RCA: 123] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best approach in statistical phylogeography. In order to remove the confusion and misconceptions introduced by these papers, we justify and explain the reasoning behind model-based inference. We argue that ABC is a statistically valid approach, alongside other computational statistical techniques that have been successfully used to infer parameters and compare models in population genetics. We also examine the NCPA method and highlight numerous deficiencies, either when used with single or multiple loci. We further show that the ages of clades are carelessly used to infer ages of demographic events, that these ages are estimated under a simple model of panmixia and population stationarity but are then used under different and unspecified models to test hypotheses, a usage the invalidates these testing procedures. We conclude by encouraging researchers to study and use model-based inference in population genetics.
Collapse
Affiliation(s)
- Mark A Beaumont
- School of Animal and Microbial Sciences, University of Reading, Whiteknights, PO Box 228, Reading, RG6 6AJ, UK
| | - Rasmus Nielsen
- Integrative Biology, UC Berkeley, 3060 Valley Life Sciences Bldg #3140, Berkeley, CA 94720-3140, USA
| | | | - Jody Hey
- Department of Genetics, Rutgers University, 604 Allison Road, Piscataway, NJ 08854, USA
| | - Oscar Gaggiotti
- Laboratoire d'Ecologie Alpine, UMR CNRS 5553, Université Joseph Fourier, BP 53, 38041 GRENOBLE, France
| | - Lacey Knowles
- Department of Ecology and Evolutionary Biology, Museum of Zoology, University of Michigan, Ann Arbor, MI 48109-1079, USA
| | - Arnaud Estoup
- INRA UMR Centre de Biologie et de Gestion des Populations (INRA ⁄ IRD ⁄ Cirad ⁄ Montpellier SupAgro), Campus international de Baillarguet, Montferrier-sur-Lez, France
| | - Mahesh Panchal
- Max Planck Institute for Evolutionary Biology, August-Thienemann-Str. 2, 24306 Plön, Germany
| | - Jukka Corander
- Department of Mathematics and statistics, University of Helsinki, Finland
| | - Mike Hickerson
- Biology Department, Queens College, City University of New York, 65-30 Kissena Boulevard, Flushing, NY 11367-1597, USA
| | - Scott A Sisson
- School of Mathematics and Statistics, University of New South Wales, Sydney, Australia
| | - Nelson Fagundes
- Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Lounès Chikhi
- Université Paul Sabatier-UMR EDB 5174 118, 31062 Toulouse Cedex 09, France
| | - Peter Beerli
- Department of Scientific Computing, Florida State University, Tallahassee, FL 32306, USA
| | - Renaud Vitalis
- CNRS-INRA, CBGP, Campus International de Baillarguet, CS 30016, 34988 Montferrier-sur-Lez, France
| | - Jean-Marie Cornuet
- INRA UMR Centre de Biologie et de Gestion des Populations (INRA ⁄ IRD ⁄ Cirad ⁄ Montpellier SupAgro), Campus international de Baillarguet, Montferrier-sur-Lez, France
| | - John Huelsenbeck
- Integrative Biology, UC Berkeley, 3060 Valley Life Sciences Bldg #3140, Berkeley, CA 94720-3140, USA
| | - Matthieu Foll
- CMPG, Institute of Ecology and Evolution, University of Berne, 3012 Berne, Switzerland.,Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Ziheng Yang
- Department of Biology, University College London, Gower Street, London WC1E 6BT, UK
| | - Francois Rousset
- Institut des Sciences de l'Évolution, Universté Montpellier 2, CNRS, Place Eugène Bataillon, CC065, Montpellier, Cedex 5, France
| | - David Balding
- Institute of Genetics, University College London, 2nd Floor, Kathleen Lonsdale Building, 5 Gower Place, London WC1E 6BT, UK
| | - Laurent Excoffier
- CMPG, Institute of Ecology and Evolution, University of Berne, 3012 Berne, Switzerland.,Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
46
|
The use of approximate Bayesian computation in conservation genetics and its application in a case study on yellow-eyed penguins. CONSERV GENET 2009. [DOI: 10.1007/s10592-009-0032-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
47
|
Abstract
A method for studying the divergence of multiple closely related populations is described and assessed. The approach of Hey and Nielsen (2007, Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proc Natl Acad Sci USA. 104:2785-2790) for fitting an isolation-with-migration model was extended to the case of multiple populations with a known phylogeny. Analysis of simulated data sets reveals the kinds of history that are accessible with a multipopulation analysis. Necessarily, processes associated with older time periods in a phylogeny are more difficult to estimate; and histories with high levels of gene flow are particularly difficult with more than two populations. However, for histories with modest levels of gene flow, or for very large data sets, it is possible to study large complex divergence problems that involve multiple closely related populations or species.
Collapse
Affiliation(s)
- Jody Hey
- Department of Genetics, Rutgers University, USA.
| |
Collapse
|
48
|
Abstract
Most methods for studying divergence with gene flow rely upon data from many individuals at few loci. Such data can be useful for inferring recent population history but they are unlikely to contain sufficient information about older events. However, the growing availability of genome sequences suggests a different kind of sampling scheme, one that may be more suited to studying relatively ancient divergence. Data sets extracted from whole-genome alignments may represent very few individuals but contain a very large number of loci. To take advantage of such data we developed a new maximum-likelihood method for genomic data under the isolation-with-migration model. Unlike many coalescent-based likelihood methods, our method does not rely on Monte Carlo sampling of genealogies, but rather provides a precise calculation of the likelihood by numerical integration over all genealogies. We demonstrate that the method works well on simulated data sets. We also consider two models for accommodating mutation rate variation among loci and find that the model that treats mutation rates as random variables leads to better estimates. We applied the method to the divergence of Drosophila melanogaster and D. simulans and detected a low, but statistically significant, signal of gene flow from D. simulans to D. melanogaster.
Collapse
|
49
|
Caciagli L, Bulayeva K, Bulayev O, Bertoncini S, Taglioli L, Pagani L, Paoli G, Tofanelli S. The key role of patrilineal inheritance in shaping the genetic variation of Dagestan highlanders. J Hum Genet 2009; 54:689-94. [PMID: 19911015 DOI: 10.1038/jhg.2009.94] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The Caucasus region is a complex cultural and ethnic mosaic, comprising populations that speak Caucasian, Indo-European and Altaic languages. Isolated mountain villages (auls) in Dagestan still preserve high level of genetic and cultural diversity and have patriarchal societies with a long history of isolation. The aim of this study was to understand the genetic history of five Dagestan highland auls with distinct ethnic affiliation (Avars, Chechens-Akkins, Kubachians, Laks, Tabasarans) using markers on the male-specific region of the Y chromosome. The groups analyzed here are all Muslims but speak different languages all belonging to the Nakh-Dagestanian linguistic family. The results show that the Dagestan ethnic groups share a common Y-genetic background, with deep-rooted genealogies and rare alleles, dating back to an early phase in the post-glacial recolonization of Europe. Geography and stochastic factors, such as founder effect and long-term genetic drift, driven by the rigid structuring of societies in groups of patrilineal descent, most likely acted as mutually reinforcing key factors in determining the high degree of Y-genetic divergence among these ethnic groups.
Collapse
Affiliation(s)
- Laura Caciagli
- Dipartimento di Biologia, Università di Pisa, Pisa, Italy
| | | | | | | | | | | | | | | |
Collapse
|
50
|
Abstract
Until recently, it has been common practice for a phylogenetic analysis to use a single gene sequence from a single individual organism as a proxy for an entire species. With technological advances, it is now becoming more common to collect data sets containing multiple gene loci and multiple individuals per species. These data sets often reveal the need to directly model intraspecies polymorphism and incomplete lineage sorting in phylogenetic estimation procedures. For a single species, coalescent theory is widely used in contemporary population genetics to model intraspecific gene trees. Here, we present a Bayesian Markov chain Monte Carlo method for the multispecies coalescent. Our method coestimates multiple gene trees embedded in a shared species tree along with the effective population size of both extant and ancestral species. The inference is made possible by multilocus data from multiple individuals per species. Using a multiindividual data set and a series of simulations of rapid species radiations, we demonstrate the efficacy of our new method. These simulations give some insight into the behavior of the method as a function of sampled individuals, sampled loci, and sequence length. Finally, we compare our new method to both an existing method (BEST 2.2) with similar goals and the supermatrix (concatenation) method. We demonstrate that both BEST and our method have much better estimation accuracy for species tree topology than concatenation, and our method outperforms BEST in divergence time and population size estimation.
Collapse
Affiliation(s)
- Joseph Heled
- Department of Computer Science, University of Auckland, New Zealand.
| | | |
Collapse
|