4
|
Perez MF, Bonatelli IAS, Romeiro-Brito M, Franco FF, Taylor NP, Zappi DC, Moraes EM. Coalescent-based species delimitation meets deep learning: Insights from a highly fragmented cactus system. Mol Ecol Resour 2021; 22:1016-1028. [PMID: 34669256 DOI: 10.1111/1755-0998.13534] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Revised: 09/16/2021] [Accepted: 10/12/2021] [Indexed: 11/26/2022]
Abstract
Delimiting species boundaries is a major goal in evolutionary biology. An increasing volume of literature has focused on the challenges of investigating cryptic diversity within complex evolutionary scenarios of speciation, including gene flow and demographic fluctuations. New methods based on model selection, such as approximate Bayesian computation, approximate likelihoods, and machine learning are promising tools arising in this field. Here, we introduce a framework for species delimitation using the multispecies coalescent model coupled with a deep learning algorithm based on convolutional neural networks (CNNs). We compared this strategy with a similar ABC approach. We applied both methods to test species boundary hypotheses based on current and previous taxonomic delimitations as well as genetic data (sequences from 41 loci) in Pilosocereus aurisetus, a cactus species complex with a sky-island distribution and taxonomic uncertainty. To validate our method, we also applied the same strategy on data from widely accepted species from the genus Drosophila. The results show that our CNN approach has a high capacity to distinguish among the simulated species delimitation scenarios, with higher accuracy than ABC. For the cactus data set, a splitter hypothesis without gene flow showed the highest probability in both CNN and ABC approaches, a result agreeing with previous taxonomic classifications and in line with the sky-island distribution and low dispersal of P. aurisetus. Our results highlight the cryptic diversity within the P. aurisetus complex and show that CNNs are a promising approach for distinguishing complex evolutionary histories, even outperforming the accuracy of other model-based approaches such as ABC.
Collapse
Affiliation(s)
- Manolo F Perez
- Departamento de Biologia, Universidade Federal de São Carlos, Sorocaba, Brazil.,Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Carlos, Brazil
| | - Isabel A S Bonatelli
- Departamento de Biologia, Universidade Federal de São Carlos, Sorocaba, Brazil.,Departamento de Ecologia e Biologia Evolutiva, Universidade Federal de São Paulo, Diadema, Brazil
| | | | - Fernando F Franco
- Departamento de Biologia, Universidade Federal de São Carlos, Sorocaba, Brazil
| | | | - Daniela C Zappi
- Programa de Pós Graduação em Botânica, Instituto de Ciências Biológicas, Universidade de Brasília, Brasília, Brazil
| | - Evandro M Moraes
- Departamento de Biologia, Universidade Federal de São Carlos, Sorocaba, Brazil
| |
Collapse
|
5
|
Clarté G, Robert CP, Ryder RJ, Stoehr J. Componentwise approximate Bayesian computation via Gibbs-like steps. Biometrika 2020. [DOI: 10.1093/biomet/asaa090] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Summary
Approximate Bayesian computation methods are useful for generative models with intractable likelihoods. These methods are, however, sensitive to the dimension of the parameter space, requiring exponentially increasing resources as this dimension grows. To tackle this difficulty we explore a Gibbs version of the approximate Bayesian computation approach that runs component-wise approximate Bayesian computation steps aimed at the corresponding conditional posterior distributions, and based on summary statistics of reduced dimensions. While lacking the standard justifications for the Gibbs sampler, the resulting Markov chain is shown to converge in distribution under some partial independence conditions. The associated stationary distribution can further be shown to be close to the true posterior distribution, and some hierarchical versions of the proposed mechanism enjoy a closed-form limiting distribution. Experiments also demonstrate the gain in efficiency brought by the Gibbs version over the standard solution.
Collapse
Affiliation(s)
- Grégoire Clarté
- CEREMADE, Université Paris-Dauphine, Place du Maréchal de Lattre de Tassigny, 75775 Paris, Cedex 16, France
| | - Christian P Robert
- CEREMADE, Université Paris-Dauphine, Place du Maréchal de Lattre de Tassigny, 75775 Paris, Cedex 16, France
| | - Robin J Ryder
- CEREMADE, Université Paris-Dauphine, Place du Maréchal de Lattre de Tassigny, 75775 Paris, Cedex 16, France
| | - Julien Stoehr
- CEREMADE, Université Paris-Dauphine, Place du Maréchal de Lattre de Tassigny, 75775 Paris, Cedex 16, France
| |
Collapse
|
6
|
Filipe JA, Kyriazakis I. Bayesian, Likelihood-Free Modelling of Phenotypic Plasticity and Variability in Individuals and Populations. Front Genet 2019; 10:727. [PMID: 31616460 PMCID: PMC6764410 DOI: 10.3389/fgene.2019.00727] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Accepted: 07/11/2019] [Indexed: 12/17/2022] Open
Abstract
There is a paradigm shift from the traditional focus on the "average" individual towards the definition and analysis of trait variation within individual life-history and among individuals in populations. This is a result of increasing availability of individual phenotypic data. The shift allows the use of genetic and environment-driven variations to assess robustness to challenge, gain greater understanding of organismal biological processes, or deliver individual-targeted treatments or genetic selection. These consequences apply, in particular, to variation in ontogenetic growth. We propose an approach to parameterise mathematical models of individual traits (e.g., reaction norms, growth curves) that address two challenges: 1) Estimation of individual traits while making minimal assumptions about data distribution and correlation, addressed via Approximate Bayesian Computation (a form of nonparametric inference). We are motivated by the fact that available information on distribution of biological data is often less precise than assumed by conventional likelihood functions. 2) Scaling-up to population phenotype distributions while facilitating unbiased use of individual data; this is addressed via a probabilistic framework where population distributions build on separately-inferred individual distributions and individual-trait interpretability is preserved. The approach is tested against Bayesian likelihood-based inference, by fitting weight and energy intake growth models to animal data and normal- and skewed-distributed simulated data. i) Individual inferences were accurate and robust to changes in data distribution and sample size; in particular, median-based predictions were more robust than maximum- likelihood-based curves. These results suggest that the approach gives reliable inferences using few observations and monitoring resources. ii) At the population level, each individual contributed via a specific data distribution, and population phenotype estimates were not disproportionally influenced by outlier individuals. Indices measuring population phenotype variation can be derived for study comparisons. The approach offers an alternative for estimating trait variability in biological systems that may be reliable for various applications, for example, in genetics, health, and individualised nutrition, while using fewer assumptions and fewer empirical observations. In livestock breeding, the potentially greater accuracy of trait estimation (without specification of multitrait variance-covariance parameters) could lead to improved selection and to more decisive estimates of trait heritability.
Collapse
Affiliation(s)
- Joao A.N. Filipe
- Agriculture, School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom
| | | |
Collapse
|
7
|
Laurin-Lemay S, Rodrigue N, Lartillot N, Philippe H. Conditional Approximate Bayesian Computation: A New Approach for Across-Site Dependency in High-Dimensional Mutation-Selection Models. Mol Biol Evol 2019; 35:2819-2834. [PMID: 30203003 DOI: 10.1093/molbev/msy173] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
A key question in molecular evolutionary biology concerns the relative roles of mutation and selection in shaping genomic data. Moreover, features of mutation and selection are heterogeneous along the genome and over time. Mechanistic codon substitution models based on the mutation-selection framework are promising approaches to separating these effects. In practice, however, several complications arise, since accounting for such heterogeneities often implies handling models of high dimensionality (e.g., amino acid preferences), or leads to across-site dependence (e.g., CpG hypermutability), making the likelihood function intractable. Approximate Bayesian Computation (ABC) could address this latter issue. Here, we propose a new approach, named Conditional ABC (CABC), which combines the sampling efficiency of MCMC and the flexibility of ABC. To illustrate the potential of the CABC approach, we apply it to the study of mammalian CpG hypermutability based on a new mutation-level parameter implying dependence across adjacent sites, combined with site-specific purifying selection on amino-acids captured by a Dirichlet process. Our proof-of-concept of the CABC methodology opens new modeling perspectives. Our application of the method reveals a high level of heterogeneity of CpG hypermutability across loci and mild heterogeneity across taxonomic groups; and finally, we show that CpG hypermutability is an important evolutionary factor in rendering relative synonymous codon usage. All source code is available as a GitHub repository (https://github.com/Simonll/LikelihoodFreePhylogenetics.git).
Collapse
Affiliation(s)
- Simon Laurin-Lemay
- Robert-Cedergren Center for Bioinformatics and Genomics, Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, QC, Canada
| | - Nicolas Rodrigue
- Department of Biology, Institute of Biochemistry, and School of Mathematics and Statistics, Carleton University, Ottawa, ON, Canada
| | - Nicolas Lartillot
- Laboratoire de Biométrie et Biologie Évolutive, UMR CNRS 5558, Université Lyon 1, Lyon, France
| | - Hervé Philippe
- Robert-Cedergren Center for Bioinformatics and Genomics, Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, QC, Canada.,Centre de Théorisation et de Modélisation de la Biodiversité, Station d'Écologie Théorique et Expérimentale, UMR CNRS 5321, Moulis, France
| |
Collapse
|
8
|
Scheib CL, Li H, Desai T, Link V, Kendall C, Dewar G, Griffith PW, Mörseburg A, Johnson JR, Potter A, Kerr SL, Endicott P, Lindo J, Haber M, Xue Y, Tyler-Smith C, Sandhu MS, Lorenz JG, Randall TD, Faltyskova Z, Pagani L, Danecek P, O'Connell TC, Martz P, Boraas AS, Byrd BF, Leventhal A, Cambra R, Williamson R, Lesage L, Holguin B, Ygnacio-De Soto E, Rosas J, Metspalu M, Stock JT, Manica A, Scally A, Wegmann D, Malhi RS, Kivisild T. Ancient human parallel lineages within North America contributed to a coastal expansion. Science 2018; 360:1024-1027. [PMID: 29853687 DOI: 10.1126/science.aar6851] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2017] [Accepted: 04/20/2018] [Indexed: 12/12/2022]
Abstract
Little is known regarding the first people to enter the Americas and their genetic legacy. Genomic analysis of the oldest human remains from the Americas showed a direct relationship between a Clovis-related ancestral population and all modern Central and South Americans as well as a deep split separating them from North Americans in Canada. We present 91 ancient human genomes from California and Southwestern Ontario and demonstrate the existence of two distinct ancestries in North America, which possibly split south of the ice sheets. A contribution from both of these ancestral populations is found in all modern Central and South Americans. The proportions of these two ancestries in ancient and modern populations are consistent with a coastal dispersal and multiple admixture events.
Collapse
Affiliation(s)
- C L Scheib
- Department of Archaeology, University of Cambridge, Cambridge CB2 3DZ, UK. .,Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Hongjie Li
- Department of Anthropology and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Tariq Desai
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
| | - Vivian Link
- Department of Biology, Université de Fribourg, Fribourg, Switzerland
| | - Christopher Kendall
- Department of Anthropology, University of Toronto, Toronto, Ontario M5S 2S2, Canada
| | - Genevieve Dewar
- Department of Anthropology, University of Toronto, Toronto, Ontario M5S 2S2, Canada
| | | | | | - John R Johnson
- Santa Barbara Museum of Natural History, Santa Barbara, CA 93105, USA
| | - Amiee Potter
- Department of Anthropology, Portland State University, Portland, OR 97232, USA.,Knight Diagnostics Laboratory, Oregon Health & Science University, Portland, OR 97239, USA
| | - Susan L Kerr
- Department of Anthropology, Modesto Junior College, Modesto, CA 95350, USA
| | - Phillip Endicott
- Department Hommes Natures Societies, Musée de l'Homme, Paris 75016, France
| | - John Lindo
- Department of Anthropology, Emory University, Atlanta, GA 30322, USA
| | - Marc Haber
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | - Yali Xue
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | - Chris Tyler-Smith
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | | | - Joseph G Lorenz
- Department of Anthropology and Museum Studies, Central Washington University, Ellensburg, WA 98926, USA
| | - Tori D Randall
- Department of Anthropology, San Diego City College, San Diego, CA 92101, USA
| | - Zuzana Faltyskova
- Department of Archaeology, University of Cambridge, Cambridge CB2 3DZ, UK
| | - Luca Pagani
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia.,APE Lab, Department of Biology, University of Padova, Padova, Italy
| | - Petr Danecek
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | - Tamsin C O'Connell
- Department of Archaeology, University of Cambridge, Cambridge CB2 3DZ, UK
| | - Patricia Martz
- Department of Anthropology, California State University, Los Angeles, CA 90032, USA
| | | | - Brian F Byrd
- Far Western Anthropological Research Group Inc., Davis, CA 95618, USA
| | - Alan Leventhal
- Muwekma Ohlone Tribe of the San Francisco Bay Area, P.O. Box 360791, Milpitas, CA 95036, USA.,Department of Anthropology, San Jose State University, San Jose, CA 95192, USA
| | - Rosemary Cambra
- Muwekma Ohlone Tribe of the San Francisco Bay Area, P.O. Box 360791, Milpitas, CA 95036, USA
| | | | | | - Brian Holguin
- Department of Anthropology, University of California, Los Angeles, CA 90095, USA
| | - Ernestine Ygnacio-De Soto
- Barbareño Chumash, California Indian Advisory Committee, Santa Barbara Museum of Natural History, Santa Barbara, CA 93105, USA
| | | | - Mait Metspalu
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Jay T Stock
- Department of Archaeology, University of Cambridge, Cambridge CB2 3DZ, UK.,Department of Anthropology, University of Western Ontario, London, Ontario N6A 3K7, Canada
| | - Andrea Manica
- Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK
| | - Aylwyn Scally
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
| | - Daniel Wegmann
- Department of Biology, Université de Fribourg, Fribourg, Switzerland
| | - Ripan S Malhi
- Department of Anthropology and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
| | - Toomas Kivisild
- Department of Archaeology, University of Cambridge, Cambridge CB2 3DZ, UK. .,Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| |
Collapse
|
10
|
Aeschbacher S, Selby JP, Willis JH, Coop G. Population-genomic inference of the strength and timing of selection against gene flow. Proc Natl Acad Sci U S A 2017; 114:7061-7066. [PMID: 28634295 PMCID: PMC5502586 DOI: 10.1073/pnas.1616755114] [Citation(s) in RCA: 78] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The interplay of divergent selection and gene flow is key to understanding how populations adapt to local environments and how new species form. Here, we use DNA polymorphism data and genome-wide variation in recombination rate to jointly infer the strength and timing of selection, as well as the baseline level of gene flow under various demographic scenarios. We model how divergent selection leads to a genome-wide negative correlation between recombination rate and genetic differentiation among populations. Our theory shows that the selection density (i.e., the selection coefficient per base pair) is a key parameter underlying this relationship. We then develop a procedure for parameter estimation that accounts for the confounding effect of background selection. Applying this method to two datasets from Mimulus guttatus, we infer a strong signal of adaptive divergence in the face of gene flow between populations growing on and off phytotoxic serpentine soils. However, the genome-wide intensity of this selection is not exceptional compared with what M. guttatus populations may typically experience when adapting to local conditions. We also find that selection against genome-wide introgression from the selfing sister species M. nasutus has acted to maintain a barrier between these two species over at least the last 250 ky. Our study provides a theoretical framework for linking genome-wide patterns of divergence and recombination with the underlying evolutionary mechanisms that drive this differentiation.
Collapse
Affiliation(s)
- Simon Aeschbacher
- Department of Evolution and Ecology, University of California, Davis, CA 95616;
- Institute of Ecology and Evolution, University of Bern, 3012 Bern, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | | | - John H Willis
- Department of Biology, Duke University, Durham, NC 27708
| | - Graham Coop
- Department of Evolution and Ecology, University of California, Davis, CA 95616
| |
Collapse
|