1
|
Carvalho J, Morales HE, Faria R, Butlin RK, Sousa VC. Integrating Pool-seq uncertainties into demographic inference. Mol Ecol Resour 2023; 23:1737-1755. [PMID: 37475177 DOI: 10.1111/1755-0998.13834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 06/16/2023] [Accepted: 06/30/2023] [Indexed: 07/22/2023]
Abstract
Next-generation sequencing of pooled samples (Pool-seq) is a popular method to assess genome-wide diversity patterns in natural and experimental populations. However, Pool-seq is associated with specific sources of noise, such as unequal individual contributions. Consequently, using Pool-seq for the reconstruction of evolutionary history has remained underexplored. Here we describe a novel Approximate Bayesian Computation (ABC) method to infer demographic history, explicitly modelling Pool-seq sources of error. By jointly modelling Pool-seq data, demographic history and the effects of selection due to barrier loci, we obtain estimates of demographic history parameters accounting for technical errors associated with Pool-seq. Our ABC approach is computationally efficient as it relies on simulating subsets of loci (rather than the whole-genome) and on using relative summary statistics and relative model parameters. Our simulation study results indicate Pool-seq data allows distinction between general scenarios of ecotype formation (single versus parallel origin) and to infer relevant demographic parameters (e.g. effective sizes and split times). We exemplify the application of our method to Pool-seq data from the rocky-shore gastropod Littorina saxatilis, sampled on a narrow geographical scale at two Swedish locations where two ecotypes (Wave and Crab) are found. Our model choice and parameter estimates show that ecotypes formed before colonization of the two locations (i.e. single origin) and are maintained despite gene flow. These results indicate that demographic modelling and inference can be successful based on pool-sequencing using ABC, contributing to the development of suitable null models that allow for a better understanding of the genetic basis of divergent adaptation.
Collapse
Affiliation(s)
- João Carvalho
- cE3c - Centre for Ecology, Evolution and Environmental Changes & CHANGE - Global Change and Sustainability Institute, Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, Portugal
| | - Hernán E Morales
- Section for Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Rui Faria
- CIBIO - Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO, Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
| | - Roger K Butlin
- Ecology and Evolutionary Biology, School of Biosciences, University of Sheffield, Sheffield, UK
- Department of Marine Sciences, University of Gothenburg, Gothenburg, Sweden
| | - Vítor C Sousa
- cE3c - Centre for Ecology, Evolution and Environmental Changes & CHANGE - Global Change and Sustainability Institute, Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, Portugal
| |
Collapse
|
2
|
Dittberner H, Tellier A, de Meaux J. Approximate Bayesian computation untangles signatures of contemporary and historical hybridization between two endangered species. Mol Biol Evol 2022; 39:6516021. [PMID: 35084503 PMCID: PMC8826969 DOI: 10.1093/molbev/msac015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Contemporary gene flow, when resumed after a period of isolation, can have crucial consequences for endangered species, as it can both increase the supply of adaptive alleles and erode local adaptation. Determining the history of gene flow and thus the importance of contemporary hybridization, however, is notoriously difficult. Here, we focus on two endangered plant species, Arabis nemorensis and A. sagittata, which hybridize naturally in a sympatric population located on the banks of the Rhine. Using reduced genome sequencing, we determined the phylogeography of the two taxa but report only a unique sympatric population. Molecular variation in chloroplast DNA indicated that A. sagittata is the principal receiver of gene flow. Applying classical D-statistics and its derivatives to whole-genome data of 35 accessions, we detect gene flow not only in the sympatric population but also among allopatric populations. Using an Approximate Bayesian computation approach, we identify the model that best describes the history of gene flow between these taxa. This model shows that low levels of gene flow have persisted long after speciation. Around 10 000 years ago, gene flow stopped and a period of complete isolation began. Eventually, a hotspot of contemporary hybridization was formed in the unique sympatric population. Occasional sympatry may have helped protect these lineages from extinction in spite of their extremely low diversity.
Collapse
Affiliation(s)
- Hannes Dittberner
- Institute of Plant Sciences,University of Cologne, Zülpicher str. 47b, Germany
| | - Aurelien Tellier
- Department of Life Science Systems, Technical University of Munich, Freising, Germany
| | - Juliette de Meaux
- Institute of Plant Sciences,University of Cologne, Zülpicher str. 47b, Germany
| |
Collapse
|
3
|
Montinaro F, Pankratov V, Yelmen B, Pagani L, Mondal M. Revisiting the out of Africa event with a deep-learning approach. Am J Hum Genet 2021; 108:2037-2051. [PMID: 34626535 PMCID: PMC8595897 DOI: 10.1016/j.ajhg.2021.09.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 09/09/2021] [Indexed: 10/20/2022] Open
Abstract
Anatomically modern humans evolved around 300 thousand years ago in Africa. They started to appear in the fossil record outside of Africa as early as 100 thousand years ago, although other hominins existed throughout Eurasia much earlier. Recently, several studies argued in favor of a single out of Africa event for modern humans on the basis of whole-genome sequence analyses. However, the single out of Africa model is in contrast with some of the findings from fossil records, which support two out of Africa events, and uniparental data, which propose a back to Africa movement. Here, we used a deep-learning approach coupled with approximate Bayesian computation and sequential Monte Carlo to revisit these hypotheses from the whole-genome sequence perspective. Our results support the back to Africa model over other alternatives. We estimated that there are two sequential separations between Africa and out of African populations happening around 60-90 thousand years ago and separated by 13-15 thousand years. One of the populations resulting from the more recent split has replaced the older West African population to a large extent, while the other one has founded the out of Africa populations.
Collapse
Affiliation(s)
- Francesco Montinaro
- Institute of Genomics, University of Tartu, Tartu 51010, Estonia; Department of Biology-Genetics, University of Bari, Bari 70124, Italy
| | - Vasili Pankratov
- Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Burak Yelmen
- Institute of Genomics, University of Tartu, Tartu 51010, Estonia; Institute of Molecular and Cell Biology, University of Tartu, Tartu 51010, Estonia; Université Paris-Saclay, CNRS UMR 9015, INRIA, Laboratoire Interdisciplinaire des Sciences du Numérique, 91400 Orsay, France
| | - Luca Pagani
- Institute of Genomics, University of Tartu, Tartu 51010, Estonia; Department of Biology, University of Padova, Padova 35121, Italy
| | - Mayukh Mondal
- Institute of Genomics, University of Tartu, Tartu 51010, Estonia.
| |
Collapse
|
4
|
Freund F, Siri-Jégousse A. The impact of genetic diversity statistics on model selection between coalescents. Comput Stat Data Anal 2021. [DOI: 10.1016/j.csda.2020.107055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
5
|
Fortes-Lima CA, Laurent R, Thouzeau V, Toupance B, Verdu P. Complex genetic admixture histories reconstructed with Approximate Bayesian Computation. Mol Ecol Resour 2021; 21:1098-1117. [PMID: 33452723 PMCID: PMC8247995 DOI: 10.1111/1755-0998.13325] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Revised: 12/11/2020] [Accepted: 01/07/2021] [Indexed: 01/19/2023]
Abstract
Admixture is a fundamental evolutionary process that has influenced genetic patterns in numerous species. Maximum‐likelihood approaches based on allele frequencies and linkage‐disequilibrium have been extensively used to infer admixture processes from genome‐wide data sets, mostly in human populations. Nevertheless, complex admixture histories, beyond one or two pulses of admixture, remain methodologically challenging to reconstruct. We developed an Approximate Bayesian Computation (ABC) framework to reconstruct highly complex admixture histories from independent genetic markers. We built the software package methis to simulate independent SNPs or microsatellites in a two‐way admixed population for scenarios with multiple admixture pulses, monotonically decreasing or increasing recurring admixture, or combinations of these scenarios. methis allows users to draw model‐parameter values from prior distributions set by the user, and, for each simulation, methis can calculate numerous summary statistics describing genetic diversity patterns and moments of the distribution of individual admixture fractions. We coupled methis with existing machine‐learning ABC algorithms and investigated the admixture history of admixed populations. Results showed that random forest ABC scenario‐choice could accurately distinguish among most complex admixture scenarios, and errors were mainly found in regions of the parameter space where scenarios were highly nested, and, thus, biologically similar. We focused on African American and Barbadian populations as two study‐cases. We found that neural network ABC posterior parameter estimation was accurate and reasonably conservative under complex admixture scenarios. For both admixed populations, we found that monotonically decreasing contributions over time, from Europe and Africa, explained the observed data more accurately than multiple admixture pulses. This approach will allow for reconstructing detailed admixture histories when maximum‐likelihood methods are intractable.
Collapse
Affiliation(s)
- Cesar A Fortes-Lima
- UMR7206 Eco-anthropologie, CNRS, Muséum National d'Histoire Naturelle, Université de Paris, Paris, France.,Sub-department of Human Evolution, Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Romain Laurent
- UMR7206 Eco-anthropologie, CNRS, Muséum National d'Histoire Naturelle, Université de Paris, Paris, France
| | - Valentin Thouzeau
- UMR7534 Centre de Recherche en Mathématiques de la Décision, CNRS, Université Paris-Dauphine, PSL University, Paris, France.,Laboratoire de Sciences Cognitives et Psycholinguistique, Département d'Etudes Cognitives, ENS, PSL University, EHESS, CNRS, Paris, France
| | - Bruno Toupance
- UMR7206 Eco-anthropologie, CNRS, Muséum National d'Histoire Naturelle, Université de Paris, Paris, France
| | - Paul Verdu
- UMR7206 Eco-anthropologie, CNRS, Muséum National d'Histoire Naturelle, Université de Paris, Paris, France
| |
Collapse
|
6
|
Momigliano P, Florin AB, Merilä J. Biases in Demographic Modeling Affect Our Understanding of Recent Divergence. Mol Biol Evol 2021; 38:2967-2985. [PMID: 33624816 PMCID: PMC8233503 DOI: 10.1093/molbev/msab047] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Testing among competing demographic models of divergence has become an important component of evolutionary research in model and non-model organisms. However, the effect of unaccounted demographic events on model choice and parameter estimation remains largely unexplored. Using extensive simulations, we demonstrate that under realistic divergence scenarios, failure to account for population size (Ne) changes in daughter and ancestral populations leads to strong biases in divergence time estimates as well as model choice. We illustrate these issues reconstructing the recent demographic history of North Sea and Baltic Sea turbots (Scophthalmus maximus) by testing 16 isolation with migration (IM) and 16 secondary contact (SC) scenarios, modeling changes in Ne as well as the effects of linked selection and barrier loci. Failure to account for changes in Ne resulted in selecting SC models with long periods of strict isolation and divergence times preceding the formation of the Baltic Sea. In contrast, models accounting for Ne changes suggest recent (<6 kya) divergence with constant gene flow. We further show how interpreting genomic landscapes of differentiation can help discerning among competing models. For example, in the turbot data, islands of differentiation show signatures of recent selective sweeps, rather than old divergence resisting secondary introgression. The results have broad implications for the study of population divergence by highlighting the potential effects of unmodeled changes in Ne on demographic inference. Tested models should aim at representing realistic divergence scenarios for the target taxa, and extreme caution should always be exercised when interpreting results of demographic modeling.
Collapse
Affiliation(s)
- Paolo Momigliano
- Ecological Genetics Research Unit, Organismal and Evolutionary Biology Research Programme, University of Helsinki, Helsinki, Finland
| | - Ann-Britt Florin
- Department of Aquatic Resources, Institute of Coastal Research, Swedish University of Agricultural Sciences, Öregrund, Sweden
| | - Juha Merilä
- Ecological Genetics Research Unit, Organismal and Evolutionary Biology Research Programme, University of Helsinki, Helsinki, Finland.,Division of Ecology and Biodiversity, Faculty of Science, The University of Hong Kong, Hong Kong SAR
| |
Collapse
|
7
|
Nye J, Mondal M, Bertranpetit J, Laayouni H. A fully integrated machine learning scan of selection in the chimpanzee genome. NAR Genom Bioinform 2021; 2:lqaa061. [PMID: 33575612 PMCID: PMC7671310 DOI: 10.1093/nargab/lqaa061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2019] [Revised: 06/11/2020] [Accepted: 07/31/2020] [Indexed: 11/13/2022] Open
Abstract
After diverging, each chimpanzee subspecies has been the target of unique selective pressures. Here, we employ a machine learning approach to classify regions as under positive selection or neutrality genome-wide. The regions determined to be under selection reflect the unique demographic and adaptive history of each subspecies. The results indicate that effective population size is important for determining the proportion of the genome under positive selection. The chimpanzee subspecies share signals of selection in genes associated with immunity and gene regulation. With these results, we have created a selection map for each population that can be displayed in a genome browser (www.hsb.upf.edu/chimp_browser). This study is the first to use a detailed demographic history and machine learning to map selection genome-wide in chimpanzee. The chimpanzee selection map will improve our understanding of the impact of selection on closely related subspecies and will empower future studies of chimpanzee.
Collapse
Affiliation(s)
- Jessica Nye
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain
| | - Mayukh Mondal
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain
| | - Jaume Bertranpetit
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain
| | - Hafid Laayouni
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain
| |
Collapse
|
8
|
Malinsky M, Matschiner M, Svardal H. Dsuite - Fast D-statistics and related admixture evidence from VCF files. Mol Ecol Resour 2021; 21:584-595. [PMID: 33012121 PMCID: PMC7116594 DOI: 10.1111/1755-0998.13265] [Citation(s) in RCA: 347] [Impact Index Per Article: 86.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Accepted: 09/14/2020] [Indexed: 12/30/2022]
Abstract
Patterson's D, also known as the ABBA-BABA statistic, and related statistics such as the f4 -ratio, are commonly used to assess evidence of gene flow between populations or closely related species. Currently available implementations often require custom file formats, implement only small subsets of the available statistics, and are impractical to evaluate all gene flow hypotheses across data sets with many populations or species due to computational inefficiencies. Here, we present a new software package Dsuite, an efficient implementation allowing genome scale calculations of the D and f4 -ratio statistics across all combinations of tens or hundreds of populations or species directly from a variant call format (VCF) file. Our program also implements statistics suited for application to genomic windows, providing evidence of whether introgression is confined to specific loci, and it can also aid in interpretation of a system of f4 -ratio results with the use of the "f-branch" method. Dsuite is available at https://github.com/millanek/Dsuite, is straightforward to use, substantially more computationally efficient than comparable programs, and provides a convenient suite of tools and statistics, including some not previously available in any software package. Thus, Dsuite facilitates the assessment of evidence for gene flow, especially across larger genomic data sets.
Collapse
Affiliation(s)
- Milan Malinsky
- Zoological Institute, University of Basel, Basel, Switzerland
| | - Michael Matschiner
- Department of Paleontology and Museum, University of Zurich, Zurich, Switzerland
- Department of Biosciences, University of Oslo, Oslo, Norway
| | - Hannes Svardal
- Department of Biology, University of Antwerp, Antwerp, Belgium
- Naturalis Biodiversity Center, Leiden, The Netherlands
| |
Collapse
|
9
|
Sanchez T, Cury J, Charpiat G, Jay F. Deep learning for population size history inference: Design, comparison and combination with approximate Bayesian computation. Mol Ecol Resour 2020; 21:2645-2660. [DOI: 10.1111/1755-0998.13224] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 06/19/2020] [Accepted: 07/02/2020] [Indexed: 12/28/2022]
Affiliation(s)
- Théophile Sanchez
- Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris‐Saclay Orsay France
| | - Jean Cury
- Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris‐Saclay Orsay France
| | - Guillaume Charpiat
- Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris‐Saclay Orsay France
| | - Flora Jay
- Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris‐Saclay Orsay France
| |
Collapse
|
10
|
Smith CCR, Flaxman SM. Leveraging whole genome sequencing data for demographic inference with approximate Bayesian computation. Mol Ecol Resour 2019; 20:125-139. [DOI: 10.1111/1755-0998.13092] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 08/30/2019] [Accepted: 09/06/2019] [Indexed: 01/16/2023]
Affiliation(s)
- Chris C. R. Smith
- Department of Ecology and Evolutionary Biology University of Colorado Boulder CO USA
| | - Samuel M. Flaxman
- Department of Ecology and Evolutionary Biology University of Colorado Boulder CO USA
| |
Collapse
|