1
|
Potter S, Moritz C, Piggott MP, Bragg JG, Afonso Silva AC, Bi K, McDonald-Spicer C, Turakulov R, Eldridge MDB. Museum Skins Enable Identification of Introgression Associated with Cytonuclear Discordance. Syst Biol 2024; 73:579-593. [PMID: 38577768 PMCID: PMC11377193 DOI: 10.1093/sysbio/syae016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 03/14/2024] [Accepted: 04/03/2024] [Indexed: 04/06/2024] Open
Abstract
Increased sampling of genomes and populations across closely related species has revealed that levels of genetic exchange during and after speciation are higher than previously thought. One obvious manifestation of such exchange is strong cytonuclear discordance, where the divergence in mitochondrial DNA (mtDNA) differs from that for nuclear genes more (or less) than expected from differences between mtDNA and nuclear DNA (nDNA) in population size and mutation rate. Given genome-scale data sets and coalescent modeling, we can now confidently identify cases of strong discordance and test specifically for historical or recent introgression as the cause. Using population sampling, combining exon capture data from historical museum specimens and recently collected tissues we showcase how genomic tools can resolve complex evolutionary histories in the brachyotis group of rock-wallabies (Petrogale). In particular, applying population and phylogenomic approaches we can assess the role of demographic processes in driving complex evolutionary patterns and assess a role of ancient introgression and hybridization. We find that described species are well supported as monophyletic taxa for nDNA genes, but not for mtDNA, with cytonuclear discordance involving at least 4 operational taxonomic units across 4 species which diverged 183-278 kya. ABC modeling of nDNA gene trees supports introgression during or after speciation for some taxon pairs with cytonuclear discordance. Given substantial differences in body size between the species involved, this evidence for gene flow is surprising. Heterogenous patterns of introgression were identified but do not appear to be associated with chromosome differences between species. These and previous results suggest that dynamic past climates across the monsoonal tropics could have promoted reticulation among related species.
Collapse
Affiliation(s)
- Sally Potter
- School of Natural Sciences, 14 Eastern Road, Macquarie University, Macquarie Park, NSW 2109, Australia
- Division of Ecology and Evolution, Research School of Biology, 134 Linnaeus Way, The Australian National University, Acton, ACT 2601, Australia
- Australian Museum Research Institute, Australian Museum, 1 William St, Sydney, NSW 2010, Australia
| | - Craig Moritz
- Division of Ecology and Evolution, Research School of Biology, 134 Linnaeus Way, The Australian National University, Acton, ACT 2601, Australia
| | - Maxine P Piggott
- Division of Ecology and Evolution, Research School of Biology, 134 Linnaeus Way, The Australian National University, Acton, ACT 2601, Australia
- Research Institute for the Environment and Livelihoods, Charles Darwin University, Casuarina, NT 0811, Australia
| | - Jason G Bragg
- National Herbarium of New South Wales, The Royal Botanical Gardens and Domain Trust, Mrs Macquaries Road, Sydney, NSW 2000, Australia
| | | | - Ke Bi
- Museum of Vertebrate Zoology and Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Christiana McDonald-Spicer
- Division of Ecology and Evolution, Research School of Biology, 134 Linnaeus Way, The Australian National University, Acton, ACT 2601, Australia
| | - Rustamzhon Turakulov
- Australian Genome Research Facility, Victorian Comprehensive Cancer Centre, 305 Grattan Street, Melbourne, VIC 3000, Australia
- Earth Sciences, College of Science and Engineering, Flinders University GPO Box 2100, Adelaide, SA 5001, Australia
| | - Mark D B Eldridge
- Australian Museum Research Institute, Australian Museum, 1 William St, Sydney, NSW 2010, Australia
| |
Collapse
|
2
|
Zou Y, Zhang Z, Zeng Y, Hu H, Hao Y, Huang S, Li B. Common Methods for Phylogenetic Tree Construction and Their Implementation in R. Bioengineering (Basel) 2024; 11:480. [PMID: 38790347 PMCID: PMC11117635 DOI: 10.3390/bioengineering11050480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 05/04/2024] [Accepted: 05/07/2024] [Indexed: 05/26/2024] Open
Abstract
A phylogenetic tree can reflect the evolutionary relationships between species or gene families, and they play a critical role in modern biological research. In this review, we summarize common methods for constructing phylogenetic trees, including distance methods, maximum parsimony, maximum likelihood, Bayesian inference, and tree-integration methods (supermatrix and supertree). Here we discuss the advantages, shortcomings, and applications of each method and offer relevant codes to construct phylogenetic trees from molecular data using packages and algorithms in R. This review aims to provide comprehensive guidance and reference for researchers seeking to construct phylogenetic trees while also promoting further development and innovation in this field. By offering a clear and concise overview of the different methods available, we hope to enable researchers to select the most appropriate approach for their specific research questions and datasets.
Collapse
Affiliation(s)
- Yue Zou
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China; (Y.Z.); (Z.Z.); (Y.Z.); (H.H.); (Y.H.)
| | - Zixuan Zhang
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China; (Y.Z.); (Z.Z.); (Y.Z.); (H.H.); (Y.H.)
| | - Yujie Zeng
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China; (Y.Z.); (Z.Z.); (Y.Z.); (H.H.); (Y.H.)
| | - Hanyue Hu
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China; (Y.Z.); (Z.Z.); (Y.Z.); (H.H.); (Y.H.)
| | - Youjin Hao
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China; (Y.Z.); (Z.Z.); (Y.Z.); (H.H.); (Y.H.)
| | - Sheng Huang
- Animal Nutrition Institute, Chongqing Academy of Animal Science, Chongqing 402460, China
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China; (Y.Z.); (Z.Z.); (Y.Z.); (H.H.); (Y.H.)
| |
Collapse
|
3
|
Rincón-Barrado M, Villaverde T, Perez MF, Sanmartín I, Riina R. The sweet tabaiba or there and back again: phylogeographical history of the Macaronesian Euphorbia balsamifera. ANNALS OF BOTANY 2024; 133:883-904. [PMID: 38197716 PMCID: PMC11082519 DOI: 10.1093/aob/mcae001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 03/01/2024] [Indexed: 01/11/2024]
Abstract
BACKGROUND AND AIMS Biogeographical relationships between the Canary Islands and north-west Africa are often explained by oceanic dispersal and geographical proximity. Sister-group relationships between Canarian and eastern African/Arabian taxa, the 'Rand Flora' pattern, are rare among plants and have been attributed to the extinction of north-western African populations. Euphorbia balsamifera is the only representative species of this pattern that is distributed in the Canary Islands and north-west Africa; it is also one of few species present in all seven islands. Previous studies placed African populations of E. balsamifera as sister to the Canarian populations, but this relationship was based on herbarium samples with highly degraded DNA. Here, we test the extinction hypothesis by sampling new continental populations; we also expand the Canarian sampling to examine the dynamics of island colonization and diversification. METHODS Using target enrichment with genome skimming, we reconstructed phylogenetic relationships within E. balsamifera and between this species and its disjunct relatives. A single nucleotide polymorphism dataset obtained from the target sequences was used to infer population genetic diversity patterns. We used convolutional neural networks to discriminate among alternative Canary Islands colonization scenarios. KEY RESULTS The results confirmed the Rand Flora sister-group relationship between western E. balsamifera and Euphorbia adenensis in the Eritreo-Arabian region and recovered an eastern-western geographical structure among E. balsamifera Canarian populations. Convolutional neural networks supported a scenario of east-to-west island colonization, followed by population extinctions in Lanzarote and Fuerteventura and recolonization from Tenerife and Gran Canaria; a signal of admixture between the eastern island and north-west African populations was recovered. CONCLUSIONS Our findings support the Surfing Syngameon Hypothesis for the colonization of the Canary Islands by E. balsamifera, but also a recent back-colonization to the continent. Populations of E. balsamifera from northwest Africa are not the remnants of an ancestral continental stock, but originated from migration events from Lanzarote and Fuerteventura. This is further evidence that oceanic archipelagos are not a sink for biodiversity, but may be a source of new genetic variability.
Collapse
Affiliation(s)
- Mario Rincón-Barrado
- Real Jardín Botánico (RJB), CSIC, Madrid, 28014, Spain
- Centro Nacional de Biotecnología (CNB), CSIC, Madrid, 28049, Spain
| | - Tamara Villaverde
- Universidad Rey Juan Carlos (URJC), Área de Biodiversidad y Conservación, Móstoles, 28933, Spain
| | - Manolo F Perez
- Institut de Systématique, Evolution, Biodiversité (ISYEB – URM 7205 CNRS), Muséum National d’Histoire Naturelle, SU, EPHE & UA, Paris, France
| | | | - Ricarda Riina
- Real Jardín Botánico (RJB), CSIC, Madrid, 28014, Spain
| |
Collapse
|
4
|
Steenwyk JL, Li Y, Zhou X, Shen XX, Rokas A. Incongruence in the phylogenomics era. Nat Rev Genet 2023; 24:834-850. [PMID: 37369847 DOI: 10.1038/s41576-023-00620-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/19/2023] [Indexed: 06/29/2023]
Abstract
Genome-scale data and the development of novel statistical phylogenetic approaches have greatly aided the reconstruction of a broad sketch of the tree of life and resolved many of its branches. However, incongruence - the inference of conflicting evolutionary histories - remains pervasive in phylogenomic data, hampering our ability to reconstruct and interpret the tree of life. Biological factors, such as incomplete lineage sorting, horizontal gene transfer, hybridization, introgression, recombination and convergent molecular evolution, can lead to gene phylogenies that differ from the species tree. In addition, analytical factors, including stochastic, systematic and treatment errors, can drive incongruence. Here, we review these factors, discuss methodological advances to identify and handle incongruence, and highlight avenues for future research.
Collapse
Affiliation(s)
- Jacob L Steenwyk
- Howards Hughes Medical Institute and the Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
- Vanderbilt Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN, USA
| | - Yuanning Li
- Institute of Marine Science and Technology, Shandong University, Qingdao, China
| | - Xiaofan Zhou
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Centre, South China Agricultural University, Guangzhou, China
| | - Xing-Xing Shen
- Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA.
- Vanderbilt Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN, USA.
- Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.
| |
Collapse
|
5
|
Tabatabaee Y, Roch S, Warnow T. QR-STAR: A Polynomial-Time Statistically Consistent Method for Rooting Species Trees Under the Coalescent. J Comput Biol 2023; 30:1146-1181. [PMID: 37902986 DOI: 10.1089/cmb.2023.0185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2023] Open
Abstract
We address the problem of rooting an unrooted species tree given a set of unrooted gene trees, under the assumption that gene trees evolve within the model species tree under the multispecies coalescent (MSC) model. Quintet Rooting (QR) is a polynomial time algorithm that was recently proposed for this problem, which is based on the theory developed by Allman, Degnan, and Rhodes that proves the identifiability of rooted 5-taxon trees from unrooted gene trees under the MSC. However, although QR had good accuracy in simulations, its statistical consistency was left as an open problem. We present QR-STAR, a variant of QR with an additional step and a different cost function, and prove that it is statistically consistent under the MSC. Moreover, we derive sample complexity bounds for QR-STAR and show that a particular variant of it based on "short quintets" has polynomial sample complexity. Finally, our simulation study under a variety of model conditions shows that QR-STAR matches or improves on the accuracy of QR. QR-STAR is available in open-source form on github.
Collapse
Affiliation(s)
- Yasamin Tabatabaee
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Sebastien Roch
- Department of Mathematics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| |
Collapse
|
6
|
Ramírez-Reyes T, Armendáriz-Toledano F, Rodríguez LGC. Rearranging and completing the puzzle: Phylogenomic analysis of bark beetles Dendroctonus reveals new hypotheses about genus diversification. Mol Phylogenet Evol 2023; 187:107885. [PMID: 37467902 DOI: 10.1016/j.ympev.2023.107885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 07/07/2023] [Accepted: 07/16/2023] [Indexed: 07/21/2023]
Abstract
Studies carried out on bark beetles within Dendroctonus have been extensive and revealed diverse information in different areas of their natural history, taxonomy, evolution, and interactions, among others. Despite these efforts, phylogenetic hypotheses have remained obscured mainly due to limited information analyzed (taxonomic, gene sampling, or both) in studies focused on obtaining evolutionary hypotheses for this genus. With the aim of filling these gaps in the evolutionary history for Dendroctonus, we analyzed ∼1800 loci mapped to a reference genome obtained for 20 of the 21 species recognized to date, minimizing the impact of missing information and improving the assumption of orthology in a phylogenomic framework. We obtained congruent phylogenetic topologies from two phylogenomic inference strategies: loci concatenation (ML framework) and a multispecies coalescent model (MSC) through the analysis of site pattern frequencies (SNPs). Dendroctonus is composed of two major clades (A and B), each containing five and four subclades, respectively. According to our divergence dating analysis, the MRCA for Dendroctonus dates back to the early Eocene, while the MRCA for each major clade diverged in the mid-Eocene. Interestingly, most of the speciation events of extant species occurred during the Miocene, which could be correlated with the diversification of pine trees (Pinus). The MRCA for Dendroctonus inhabited large regions of North America, with all ancestors and descendants of clade A having diversified within this region. The Mexican Transition Zone is important in the diversification processes for the majority of clade A species. For clade B, we identified two important colonization events to the Old World from America: the first in the early Oligocene from the Arctic to Asia (via Beringia), and the second during the Miocene from the Arctic-Western-Alleghany region to Europe and Siberia (also via Beringia). Our genomic analyses also supported the existence of hidden structured lineages within the frontalis complex, and also that D. beckeri represent a lineage independent from D. valens, as previously suggested. The information presented here updates the knowledge concerning the diversification of a genus with remarkable ecological and economic importance.
Collapse
Affiliation(s)
- Tonatiuh Ramírez-Reyes
- Instituto de Biología, Departamento de Zoología, Colección Nacional de Insectos, Universidad Nacional Autónoma de México, Circuito Zona Deportiva S/N, C.U., Coyoacán, 04510 Ciudad de México, Mexico; Facultad de Ciencias Forestales, Universidad Autónoma de Nuevo León, Carretera Nacional 85, Km. 145, 67700 Linares, Nuevo León, Mexico
| | - Francisco Armendáriz-Toledano
- Instituto de Biología, Departamento de Zoología, Colección Nacional de Insectos, Universidad Nacional Autónoma de México, Circuito Zona Deportiva S/N, C.U., Coyoacán, 04510 Ciudad de México, Mexico.
| | - Luis Gerardo Cuéllar Rodríguez
- Facultad de Ciencias Forestales, Universidad Autónoma de Nuevo León, Carretera Nacional 85, Km. 145, 67700 Linares, Nuevo León, Mexico
| |
Collapse
|
7
|
Arantes ÍC, Vasconcellos MM, Smith ML, Garrick RC, Colli GR, Noonan BP. Species limits and diversification of the Dendropsophus rubicundulus subgroup (Anura, Hylidae) in Neotropical savannas. Mol Phylogenet Evol 2023:107843. [PMID: 37286064 DOI: 10.1016/j.ympev.2023.107843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 06/01/2023] [Accepted: 06/02/2023] [Indexed: 06/09/2023]
Abstract
Understanding the processes that generate and maintain biodiversity at and below the species level is a central goal of evolutionary biology. Here we explore the spatial and temporal drivers of diversification of the treefrog subgroup Dendropsophus rubicundulus, a subgroup of the D. microcephalus species group, over periods of pronounced geological and climatic changes in the Neotropical savannas that they inhabit. This subgroup currently comprises 11 recognized species distributed across the Brazilian and Bolivian savannas, but the taxonomy has been in a state of flux, necessitating reexamination. Using newly generated single nucleotide polymorphism (SNP) data from restriction-site associated DNA sequencing (RADseq) and mitochondrial 16S sequence data for ∼150 specimens, we inferred phylogenetic relationships, tested species limits using a model-based approach, and estimated divergence times to gain insights into the geographic and climatic events that affected the diversification of this subgroup. Our results recognized at least nine species: D. anataliasiasi, D. araguaya, D. cerradensis, D. elianeae, D. jimi, D. rubicundulus, D. tritaeniatus, D. rozenmani, and D. sanborni. Although we did not collect SNP data for the latter two species, they are likely distinct based on mitochondrial data. In addition, we found genetic structure within the widespread species D. rubicundulus, which comprises three allopatric lineages connected by gene flow upon secondary contact. We also found evidence of population structure and perhaps undescribed diversity in D. elianeae, which warrants further study. The D. rubicundulus subgroup is estimated to have originated in the Late Miocene (∼5.45 million years ago), with diversification continuing through the Pliocene and Early Pleistocene, followed by the most recent divergence of D. rubicundulus lineages in the Middle Pleistocene. The epeirogenic uplift followed by erosion and denudation of the central Brazilian plateau throughout the Pliocene and Pleistocene, in combination with the increasing frequency and amplitude of climatic fluctuations during the Pleistocene, was important for generating and structuring diversity at or below the species level in the D. rubicundulus subgroup.
Collapse
Affiliation(s)
- Ísis C Arantes
- Department of Biology, University of Mississippi, Oxford, MS 38677, USA.
| | - Mariana M Vasconcellos
- Departamento de Zoologia, Instituto de Biociências, Universidade de São Paulo, São Paulo, SP, 05508-090, Brazil
| | - Megan L Smith
- Department of Biology and Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| | - Ryan C Garrick
- Department of Biology, University of Mississippi, Oxford, MS 38677, USA
| | - Guarino R Colli
- Departamento de Zoologia, Universidade de Brasília, 70910-900 Brasília, Distrito Federal, Brazil
| | - Brice P Noonan
- Department of Biology, University of Mississippi, Oxford, MS 38677, USA
| |
Collapse
|
8
|
Peng J, Swofford DL, Kubatko L. Estimation of speciation times under the multispecies coalescent. Bioinformatics 2022; 38:5182-5190. [PMID: 36227122 DOI: 10.1093/bioinformatics/btac679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 06/02/2022] [Accepted: 10/10/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The multispecies coalescent model is now widely accepted as an effective model for incorporating variation in the evolutionary histories of individual genes into methods for phylogenetic inference from genome-scale data. However, because model-based analysis under the coalescent can be computationally expensive for large datasets, a variety of inferential frameworks and corresponding algorithms have been proposed for estimation of species-level phylogenies and associated parameters, including speciation times and effective population sizes. RESULTS We consider the problem of estimating the timing of speciation events along a phylogeny in a coalescent framework. We propose a maximum a posteriori estimator based on composite likelihood (MAPCL) for inferring these speciation times under a model of DNA sequence evolution for which exact site-pattern probabilities can be computed under the assumption of a constant θ throughout the species tree. We demonstrate that the MAPCL estimates are statistically consistent and asymptotically normally distributed, and we show how this result can be used to estimate their asymptotic variance. We also provide a more computationally efficient estimator of the asymptotic variance based on the non-parametric bootstrap. We evaluate the performance of our method using simulation and by application to an empirical dataset for gibbons. AVAILABILITY AND IMPLEMENTATION The method has been implemented in the PAUP* program, freely available at https://paup.phylosolutions.com for Macintosh, Windows and Linux operating systems. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jing Peng
- Division of Biostatistics, The Ohio State University, Columbus, OH 43210, USA
| | - David L Swofford
- Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
| | - Laura Kubatko
- Department of Statistics, The Ohio State University, Columbus, OH 43210, USA.,Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH 43210, USA.,Mathematical Biosciences Institute, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
9
|
Baumel A, Nieto Feliner G, Médail F, La Malfa S, Di Guardo M, Bou Dagher Kharrat M, Lakhal-Mirleau F, Frelon V, Ouahmane L, Diadema K, Sanguin H, Viruel J. Genome-wide footprints in the carob tree (Ceratonia siliqua) unveil a new domestication pattern of a fruit tree in the Mediterranean. Mol Ecol 2022; 31:4095-4111. [PMID: 35691023 PMCID: PMC9541536 DOI: 10.1111/mec.16563] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 05/13/2022] [Accepted: 06/08/2022] [Indexed: 12/22/2022]
Abstract
Intense research efforts over the last two decades have renewed our understanding of plant phylogeography and domestication in the Mediterranean basin. Here we aim to investigate the evolutionary history and the origin of domestication of the carob tree (Ceratonia siliqua), which has been cultivated for millennia for food and fodder. We used >1000 microsatellite genotypes to delimit seven carob evolutionary units (CEUs). We investigated genome‐wide diversity and evolutionary patterns of the CEUs with 3557 single nucleotide polymorphisms generated by restriction‐site associated DNA sequencing (RADseq). To address the complex wild vs. cultivated status of sampled trees, we classified 56 sampled populations across the Mediterranean basin as wild, seminatural or cultivated. Nuclear and cytoplasmic loci were identified from RADseq data and separated for analyses. Phylogenetic analyses of these genomic‐wide data allowed us to resolve west‐to‐east expansions from a single long‐term refugium probably located in the foothills of the High Atlas Mountains near the Atlantic coast. Our findings support multiple origins of domestication with a low impact on the genetic diversity at range‐wide level. The carob was mostly domesticated from locally selected wild genotypes and scattered long‐distance westward dispersals of domesticated varieties by humans, concomitant with major historical migrations by Romans, Greeks and Arabs. Ex situ efforts to preserve carob genetic resources should prioritize accessions from both western and eastern populations, with emphasis on the most differentiated CEUs situated in southwest Morocco, south Spain and eastern Mediterranean. Our study highlights the relevance of wild and seminatural habitats in the conservation of genetic resources for cultivated trees.
Collapse
Affiliation(s)
- Alex Baumel
- Aix Marseille University, Avignon University, CNRS, IRD, IMBE, Institut Méditerranéen de Biodiversité et d'Ecologie marine et continentale, Faculté des Sciences et Techniques St-Jérôme, Marseille, France
| | | | - Frédéric Médail
- Aix Marseille University, Avignon University, CNRS, IRD, IMBE, Institut Méditerranéen de Biodiversité et d'Ecologie marine et continentale, Faculté des Sciences et Techniques St-Jérôme, Marseille, France
| | - Stefano La Malfa
- Department of Agriculture, Food and Environment (Di3A), University of Catania, Catania, Italy
| | - Mario Di Guardo
- Department of Agriculture, Food and Environment (Di3A), University of Catania, Catania, Italy
| | - Magda Bou Dagher Kharrat
- Laboratoire Biodiversité et Génomique Fonctionnelle, Faculté des Sciences, Université Saint-Joseph, Campus Sciences et Technologies, Beirut, Lebanon
| | - Fatma Lakhal-Mirleau
- Aix Marseille University, Avignon University, CNRS, IRD, IMBE, Institut Méditerranéen de Biodiversité et d'Ecologie marine et continentale, Faculté des Sciences et Techniques St-Jérôme, Marseille, France
| | - Valentine Frelon
- Aix Marseille University, Avignon University, CNRS, IRD, IMBE, Institut Méditerranéen de Biodiversité et d'Ecologie marine et continentale, Faculté des Sciences et Techniques St-Jérôme, Marseille, France
| | - Lahcen Ouahmane
- Faculté des Sciences Semlalia, Laboratoire de Biotechnologies Microbiennes Agrosciences et Environnement, Université Cadi Ayyad Marrakech, Marrakech, Morocco
| | - Katia Diadema
- Conservatoire Botanique National Méditerranéen de Porquerolles (CBNMed), Hyères, France
| | - Hervé Sanguin
- CIRAD, UMR PHIM, Montpellier, France.,PHIM, Univ Montpellier, CIRAD, INRAE, Institut Agro, IRD, Montpellier, France
| | | |
Collapse
|
10
|
Mirarab S, Nakhleh L, Warnow T. Multispecies Coalescent: Theory and Applications in Phylogenetics. ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS 2021. [DOI: 10.1146/annurev-ecolsys-012121-095340] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Species tree estimation is a basic part of many biological research projects, ranging from answering basic evolutionary questions (e.g., how did a group of species adapt to their environments?) to addressing questions in functional biology. Yet, species tree estimation is very challenging, due to processes such as incomplete lineage sorting, gene duplication and loss, horizontal gene transfer, and hybridization, which can make gene trees differ from each other and from the overall evolutionary history of the species. Over the last 10–20 years, there has been tremendous growth in methods and mathematical theory for estimating species trees and phylogenetic networks, and some of these methods are now in wide use. In this survey, we provide an overview of the current state of the art, identify the limitations of existing methods and theory, and propose additional research problems and directions.
Collapse
Affiliation(s)
- Siavash Mirarab
- Electrical and Computer Engineering Department, University of California, San Diego, La Jolla, California 92093, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
11
|
Molloy EK, Gatesy J, Springer MS. Theoretical and practical considerations when using retroelement insertions to estimate species trees in the anomaly zone. Syst Biol 2021; 71:721-740. [PMID: 34677617 DOI: 10.1093/sysbio/syab086] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 10/11/2021] [Indexed: 11/13/2022] Open
Abstract
A potential shortcoming of concatenation methods for species tree estimation is their failure to account for incomplete lineage sorting. Coalescent methods address this problem but make various assumptions that, if violated, can result in worse performance than concatenation. Given the challenges of analyzing DNA sequences with both concatenation and coalescent methods, retroelement insertions (RIs) have emerged as powerful phylogenomic markers for species tree estimation. Here, we show that two recently proposed quartet-based methods, SDPquartets and ASTRAL_BP, are statistically consistent estimators of the unrooted species tree topology under the coalescent when RIs follow a neutral infinite-sites model of mutation and the expected number of new RIs per generation is constant across the species tree. The accuracy of these (and other) methods for inferring species trees from RIs has yet to be assessed on simulated data sets, where the true species tree topology is known. Therefore, we evaluated eight methods given RIs simulated from four model species trees, all of which have short branches and at least three of which are in the anomaly zone. In our simulation study, ASTRAL_BP and SDPquartets always recovered the correct species tree topology when given a sufficiently large number of RIs, as predicted. A distance-based method (ASTRID_BP) and Dollo parsimony also performed well in recovering the species tree topology. In contrast, unordered, polymorphism, and Camin-Sokal parsimony typically fail to recover the correct species tree topology in anomaly zone situations with more than four ingroup taxa. Of the methods studied, only ASTRAL_BP automatically estimates internal branch lengths (in coalescent units) and support values (i.e. local posterior probabilities). We examined the accuracy of branch length estimation, finding that estimated lengths were accurate for short branches but upwardly biased otherwise. This led us to derive the maximum likelihood (branch length) estimate for when RIs are given as input instead of binary gene trees; this corrected formula produced accurate estimates of branch lengths in our simulation study, provided that a sufficiently large number of RIs were given as input. Lastly, we evaluated the impact of data quantity on species tree estimation by repeating the above experiments with input sizes varying from 100 to 100 000 parsimony-informative RIs. We found that, when given just 1 000 parsimony-informative RIs as input, ASTRAL_BP successfully reconstructed major clades (i.e clades separated by branches > 0.3 CUs) with high support and identified rapid radiations (i.e. shorter connected branches), although not their precise branching order. The local posterior probability was effective for controlling false positive branches in these scenarios.
Collapse
Affiliation(s)
- Erin K Molloy
- Department of Computer Science, University of Maryland, College Park, College Park, 20742, USA
| | - John Gatesy
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, 10024, USA
| | - Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, Riverside, 92521, USA
| |
Collapse
|
12
|
Richards A, Kubatko L. Bayesian-Weighted Triplet and Quartet Methods for Species Tree Inference. Bull Math Biol 2021; 83:93. [PMID: 34297209 DOI: 10.1007/s11538-021-00918-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 06/03/2021] [Indexed: 11/26/2022]
Abstract
Inference of the evolutionary histories of species, commonly represented by a species tree, is complicated by the divergent evolutionary history of different parts of the genome. Different loci on the genome can have different histories from the underlying species tree (and each other) due to processes such as incomplete lineage sorting (ILS), gene duplication and loss, and horizontal gene transfer. The multispecies coalescent is a commonly used model for performing inference on species and gene trees in the presence of ILS. This paper introduces Lily-T and Lily-Q, two new methods for species tree inference under the multispecies coalescent. We then compare them to two frequently used methods, SVDQuartets and ASTRAL, using simulated and empirical data. Both methods generally showed improvement over SVDQuartets, and Lily-Q was superior to Lily-T for most simulation settings. The comparison to ASTRAL was more mixed-Lily-Q tended to be better than ASTRAL when the length of recombination-free loci was short, when the coalescent population parameter [Formula: see text] was small, or when the internal branch lengths were longer.
Collapse
Affiliation(s)
- Andrew Richards
- Department of Statistics, The Ohio State University, Columbus, USA
| | - Laura Kubatko
- Department of Statistics, The Ohio State University, Columbus, USA.
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, USA.
| |
Collapse
|
13
|
Long C, Kubatko L. Hypothesis Testing With Rank Conditions in Phylogenetics. Front Genet 2021; 12:664357. [PMID: 34276772 PMCID: PMC8283673 DOI: 10.3389/fgene.2021.664357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 05/26/2021] [Indexed: 12/02/2022] Open
Abstract
A phylogenetic model of sequence evolution for a set of n taxa is a collection of probability distributions on the 4n possible site patterns that may be observed in their aligned DNA sequences. For a four-taxon model, one can arrange the entries of these probability distributions into three flattening matrices that correspond to the three different unrooted leaf-labeled four-leaf trees, or quartet trees. The flattening matrix corresponding to the tree parameter of the model is known to satisfy certain rank conditions. Methods such as ErikSVD and SVDQuartets take advantage of this observation by applying singular value decomposition to flattening matrices consisting of empirical data. Each possible quartet is assigned an “SVD score” based on how close the flattening is to the set of matrices of the predicted rank. When choosing among possible quartets, the one with the lowest score is inferred to be the phylogeny of the four taxa under consideration. Since an n-leaf phylogenetic tree is determined by its quartets, this approach can be generalized to infer larger phylogenies. In this article, we explore using the SVD score as a test statistic to test whether phylogenetic data were generated by a particular quartet tree. To do so, we use several results to approximate the distribution of the SVD score and to give upper bounds on the p-value of the associated hypothesis tests. We also apply these hypothesis tests to simulated phylogenetic data and discuss the implications for interpreting SVD scores in rank-based inference methods.
Collapse
Affiliation(s)
- Colby Long
- Department of Mathematical and Computational Sciences, College of Wooster, Wooster, OH, United States
| | - Laura Kubatko
- Department of Statistics and Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH, United States
| |
Collapse
|
14
|
Yourdkhani S, Allman ES, Rhodes JA. Parameter Identifiability for a Profile Mixture Model of Protein Evolution. J Comput Biol 2021; 28:570-586. [PMID: 33960831 DOI: 10.1089/cmb.2020.0315] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
A profile mixture (PM) model is a model of protein evolution, describing sequence data in which sites are assumed to follow many related substitution processes on a single evolutionary tree. The processes depend, in part, on different amino acid distributions, or profiles, varying over sites in aligned sequences. A fundamental question for any stochastic model, which must be answered positively to justify model-based inference, is whether the parameters are identifiable from the probability distribution they determine. Here, using algebraic methods, we show that a PM model has identifiable parameters under circumstances in which it is likely to be used for empirical analyses. In particular, for a tree relating 9 or more taxa, both the tree topology and all numerical parameters are generically identifiable when the number of profiles is less than 74.
Collapse
Affiliation(s)
- Samaneh Yourdkhani
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, Alaska, USA
| | - Elizabeth S Allman
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, Alaska, USA
| | - John A Rhodes
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, Alaska, USA
| |
Collapse
|
15
|
New Approaches for Inferring Phylogenies in the Presence of Paralogs. Trends Genet 2021; 37:174-187. [DOI: 10.1016/j.tig.2020.08.012] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 08/13/2020] [Accepted: 08/19/2020] [Indexed: 12/18/2022]
|