1
|
Simões TR, Vernygora OV, de Medeiros BAS, Wright AM. Handling Logical Character Dependency in Phylogenetic Inference: Extensive Performance Testing of Assumptions and Solutions Using Simulated and Empirical Data. Syst Biol 2023; 72:662-680. [PMID: 36773019 PMCID: PMC10276625 DOI: 10.1093/sysbio/syad006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 12/08/2022] [Accepted: 02/09/2023] [Indexed: 02/12/2023] Open
Abstract
Logical character dependency is a major conceptual and methodological problem in phylogenetic inference of morphological data sets, as it violates the assumption of character independence that is common to all phylogenetic methods. It is more frequently observed in higher-level phylogenies or in data sets characterizing major evolutionary transitions, as these represent parts of the tree of life where (primary) anatomical characters either originate or disappear entirely. As a result, secondary traits related to these primary characters become "inapplicable" across all sampled taxa in which that character is absent. Various solutions have been explored over the last three decades to handle character dependency, such as alternative character coding schemes and, more recently, new algorithmic implementations. However, the accuracy of the proposed solutions, or the impact of character dependency across distinct optimality criteria, has never been directly tested using standard performance measures. Here, we utilize simple and complex simulated morphological data sets analyzed under different maximum parsimony optimization procedures and Bayesian inference to test the accuracy of various coding and algorithmic solutions to character dependency. This is complemented by empirical analyses using a recoded data set on palaeognathid birds. We find that in small, simulated data sets, absent coding performs better than other popular coding strategies available (contingent and multistate), whereas in more complex simulations (larger data sets controlled for different tree structure and character distribution models) contingent coding is favored more frequently. Under contingent coding, a recently proposed weighting algorithm produces the most accurate results for maximum parsimony. However, Bayesian inference outperforms all parsimony-based solutions to handle character dependency due to fundamental differences in their optimization procedures-a simple alternative that has been long overlooked. Yet, we show that the more primary characters bearing secondary (dependent) traits there are in a data set, the harder it is to estimate the true phylogenetic tree, regardless of the optimality criterion, owing to a considerable expansion of the tree parameter space. [Bayesian inference, character dependency, character coding, distance metrics, morphological phylogenetics, maximum parsimony, performance, phylogenetic accuracy.].
Collapse
Affiliation(s)
- Tiago R Simões
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard University, Cambridge, Massachusetts, USA
| | - Oksana V Vernygora
- Department of Entomology, University of Kentucky, Lexington, Kentucky, USA
| | | | - April M Wright
- Department of Biological Sciences, Southeastern Louisiana University, Hammond, Louisiana, USA
| |
Collapse
|
2
|
Simões TR, Caldwell MW, Pierce SE. Sphenodontian phylogeny and the impact of model choice in Bayesian morphological clock estimates of divergence times and evolutionary rates. BMC Biol 2020; 18:191. [PMID: 33287835 PMCID: PMC7720557 DOI: 10.1186/s12915-020-00901-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 10/16/2020] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND The vast majority of all life that ever existed on earth is now extinct and several aspects of their evolutionary history can only be assessed by using morphological data from the fossil record. Sphenodontian reptiles are a classic example, having an evolutionary history of at least 230 million years, but currently represented by a single living species (Sphenodon punctatus). Hence, it is imperative to improve the development and implementation of probabilistic models to estimate evolutionary trees from morphological data (e.g., morphological clocks), which has direct benefits to understanding relationships and evolutionary patterns for both fossil and living species. However, the impact of model choice on morphology-only datasets has been poorly explored. RESULTS Here, we investigate the impact of a wide array of model choices on the inference of evolutionary trees and macroevolutionary parameters (divergence times and evolutionary rates) using a new data matrix on sphenodontian reptiles. Specifically, we tested different clock models, clock partitioning, taxon sampling strategies, sampling for ancestors, and variations on the fossilized birth-death (FBD) tree model parameters through time. We find a strong impact on divergence times and background evolutionary rates when applying widely utilized approaches, such as allowing for ancestors in the tree and the inappropriate assumption of diversification parameters being constant through time. We compare those results with previous studies on the impact of model choice to molecular data analysis and provide suggestions for improving the implementation of morphological clocks. Optimal model combinations find the radiation of most major lineages of sphenodontians to be in the Triassic and a gradual but continuous drop in morphological rates of evolution across distinct regions of the phenotype throughout the history of the group. CONCLUSIONS We provide a new hypothesis of sphenodontian classification, along with detailed macroevolutionary patterns in the evolutionary history of the group. Importantly, we provide suggestions to avoid overestimated divergence times and biased parameter estimates using morphological clocks. Partitioning relaxed clocks offers methodological limitations, but those can be at least partially circumvented to reveal a detailed assessment of rates of evolution across the phenotype and tests of evolutionary mosaicism.
Collapse
Affiliation(s)
- Tiago R Simões
- Museum of Comparative Zoology & Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA.
| | - Michael W Caldwell
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta, T6G 2E9, Canada
- Department of Earth and Atmospheric Sciences, University of Alberta, Edmonton, Alberta, T6G 2E9, Canada
| | - Stephanie E Pierce
- Museum of Comparative Zoology & Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
| |
Collapse
|
3
|
Vernygora OV, Simões TR, Campbell EO. Evaluating the Performance of Probabilistic Algorithms for Phylogenetic Analysis of Big Morphological Datasets: A Simulation Study. Syst Biol 2020; 69:1088-1105. [PMID: 32191335 DOI: 10.1093/sysbio/syaa020] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 02/26/2020] [Accepted: 03/15/2020] [Indexed: 01/31/2023] Open
Abstract
Reconstructing the tree of life is an essential task in evolutionary biology. It demands accurate phylogenetic inference for both extant and extinct organisms, the latter being almost entirely dependent on morphological data. While parsimony methods have traditionally dominated the field of morphological phylogenetics, a rapidly growing number of studies are now employing probabilistic methods (maximum likelihood and Bayesian inference). The present-day toolkit of probabilistic methods offers varied software with distinct algorithms and assumptions for reaching global optimality. However, benchmark performance assessments of different software packages for the analyses of morphological data, particularly in the era of big data, are still lacking. Here, we test the performance of four major probabilistic software under variable taxonomic sampling and missing data conditions: the Bayesian inference-based programs MrBayes and RevBayes, and the maximum likelihood-based IQ-TREE and RAxML. We evaluated software performance by calculating the distance between inferred and true trees using a variety of metrics, including Robinson-Foulds (RF), Matching Splits (MS), and Kuhner-Felsenstein (KF) distances. Our results show that increased taxonomic sampling improves accuracy, precision, and resolution of reconstructed topologies across all tested probabilistic software applications and all levels of missing data. Under the RF metric, Bayesian inference applications were the most consistent, accurate, and robust to variation in taxonomic sampling in all tested conditions, especially at high levels of missing data, with little difference in performance between the two tested programs. The MS metric favored more resolved topologies that were generally produced by IQ-TREE. Adding more taxa dramatically reduced performance disparities between programs. Importantly, our results suggest that the RF metric penalizes incorrectly resolved nodes (false positives) more severely than the MS metric, which instead tends to penalize polytomies. If false positives are to be avoided in systematics, Bayesian inference should be preferred over maximum likelihood for the analysis of morphological data.
Collapse
Affiliation(s)
- Oksana V Vernygora
- Department of Biological Sciences, University of Alberta, 11455 Saskatchewan Drive, Edmonton, Alberta T6G 2E9, Canada
| | - Tiago R Simões
- Department of Biological Sciences, University of Alberta, 11455 Saskatchewan Drive, Edmonton, Alberta T6G 2E9, Canada.,Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| | - Erin O Campbell
- Department of Biological Sciences, University of Alberta, 11455 Saskatchewan Drive, Edmonton, Alberta T6G 2E9, Canada
| |
Collapse
|
4
|
How to Study Classification. Cladistics 2020. [DOI: 10.1017/9781139047678.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
5
|
Classification. Cladistics 2020. [DOI: 10.1017/9781139047678.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
6
|
Systematics Association Special Volumes. Cladistics 2020. [DOI: 10.1017/9781139047678.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
7
|
Relationship Diagrams. Cladistics 2020. [DOI: 10.1017/9781139047678.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
8
|
The Separation of Classification and Phylogenetics. Cladistics 2020. [DOI: 10.1017/9781139047678.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
9
|
Beyond Classification. Cladistics 2020. [DOI: 10.1017/9781139047678.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
10
|
The Interrelationships of Organisms. Cladistics 2020. [DOI: 10.1017/9781139047678.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
11
|
How to Study Classification. Cladistics 2020. [DOI: 10.1017/9781139047678.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
12
|
Modern Artificial Methods and Raw Data. Cladistics 2020. [DOI: 10.1017/9781139047678.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
13
|
Further Myths and More Misunderstandings. Cladistics 2020. [DOI: 10.1017/9781139047678.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
14
|
Afterword. Cladistics 2020. [DOI: 10.1017/9781139047678.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
15
|
Systematics: Exposing Myths. Cladistics 2020. [DOI: 10.1017/9781139047678.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
16
|
Essentialism and Typology. Cladistics 2020. [DOI: 10.1017/9781139047678.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
17
|
Beyond Classification: How to Study Phylogeny. Cladistics 2020. [DOI: 10.1017/9781139047678.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
18
|
How to Study Classification: ‘Total Evidence’ vs. ‘Consensus’, Character Congruence vs. Taxonomic Congruence, Simultaneous Analysis vs. Partitioned Data. Cladistics 2020. [DOI: 10.1017/9781139047678.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
|
19
|
What This Book Is About. Cladistics 2020. [DOI: 10.1017/9781139047678.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
20
|
How to Study Classification. Cladistics 2020. [DOI: 10.1017/9781139047678.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
21
|
The Cladistic Programme. Cladistics 2020. [DOI: 10.1017/9781139047678.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
22
|
Index. Cladistics 2020. [DOI: 10.1017/9781139047678.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
23
|
Parameters of Classification: Ordo Ab Chao. Cladistics 2020. [DOI: 10.1017/9781139047678.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
24
|
Monothetic and Polythetic Taxa. Cladistics 2020. [DOI: 10.1017/9781139047678.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
25
|
How to Study Classification: Consensus Techniques and General Classifications. Cladistics 2020. [DOI: 10.1017/9781139047678.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
26
|
Non-taxa or the Absence of –Phyly: Paraphyly and Aphyly. Cladistics 2020. [DOI: 10.1017/9781139047678.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
27
|
Introduction: Carving Nature at Its Joints, or Why Birds Are Not Dinosaurs and Men Are Not Apes. Cladistics 2020. [DOI: 10.1017/9781139047678.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
28
|
Preface. Cladistics 2020. [DOI: 10.1017/9781139047678.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
29
|
Sobral G, Simões TR, Schoch RR. A tiny new Middle Triassic stem-lepidosauromorph from Germany: implications for the early evolution of lepidosauromorphs and the Vellberg fauna. Sci Rep 2020; 10:2273. [PMID: 32080209 PMCID: PMC7033234 DOI: 10.1038/s41598-020-58883-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Accepted: 01/22/2020] [Indexed: 11/10/2022] Open
Abstract
The Middle Triassic was a time of major changes in tetrapod faunas worldwide, but the fossil record for this interval is largely obscure for terrestrial faunas. This poses a severe limitation to our understanding on the earliest stages of diversification of lineages representing some of the most diverse faunas in the world today, such as lepidosauromorphs (e.g., lizards and tuataras). Here, we report a tiny new lepidosauromorph from the Middle Triassic from Vellberg (Germany), which combines a mosaic of features from both early evolving squamates and rhynchocephalians, such as the simultaneous occurrence of a splenial bone and partial development of acrodonty. Phylogenetic analyses applying different optimality criteria, and combined morphological and molecular data, consistently recover the new taxon as a stem-lepidosauromorph, implying stem-lepidosauromorph species coinhabited areas comprising today's central Europe at the same time as the earliest known rhynchocephalians and squamates. It further demonstrates a more complex evolutionary scenario for dental evolution in early lepidosauromorphs, with independent acquisitions of acrodonty early in their evolutionary history. The small size of most terrestrial vertebrates from Vellberg is conspicuous, contrasting to younger Triassic deposits worldwide, but comparable to Early Triassic faunas, suggesting a potential long-lasting Lilliput effect in this fauna.
Collapse
Affiliation(s)
- Gabriela Sobral
- Staatliches Museum für Naturkunde Stuttgart, Rosenstein 1, D-70191, Stuttgart, Germany.
| | - Tiago R Simões
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, 02138, USA
| | - Rainer R Schoch
- Staatliches Museum für Naturkunde Stuttgart, Rosenstein 1, D-70191, Stuttgart, Germany
| |
Collapse
|