Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Seo TK, Thorne JL. Information Criteria for Comparing Partition Schemes. Syst Biol 2018;67:616-632. [PMID: 29309694 DOI: 10.1093/sysbio/syx097] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Accepted: 12/17/2017] [Indexed: 01/10/2023] Open

For:	Seo TK, Thorne JL. Information Criteria for Comparing Partition Schemes. Syst Biol 2018;67:616-632. [PMID: 29309694 DOI: 10.1093/sysbio/syx097] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Accepted: 12/17/2017] [Indexed: 01/10/2023] Open

Number

Cited by Other Article(s)

Baños H, Susko E, Roger AJ. Is Over-parameterization a Problem for Profile Mixture Models? Syst Biol 2024;73:53-75. [PMID: 37843172 PMCID: PMC11129589 DOI: 10.1093/sysbio/syad063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 09/12/2023] [Accepted: 10/13/2023] [Indexed: 10/17/2023] Open

Abstract

Biochemical constraints on the admissible amino acids at specific sites in proteins lead to heterogeneity of the amino acid substitution process over sites in alignments. It is well known that phylogenetic models of protein sequence evolution that do not account for site heterogeneity are prone to long-branch attraction (LBA) artifacts. Profile mixture models were developed to model heterogeneity of preferred amino acids at sites via a finite distribution of site classes each with a distinct set of equilibrium amino acid frequencies. However, it is unknown whether the large number of parameters in such models associated with the many amino acid frequency vectors can adversely affect tree topology estimates because of over-parameterization. Here, we demonstrate theoretically that for long sequences, over-parameterization does not create problems for estimation with profile mixture models. Under mild conditions, tree, amino acid frequencies, and other model parameters converge to true values as sequence length increases, even when there are large numbers of components in the frequency profile distributions. Because large sample theory does not necessarily imply good behavior for shorter alignments we explore the performance of these models with short alignments simulated with tree topologies that are prone to LBA artifacts. We find that over-parameterization is not a problem for complex profile mixture models even when there are many amino acid frequency vectors. In fact, simple models with few site classes behave poorly. Interestingly, we also found that misspecification of the amino acid frequency vectors does not lead to increased LBA artifacts as long as the estimated cumulative distribution function of the amino acid frequencies at sites adequately approximates the true one. In contrast, misspecification of the amino acid exchangeability rates can severely negatively affect parameter estimation. Finally, we explore the effects of including in the profile mixture model an additional "F-class" representing the overall frequencies of amino acids in the data set. Surprisingly, the F-class does not help parameter estimation significantly and can decrease the probability of correct tree estimation, depending on the scenario, even though it tends to improve likelihood scores.

Collapse

Burgstaller-Muehlbacher S, Crotty SM, Schmidt HA, Reden F, Drucks T, von Haeseler A. ModelRevelator: Fast phylogenetic model estimation via deep learning. Mol Phylogenet Evol 2023;188:107905. [PMID: 37595933 DOI: 10.1016/j.ympev.2023.107905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 08/12/2023] [Indexed: 08/20/2023]

Lartillot N. Identifying the Best Approximating Model in Bayesian Phylogenetics: Bayes Factors, Cross-Validation or wAIC? Syst Biol 2023;72:616-638. [PMID: 36810802 PMCID: PMC10276628 DOI: 10.1093/sysbio/syad004] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 01/20/2023] [Accepted: 02/17/2023] [Indexed: 02/23/2023] Open

Abstract

There is still no consensus as to how to select models in Bayesian phylogenetics, and more generally in applied Bayesian statistics. Bayes factors are often presented as the method of choice, yet other approaches have been proposed, such as cross-validation or information criteria. Each of these paradigms raises specific computational challenges, but they also differ in their statistical meaning, being motivated by different objectives: either testing hypotheses or finding the best-approximating model. These alternative goals entail different compromises, and as a result, Bayes factors, cross-validation, and information criteria may be valid for addressing different questions. Here, the question of Bayesian model selection is revisited, with a focus on the problem of finding the best-approximating model. Several model selection approaches were re-implemented, numerically assessed and compared: Bayes factors, cross-validation (CV), in its different forms (k-fold or leave-one-out), and the widely applicable information criterion (wAIC), which is asymptotically equivalent to leave-one-out cross-validation (LOO-CV). Using a combination of analytical results and empirical and simulation analyses, it is shown that Bayes factors are unduly conservative. In contrast, CV represents a more adequate formalism for selecting the model returning the best approximation of the data-generating process and the most accurate estimates of the parameters of interest. Among alternative CV schemes, LOO-CV and its asymptotic equivalent represented by the wAIC, stand out as the best choices, conceptually and computationally, given that both can be simultaneously computed based on standard Markov chain Monte Carlo runs under the posterior distribution. [Bayes factor; cross-validation; marginal likelihood; model comparison; wAIC.].

Collapse

Selection Criteria for Overlapping Binary Models—A Simulation Study. MATHEMATICS 2022. [DOI: 10.3390/math10030478] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]

Crotty SM, Holland BR. Comparing partitioned models to mixture models: Do information criteria apply? Syst Biol 2022;71:1541-1548. [PMID: 35041002 PMCID: PMC9558833 DOI: 10.1093/sysbio/syac003] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 12/15/2021] [Accepted: 01/10/2022] [Indexed: 12/01/2022] Open

Seo TK, Gascuel O, Thorne JL. Measuring Phylogenetic Information of Incomplete Sequence Data. Syst Biol 2021;71:630-648. [PMID: 34469581 DOI: 10.1093/sysbio/syab073] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2021] [Revised: 08/26/2021] [Accepted: 08/27/2021] [Indexed: 11/13/2022] Open

A unified resource and configurable model of the synapse proteome and its role in disease. Sci Rep 2021;11:9967. [PMID: 33976238 PMCID: PMC8113277 DOI: 10.1038/s41598-021-88945-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 04/15/2021] [Indexed: 02/03/2023] Open

Evidence for sponges as sister to all other animals from partitioned phylogenomics with mixture models and recoding. Nat Commun 2021;12:1783. [PMID: 33741994 PMCID: PMC7979703 DOI: 10.1038/s41467-021-22074-7] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Accepted: 02/24/2021] [Indexed: 11/08/2022] Open

Simões TR, Caldwell MW, Pierce SE. Sphenodontian phylogeny and the impact of model choice in Bayesian morphological clock estimates of divergence times and evolutionary rates. BMC Biol 2020;18:191. [PMID: 33287835 PMCID: PMC7720557 DOI: 10.1186/s12915-020-00901-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 10/16/2020] [Indexed: 12/17/2022] Open

Abstract

BACKGROUND

The vast majority of all life that ever existed on earth is now extinct and several aspects of their evolutionary history can only be assessed by using morphological data from the fossil record. Sphenodontian reptiles are a classic example, having an evolutionary history of at least 230 million years, but currently represented by a single living species (Sphenodon punctatus). Hence, it is imperative to improve the development and implementation of probabilistic models to estimate evolutionary trees from morphological data (e.g., morphological clocks), which has direct benefits to understanding relationships and evolutionary patterns for both fossil and living species. However, the impact of model choice on morphology-only datasets has been poorly explored.

RESULTS

Here, we investigate the impact of a wide array of model choices on the inference of evolutionary trees and macroevolutionary parameters (divergence times and evolutionary rates) using a new data matrix on sphenodontian reptiles. Specifically, we tested different clock models, clock partitioning, taxon sampling strategies, sampling for ancestors, and variations on the fossilized birth-death (FBD) tree model parameters through time. We find a strong impact on divergence times and background evolutionary rates when applying widely utilized approaches, such as allowing for ancestors in the tree and the inappropriate assumption of diversification parameters being constant through time. We compare those results with previous studies on the impact of model choice to molecular data analysis and provide suggestions for improving the implementation of morphological clocks. Optimal model combinations find the radiation of most major lineages of sphenodontians to be in the Triassic and a gradual but continuous drop in morphological rates of evolution across distinct regions of the phenotype throughout the history of the group.

CONCLUSIONS

We provide a new hypothesis of sphenodontian classification, along with detailed macroevolutionary patterns in the evolutionary history of the group. Importantly, we provide suggestions to avoid overestimated divergence times and biased parameter estimates using morphological clocks. Partitioning relaxed clocks offers methodological limitations, but those can be at least partially circumvented to reveal a detailed assessment of rates of evolution across the phenotype and tests of evolutionary mosaicism.

Collapse

Susko E, Roger AJ. On the Use of Information Criteria for Model Selection in Phylogenetics. Mol Biol Evol 2020;37:549-562. [PMID: 31688943 DOI: 10.1093/molbev/msz228] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Wang HC, Susko E, Roger AJ. The Relative Importance of Modeling Site Pattern Heterogeneity Versus Partition-Wise Heterotachy in Phylogenomic Inference. Syst Biol 2020;68:1003-1019. [PMID: 31140564 DOI: 10.1093/sysbio/syz021] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Revised: 02/04/2019] [Accepted: 04/09/2019] [Indexed: 12/18/2022] Open

Abstract

Large taxa-rich genome-scale data sets are often necessary for resolving ancient phylogenetic relationships. But accurate phylogenetic inference requires that they are analyzed with realistic models that account for the heterogeneity in substitution patterns amongst the sites, genes and lineages. Two kinds of adjustments are frequently used: models that account for heterogeneity in amino acid frequencies at sites in proteins, and partitioned models that accommodate the heterogeneity in rates (branch lengths) among different proteins in different lineages (protein-wise heterotachy). Although partitioned and site-heterogeneous models are both widely used in isolation, their relative importance to the inference of correct phylogenies has not been carefully evaluated. We conducted several empirical analyses and a large set of simulations to compare the relative performances of partitioned models, site-heterogeneous models, and combined partitioned site heterogeneous models. In general, site-homogeneous models (partitioned or not) performed worse than site heterogeneous, except in simulations with extreme protein-wise heterotachy. Furthermore, simulations using empirically-derived realistic parameter settings showed a marked long-branch attraction (LBA) problem for analyses employing protein-wise partitioning even when the generating model included partitioning. This LBA problem results from a small sample bias compounded over many single protein alignments. In some cases, this problem was ameliorated by clustering similarly-evolving proteins together into larger partitions using the PartitionFinder method. Similar results were obtained under simulations with larger numbers of taxa or heterogeneity in simulating topologies over genes. For an empirical Microsporidia test data set, all but one tested site-heterogeneous models (with or without partitioning) obtain the correct Microsporidia+Fungi grouping, whereas site-homogenous models (with or without partitioning) did not. The single exception was the fully partitioned site-heterogeneous analysis that succumbed to the compounded small sample LBA bias. In general unless protein-wise heterotachy effects are extreme, it is more important to model site-heterogeneity than protein-wise heterotachy in phylogenomic analyses. Complete protein-wise partitioning should be avoided as it can lead to a serious LBA bias. In cases of extreme protein-wise heterotachy, approaches that cluster similarly-evolving proteins together and coupled with site-heterogeneous models work well for phylogenetic estimation.

Collapse

Smith SA, Walker-Hale N, Walker JF, Brown JW. Phylogenetic Conflicts, Combinability, and Deep Phylogenomics in Plants. Syst Biol 2019;69:579-592. [DOI: 10.1093/sysbio/syz078] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 10/16/2019] [Accepted: 11/18/2019] [Indexed: 11/13/2022] Open