Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Del Amparo R, Arenas M. HIV Protease and Integrase Empirical Substitution Models of Evolution: Protein-Specific Models Outperform Generalist Models. Genes (Basel) 2021;13:61. [PMID: 35052404 PMCID: PMC8774313 DOI: 10.3390/genes13010061] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 12/22/2021] [Accepted: 12/22/2021] [Indexed: 12/24/2022] Open

For:	Del Amparo R, Arenas M. HIV Protease and Integrase Empirical Substitution Models of Evolution: Protein-Specific Models Outperform Generalist Models. Genes (Basel) 2021;13:61. [PMID: 35052404 PMCID: PMC8774313 DOI: 10.3390/genes13010061] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 12/22/2021] [Accepted: 12/22/2021] [Indexed: 12/24/2022] Open

Number

Cited by Other Article(s)

Ferreiro D, Khalil R, Sousa SF, Arenas M. Substitution Models of Protein Evolution with Selection on Enzymatic Activity. Mol Biol Evol 2024;41:msae026. [PMID: 38314876 PMCID: PMC10873502 DOI: 10.1093/molbev/msae026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 01/25/2024] [Accepted: 01/31/2024] [Indexed: 02/07/2024] Open

Brazão JM, Foster PG, Cox CJ. Data-specific substitution models improve protein-based phylogenetics. PeerJ 2023;11:e15716. [PMID: 37576497 PMCID: PMC10416777 DOI: 10.7717/peerj.15716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 06/16/2023] [Indexed: 08/15/2023] Open

Abstract

Calculating amino-acid substitution models that are specific for individual protein data sets is often difficult due to the computational burden of estimating large numbers of rate parameters. In this study, we tested the computational efficiency and accuracy of five methods used to estimate substitution models, namely Codeml, FastMG, IQ-TREE, P4 (maximum likelihood), and P4 (Bayesian inference). Data-specific substitution models were estimated from simulated alignments (with different lengths) that were generated from a known simulation model and simulation tree. Each of the resulting data-specific substitution models was used to calculate the maximum likelihood score of the simulation tree and simulated data that was used to calculate the model, and compared with the maximum likelihood scores of the known simulation model and simulation tree on the same simulated data. Additionally, the commonly-used empirical models, cpREV and WAG, were assessed similarly. Data-specific models performed better than the empirical models, which under-fitted the simulated alignments, had the highest difference to the simulation model maximum-likelihood score, clustered further from the simulation model in principal component analysis ordination, and inferred less accurate trees. Data-specific models and the simulation model shared statistically indistinguishable maximum-likelihood scores, indicating that the five methods were reasonably accurate at estimating substitution models by this measure. Nevertheless, tree statistics showed differences between optimal maximum likelihood trees. Unlike other model estimating methods, trees inferred using data-specific models generated with IQ-TREE and P4 (maximum likelihood) were not significantly different from the trees derived from the simulation model in each analysis, indicating that these two methods alone were the most accurate at estimating data-specific models. To show the benefits of using data-specific protein models several published data sets were reanalysed using IQ-TREE-estimated models. These newly estimated models were a better fit to the data than the empirical models that were used by the original authors, often inferred longer trees, and resulted in different tree topologies in more than half of the re-analysed data sets. The results of this study show that software availability and high computation burden are not limitations to generating better-fitting data-specific amino-acid substitution models for phylogenetic analyses.

Collapse

Del Amparo R, Arenas M. Influence of substitution model selection on protein phylogenetic tree reconstruction. Gene 2023;865:147336. [PMID: 36871672 DOI: 10.1016/j.gene.2023.147336] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 02/22/2023] [Accepted: 02/28/2023] [Indexed: 03/06/2023]

The Structure of Evolutionary Model Space for Proteins across the Tree of Life. BIOLOGY 2023;12:biology12020282. [PMID: 36829559 PMCID: PMC9952988 DOI: 10.3390/biology12020282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 02/04/2023] [Accepted: 02/08/2023] [Indexed: 02/12/2023]

Abstract

The factors that determine the relative rates of amino acid substitution during protein evolution are complex and known to vary among taxa. We estimated relative exchangeabilities for pairs of amino acids from clades spread across the tree of life and assessed the historical signal in the distances among these clade-specific models. We separately trained these models on collections of arbitrarily selected protein alignments and on ribosomal protein alignments. In both cases, we found a clear separation between the models trained using multiple sequence alignments from bacterial clades and the models trained on archaeal and eukaryotic data. We assessed the predictive power of our novel clade-specific models of sequence evolution by asking whether fit to the models could be used to identify the source of multiple sequence alignments. Model fit was generally able to correctly classify protein alignments at the level of domain (bacterial versus archaeal), but the accuracy of classification at finer scales was much lower. The only exceptions to this were the relatively high classification accuracy for two archaeal lineages: Halobacteriaceae and Thermoprotei. Genomic GC content had a modest impact on relative exchangeabilities despite having a large impact on amino acid frequencies. Relative exchangeabilities involving aromatic residues exhibited the largest differences among models. There were a small number of exchangeabilities that exhibited large differences in comparisons among major clades and between generalized models and ribosomal protein models. Taken as a whole, these results reveal that a small number of relative exchangeabilities are responsible for much of the structure of the "model space" for protein sequence evolution. The clade-specific models we generated may be useful tools for protein phylogenetics, and the structure of evolutionary model space that they revealed has implications for phylogenomic inference across the tree of life.

Collapse

Del Amparo R, Arenas M. Consequences of Substitution Model Selection on Protein Ancestral Sequence Reconstruction. Mol Biol Evol 2022;39:6628884. [PMID: 35789388 PMCID: PMC9254009 DOI: 10.1093/molbev/msac144] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open