Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Herrada A, Eguíluz VM, Hernández-García E, Duarte CM. Scaling properties of protein family phylogenies. BMC Evol Biol 2011;11:155. [PMID: 21645345 PMCID: PMC3277297 DOI: 10.1186/1471-2148-11-155] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2011] [Accepted: 06/06/2011] [Indexed: 11/25/2022] Open

For:	Herrada A, Eguíluz VM, Hernández-García E, Duarte CM. Scaling properties of protein family phylogenies. BMC Evol Biol 2011;11:155. [PMID: 21645345 PMCID: PMC3277297 DOI: 10.1186/1471-2148-11-155] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2011] [Accepted: 06/06/2011] [Indexed: 11/25/2022] Open

Number

Cited by Other Article(s)

Janzen T, Etienne RS. Phylogenetic tree statistics: A systematic overview using the new R package 'treestats'. Mol Phylogenet Evol 2024;200:108168. [PMID: 39117295 DOI: 10.1016/j.ympev.2024.108168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 07/19/2024] [Accepted: 08/04/2024] [Indexed: 08/10/2024]

Abstract

Phylogenetic trees are believed to contain a wealth of information on diversification processes. However, comparing phylogenetic trees is not straightforward due to their high dimensionality. Researchers have therefore defined a wide range of low-dimensional summary statistics. Currently, it remains unexplored to what extent these summary statistics cover the same underlying information and what summary statistics best explain observed variation across phylogenies. Furthermore, a large subset of available summary statistics focusses on measuring the topological features of a phylogenetic tree, but are often only explored at the extreme edge cases of the fully balanced or imbalanced tree and not for trees of intermediate balance. Here, we introduce a new R package called 'treestats', that provides speed optimized code to compute 70 summary statistics. We study correlations between summary statistics on empirical trees and on trees simulated using several diversification models. Furthermore, we introduce an algorithm to create intermediately balanced trees in a well-defined manner, in order to explore variation in summary statistics across a balance gradient. We find that almost all summary statistics are correlated with tree size, and find that it is difficult, if not impossible, to correct for tree size, unless the tree generating model is known. Furthermore, we find that across empirical and simulated trees, at least three large clusters of correlated summary statistics can be found, where statistics group together based on information used (topology or branching times). However, the finer grained correlation structure appears to depend strongly on either the taxonomic group studied (in empirical studies) or the tree generating model (in simulation studies). Amongst statistics describing the (im)balance of a tree, we find that almost all statistics vary non-linearly, and sometimes even non-monotonically, with our generated balance gradient. This indicates that balance is perhaps a more complex property of a tree than previously thought. Furthermore, using our new imbalancing algorithm, we devise a numerical test to identify balance statistics, and identify several statistics as balance statistics that were not previously considered as such. Lastly, our results lead to several recommendations on which statistics to select when analyzing and comparing phylogenetic trees.

Collapse

Tenorio-Salgado S, Villalpando-Aguilar JL, Hernandez-Guerrero R, Poot-Hernández AC, Perez-Rueda E. Exploring the enzymatic repertoires of Bacteria and Archaea and their associations with metabolic maps. Braz J Microbiol 2024:10.1007/s42770-024-01462-3. [PMID: 39052173 DOI: 10.1007/s42770-024-01462-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 07/11/2024] [Indexed: 07/27/2024] Open

Khurana MP, Scheidwasser-Clow N, Penn MJ, Bhatt S, Duchêne DA. The Limits of the Constant-rate Birth-Death Prior for Phylogenetic Tree Topology Inference. Syst Biol 2024;73:235-246. [PMID: 38153910 PMCID: PMC11129600 DOI: 10.1093/sysbio/syad075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 12/20/2023] [Accepted: 12/27/2023] [Indexed: 12/30/2023] Open

Abstract

Birth-death models are stochastic processes describing speciation and extinction through time and across taxa and are widely used in biology for inference of evolutionary timescales. Previous research has highlighted how the expected trees under the constant-rate birth-death (crBD) model tend to differ from empirical trees, for example, with respect to the amount of phylogenetic imbalance. However, our understanding of how trees differ between the crBD model and the signal in empirical data remains incomplete. In this Point of View, we aim to expose the degree to which the crBD model differs from empirically inferred phylogenies and test the limits of the model in practice. Using a wide range of topology indices to compare crBD expectations against a comprehensive dataset of 1189 empirically estimated trees, we confirm that crBD model trees frequently differ topologically compared with empirical trees. To place this in the context of standard practice in the field, we conducted a meta-analysis for a subset of the empirical studies. When comparing studies that used Bayesian methods and crBD priors with those that used other non-crBD priors and non-Bayesian methods (i.e., maximum likelihood methods), we do not find any significant differences in tree topology inferences. To scrutinize this finding for the case of highly imbalanced trees, we selected the 100 trees with the greatest imbalance from our dataset, simulated sequence data for these tree topologies under various evolutionary rates, and re-inferred the trees under maximum likelihood and using the crBD model in a Bayesian setting. We find that when the substitution rate is low, the crBD prior results in overly balanced trees, but the tendency is negligible when substitution rates are sufficiently high. Overall, our findings demonstrate the general robustness of crBD priors across a broad range of phylogenetic inference scenarios but also highlight that empirically observed phylogenetic imbalance is highly improbable under the crBD model, leading to systematic bias in data sets with limited information content.

Collapse

Duarte CM, Ketcheson DI, Eguíluz VM, Agustí S, Fernández-Gracia J, Jamil T, Laiolo E, Gojobori T, Alam I. Rapid evolution of SARS-CoV-2 challenges human defenses. Sci Rep 2022;12:6457. [PMID: 35440671 PMCID: PMC9017738 DOI: 10.1038/s41598-022-10097-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Accepted: 03/23/2022] [Indexed: 12/25/2022] Open

Scale-invariant topology and bursty branching of evolutionary trees emerge from niche construction. Proc Natl Acad Sci U S A 2020;117:7879-7887. [PMID: 32209672 DOI: 10.1073/pnas.1915088117] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open

Chakraborty C, Sharma AR, Sharma G, Bhattacharya M, Lee SS. Insight into Evolution and Conservation Patterns of B1-Subfamily Members of GPCR. Int J Pept Res Ther 2020;26:2505-2517. [PMID: 32421105 PMCID: PMC7223794 DOI: 10.1007/s10989-020-10043-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/30/2020] [Indexed: 11/25/2022]

Keller-Schmidt S, Tuğrul M, Eguíluz VM, Hernández-García E, Klemm K. Anomalous scaling in an age-dependent branching model. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015;91:022803. [PMID: 25768548 DOI: 10.1103/physreve.91.022803] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2013] [Indexed: 06/04/2023]

Li W, Freudenberg J, Miramontes P. Diminishing return for increased Mappability with longer sequencing reads: implications of the k-mer distributions in the human genome. BMC Bioinformatics 2014;15:2. [PMID: 24386976 PMCID: PMC3927684 DOI: 10.1186/1471-2105-15-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2013] [Accepted: 12/17/2013] [Indexed: 11/10/2022] Open

Abstract

Background

The amount of non-unique sequence (non-singletons) in a genome directly affects the difficulty of read alignment to a reference assembly for high throughput-sequencing data. Although a longer read is more likely to be uniquely mapped to the reference genome, a quantitative analysis of the influence of read lengths on mappability has been lacking. To address this question, we evaluate the k-mer distribution of the human reference genome. The k-mer frequency is determined for k ranging from 20 bp to 1000 bp.

Results

We observe that the proportion of non-singletons k-mers decreases slowly with increasing k, and can be fitted by piecewise power-law functions with different exponents at different ranges of k. A slower decay at greater values for k indicates more limited gains in mappability for read lengths between 200 bp and 1000 bp. The frequency distributions of k-mers exhibit long tails with a power-law-like trend, and rank frequency plots exhibit a concave Zipf’s curve. The most frequent 1000-mers comprise 172 regions, which include four large stretches on chromosomes 1 and X, containing genes of biomedical relevance. Comparison with other databases indicates that the 172 regions can be broadly classified into two types: those containing LINE transposable elements and those containing segmental duplications.

Conclusion

Read mappability as measured by the proportion of singletons increases steadily up to the length scale around 200 bp. When read length increases above 200 bp, smaller gains in mappability are expected. Moreover, the proportion of non-singletons decreases with read lengths much slower than linear. Even a read length of 1000 bp would not allow the unique alignment of reads for many coding regions of human genes. A mix of techniques will be needed for efficiently producing high-quality data that cover the complete human genome.

Collapse

Pompei S, Loreto V, Tria F. Phylogenetic properties of RNA viruses. PLoS One 2012;7:e44849. [PMID: 23028645 PMCID: PMC3447819 DOI: 10.1371/journal.pone.0044849] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 08/07/2012] [Indexed: 11/19/2022] Open

Abstract

A new word, phylodynamics, was coined to emphasize the interconnection between phylogenetic properties, as observed for instance in a phylogenetic tree, and the epidemic dynamics of viruses, where selection, mediated by the host immune response, and transmission play a crucial role. The challenges faced when investigating the evolution of RNA viruses call for a virtuous loop of data collection, data analysis and modeling. This already resulted both in the collection of massive sequences databases and in the formulation of hypotheses on the main mechanisms driving qualitative differences observed in the (reconstructed) evolutionary patterns of different RNA viruses. Qualitatively, it has been observed that selection driven by the host immune response induces an uneven survival ability among co-existing strains. As a consequence, the imbalance level of the phylogenetic tree is manifestly more pronounced if compared to the case when the interaction with the host immune system does not play a central role in the evolutive dynamics. While many imbalance metrics have been introduced, reliable methods to discriminate in a quantitative way different level of imbalance are still lacking. In our work, we reconstruct and analyze the phylogenetic trees of six RNA viruses, with a special emphasis on the human Influenza A virus, due to its relevance for vaccine preparation as well as for the theoretical challenges it poses due to its peculiar evolutionary dynamics. We focus in particular on topological properties. We point out the limitation featured by standard imbalance metrics, and we introduce a new methodology with which we assign the correct imbalance level of the phylogenetic trees, in agreement with the phylodynamics of the viruses. Our thorough quantitative analysis allows for a deeper understanding of the evolutionary dynamics of the considered RNA viruses, which is crucial in order to provide a valuable framework for a quantitative assessment of theoretical predictions.

Collapse

Caetano-Anollés G, Nasir A. Benefits of using molecular structure and abundance in phylogenomic analysis. Front Genet 2012;3:172. [PMID: 22973296 PMCID: PMC3434437 DOI: 10.3389/fgene.2012.00172] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Accepted: 08/18/2012] [Indexed: 12/25/2022] Open