Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Holmes S. Statistics for phylogenetic trees. Theor Popul Biol 2003;63:17-32. [PMID: 12464492 DOI: 10.1016/s0040-5809(02)00005-9] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Number

Cited by Other Article(s)

Berling L, Collienne L, Gavryushkin A. Estimating the mean in the space of ranked phylogenetic trees. Bioinformatics 2024;40:btae514. [PMID: 39177090 PMCID: PMC11364146 DOI: 10.1093/bioinformatics/btae514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 05/16/2024] [Accepted: 08/21/2024] [Indexed: 08/24/2024] Open

Abstract

MOTIVATION

Reconstructing evolutionary histories of biological entities, such as genes, cells, organisms, populations, and species, from phenotypic and molecular sequencing data is central to many biological, palaeontological, and biomedical disciplines. Typically, due to uncertainties and incompleteness in data, the true evolutionary history (phylogeny) is challenging to estimate. Statistical modelling approaches address this problem by introducing and studying probability distributions over all possible evolutionary histories, but can also introduce uncertainties due to misspecification. In practice, computational methods are deployed to learn those distributions typically by sampling them. This approach, however, is fundamentally challenging as it requires designing and implementing various statistical methods over a space of phylogenetic trees (or treespace). Although the problem of developing statistics over a treespace has received substantial attention in the literature and numerous breakthroughs have been made, it remains largely unsolved. The challenge of solving this problem is 2-fold: a treespace has nontrivial often counter-intuitive geometry implying that much of classical Euclidean statistics does not immediately apply; many parametrizations of treespace with promising statistical properties are computationally hard, so they cannot be used in data analyses. As a result, there is no single conventional method for estimating even the most fundamental statistics over any treespace, such as mean and variance, and various heuristics are used in practice. Despite the existence of numerous tree summary methods to approximate means of probability distributions over a treespace based on its geometry, and the theoretical promise of this idea, none of the attempts resulted in a practical method for summarizing tree samples.

RESULTS

In this paper, we present a tree summary method along with useful properties of our chosen treespace while focusing on its impact on phylogenetic analyses of real datasets. We perform an extensive benchmark study and demonstrate that our method outperforms currently most popular methods with respect to a number of important 'quality' statistics. Further, we apply our method to three empirical datasets ranging from cancer evolution to linguistics and find novel insights into corresponding evolutionary problems in all of them. We hence conclude that this treespace is a promising candidate to serve as a foundation for developing statistics over phylogenetic trees analytically, as well as new computational tools for evolutionary data analyses.

AVAILABILITY AND IMPLEMENTATION

An implementation is available at https://github.com/bioDS/Centroid-Code.

Collapse

Liu J, Lindstrom AJ, Gong X. Towards the plastome evolution and phylogeny of Cycas L. (Cycadaceae): molecular-morphology discordance and gene tree space analysis. BMC PLANT BIOLOGY 2022;22:116. [PMID: 35291941 PMCID: PMC8922756 DOI: 10.1186/s12870-022-03491-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 02/22/2022] [Indexed: 05/20/2023]

Abstract

BACKGROUND

Plastid genomes (plastomes) present great potential in resolving multiscale phylogenetic relationship but few studies have focused on the influence of genetic characteristics of plastid genes, such as genetic variation and phylogenetic discordance, in resolving the phylogeny within a lineage. Here we examine plastome characteristics of Cycas L., the most diverse genus among extant cycads, and investigate the deep phylogenetic relationships within Cycas by sampling 47 plastomes representing all major clades from six sections.

RESULTS

All Cycas plastomes shared consistent gene content and structure with only one gene loss detected in Philippine species C. wadei. Three novel plastome regions (psbA-matK, trnN-ndhF, chlL-trnN) were identified as containing the highest nucleotide variability. Molecular evolutionary analysis showed most of the plastid protein-coding genes have been under purifying selection except ndhB. Phylogenomic analyses that alternatively included concatenated and coalescent methods, both identified four clades but with conflicting topologies at shallow nodes. Specifically, we found three species-rich Cycas sections, namely Stangerioides, Indosinenses and Cycas, were not or only weakly supported as monophyly based on plastomic phylogeny. Tree space analyses based on different tree-inference methods both revealed three gene clusters, of which the cluster with moderate genetic properties showed the best congruence with the favored phylogeny.

CONCLUSIONS

Our exploration in plastomic data for Cycas supports the idea that plastid protein-coding genes may exhibit discordance in phylogenetic signals. The incongruence between molecular phylogeny and morphological classification reported here may largely be attributed to the uniparental attribute of plastid, which cannot offer sufficient information to resolve the phylogeny. Contrasting to a previous consensus that genes with longer sequences and a higher proportion of variances are superior for phylogeny reconstruction, our result implies that the most effective phylogenetic signals could come from loci that own moderate variation, GC content, sequence length, and underwent modest selection.

Collapse

Zhu Q, Mirarab S. Assembling a Reference Phylogenomic Tree of Bacteria and Archaea by Summarizing Many Gene Phylogenies. Methods Mol Biol 2022;2569:137-165. [PMID: 36083447 DOI: 10.1007/978-1-0716-2691-7_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]

Kim J, Rosenberg NA, Palacios JA. Distance metrics for ranked evolutionary trees. Proc Natl Acad Sci U S A 2020;117:28876-28886. [PMID: 33139566 PMCID: PMC7682335 DOI: 10.1073/pnas.1922851117] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

Willis A. Confidence Sets for Phylogenetic Trees. J Am Stat Assoc 2018. [DOI: 10.1080/01621459.2017.1395342] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Jombart T, Kendall M, Almagro‐Garcia J, Colijn C. treespace: Statistical exploration of landscapes of phylogenetic trees. Mol Ecol Resour 2017;17:1385-1392. [PMID: 28374552 PMCID: PMC5724650 DOI: 10.1111/1755-0998.12676] [Citation(s) in RCA: 92] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Revised: 03/17/2017] [Accepted: 03/21/2017] [Indexed: 01/01/2023]

Barden D, Le H, Owen M. Limiting behaviour of Fréchet means in the space of phylogenetic trees. ANN I STAT MATH 2016. [DOI: 10.1007/s10463-016-0582-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Amendola A, Pisciotta M, Aleo L, Ferraioli V, Angeletti C, Capobianchi MR. Evaluation of the Aptima(®) HIV-1 Quant Dx assay for HIV-1 RNA viral load detection and quantitation in plasma of HIV-1-infected individuals: A comparison with Abbott RealTime HIV-1 assay. J Med Virol 2016;88:1535-44. [PMID: 26864171 PMCID: PMC6585778 DOI: 10.1002/jmv.24493] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/28/2016] [Indexed: 11/16/2022]

Abstract

The Hologic Aptima^® HIV‐1 Quant Dx assay (Aptima HIV) is a real‐time transcription‐mediated amplification method CE‐approved for use in diagnosis and monitoring of HIV‐1 infection. The analytical performance of this new assay was compared to the FDA‐approved Abbott RealTime HIV‐1 (RealTime). The evaluation was performed using 220 clinical plasma samples, the WHO 3rd HIV‐1 International Standard, and the QCMD HIV‐1 RNA EQA. Concordance on qualitative results, correlation between quantitative results, accuracy, and reproducibility of viral load data were analyzed. The ability to measure HIV‐1 subtypes was assessed on the second WHO International Reference Preparation Panel for HIV‐1 Subtypes. With clinical samples, inter‐assay agreement for qualitative results was high (91.8%) with Cohen's kappa statistic equal to 0.836. For samples with quantitative results in both assays (n = 93), Lin's concordance correlation coefficient was 0.980 (P < 0.0001) and mean differences of measurement, conducted according to Bland–Altman method, was low (0.115 log₁₀ copies/ml). The Aptima HIV quantified the WHO 3rd HIV‐1 International Standard diluted from 2000 to 31 cp/ml (5,700–88 IU/ml) at expected values with excellent linearity (R² > 0.970) and showed higher sensitivity compared to RealTime being able to detect HIV‐1 RNA in 10 out of 10 replicates containing down to 7 cp/ml (20 IU/ml). Reproducibility was very high, even at low HIV‐1 RNA values. The Aptima HIV was able to detect and accurately quantify all the main HIV‐1 subtypes in both reference panels and clinical samples. Besides excellent performance, Aptima HIV shows full automation, ease of use, and improved workflow compared to RealTime. J. Med. Virol. 88:1535–1544, 2016. © 2016 Wiley Periodicals, Inc.

Collapse

Huckemann S, Mattingly J, Miller E, Nolen J. Sticky central limit theorems at isolated hyperbolic planar singularities. ELECTRON J PROBAB 2015. [DOI: 10.1214/ejp.v20-3887] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Lu X, Marron JS, Haaland P. Object-Oriented Data Analysis of Cell Images. J Am Stat Assoc 2014. [DOI: 10.1080/01621459.2014.884503] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Holland BR. The Rise of Statistical Phylogenetics. AUST NZ J STAT 2013. [DOI: 10.1111/anzs.12035] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Skutkova H, Vitek M, Babula P, Kizek R, Provaznik I. Classification of genomic signals using dynamic time warping. BMC Bioinformatics 2013;14 Suppl 10:S1. [PMID: 24267034 PMCID: PMC3750471 DOI: 10.1186/1471-2105-14-s10-s1] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Classification methods of DNA most commonly use comparison of the differences in DNA symbolic records, which requires the global multiple sequence alignment. This solution is often inappropriate, causing a number of imprecisions and requires additional user intervention for exact alignment of the similar segments. The similar segments in DNA represented as a signal are characterized by a similar shape of the curve. The DNA alignment in genomic signals may adjust whole sections not only individual symbols. The dynamic time warping (DTW) is suitable for this purpose and can replace the multiple alignment of symbolic sequences in applications, such as phylogenetic analysis.

METHODS

The proposed method is composed of three main parts. The first part represent conversion of symbolic representation of DNA sequences in the form of a string of A,C,G,T symbols to signal representation in the form of cumulated phase of complex components defined for each symbol. Next part represents signals size adjustment realized by standard signal preprocessing methods: median filtration, detrendization and resampling. The final part necessary for genomic signals comparison is position and length alignment of genomic signals by dynamic time warping (DTW).

RESULTS

The application of the DTW on set of genomic signals was evaluated in dendrogram construction using cluster analysis. The resulting tree was compared with a classical phylogenetic tree reconstructed using multiple alignment. The classification of genomic signals using the DTW is evolutionary closer to phylogeny of organisms. This method is more resistant to errors in the sequences and less dependent on the number of input sequences.

CONCLUSIONS

Classification of genomic signals using dynamic time warping is an adequate variant to phylogenetic analysis using the symbolic DNA sequences alignment; in addition, it is robust, quick and more precise technique.

Collapse

Fang DA, Wang Y, Wang J, Liu LH, Wang Q. Characterization of Cherax quadricarinatus prohibitin and its potential role in spermatogenesis. Gene 2013;519:318-25. [PMID: 23485620 DOI: 10.1016/j.gene.2013.02.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2012] [Revised: 02/01/2013] [Accepted: 02/06/2013] [Indexed: 01/10/2023]

A simple k-word interval method for phylogenetic analysis of DNA sequences. J Theor Biol 2013;317:192-9. [DOI: 10.1016/j.jtbi.2012.10.010] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2012] [Revised: 10/02/2012] [Accepted: 10/06/2012] [Indexed: 11/18/2022]

Chakerian J, Holmes S. Computational Tools for Evaluating Phylogenetic and Hierarchical Clustering Trees. J Comput Graph Stat 2012;21:581-599. [PMID: 32982128 PMCID: PMC7518125 DOI: 10.1080/10618600.2012.640901] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Owen M, Provan JS. A fast algorithm for computing geodesic distances in tree space. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011;8:2-13. [PMID: 21071792 DOI: 10.1109/tcbb.2010.3] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]

Altaba CR. Universal artifacts affect the branching of phylogenetic trees, not universal scaling laws. PLoS One 2009;4:e4611. [PMID: 19242549 PMCID: PMC2644784 DOI: 10.1371/journal.pone.0004611] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2008] [Accepted: 01/21/2009] [Indexed: 11/29/2022] Open

Abstract

BACKGROUND

The superficial resemblance of phylogenetic trees to other branching structures allows searching for macroevolutionary patterns. However, such trees are just statistical inferences of particular historical events. Recent meta-analyses report finding regularities in the branching pattern of phylogenetic trees. But is this supported by evidence, or are such regularities just methodological artifacts? If so, is there any signal in a phylogeny?

METHODOLOGY

In order to evaluate the impact of polytomies and imbalance on tree shape, the distribution of all binary and polytomic trees of up to 7 taxa was assessed in tree-shape space. The relationship between the proportion of outgroups and the amount of imbalance introduced with them was assessed applying four different tree-building methods to 100 combinations from a set of 10 ingroup and 9 outgroup species, and performing covariance analyses. The relevance of this analysis was explored taking 61 published phylogenies, based on nucleic acid sequences and involving various taxa, taxonomic levels, and tree-building methods.

PRINCIPAL FINDINGS

All methods of phylogenetic inference are quite sensitive to the artifacts introduced by outgroups. However, published phylogenies appear to be subject to a rather effective, albeit rather intuitive control against such artifacts. The data and methods used to build phylogenetic trees are varied, so any meta-analysis is subject to pitfalls due to their uneven intrinsic merits, which translate into artifacts in tree shape. The binary branching pattern is an imposition of methods, and seldom reflects true relationships in intraspecific analyses, yielding artifactual polytomies in short trees. Above the species level, the departure of real trees from simplistic random models is caused at least by two natural factors--uneven speciation and extinction rates; and artifacts such as choice of taxa included in the analysis, and imbalance introduced by outgroups and basal paraphyletic taxa. This artifactual imbalance accounts for tree shape convergence of large trees.

SIGNIFICANCE

There is no evidence for any universal scaling in the tree of life. Instead, there is a need for improved methods of tree analysis that can be used to discriminate the noise due to outgroups from the phylogenetic signal within the taxon of interest, and to evaluate realistic models of evolution, correcting the retrospective perspective and explicitly recognizing extinction as a driving force. Artifacts are pervasive, and can only be overcome through understanding the structure and biological meaning of phylogenetic trees. Catalan Abstract in Translation S1.

Collapse

Steel M, Rodrigo A. Maximum likelihood supertrees. Syst Biol 2008;57:243-50. [PMID: 18398769 DOI: 10.1080/10635150802033014] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open

Wróbel B. Statistical measures of uncertainty for branches in phylogenetic trees inferred from molecular sequences by using model-based methods. J Appl Genet 2008;49:49-67. [DOI: 10.1007/bf03195249] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Geuten K, Smets E, Schols P, Yuan YM, Janssens S, Küpfer P, Pyck N. Conflicting phylogenies of balsaminoid families and the polytomy in Ericales: combining data in a Bayesian framework. Mol Phylogenet Evol 2004;31:711-29. [PMID: 15062805 DOI: 10.1016/j.ympev.2003.09.014] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2003] [Revised: 09/17/2003] [Indexed: 11/29/2022]

Cotton JA, Page RDM. Gene tree parsimony vs. uninode coding for phylogenetic reconstruction. Mol Phylogenet Evol 2003;29:298-308. [PMID: 13678685 DOI: 10.1016/s1055-7903(03)00109-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]