1
|
Berling L, Collienne L, Gavryushkin A. Estimating the mean in the space of ranked phylogenetic trees. Bioinformatics 2024; 40:btae514. [PMID: 39177090 PMCID: PMC11364146 DOI: 10.1093/bioinformatics/btae514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 05/16/2024] [Accepted: 08/21/2024] [Indexed: 08/24/2024] Open
Abstract
MOTIVATION Reconstructing evolutionary histories of biological entities, such as genes, cells, organisms, populations, and species, from phenotypic and molecular sequencing data is central to many biological, palaeontological, and biomedical disciplines. Typically, due to uncertainties and incompleteness in data, the true evolutionary history (phylogeny) is challenging to estimate. Statistical modelling approaches address this problem by introducing and studying probability distributions over all possible evolutionary histories, but can also introduce uncertainties due to misspecification. In practice, computational methods are deployed to learn those distributions typically by sampling them. This approach, however, is fundamentally challenging as it requires designing and implementing various statistical methods over a space of phylogenetic trees (or treespace). Although the problem of developing statistics over a treespace has received substantial attention in the literature and numerous breakthroughs have been made, it remains largely unsolved. The challenge of solving this problem is 2-fold: a treespace has nontrivial often counter-intuitive geometry implying that much of classical Euclidean statistics does not immediately apply; many parametrizations of treespace with promising statistical properties are computationally hard, so they cannot be used in data analyses. As a result, there is no single conventional method for estimating even the most fundamental statistics over any treespace, such as mean and variance, and various heuristics are used in practice. Despite the existence of numerous tree summary methods to approximate means of probability distributions over a treespace based on its geometry, and the theoretical promise of this idea, none of the attempts resulted in a practical method for summarizing tree samples. RESULTS In this paper, we present a tree summary method along with useful properties of our chosen treespace while focusing on its impact on phylogenetic analyses of real datasets. We perform an extensive benchmark study and demonstrate that our method outperforms currently most popular methods with respect to a number of important 'quality' statistics. Further, we apply our method to three empirical datasets ranging from cancer evolution to linguistics and find novel insights into corresponding evolutionary problems in all of them. We hence conclude that this treespace is a promising candidate to serve as a foundation for developing statistics over phylogenetic trees analytically, as well as new computational tools for evolutionary data analyses. AVAILABILITY AND IMPLEMENTATION An implementation is available at https://github.com/bioDS/Centroid-Code.
Collapse
Affiliation(s)
- Lars Berling
- Biological Data Science Lab, School of Mathematics and Statistics, University of Canterbury, Christchurch 8041, New Zealand
| | - Lena Collienne
- Biological Data Science Lab, School of Mathematics and Statistics, University of Canterbury, Christchurch 8041, New Zealand
| | - Alex Gavryushkin
- Biological Data Science Lab, School of Mathematics and Statistics, University of Canterbury, Christchurch 8041, New Zealand
| |
Collapse
|
2
|
Liu J, Lindstrom AJ, Gong X. Towards the plastome evolution and phylogeny of Cycas L. (Cycadaceae): molecular-morphology discordance and gene tree space analysis. BMC PLANT BIOLOGY 2022; 22:116. [PMID: 35291941 PMCID: PMC8922756 DOI: 10.1186/s12870-022-03491-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 02/22/2022] [Indexed: 05/20/2023]
Abstract
BACKGROUND Plastid genomes (plastomes) present great potential in resolving multiscale phylogenetic relationship but few studies have focused on the influence of genetic characteristics of plastid genes, such as genetic variation and phylogenetic discordance, in resolving the phylogeny within a lineage. Here we examine plastome characteristics of Cycas L., the most diverse genus among extant cycads, and investigate the deep phylogenetic relationships within Cycas by sampling 47 plastomes representing all major clades from six sections. RESULTS All Cycas plastomes shared consistent gene content and structure with only one gene loss detected in Philippine species C. wadei. Three novel plastome regions (psbA-matK, trnN-ndhF, chlL-trnN) were identified as containing the highest nucleotide variability. Molecular evolutionary analysis showed most of the plastid protein-coding genes have been under purifying selection except ndhB. Phylogenomic analyses that alternatively included concatenated and coalescent methods, both identified four clades but with conflicting topologies at shallow nodes. Specifically, we found three species-rich Cycas sections, namely Stangerioides, Indosinenses and Cycas, were not or only weakly supported as monophyly based on plastomic phylogeny. Tree space analyses based on different tree-inference methods both revealed three gene clusters, of which the cluster with moderate genetic properties showed the best congruence with the favored phylogeny. CONCLUSIONS Our exploration in plastomic data for Cycas supports the idea that plastid protein-coding genes may exhibit discordance in phylogenetic signals. The incongruence between molecular phylogeny and morphological classification reported here may largely be attributed to the uniparental attribute of plastid, which cannot offer sufficient information to resolve the phylogeny. Contrasting to a previous consensus that genes with longer sequences and a higher proportion of variances are superior for phylogeny reconstruction, our result implies that the most effective phylogenetic signals could come from loci that own moderate variation, GC content, sequence length, and underwent modest selection.
Collapse
Affiliation(s)
- Jian Liu
- CAS Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, 650201, Kunming, Yunnan, China
- Department of Economic Plants and Biotechnology, Yunnan Key Laboratory for Wild Plant Resources, Kunming Institute of Botany, Chinese Academy of Sciences, 650201, Kunming, China
| | - Anders J Lindstrom
- Global Biodiversity Conservancy, 144/124 Moo3, Soi Bua Thong, 20250, Bangsalae, Sattahip, Chonburi, Thailand.
| | - Xun Gong
- CAS Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, 650201, Kunming, Yunnan, China.
- Department of Economic Plants and Biotechnology, Yunnan Key Laboratory for Wild Plant Resources, Kunming Institute of Botany, Chinese Academy of Sciences, 650201, Kunming, China.
- University of Chinese Academy of Sciences, 100049, Beijing, China.
| |
Collapse
|
3
|
Zhu Q, Mirarab S. Assembling a Reference Phylogenomic Tree of Bacteria and Archaea by Summarizing Many Gene Phylogenies. Methods Mol Biol 2022; 2569:137-165. [PMID: 36083447 DOI: 10.1007/978-1-0716-2691-7_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Phylogenomics is the inference of phylogenetic trees based on multiple marker genes sampled in the genomes of interest. An important challenge in phylogenomics is the potential incongruence among the evolutionary histories of individual genes, which can be widespread in microorganisms due to the prevalence of horizontal gene transfer. This protocol introduces the procedures for building a phylogenetic tree of a large number of microbial genomes using a broad sampling of marker genes that are representative of whole-genome evolution. The protocol highlights the use of a gene tree summary method, which can effectively reconstruct the species tree while accounting for the topological conflicts among individual gene trees. The pipeline described in this protocol is scalable to tens of thousands of genomes while retaining high accuracy. We discussed multiple software tools, libraries, and scripts to enable convenient adoption of the protocol. The protocol is suitable for microbiology and microbiome studies based on public genomes and metagenomic data.
Collapse
Affiliation(s)
- Qiyun Zhu
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA.
- School of Life Sciences, Arizona State University, Tempe, AZ, USA.
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA, USA
| |
Collapse
|
4
|
Abstract
Genealogical tree modeling is essential for estimating evolutionary parameters in population genetics and phylogenetics. Recent mathematical results concerning ranked genealogies without leaf labels unlock opportunities in the analysis of evolutionary trees. In particular, comparisons between ranked genealogies facilitate the study of evolutionary processes of different organisms sampled at multiple time periods. We propose metrics on ranked tree shapes and ranked genealogies for lineages isochronously and heterochronously sampled. Our proposed tree metrics make it possible to conduct statistical analyses of ranked tree shapes and timed ranked tree shapes or ranked genealogies. Such analyses allow us to assess differences in tree distributions, quantify estimation uncertainty, and summarize tree distributions. We show the utility of our metrics via simulations and an application in infectious diseases.
Collapse
Affiliation(s)
- Jaehee Kim
- Department of Biology, Stanford University, Stanford, CA 94305
| | | | - Julia A Palacios
- Department of Statistics, Stanford University, Stanford, CA 94305;
- Department of Biomedical Data Science, Stanford School of Medicine, Stanford, CA 94305
| |
Collapse
|
5
|
Affiliation(s)
- Amy Willis
- Department of Biostatistics, University of Washington, Seattle, WA
| |
Collapse
|
6
|
Jombart T, Kendall M, Almagro‐Garcia J, Colijn C. treespace: Statistical exploration of landscapes of phylogenetic trees. Mol Ecol Resour 2017; 17:1385-1392. [PMID: 28374552 PMCID: PMC5724650 DOI: 10.1111/1755-0998.12676] [Citation(s) in RCA: 92] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Revised: 03/17/2017] [Accepted: 03/21/2017] [Indexed: 01/01/2023]
Abstract
The increasing availability of large genomic data sets as well as the advent of Bayesian phylogenetics facilitates the investigation of phylogenetic incongruence, which can result in the impossibility of representing phylogenetic relationships using a single tree. While sometimes considered as a nuisance, phylogenetic incongruence can also reflect meaningful biological processes as well as relevant statistical uncertainty, both of which can yield valuable insights in evolutionary studies. We introduce a new tool for investigating phylogenetic incongruence through the exploration of phylogenetic tree landscapes. Our approach, implemented in the R package treespace, combines tree metrics and multivariate analysis to provide low-dimensional representations of the topological variability in a set of trees, which can be used for identifying clusters of similar trees and group-specific consensus phylogenies. treespace also provides a user-friendly web interface for interactive data analysis and is integrated alongside existing standards for phylogenetics. It fills a gap in the current phylogenetics toolbox in R and will facilitate the investigation of phylogenetic results.
Collapse
Affiliation(s)
- Thibaut Jombart
- Department of Infectious Disease EpidemiologyMRC Centre for Outbreak Analysis and ModellingSchool of Public HealthImperial College LondonLondonUK
| | | | | | | |
Collapse
|
7
|
Barden D, Le H, Owen M. Limiting behaviour of Fréchet means in the space of phylogenetic trees. ANN I STAT MATH 2016. [DOI: 10.1007/s10463-016-0582-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
8
|
Amendola A, Pisciotta M, Aleo L, Ferraioli V, Angeletti C, Capobianchi MR. Evaluation of the Aptima(®) HIV-1 Quant Dx assay for HIV-1 RNA viral load detection and quantitation in plasma of HIV-1-infected individuals: A comparison with Abbott RealTime HIV-1 assay. J Med Virol 2016; 88:1535-44. [PMID: 26864171 PMCID: PMC6585778 DOI: 10.1002/jmv.24493] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/28/2016] [Indexed: 11/16/2022]
Abstract
The Hologic Aptima® HIV‐1 Quant Dx assay (Aptima HIV) is a real‐time transcription‐mediated amplification method CE‐approved for use in diagnosis and monitoring of HIV‐1 infection. The analytical performance of this new assay was compared to the FDA‐approved Abbott RealTime HIV‐1 (RealTime). The evaluation was performed using 220 clinical plasma samples, the WHO 3rd HIV‐1 International Standard, and the QCMD HIV‐1 RNA EQA. Concordance on qualitative results, correlation between quantitative results, accuracy, and reproducibility of viral load data were analyzed. The ability to measure HIV‐1 subtypes was assessed on the second WHO International Reference Preparation Panel for HIV‐1 Subtypes. With clinical samples, inter‐assay agreement for qualitative results was high (91.8%) with Cohen's kappa statistic equal to 0.836. For samples with quantitative results in both assays (n = 93), Lin's concordance correlation coefficient was 0.980 (P
< 0.0001) and mean differences of measurement, conducted according to Bland–Altman method, was low (0.115 log10 copies/ml). The Aptima HIV quantified the WHO 3rd HIV‐1 International Standard diluted from 2000 to 31 cp/ml (5,700–88 IU/ml) at expected values with excellent linearity (R2 > 0.970) and showed higher sensitivity compared to RealTime being able to detect HIV‐1 RNA in 10 out of 10 replicates containing down to 7 cp/ml (20 IU/ml). Reproducibility was very high, even at low HIV‐1 RNA values. The Aptima HIV was able to detect and accurately quantify all the main HIV‐1 subtypes in both reference panels and clinical samples. Besides excellent performance, Aptima HIV shows full automation, ease of use, and improved workflow compared to RealTime. J. Med. Virol. 88:1535–1544, 2016. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Alessandra Amendola
- Laboratory of VirologyNational Institute for Infectious Diseases “Lazzaro Spallanzani,”RomeItaly
| | - Maria Pisciotta
- Laboratory of VirologyNational Institute for Infectious Diseases “Lazzaro Spallanzani,”RomeItaly
| | - Loredana Aleo
- Laboratory of VirologyNational Institute for Infectious Diseases “Lazzaro Spallanzani,”RomeItaly
| | - Valeria Ferraioli
- Laboratory of VirologyNational Institute for Infectious Diseases “Lazzaro Spallanzani,”RomeItaly
| | - Claudio Angeletti
- Department of EpidemiologyNational Institute for Infectious Diseases “Lazzaro Spallanzani,”RomeItaly
| | | |
Collapse
|
9
|
Huckemann S, Mattingly J, Miller E, Nolen J. Sticky central limit theorems at isolated hyperbolic planar singularities. ELECTRON J PROBAB 2015. [DOI: 10.1214/ejp.v20-3887] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
10
|
|
11
|
|
12
|
Skutkova H, Vitek M, Babula P, Kizek R, Provaznik I. Classification of genomic signals using dynamic time warping. BMC Bioinformatics 2013; 14 Suppl 10:S1. [PMID: 24267034 PMCID: PMC3750471 DOI: 10.1186/1471-2105-14-s10-s1] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Classification methods of DNA most commonly use comparison of the differences in DNA symbolic records, which requires the global multiple sequence alignment. This solution is often inappropriate, causing a number of imprecisions and requires additional user intervention for exact alignment of the similar segments. The similar segments in DNA represented as a signal are characterized by a similar shape of the curve. The DNA alignment in genomic signals may adjust whole sections not only individual symbols. The dynamic time warping (DTW) is suitable for this purpose and can replace the multiple alignment of symbolic sequences in applications, such as phylogenetic analysis. METHODS The proposed method is composed of three main parts. The first part represent conversion of symbolic representation of DNA sequences in the form of a string of A,C,G,T symbols to signal representation in the form of cumulated phase of complex components defined for each symbol. Next part represents signals size adjustment realized by standard signal preprocessing methods: median filtration, detrendization and resampling. The final part necessary for genomic signals comparison is position and length alignment of genomic signals by dynamic time warping (DTW). RESULTS The application of the DTW on set of genomic signals was evaluated in dendrogram construction using cluster analysis. The resulting tree was compared with a classical phylogenetic tree reconstructed using multiple alignment. The classification of genomic signals using the DTW is evolutionary closer to phylogeny of organisms. This method is more resistant to errors in the sequences and less dependent on the number of input sequences. CONCLUSIONS Classification of genomic signals using dynamic time warping is an adequate variant to phylogenetic analysis using the symbolic DNA sequences alignment; in addition, it is robust, quick and more precise technique.
Collapse
|
13
|
Fang DA, Wang Y, Wang J, Liu LH, Wang Q. Characterization of Cherax quadricarinatus prohibitin and its potential role in spermatogenesis. Gene 2013; 519:318-25. [PMID: 23485620 DOI: 10.1016/j.gene.2013.02.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2012] [Revised: 02/01/2013] [Accepted: 02/06/2013] [Indexed: 01/10/2023]
Abstract
Prohibitin (PHB) proteins have diverse functions, such as cellular signaling, transcriptional control and mitochondrial biogenesis. In this study, we characterized PHB gene and its protein expression in Cherax quadricarinatus. PHB cDNA comprises 1472 nucleotides with an open reading frame of 828bp, which encodes 275 amino acid residues. The highest transcript levels were found during the spermatogonial developmental phase, with the lowest levels detected during the resting phase in the reproductive cycle. Western blot analysis revealed that PHB is an approximately 30kDa protein, and occurs in a number of unexpected isoforms, ranging from 30kDa to greater than 180kDa in the testes of different developmental phases, which may be the ubiquitinated substrates. The strongest immunolabeling signal was found in spermatogonia, with lower levels of staining in secondary spermatocytes, and weak or absent expression in mature sperm. Immunogold electron microscopy results confirmed the localization of PHB in the inner mitochondrial membranes. The results showed that PHB is a substrate protein for spermatogenesis, with a potential reproductive function involving sperm ubiquitination in invertebrates.
Collapse
Affiliation(s)
- Di-An Fang
- Scientific Observing and Experimental Station of Fishery Resources and Environment in the Changjiang River, Freshwater Fisheries Research Center, Wuxi, Shanshui Road 9, 214081, China
| | | | | | | | | |
Collapse
|
14
|
A simple k-word interval method for phylogenetic analysis of DNA sequences. J Theor Biol 2013; 317:192-9. [DOI: 10.1016/j.jtbi.2012.10.010] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2012] [Revised: 10/02/2012] [Accepted: 10/06/2012] [Indexed: 11/18/2022]
|
15
|
Chakerian J, Holmes S. Computational Tools for Evaluating Phylogenetic and Hierarchical Clustering Trees. J Comput Graph Stat 2012; 21:581-599. [PMID: 32982128 PMCID: PMC7518125 DOI: 10.1080/10618600.2012.640901] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Inferential summaries of tree estimates are useful in the setting of evolutionary biology, where phylogenetic trees have been built from DNA data since the 1960s. In bioinformatics, psychometrics, and data mining, hierarchical clustering techniques output the same mathematical objects, and practitioners have similar questions about the stability and "generalizability" of these summaries. This article describes the implementation of the geometric distance between trees developed by Billera, Holmes, and Vogtmann (2001) equally applicable to phylogenetic trees and hierarchical clustering trees, and shows some of the applications in evaluating tree estimates. In particular, since Billera et al. (2001) have shown that the space of trees is negatively curved (called a CAT(0) space), a collection of trees can naturally be represented as a tree. We compare this representation to the Euclidean approximations of treespace made available through both a classical multidimensional scaling and a Kernel multidimensional scaling of the matrix of the distances between trees. We also provide applications of the distances between trees to hierarchical clustering trees constructed from microarrays. Our method gives a new way of evaluating the influence of both certain columns (positions, variables, or genes) and certain rows (species, observations, or arrays) on the construction of such trees. It also can provide a way of detecting heterogeneous mixtures in the input data. Supplementary materials for this article are available online.
Collapse
|
16
|
Owen M, Provan JS. A fast algorithm for computing geodesic distances in tree space. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:2-13. [PMID: 21071792 DOI: 10.1109/tcbb.2010.3] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Comparing and computing distances between phylogenetic trees are important biological problems, especially for models where edge lengths play an important role. The geodesic distance measure between two phylogenetic trees with edge lengths is the length of the shortest path between them in the continuous tree space introduced by Billera, Holmes, and Vogtmann. This tree space provides a powerful tool for studying and comparing phylogenetic trees, both in exhibiting a natural distance measure and in providing a euclidean-like structure for solving optimization problems on trees. An important open problem is to find a polynomial time algorithm for finding geodesics in tree space. This paper gives such an algorithm, which starts with a simple initial path and moves through a series of successively shorter paths until the geodesic is attained.
Collapse
Affiliation(s)
- Megan Owen
- Department of Mathematics, University of California, Berkeley, MC 3840, Berkeley, CA 94720-0432, USA.
| | | |
Collapse
|
17
|
Altaba CR. Universal artifacts affect the branching of phylogenetic trees, not universal scaling laws. PLoS One 2009; 4:e4611. [PMID: 19242549 PMCID: PMC2644784 DOI: 10.1371/journal.pone.0004611] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2008] [Accepted: 01/21/2009] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND The superficial resemblance of phylogenetic trees to other branching structures allows searching for macroevolutionary patterns. However, such trees are just statistical inferences of particular historical events. Recent meta-analyses report finding regularities in the branching pattern of phylogenetic trees. But is this supported by evidence, or are such regularities just methodological artifacts? If so, is there any signal in a phylogeny? METHODOLOGY In order to evaluate the impact of polytomies and imbalance on tree shape, the distribution of all binary and polytomic trees of up to 7 taxa was assessed in tree-shape space. The relationship between the proportion of outgroups and the amount of imbalance introduced with them was assessed applying four different tree-building methods to 100 combinations from a set of 10 ingroup and 9 outgroup species, and performing covariance analyses. The relevance of this analysis was explored taking 61 published phylogenies, based on nucleic acid sequences and involving various taxa, taxonomic levels, and tree-building methods. PRINCIPAL FINDINGS All methods of phylogenetic inference are quite sensitive to the artifacts introduced by outgroups. However, published phylogenies appear to be subject to a rather effective, albeit rather intuitive control against such artifacts. The data and methods used to build phylogenetic trees are varied, so any meta-analysis is subject to pitfalls due to their uneven intrinsic merits, which translate into artifacts in tree shape. The binary branching pattern is an imposition of methods, and seldom reflects true relationships in intraspecific analyses, yielding artifactual polytomies in short trees. Above the species level, the departure of real trees from simplistic random models is caused at least by two natural factors--uneven speciation and extinction rates; and artifacts such as choice of taxa included in the analysis, and imbalance introduced by outgroups and basal paraphyletic taxa. This artifactual imbalance accounts for tree shape convergence of large trees. SIGNIFICANCE There is no evidence for any universal scaling in the tree of life. Instead, there is a need for improved methods of tree analysis that can be used to discriminate the noise due to outgroups from the phylogenetic signal within the taxon of interest, and to evaluate realistic models of evolution, correcting the retrospective perspective and explicitly recognizing extinction as a driving force. Artifacts are pervasive, and can only be overcome through understanding the structure and biological meaning of phylogenetic trees. Catalan Abstract in Translation S1.
Collapse
Affiliation(s)
- Cristian R Altaba
- Laboratory of Human Systematics, University of the Balearic Islands, Balearic Islands, Spain.
| |
Collapse
|
18
|
Abstract
We analyze a maximum likelihood approach for combining phylogenetic trees into a larger "supertree." This is based on a simple exponential model of phylogenetic error, which ensures that ML supertrees have a simple combinatorial description (as a median tree, minimizing a weighted sum of distances to the input trees). We show that this approach to ML supertree reconstruction is statistically consistent (it converges on the true species supertree as more input trees are combined), in contrast to the widely used MRP method, which we show can be statistically inconsistent under the exponential error model. We also show that this statistical consistency extends to an ML approach for constructing species supertrees from gene trees. In this setting, incomplete lineage sorting (due to coalescence rates of homologous genes being lower than speciation rates) has been shown to lead to gene trees that are frequently different from species trees, and this can confound efforts to reconstruct the species phylogeny correctly.
Collapse
Affiliation(s)
- Mike Steel
- Allan Wilson Centre for Molecular Ecology and Evolution, Department of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand.
| | | |
Collapse
|
19
|
Wróbel B. Statistical measures of uncertainty for branches in phylogenetic trees inferred from molecular sequences by using model-based methods. J Appl Genet 2008; 49:49-67. [DOI: 10.1007/bf03195249] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
20
|
Geuten K, Smets E, Schols P, Yuan YM, Janssens S, Küpfer P, Pyck N. Conflicting phylogenies of balsaminoid families and the polytomy in Ericales: combining data in a Bayesian framework. Mol Phylogenet Evol 2004; 31:711-29. [PMID: 15062805 DOI: 10.1016/j.ympev.2003.09.014] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2003] [Revised: 09/17/2003] [Indexed: 11/29/2022]
Abstract
The balsaminoid Ericales, namely Balsaminaceae, Marcgraviaceae, Tetrameristaceae, and Pellicieraceae have been confidently placed at the base of Ericales, but the relations among these families have been resolved differently in recent analyses. Sister to this basal group is a large polytomy comprising all other families of Ericales, which is associated with short internodes. Because there are more than 13 kb of sequences for a large sampling of representatives, a thorough examination of the available data with novel methods seemed in place. Because of its computational speed, Bayesian phylogenetics allows for the use of parameter-rich models that can accommodate differences in the evolutionary process between partitions in a simultaneous analysis. In addition, there are recently proposed Bayesian strategies of assessing incongruence between partitions. We have applied these methods to the current problems in Ericales phylogeny, taking into account reported pitfalls in Bayesian analysis such as model selection uncertainty. Based on our results we infer several, previously unresolved relationships in the order Ericales. In balsaminoid families, we find that the closest relatives of Balsaminaceae are Marcgraviaceae. In the Ericales polytomy, we find strong support for Pentaphylacaceae sensu APG II as the sister group of Maesaceae. In addition, Symplocaceae receive a position as sister to Theaceae and these families form a monophyletic group together with Styracaceae-Diapensiaceae. At the base of this clade are Actinidiaceae and Clethraceae. The positions of Ebenaceae and Lecythidaceae remain uncertain.
Collapse
Affiliation(s)
- K Geuten
- Laboratory of Plant Systematics, Institute of Botany and Microbiology, K.U.Leuven, Kasteelpark Arenberg 31, B-3001 Leuven, Belgium.
| | | | | | | | | | | | | |
Collapse
|
21
|
Abstract
have suggested that there are important weaknesses of gene tree parsimony in reconstructing phylogeny in the face of gene duplication, weaknesses that are addressed by method of uninode coding. Here, we discuss Simmons and Freudenstein's criticisms and suggest a number of reasons why gene tree parsimony is preferable to uninode coding. During this discussion we introduce a number of recent developments of gene tree parsimony methods overlooked by Simmons and Freudenstein. Finally, we present a re-analysis of data from that produces a more reasonable phylogeny than that found by Simmons and Freudenstein, suggesting that gene tree parsimony outperforms uninode coding, at least on these data.
Collapse
Affiliation(s)
- James A Cotton
- Division of Environmental and Evolutionary Biology, Institute of Biomedical and Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK.
| | | |
Collapse
|