1
|
|
2
|
Chen D, Eulenstein O, Fernández-Baca D, Burleigh JG. Improved Heuristics for Minimum-Flip Supertree Construction. Evol Bioinform Online 2017. [DOI: 10.1177/117693430600200003] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The utility of the matrix representation with flipping (MRF) supertree method has been limited by the speed of its heuristic algorithms. We describe a new heuristic algorithm for MRF supertree construction that improves upon the speed of the previous heuristic by a factor of n (the number of taxa in the supertree). This new heuristic makes MRF tractable for large-scale supertree analyses and allows the first comparisons of MRF with other supertree methods using large empirical data sets. Analyses of three published supertree data sets with between 267 to 571 taxa indicate that MRF supertrees are equally or more similar to the input trees on average than matrix representation with parsimony (MRP) and modified mincut supertrees. The results also show that large differences may exist between MRF and MRP supertrees and demonstrate that the MRF supertree method is a practical and potentially more accurate alternative to the nearly ubiquitous MRP super-tree method.
Collapse
Affiliation(s)
- Duhong Chen
- Department of Computer Science, Iowa State University, Ames, IA 50011, U.S.A
| | - Oliver Eulenstein
- Department of Computer Science, Iowa State University, Ames, IA 50011, U.S.A
| | | | - J. Gordon Burleigh
- Section of Evolution and Ecology, University of California, Davis, CA 95616, U.S.A.; NESCent, Durham, NC 27705, U.S.A
| |
Collapse
|
3
|
Sigwart JD, Lindberg DR. Consensus and confusion in molluscan trees: evaluating morphological and molecular phylogenies. Syst Biol 2015; 64:384-95. [PMID: 25472575 PMCID: PMC4395843 DOI: 10.1093/sysbio/syu105] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Accepted: 11/21/2014] [Indexed: 11/18/2022] Open
Abstract
Mollusks are the most morphologically disparate living animal phylum, they have diversified into all habitats, and have a deep fossil record. Monophyly and identity of their eight living classes is undisputed, but relationships between these groups and patterns of their early radiation have remained elusive. Arguments about traditional morphological phylogeny focus on a small number of topological concepts but often without regard to proximity of the individual classes. In contrast, molecular studies have proposed a number of radically different, inherently contradictory, and controversial sister relationships. Here, we assembled a data set of 42 unique published trees describing molluscan interrelationships. We used these data to ask several questions about the state of resolution of molluscan phylogeny compared with a null model of the variation possible in random trees constructed from a monophyletic assemblage of eight terminals. Although 27 different unique trees have been proposed from morphological inference, the majority of these are not statistically different from each other. Within the available molecular topologies, only four studies to date have included the deep sea class Monoplacophora; but 36.4% of all trees are not significantly different. We also present supertrees derived from two data partitions and three methods, including all available molecular molluscan phylogenies, which will form the basis for future hypothesis testing. The supertrees presented here were not constructed to provide yet another hypothesis of molluscan relationships, but rather to algorithmically evaluate the relationships present in the disparate published topologies. Based on the totality of available evidence, certain patterns of relatedness among constituent taxa become clear. The internodal distance is consistently short between a few taxon pairs, particularly supporting the relatedness of Monoplacophora and the chitons, Polyplacophora. Other taxon pairs are rarely or never found in close proximity, such as the vermiform Caudofoveata and Bivalvia. Our results have specific utility for guiding constructive research planning to better test relationships in Mollusca as well as other problematic groups. Taxa with consistently proximate relationships should be the focus of a combined approach in a concerted assessment of potential genetic and anatomical homology, whereas unequivocally distant taxa will make the most constructive choices for exemplar selection in higher level phylogenomic analyses.
Collapse
Affiliation(s)
- Julia D Sigwart
- Marine Laboratory, Queen's University Belfast, BT22 1PF, Northern Ireland, UK; and Department of Integrative Biology, Museum of Paleontology and Center for Computational Biology, University of California, Berkeley, CA, 94720, USA
| | - David R Lindberg
- Marine Laboratory, Queen's University Belfast, BT22 1PF, Northern Ireland, UK; and Department of Integrative Biology, Museum of Paleontology and Center for Computational Biology, University of California, Berkeley, CA, 94720, USA
| |
Collapse
|
4
|
Vidal MA, Ortiz JC, Marín JC, Poulin E, Moreno PI. Comparative phylogeography of two co-distributed species of lizards of the genus Liolaemus (Squamata: Tropiduridae) from Southern Chile. AMPHIBIA-REPTILIA 2012. [DOI: 10.1163/156853811x622039] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Comparative phylogeography describes the patterns of evolutionary divergence and whether or not they are congruent, in co-distributed populations of different taxa. If the populations of these taxa have been co-distributed for a prolonged time, and if the times between processes of perturbation or vicariance have been more or less stable, it is expected that patterns of divergence will be congruent in closely related species, for example because of similar biological and demographic characteristics.Liolaemus pictusandL. cyanogasterare widely co-distributed lizard species in southern Chile, occurring in a region with a complex topology. We analyzed the phylogeographic structure of the two lizard species usingCytochromebDNA sequences to estimate their genetic structure in response to historical events. Our results suggest an evolutionary pattern of genetic diversity for each species that is consistent with the geomorphological history of the region, suggesting a complex phylogeographic history inLiolaemusspecies. Also, the high levels of divergence among haplotypes in several populations suggest the possibility that their origin might predate the middle Pleistocene in both species. Finally, our results are consistent with our hypothesis that two species have responded to historical events in parallel, where historical process have been sufficient to influence their phylogeographical structure (0.80 congruency between topologies).
Collapse
Affiliation(s)
- Marcela A. Vidal
- 1Laboratorio de Genómica y Biodiversidad, Departamento de Ciencias Básicas, Facultad de Ciencias, Universidad del Bío-Bío, Casilla 447, Chillán, Chile
| | - Juan Carlos Ortiz
- 2Departamento de Zoología, Facultad de Ciencias Naturales y Oceanográficas, Universidad de Concepción, Casilla 160-C, Concepción, Chile
| | - Juan Carlos Marín
- 1Laboratorio de Genómica y Biodiversidad, Departamento de Ciencias Básicas, Facultad de Ciencias, Universidad del Bío-Bío, Casilla 447, Chillán, Chile
| | - Elie Poulin
- 3Instituto de Ecología y Biodiversidad, Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Casilla 653, Santiago, Chile
| | - Patricio I. Moreno
- 3Instituto de Ecología y Biodiversidad, Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Casilla 653, Santiago, Chile
| |
Collapse
|
5
|
Kupczok A, Schmidt HA, von Haeseler A. Accuracy of phylogeny reconstruction methods combining overlapping gene data sets. Algorithms Mol Biol 2010; 5:37. [PMID: 21134245 PMCID: PMC3022592 DOI: 10.1186/1748-7188-5-37] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2010] [Accepted: 12/06/2010] [Indexed: 11/17/2022] Open
Abstract
Background The availability of many gene alignments with overlapping taxon sets raises the question of which strategy is the best to infer species phylogenies from multiple gene information. Methods and programs abound that use the gene alignment in different ways to reconstruct the species tree. In particular, different methods combine the original data at different points along the way from the underlying sequences to the final tree. Accordingly, they are classified into superalignment, supertree and medium-level approaches. Here, we present a simulation study to compare different methods from each of these three approaches. Results We observe that superalignment methods usually outperform the other approaches over a wide range of parameters including sparse data and gene-specific evolutionary parameters. In the presence of high incongruency among gene trees, however, other combination methods show better performance than the superalignment approach. Surprisingly, some supertree and medium-level methods exhibit, on average, worse results than a single gene phylogeny with complete taxon information. Conclusions For some methods, using the reconstructed gene tree as an estimation of the species tree is superior to the combination of incomplete information. Superalignment usually performs best since it is less susceptible to stochastic error. Supertree methods can outperform superalignment in the presence of gene-tree conflict.
Collapse
|
6
|
Buerki S, Forest F, Salamin N, Alvarez N. Comparative performance of supertree algorithms in large data sets using the soapberry family (Sapindaceae) as a case study. Syst Biol 2010; 60:32-44. [PMID: 21068445 DOI: 10.1093/sysbio/syq057] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
For the last 2 decades, supertree reconstruction has been an active field of research and has seen the development of a large number of major algorithms. Because of the growing popularity of the supertree methods, it has become necessary to evaluate the performance of these algorithms to determine which are the best options (especially with regard to the supermatrix approach that is widely used). In this study, seven of the most commonly used supertree methods are investigated by using a large empirical data set (in terms of number of taxa and molecular markers) from the worldwide flowering plant family Sapindaceae. Supertree methods were evaluated using several criteria: similarity of the supertrees with the input trees, similarity between the supertrees and the total evidence tree, level of resolution of the supertree and computational time required by the algorithm. Additional analyses were also conducted on a reduced data set to test if the performance levels were affected by the heuristic searches rather than the algorithms themselves. Based on our results, two main groups of supertree methods were identified: on one hand, the matrix representation with parsimony (MRP), MinFlip, and MinCut methods performed well according to our criteria, whereas the average consensus, split fit, and most similar supertree methods showed a poorer performance or at least did not behave the same way as the total evidence tree. Results for the super distance matrix, that is, the most recent approach tested here, were promising with at least one derived method performing as well as MRP, MinFlip, and MinCut. The output of each method was only slightly improved when applied to the reduced data set, suggesting a correct behavior of the heuristic searches and a relatively low sensitivity of the algorithms to data set sizes and missing data. Results also showed that the MRP analyses could reach a high level of quality even when using a simple heuristic search strategy, with the exception of MRP with Purvis coding scheme and reversible parsimony. The future of supertrees lies in the implementation of a standardized heuristic search for all methods and the increase in computing power to handle large data sets. The latter would prove to be particularly useful for promising approaches such as the maximum quartet fit method that yet requires substantial computing power.
Collapse
Affiliation(s)
- Sven Buerki
- Real Jardin Botanico, Department of Biodiversity and Conservation, CSIC, Plaza de Murillo 2, 28014 Madrid, Spain.
| | | | | | | |
Collapse
|
7
|
ZHOU XUMING, XU SHIXIA, ZHANG PAN, YANG GUANG. Developing a series of conservative anchor markers and their application to phylogenomics of Laurasiatherian mammals. Mol Ecol Resour 2010; 11:134-40. [DOI: 10.1111/j.1755-0998.2010.02903.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- XUMING ZHOU
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University,
Nanjing 210046, China
| | - SHIXIA XU
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University,
Nanjing 210046, China
| | - PAN ZHANG
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University,
Nanjing 210046, China
| | - GUANG YANG
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University,
Nanjing 210046, China
| |
Collapse
|
8
|
Mating system drives negative associations between morphological features in Schistosomatidae. BMC Evol Biol 2010; 10:245. [PMID: 20698972 PMCID: PMC2928788 DOI: 10.1186/1471-2148-10-245] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2010] [Accepted: 08/10/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Sexual morphological features are known to be associated with the mating systems of several animal groups. However, it has been suggested that morphological features other than sexual characteristics could also be constrained by the mating system as a consequence of negative associations. Schistosomatidae are parasitic organisms that vary in mating system and can thus be used to explore links between the mating system and negative associations with morphological features. RESULTS A comparative analysis of Schistosomatidae morphological features revealed an association between the mating system (monogamous versus polygynandrous) and morphological characteristics of reproduction, nutrition, and locomotion. CONCLUSIONS The mating system drives negative associations between somatic and sexual morphological features. In monogamous species, males display a lower investment in sexual tissues and a higher commitment of resources to tissues involved in female transport, protection, and feeding assistance. In contrast, males of polygynandrous species invest to a greater extent in sexual tissues at the cost of reduced commitment to female care.
Collapse
|
9
|
Intrabreed Stratification Related to Divergent Selection Regimes in Purebred Dogs May Affect the Interpretation of Genetic Association Studies. J Hered 2009; 100:S28-S36. [PMCID: PMC4176315 DOI: 10.1093/jhered/esp012] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2008] [Revised: 02/13/2009] [Accepted: 02/25/2009] [Indexed: 12/22/2023] Open
Abstract
Until recently, canine genetic research has not focused on population structure within breeds, which may confound the results of case–control studies by introducing spurious correlations between phenotype and genotype that reflect population history. Intrabreed structure may exist when geographical origin or divergent selection regimes influence the choices of potential mates for breeding dogs. We present evidence for intrabreed stratification from a genome-wide marker survey in a sample of unrelated dogs. We genotyped 76 Border Collies, 49 Australian Shepherds, 17 German Shepherd Dogs, and 17 Portuguese Water Dogs for our primary analyses using Affymetrix Canine v2.0 single-nucleotide polymorphism (SNP) arrays. Subsets of autosomal markers were examined using clustering algorithms to facilitate assignment of individuals to populations and estimation of the number of populations represented in the sample. SNPs passing stringent quality control filters were employed for explicitly phylogenetic analyses reconstructing relationships between individuals using maximum parsimony and Bayesian methods. We used simulation studies to explore the possible effects of intrabreed stratification on genome-wide association studies. These analyses demonstrate significant stratification in at least one of our primary breeds of interest, the Border Collie. Demographic and pedigree data suggest that this population substructure may result from geographic isolation or divergent selection regimes practiced by breeders with different breeding program goals. Simulation studies indicate that such stratification could result in false discovery rates significant enough to confound genome-wide association analyses. Intrabreed stratification should be accounted for when designing and interpreting the results of case–control association studies using purebred dogs.
Collapse
|
10
|
Jousselin E, Van Noort S, Berry V, Rasplus JY, Rønsted N, Erasmus JC, Greeff JM. ONE FIG TO BIND THEM ALL: HOST CONSERVATISM IN A FIG WASP COMMUNITY UNRAVELED BY COSPECIATION ANALYSES AMONG POLLINATING AND NONPOLLINATING FIG WASPS. Evolution 2008; 62:1777-1797. [PMID: 18419750 DOI: 10.1111/j.1558-5646.2008.00406.x] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Affiliation(s)
- Emmanuelle Jousselin
- Institut National de la Recherche Agronomique, Centre de Biologie et de Gestion des Populations, Campus International de Baillarguet, CS-30 016, 34 988 Montferrier sur Lez, France
- E-mail:
| | - Simon Van Noort
- Natural History Division, South African Museum, Iziko Museums of Cape Town, PO Box 61, Cape Town 8000, South Africa
| | - Vincent Berry
- Département Informatique, LIRMM- CNRS, 161, rue Ada 34392 Montpellier Cedex 5, France
| | - Jean-Yves Rasplus
- Institut National de la Recherche Agronomique, Centre de Biologie et de Gestion des Populations, Campus International de Baillarguet, CS-30 016, 34 988 Montferrier sur Lez, France
| | - Nina Rønsted
- Jodrell Laboratory, Royal Botanic Gardens, Kew, TW9 3DS Richmond, Surrey, United Kingdom
| | | | - Jaco M Greeff
- Department of Genetics, University of Pretoria, Pretoria 0002, South Africa
| |
Collapse
|
11
|
Wilkinson M, Cotton JA, Lapointe FJ, Pisani D. Properties of supertree methods in the consensus setting. Syst Biol 2007; 56:330-7. [PMID: 17464887 DOI: 10.1080/10635150701245370] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Affiliation(s)
- Mark Wilkinson
- Department of Zoology, The Natural History Museum, London, SW7 5BD, UK.
| | | | | | | |
Collapse
|
12
|
Abstract
Phylogenetic analysis has changed greatly in the last decade, and the most important themes in that change are reviewed here. Sequence data have become the most common source of phylogenetic information. This means that explicit models for evolutionary processes have been developed in a likelihood context, which allow more realistic data analyses. These models are becoming increasingly complex, both for nucleotides and for amino acid sequences, and so all such models need to be quantitatively assessed for each data set, to find the most appropriate one for use in any particular tree-building analysis. Bayesian analysis has been developed for tree-building and is greatly increasing in popularity. This is because a good heuristic strategy exists, which allows large data sets to be analyzed with complex evolutionary models in a practical time. Perhaps the most disappointing aspect of tree interpretation is the ongoing confusion between rooted and unrooted trees, while the effect of taxon and character sampling is often overlooked when constructing a phylogeny (especially in parasitology). The review finishes with a detailed consideration of the analysis of a multi-gene data set for several dozen taxa of Cryptosporidium (Apicomplexa), illustrating many of the theoretical and practical points highlighted in the review.
Collapse
Affiliation(s)
- David A Morrison
- Department of Parasitology (SWEPAR), National Veterinary Institute and Swedish University of Agricultural Sciences, 751 89 Uppsala, Sweden
| |
Collapse
|
13
|
Moore BR, Smith SA, Donoghue MJ. Increasing data transparency and estimating phylogenetic uncertainty in supertrees: Approaches using nonparametric bootstrapping. Syst Biol 2006; 55:662-76. [PMID: 16969942 DOI: 10.1080/10635150600920693] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
Abstract
The estimation of ever larger phylogenies requires consideration of alternative inference strategies, including divide-and-conquer approaches that decompose the global inference problem to a set of smaller, more manageable component problems. A prominent locus of research in this area is the development of supertree methods, which estimate a composite tree by combining a set of partially overlapping component topologies. Although promising, the use of component tree topologies as the primary data dissociates supertrees from complexities within the underling character data and complicates the evaluation of phylogenetic uncertainty. We address these issues by exploring three approaches that variously incorporate nonparametric bootstrapping into a common supertree estimation algorithm (matrix representation with parsimony, although any algorithm might be used), including bootstrap-weighting, source-tree bootstrapping, and hierarchical bootstrapping. We illustrate these procedures by means of hypothetical and empirical examples. Our preliminary experiments suggest that these methods have the potential to improve the correspondence of supertree estimates to those derived from simultaneous analysis of the combined data and to allow uncertainty in supertree topologies to be quantified. The ability to increase the transparency of supertrees to the underlying character data has several practical implications and sheds new light on an old debate. These methods have been implemented in the freely available program, tREeBOOT.
Collapse
Affiliation(s)
- Brian R Moore
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut 06520, USA.
| | | | | |
Collapse
|
14
|
Chen D, Eulenstein O, Fernandez-Baca D, Sanderson M. Minimum-flip supertrees: complexity and algorithms. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2006; 3:165-73. [PMID: 17048402 DOI: 10.1109/tcbb.2006.26] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
The input to a supertree problem is a collection of phylogenetic trees that intersect pairwise in their leaf sets; the goal is to construct a single tree that retains as much as possible of the information in the input. This task is complicated by inconsistencies due to errors. We consider the case where the input trees are rooted and are represented by the clusters they exhibit. The problem is to find the minimum number of flips needed to resolve all inconsistencies, where each flip moves a taxon into or out of a cluster. We prove that the minimum-flip problem is NP-complete, but show that it is fixed-parameter tractable and give approximation algorithms for special cases.
Collapse
Affiliation(s)
- Duhong Chen
- Department of Computer Science, Iowa State University, Ames, IA 50011-1040, USA.
| | | | | | | |
Collapse
|
15
|
Affiliation(s)
- Olaf R P Bininda-Emonds
- Lehrstuhl für Tierzucht, Technical University of Munich, Hochfeldweg 1, 85354 Freising-Weihenstephan, Germany.
| | | | | |
Collapse
|
16
|
Abstract
The application of whole-genome shotgun sequencing to microbial communities represents a major development in metagenomics, the study of uncultured microbes via the tools of modern genomic analysis. In the past year, whole-genome shotgun sequencing projects of prokaryotic communities from an acid mine biofilm, the Sargasso Sea, Minnesota farm soil, three deep-sea whale falls, and deep-sea sediments have been reported, adding to previously published work on viral communities from marine and fecal samples. The interpretation of this new kind of data poses a wide variety of exciting and difficult bioinformatics problems. The aim of this review is to introduce the bioinformatics community to this emerging field by surveying existing techniques and promising new approaches for several of the most interesting of these computational problems.
Collapse
Affiliation(s)
- Kevin Chen
- *To whom correspondence should be addressed. E-mail: (KC), (LP)
| | - Lior Pachter
- *To whom correspondence should be addressed. E-mail: (KC), (LP)
| |
Collapse
|