1
|
Bryant D, Huson DH. NeighborNet: improved algorithms and implementation. FRONTIERS IN BIOINFORMATICS 2023; 3:1178600. [PMID: 37799982 PMCID: PMC10548196 DOI: 10.3389/fbinf.2023.1178600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 08/04/2023] [Indexed: 10/07/2023] Open
Abstract
NeighborNet constructs phylogenetic networks to visualize distance data. It is a popular method used in a wide range of applications. While several studies have investigated its mathematical features, here we focus on computational aspects. The algorithm operates in three steps. We present a new simplified formulation of the first step, which aims at computing a circular ordering. We provide the first technical description of the second step, the estimation of split weights. We review the third step by constructing and drawing the network. Finally, we discuss how the networks might best be interpreted, review related approaches, and present some open questions.
Collapse
Affiliation(s)
- David Bryant
- Department of Mathematics and Statistics, University of Otago, Dunedin, New Zealand
| | - Daniel H Huson
- Algorithms in Bioinformatics, University of Tübingen, Tübingen, Germany
- Cluster of Excellence: Controlling Microbes to Fight Infection, University of Tübingen, Tübingen, Germany
| |
Collapse
|
2
|
Forcey S, Scalzo D. Phylogenetic Networks as Circuits With Resistance Distance. Front Genet 2020; 11:586664. [PMID: 33193721 PMCID: PMC7593533 DOI: 10.3389/fgene.2020.586664] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 09/07/2020] [Indexed: 11/18/2022] Open
Abstract
Phylogenetic networks are notoriously difficult to reconstruct. Here we suggest that it can be useful to view unknown genetic distance along edges in phylogenetic networks as analogous to unknown resistance in electric circuits. This resistance distance, well-known in graph theory, turns out to have nice mathematical properties which allow the precise reconstruction of networks. Specifically we show that the resistance distance for a weighted 1-nested network is Kalmanson, and that the unique associated circular split network fully represents the splits of the original phylogenetic network (or circuit). In fact, this full representation corresponds to a face of the balanced minimal evolution polytope for level-1 networks. Thus, the unweighted class of the original network can be reconstructed by either the greedy algorithm neighbor-net or by linear programming over a balanced minimal evolution polytope. We begin study of 2-nested networks with both minimum path and resistance distance, and include some counting results for 2-nested networks.
Collapse
Affiliation(s)
- Stefan Forcey
- Department of Mathematics, The University of Akron, Akron, OH, United States
| | | |
Collapse
|
3
|
Georges-Filteau J, Hamelin RC, Blanchette M. Mycorrhiza: genotype assignment using phylogenetic networks. Bioinformatics 2020; 36:212-220. [PMID: 31197316 DOI: 10.1093/bioinformatics/btz476] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Revised: 05/03/2019] [Accepted: 06/06/2019] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION The genotype assignment problem consists of predicting, from the genotype of an individual, which of a known set of populations it originated from. The problem arises in a variety of contexts, including wildlife forensics, invasive species detection and biodiversity monitoring. Existing approaches perform well under ideal conditions but are sensitive to a variety of common violations of the assumptions they rely on. RESULTS In this article, we introduce Mycorrhiza, a machine learning approach for the genotype assignment problem. Our algorithm makes use of phylogenetic networks to engineer features that encode the evolutionary relationships among samples. Those features are then used as input to a Random Forests classifier. The classification accuracy was assessed on multiple published empirical SNP, microsatellite or consensus sequence datasets with wide ranges of size, geographical distribution and population structure and on simulated datasets. It compared favorably against widely used assessment tests or mixture analysis methods such as STRUCTURE and Admixture, and against another machine-learning based approach using principal component analysis for dimensionality reduction. Mycorrhiza yields particularly significant gains on datasets with a large average fixation index (FST) or deviation from the Hardy-Weinberg equilibrium. Moreover, the phylogenetic network approach estimates mixture proportions with good accuracy. AVAILABILITY AND IMPLEMENTATION Mycorrhiza is released as an easy to use open-source python package at github.com/jgeofil/mycorrhiza. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Richard C Hamelin
- Department of Forest and Conservation Sciences, The University of British Columbia, Vancouver, BC, Canada.,Département des sciences du bois et de la forêt, Université Laval, Québec, Canada
| | | |
Collapse
|
4
|
Bannantine JP, Conde C, Bayles DO, Branger M, Biet F. Genetic Diversity Among Mycobacterium avium Subspecies Revealed by Analysis of Complete Genome Sequences. Front Microbiol 2020; 11:1701. [PMID: 32849358 PMCID: PMC7426613 DOI: 10.3389/fmicb.2020.01701] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Accepted: 06/29/2020] [Indexed: 11/13/2022] Open
Abstract
Mycobacterium avium comprises four subspecies that contain both human and veterinary pathogens. At the inception of this study, twenty-eight M. avium genomes had been annotated as RefSeq genomes, facilitating direct comparisons. These genomes represent strains from around the world and provided a unique opportunity to examine genome dynamics in this species. Each genome was confirmed to be classified correctly based on SNP genotyping, nucleotide identity and presence/absence of repetitive elements or other typing methods. The Mycobacterium avium subspecies paratuberculosis (Map) genome size and organization was remarkably consistent, averaging 4.8 Mb with a variance of only 29.6 kb among the 13 strains. Comparing recombination events along with the larger genome size and variance observed among Mycobacterium avium subspecies avium (Maa) and Mycobacterium avium subspecies hominissuis (Mah) strains (collectively termed non-Map) suggests horizontal gene transfer occurs in non-Map, but not in Map strains. Overall, M. avium subspecies could be divided into two major sub-divisions, with the Map type II (bovine strains) clustering tightly on one end of a phylogenetic spectrum and Mah strains clustering more loosely together on the other end. The most evolutionarily distinct Map strain was an ovine strain, designated Telford, which had >1,000 SNPs and showed large rearrangements compared to the bovine type II strains. The Telford strain clustered with Maa strains as an intermediate between Map type II and Mah. SNP analysis and genome organization analyses repeatedly demonstrated the conserved nature of Map versus the mosaic nature of non-Map M. avium strains. Finally, core and pangenomes were developed for Map and non-Map strains. A total of 80% Map genes belonged to the Map core genome, while only 40% of non-Map genes belonged to the non-Map core genome. These genomes provide a more complete and detailed comparison of these subspecies strains as well as a blueprint for how genetic diversity originated.
Collapse
Affiliation(s)
- John P Bannantine
- USDA-Agricultural Research Service, National Animal Disease Center, Ames, IA, United States
| | - Cyril Conde
- INRAE, Université de Tours, ISP, Nouzilly, France
| | - Darrell O Bayles
- USDA-Agricultural Research Service, National Animal Disease Center, Ames, IA, United States
| | | | - Franck Biet
- INRAE, Université de Tours, ISP, Nouzilly, France
| |
Collapse
|
5
|
Durell C, Forcey S. Level-1 phylogenetic networks and their balanced minimum evolution polytopes. J Math Biol 2020; 80:1235-1263. [PMID: 32047981 DOI: 10.1007/s00285-019-01458-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 10/29/2019] [Indexed: 11/29/2022]
Abstract
Balanced minimum evolution is a distance-based criterion for the reconstruction of phylogenetic trees. Several algorithms exist to find the optimal tree with respect to this criterion. One approach is to minimize a certain linear functional over an appropriate polytope. Here we present polytopes that allow a similar linear programming approach to finding phylogenetic networks. We investigate a two-parameter family of polytopes that arise from phylogenetic networks, and which specialize to the Balanced Minimum Evolution polytopes as well as the Symmetric Travelling Salesman polytopes. We show that the vertices correspond to certain level-1 phylogenetic networks, and that there are facets or faces for every split. We also describe lower bound faces and a family of faces for every dimension.
Collapse
Affiliation(s)
- Cassandra Durell
- Department of Mathematics, The University of Akron, Akron, OH, 44325-4002, USA
| | - Stefan Forcey
- Department of Mathematics, The University of Akron, Akron, OH, 44325-4002, USA.
| |
Collapse
|
6
|
Allman ES, Baños H, Rhodes JA. NANUQ: a method for inferring species networks from gene trees under the coalescent model. Algorithms Mol Biol 2019; 14:24. [PMID: 31827592 PMCID: PMC6896299 DOI: 10.1186/s13015-019-0159-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 11/07/2019] [Indexed: 01/07/2023] Open
Abstract
Species networks generalize the notion of species trees to allow for hybridization or other lateral gene transfer. Under the network multispecies coalescent model, individual gene trees arising from a network can have any topology, but arise with frequencies dependent on the network structure and numerical parameters. We propose a new algorithm for statistical inference of a level-1 species network under this model, from data consisting of gene tree topologies, and provide the theoretical justification for it. The algorithm is based on an analysis of quartets displayed on gene trees, combining several statistical hypothesis tests with combinatorial ideas such as a quartet-based intertaxon distance appropriate to networks, the NeighborNet algorithm for circular split systems, and the Circular Network algorithm for constructing a splits graph.
Collapse
|
7
|
Rigaud S, Manen C, García-Martínez de Lagrán I. Symbols in motion: Flexible cultural boundaries and the fast spread of the Neolithic in the western Mediterranean. PLoS One 2018; 13:e0196488. [PMID: 29715284 PMCID: PMC5929525 DOI: 10.1371/journal.pone.0196488] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Accepted: 04/13/2018] [Indexed: 11/19/2022] Open
Abstract
The rapid diffusion of farming technologies in the western Mediterranean raises questions about the mechanisms that drove the development of intensive contact networks and circulation routes between incoming Neolithic communities. Using a statistical method to analyze a brand-new set of cultural and chronological data, we document the large-scale processes that led to variations between Mediterranean archaeological cultures, and micro-scale processes responsible for the transmission of cultural practices within farming communities. The analysis of two symbolic productions, pottery decorations and personal ornaments, shed light on the complex interactions developed by Early Neolithic farmers in the western Mediterranean area. Pottery decoration diversity correlates with local processes of circulation and exchange, resulting in the emergence and the persistence of stylistic and symbolic boundaries between groups, while personal ornaments reflect extensive networks and the high level of mobility of Early Neolithic farmers. The two symbolic productions express different degrees of cultural interaction that may have facilitated the successful and rapid expansion of early farming societies in the western Mediterranean.
Collapse
Affiliation(s)
- Solange Rigaud
- CNRS, UMR 5199 –PACEA, Université de Bordeaux, Bâtiment, Allée Geoffroy Saint Hilaire, Pessac, France
- * E-mail:
| | - Claire Manen
- CNRS, UMR 5608 –TRACES, Université Toulouse–Jean Jaurès, Maison de la Recherche, 5, allées Antonio-Machado, Toulouse Cedex 9, France
| | | |
Collapse
|
8
|
Prohaska SJ, Berkemer SJ, Gärtner F, Gatter T, Retzlaff N, Höner Zu Siederdissen C, Stadler PF. Expansion of gene clusters, circular orders, and the shortest Hamiltonian path problem. J Math Biol 2017; 77:313-341. [PMID: 29260295 PMCID: PMC6060901 DOI: 10.1007/s00285-017-1197-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Revised: 12/02/2017] [Indexed: 11/30/2022]
Abstract
Clusters of paralogous genes such as the famous HOX cluster of developmental transcription factors tend to evolve by stepwise duplication of its members, often involving unequal crossing over. Gene conversion and possibly other mechanisms of concerted evolution further obfuscate the phylogenetic relationships. As a consequence, it is very difficult or even impossible to disentangle the detailed history of gene duplications in gene clusters. In this contribution we show that the expansion of gene clusters by unequal crossing over as proposed by Walter Gehring leads to distinctive patterns of genetic distances, namely a subclass of circular split systems. Furthermore, when the gene cluster was left undisturbed by genome rearrangements, the shortest Hamiltonian paths with respect to genetic distances coincide with the genomic order. This observation can be used to detect ancient genomic rearrangements of gene clusters and to distinguish gene clusters whose evolution was dominated by unequal crossing over within genes from those that expanded through other mechanisms.
Collapse
Affiliation(s)
- Sonja J Prohaska
- Computational EvoDevo Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany
| | - Sarah J Berkemer
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103, Leipzig, Germany.,Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany
| | - Fabian Gärtner
- Competence Center for Scalable Data Services and Solutions Dresden/Leipzig and Bioinformatics Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany
| | - Thomas Gatter
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany
| | - Nancy Retzlaff
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103, Leipzig, Germany.,Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany
| | | | - Christian Höner Zu Siederdissen
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany
| | - Peter F Stadler
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103, Leipzig, Germany. .,Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany. .,RNomics Group, Fraunhofer Institute for Cell Therapy and Immunology, 04103, Leipzig, Germany. .,Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, 1090, Wien, Austria. .,Santa Fe Insitute, 1399 Hyde Park Rd., Santa Fe, NM, 87501, USA.
| |
Collapse
|
9
|
Santos-Júnior CD, Veríssimo A, Costa J. The recombination dynamics of Staphylococcus aureus inferred from spA gene. BMC Microbiol 2016; 16:143. [PMID: 27400707 PMCID: PMC4940709 DOI: 10.1186/s12866-016-0757-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2015] [Accepted: 07/01/2016] [Indexed: 11/10/2022] Open
Abstract
Background Given the role of spA as a pivotal virulence factor decisive for Staphylococcus aureus ability to escape from innate and adaptive immune responses, one can consider it as an object subject to adaptive evolution and that variations in spA may uncover pathogenicity variations. Results The population genetic structure was deduced from the extracellular domains of SpA gene sequence (domains A-E and the X-region) and compared to the MLST-analysis of 41 genetically diverse methicillin-resistant (MRSA) and methicillin-susceptible (MSSA) S. aureus strains. Incongruence between tree topologies was noticeable and in the inferred spA tree most MSSA isolates were clustered in a distinct group. Conversely, the distribution of strains according to their spA-type was not always congruent with the tree inferred from the complete spA gene foreseeing that spA is a mosaic gene composed of different segments exhibiting different evolutionary histories. Evidences of a network-like organization were identified through several conflicting phylogenetic signals and indeed several intragenic recombination events (within subdomains of the gene) were detected within and between CC’s of MRSA strains. The alignment of SpA sequences enabled the clustering of several isoforms as a result of non-randomly distributed amino acid variations, located in two clusters of polymorphic sites in domains D to B and Xr (a). Nevertheless, evidences of cluster specific structural arrangements were detected reflecting alterations on specific residues with potential impact on S. aureus pathogenicity. Conclusions The detection of positive selection operating on spA combined with frequent non-synonymous mutations, domain duplication and frequent intragenic recombination events represent important mechanisms acting in the evolutionary adaptive mechanism promoting spA genetic plasticity. These findings argue that crucial allelic forms correlated with pathogenicity can be identified by sequences analysis enabling the design of more robust schemes. Electronic supplementary material The online version of this article (doi:10.1186/s12866-016-0757-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Célio D Santos-Júnior
- Department of Molecular Biology and Evolutionary Genetics, Federal University of São Carlos (UFSCar), São Paulo, Brazil
| | - António Veríssimo
- CNC - Center for Neuroscience and Cell Biology, University of Coimbra - Rua Larga, Faculdade de Medicina, Pólo I, 1° andar, 3004-504, Coimbra, Portugal.,Department of Life Sciences, University of Coimbra - Calçada Martim de Freitas, 3000-456, Coimbra, Portugal
| | - Joana Costa
- CNC - Center for Neuroscience and Cell Biology, University of Coimbra - Rua Larga, Faculdade de Medicina, Pólo I, 1° andar, 3004-504, Coimbra, Portugal. .,Department of Life Sciences, University of Coimbra - Calçada Martim de Freitas, 3000-456, Coimbra, Portugal.
| |
Collapse
|
10
|
Progressive alignment of genomic signals by multiple dynamic time warping. J Theor Biol 2015; 385:20-30. [PMID: 26300069 DOI: 10.1016/j.jtbi.2015.08.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2014] [Revised: 07/21/2015] [Accepted: 08/03/2015] [Indexed: 11/22/2022]
Abstract
This paper presents the utilization of progressive alignment principle for positional adjustment of a set of genomic signals with different lengths. The new method of multiple alignment of signals based on dynamic time warping is tested for the purpose of evaluating the similarity of different length genes in phylogenetic studies. Two sets of phylogenetic markers were used to demonstrate the effectiveness of the evaluation of intraspecies and interspecies genetic variability. The part of the proposed method is modification of pairwise alignment of two signals by dynamic time warping with using correlation in a sliding window. The correlation based dynamic time warping allows more accurate alignment dependent on local homologies in sequences without the need of scoring matrix or evolutionary models, because mutual similarities of residues are included in the numerical code of signals.
Collapse
|
11
|
Balvočūtė M, Spillner A, Moulton V. FlatNJ: A Novel Network-Based Approach to Visualize Evolutionary and Biogeographical Relationships. Syst Biol 2014; 63:383-96. [DOI: 10.1093/sysbio/syu001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
12
|
Parks DH, Beiko RG. Measuring Community Similarity with Phylogenetic Networks. Mol Biol Evol 2012; 29:3947-58. [DOI: 10.1093/molbev/mss200] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
|
13
|
THUILLARD MARC, MOULTON VINCENT. IDENTIFYING AND RECONSTRUCTING LATERAL TRANSFERS FROM DISTANCE MATRICES BY COMBINING THE MINIMUM CONTRADICTION METHOD AND NEIGHBOR-NET. J Bioinform Comput Biol 2011; 9:453-70. [DOI: 10.1142/s0219720011005409] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2010] [Revised: 02/01/2011] [Accepted: 02/13/2011] [Indexed: 11/18/2022]
Abstract
Identifying lateral gene transfers is an important problem in evolutionary biology. Under a simple model of evolution, the expected values of an evolutionary distance matrix describing a phylogenetic tree fulfill the so-called Kalmanson inequalities. The Minimum Contradiction method for identifying lateral gene transfers exploits the fact that lateral transfers may generate large deviations from the Kalmanson inequalities. Here a new approach is presented to deal with such cases that combines the Neighbor-Net algorithm for computing phylogenetic networks with the Minimum Contradiction method. A subset of taxa, prescribed using Neighbor-Net, is obtained by measuring how closely the Kalmanson inequalities are fulfilled by each taxon. A criterion is then used to identify the taxa, possibly involved in a lateral transfer between nonconsecutive taxa. We illustrate the utility of the new approach by applying it to a distance matrix for Archaea, Bacteria, and Eukaryota.
Collapse
Affiliation(s)
| | - VINCENT MOULTON
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK
| |
Collapse
|
14
|
Scornavacca C, Zickmann F, Huson DH. Tanglegrams for rooted phylogenetic trees and networks. Bioinformatics 2011; 27:i248-56. [PMID: 21685078 PMCID: PMC3117342 DOI: 10.1093/bioinformatics/btr210] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Motivation: In systematic biology, one is often faced with the task of comparing different phylogenetic trees, in particular in multi-gene analysis or cospeciation studies. One approach is to use a tanglegram in which two rooted phylogenetic trees are drawn opposite each other, using auxiliary lines to connect matching taxa. There is an increasing interest in using rooted phylogenetic networks to represent evolutionary history, so as to explicitly represent reticulate events, such as horizontal gene transfer, hybridization or reassortment. Thus, the question arises how to define and compute a tanglegram for such networks. Results: In this article, we present the first formal definition of a tanglegram for rooted phylogenetic networks and present a heuristic approach for computing one, called the NN-tanglegram method. We compare the performance of our method with existing tree tanglegram algorithms and also show a typical application to real biological datasets. For maximum usability, the algorithm does not require that the trees or networks are bifurcating or bicombining, or that they are on identical taxon sets. Availability: The algorithm is implemented in our program Dendroscope 3, which is freely available from www.dendroscope.org. Contact:scornava@informatik.uni-tuebingen.de; huson@informatik.uni-tuebingen.de
Collapse
Affiliation(s)
- Celine Scornavacca
- Center for Bioinformatics (ZBIT), Tübingen University, Sand 14, 72076 Tübingen, Germany.
| | | | | |
Collapse
|
15
|
Beauregard-Racine J, Bicep C, Schliep K, Lopez P, Lapointe FJ, Bapteste E. Of woods and webs: possible alternatives to the tree of life for studying genomic fluidity in E. coli. Biol Direct 2011; 6:39; discussion 39. [PMID: 21774799 PMCID: PMC3160433 DOI: 10.1186/1745-6150-6-39] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2011] [Accepted: 07/20/2011] [Indexed: 12/26/2022] Open
Abstract
Background We introduce several forest-based and network-based methods for exploring microbial evolution, and apply them to the study of thousands of genes from 30 strains of E. coli. This case study illustrates how additional analyses could offer fast heuristic alternatives to standard tree of life (TOL) approaches. Results We use gene networks to identify genes with atypical modes of evolution, and genome networks to characterize the evolution of genetic partnerships between E. coli and mobile genetic elements. We develop a novel polychromatic quartet method to capture patterns of recombination within E. coli, to update the clanistic toolkit, and to search for the impact of lateral gene transfer and of pathogenicity on gene evolution in two large forests of trees bearing E. coli. We unravel high rates of lateral gene transfer involving E. coli (about 40% of the trees under study), and show that both core genes and shell genes of E. coli are affected by non-tree-like evolutionary processes. We show that pathogenic lifestyle impacted the structure of 30% of the gene trees, and that pathogenic strains are more likely to transfer genes with one another than with non-pathogenic strains. In addition, we propose five groups of genes as candidate mobile modules of pathogenicity. We also present strong evidence for recent lateral gene transfer between E. coli and mobile genetic elements. Conclusions Depending on which evolutionary questions biologists want to address (i.e. the identification of modules, genetic partnerships, recombination, lateral gene transfer, or genes with atypical evolutionary modes, etc.), forest-based and network-based methods are preferable to the reconstruction of a single tree, because they provide insights and produce hypotheses about the dynamics of genome evolution, rather than the relative branching order of species and lineages. Such a methodological pluralism - the use of woods and webs - is to be encouraged to analyse the evolutionary processes at play in microbial evolution. This manuscript was reviewed by: Ford Doolittle, Tal Pupko, Richard Burian, James McInerney, Didier Raoult, and Yan Boucher
Collapse
|
16
|
Improved phylogenetic analyses corroborate a plausible position of Martialis heureka in the ant tree of life. PLoS One 2011; 6:e21031. [PMID: 21731644 PMCID: PMC3123331 DOI: 10.1371/journal.pone.0021031] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2011] [Accepted: 05/17/2011] [Indexed: 12/18/2022] Open
Abstract
Martialinae are pale, eyeless and probably hypogaeic predatory ants. Morphological character sets suggest a close relationship to the ant subfamily Leptanillinae. Recent analyses based on molecular sequence data suggest that Martialinae are the sister group to all extant ants. However, by comparing molecular studies and different reconstruction methods, the position of Martialinae remains ambiguous. While this sister group relationship was well supported by Bayesian partitioned analyses, Maximum Likelihood approaches could not unequivocally resolve the position of Martialinae. By re-analysing a previous published molecular data set, we show that the Maximum Likelihood approach is highly appropriate to resolve deep ant relationships, especially between Leptanillinae, Martialinae and the remaining ant subfamilies. Based on improved alignments, alignment masking, and tree reconstructions with a sufficient number of bootstrap replicates, our results strongly reject a placement of Martialinae at the first split within the ant tree of life. Instead, we suggest that Leptanillinae are a sister group to all other extant ant subfamilies, whereas Martialinae branch off as a second lineage. This assumption is backed by approximately unbiased (AU) tests, additional Bayesian analyses and split networks. Our results demonstrate clear effects of improved alignment approaches, alignment masking and data partitioning. We hope that our study illustrates the importance of thorough, comprehensible phylogenetic analyses using the example of ant relationships.
Collapse
|
17
|
Complete mitochondrial genome of a Pleistocene jawbone unveils the origin of polar bear. Proc Natl Acad Sci U S A 2010; 107:5053-7. [PMID: 20194737 DOI: 10.1073/pnas.0914266107] [Citation(s) in RCA: 120] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The polar bear has become the flagship species in the climate-change discussion. However, little is known about how past climate impacted its evolution and persistence, given an extremely poor fossil record. Although it is undisputed from analyses of mitochondrial (mt) DNA that polar bears constitute a lineage within the genetic diversity of brown bears, timing estimates of their divergence have differed considerably. Using next-generation sequencing technology, we have generated a complete, high-quality mt genome from a stratigraphically validated 130,000- to 110,000-year-old polar bear jawbone. In addition, six mt genomes were generated of extant polar bears from Alaska and brown bears from the Admiralty and Baranof islands of the Alexander Archipelago of southeastern Alaska and Kodiak Island. We show that the phylogenetic position of the ancient polar bear lies almost directly at the branching point between polar bears and brown bears, elucidating a unique morphologically and molecularly documented fossil link between living mammal species. Molecular dating and stable isotope analyses also show that by very early in their evolutionary history, polar bears were already inhabitants of the Artic sea ice and had adapted very rapidly to their current and unique ecology at the top of the Arctic marine food chain. As such, polar bears provide an excellent example of evolutionary opportunism within a widespread mammalian lineage.
Collapse
|
18
|
Dress AWM, Flamm C, Fritzsch G, Grünewald S, Kruspe M, Prohaska SJ, Stadler PF. Noisy: identification of problematic columns in multiple sequence alignments. Algorithms Mol Biol 2008; 3:7. [PMID: 18577231 PMCID: PMC2464588 DOI: 10.1186/1748-7188-3-7] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2008] [Accepted: 06/24/2008] [Indexed: 11/10/2022] Open
Abstract
MOTIVATION Sequence-based methods for phylogenetic reconstruction from (nucleic acid) sequence data are notoriously plagued by two effects: homoplasies and alignment errors. Large evolutionary distances imply a large number of homoplastic sites. As most protein-coding genes show dramatic variations in substitution rates that are not uncorrelated across the sequence, this often leads to a patchwork pattern of (i) phylogenetically informative and (ii) effectively randomized regions. In highly variable regions, furthermore, alignment errors accumulate resulting in sometimes misleading signals in phylogenetic reconstruction. RESULTS We present here a method that, based on assessing the distribution of character states along a cyclic ordering of the taxa, allows the identification of phylogenetically uninformative homoplastic sites in a multiple sequence alignment. Removal of these sites appears to improve the performance of phylogenetic reconstruction algorithms as measured by various indices of "tree quality". In particular, we obtain more stable trees due to the exclusion of phylogenetically incompatible sites that most likely represent strongly randomized characters. SOFTWARE The computer program noisy implements this approach. It can be employed to improving phylogenetic reconstruction capability with quite a considerable success rate whenever (1) the average bootstrap support obtained from the original alignment is low, and (2) there are sufficiently many taxa in the data set - at least, say, 12 to 15 taxa. The software can be obtained under the GNU Public License from http://www.bioinf.uni-leipzig.de/Software/noisy/.
Collapse
Affiliation(s)
- Andreas WM Dress
- Department of Combinatorics and Geometry (DCG), MPG/CAS Partner Institute for Computational Biology (PICB), Shanghai Institutes for Biological Sciences (SIBS), Shanghai, PR China
- Max Planck Institute for Mathematics in the Sciences, Inselstrasse 22 -26, D 04103 Leipzig, Germany
| | - Christoph Flamm
- Institut für Theoretische Chemie und Molekulare Strukturbiologie Universität Wien, Währingerstraße 17, A-1090 Wien, Austria
| | - Guido Fritzsch
- Institute of Biology II: Zoologie, Molekulare Evolution und Systematik der Tiere, University of Leipzig, Talstrasse 33, D-04103 Leipzig, Germany
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany
| | - Stefan Grünewald
- Department of Combinatorics and Geometry (DCG), MPG/CAS Partner Institute for Computational Biology (PICB), Shanghai Institutes for Biological Sciences (SIBS), Shanghai, PR China
- Max Planck Institute for Mathematics in the Sciences, Inselstrasse 22 -26, D 04103 Leipzig, Germany
| | - Matthias Kruspe
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany
| | - Sonja J Prohaska
- Institut für Theoretische Chemie und Molekulare Strukturbiologie Universität Wien, Währingerstraße 17, A-1090 Wien, Austria
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe NM 87501, USA
- Biomedical Informatics, Arizona State University, PO-Box 878809, Tempe, AZ 85287, USA
| | - Peter F Stadler
- Institut für Theoretische Chemie und Molekulare Strukturbiologie Universität Wien, Währingerstraße 17, A-1090 Wien, Austria
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe NM 87501, USA
- Bioinformatics Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany
- RNomics Group, Fraunhofer Institut for Cell Therapy and Immunology (IZI), Perlickstraße 1, D-04103 Leipzig, Germany
| |
Collapse
|
19
|
Progressive multiple sequence alignments from triplets. BMC Bioinformatics 2007; 8:254. [PMID: 17631683 PMCID: PMC1948021 DOI: 10.1186/1471-2105-8-254] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2006] [Accepted: 07/15/2007] [Indexed: 11/27/2022] Open
Abstract
Background The quality of progressive sequence alignments strongly depends on the accuracy of the individual pairwise alignment steps since gaps that are introduced at one step cannot be removed at later aggregation steps. Adjacent insertions and deletions necessarily appear in arbitrary order in pairwise alignments and hence form an unavoidable source of errors. Research Here we present a modified variant of progressive sequence alignments that addresses both issues. Instead of pairwise alignments we use exact dynamic programming to align sequence or profile triples. This avoids a large fractions of the ambiguities arising in pairwise alignments. In the subsequent aggregation steps we follow the logic of the Neighbor-Net algorithm, which constructs a phylogenetic network by step-wisely replacing triples by pairs instead of combining pairs to singletons. To this end the three-way alignments are subdivided into two partial alignments, at which stage all-gap columns are naturally removed. This alleviates the "once a gap, always a gap" problem of progressive alignment procedures. Conclusion The three-way Neighbor-Net based alignment program aln3nn is shown to compare favorably on both protein sequences and nucleic acids sequences to other progressive alignment tools. In the latter case one easily can include scoring terms that consider secondary structure features. Overall, the quality of resulting alignments in general exceeds that of clustalw or other multiple alignments tools even though our software does not included heuristics for context dependent (mis)match scores.
Collapse
|