1
|
Allman ES, Baños H, Garrote-Lopez M, Rhodes JA. Identifiability of Level-1 Species Networks from Gene Tree Quartets. Bull Math Biol 2024; 86:110. [PMID: 39052074 PMCID: PMC11272829 DOI: 10.1007/s11538-024-01339-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Accepted: 07/01/2024] [Indexed: 07/27/2024]
Abstract
When hybridization or other forms of lateral gene transfer have occurred, evolutionary relationships of species are better represented by phylogenetic networks than by trees. While inference of such networks remains challenging, several recently proposed methods are based on quartet concordance factors-the probabilities that a tree relating a gene sampled from the species displays the possible 4-taxon relationships. Building on earlier results, we investigate what level-1 network features are identifiable from concordance factors under the network multispecies coalescent model. We obtain results on both topological features of the network, and numerical parameters, uncovering a number of failures of identifiability related to 3-cycles in the network. Addressing these identifiability issues is essential for designing statistically consistent inference methods.
Collapse
Affiliation(s)
- Elizabeth S Allman
- Department of Mathematics and Statistics, University of Alaska, Fairbanks, AK, USA.
| | - Hector Baños
- Department of Mathematics, California State University San Bernadino, San Bernadino, CA, USA
| | | | - John A Rhodes
- Department of Mathematics and Statistics, University of Alaska, Fairbanks, AK, USA
| |
Collapse
|
2
|
Myers EA, Rautsaw RM, Borja M, Jones J, Grünwald CI, Holding ML, Grazziotin F, Parkinson CL. Phylogenomic discordance is driven by wide-spread introgression and incomplete lineage sorting during rapid species diversification within rattlesnakes (Viperidae: Crotalus and Sistrurus). Syst Biol 2024:syae018. [PMID: 38695290 DOI: 10.1093/sysbio/syae018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Indexed: 08/11/2024] Open
Abstract
Phylogenomics allows us to uncover the historical signal of evolutionary processes through time and estimate phylogenetic networks accounting for these signals. Insight from genome-wide data further allows us to pinpoint the contributions to phylogenetic signal from hybridization, introgression, and ancestral polymorphism across the genome. Here we focus on how these processes have contributed to phylogenetic discordance among rattlesnakes (genera Crotalus and Sistrurus), a group for which there are numerous conflicting phylogenetic hypotheses based on a diverse array of molecular datasets and analytical methods. We address the instability of the rattlesnake phylogeny using genomic data generated from transcriptomes sampled from nearly all known species. These genomic data, analyzed with coalescent and network-based approaches, reveal numerous instances of rapid speciation where individual gene trees conflict with the species tree. Moreover, the evolutionary history of rattlesnakes is dominated by incomplete speciation and frequent hybridization, both of which have likely influenced past interpretations of phylogeny. We present a new framework in which the evolutionary relationships of this group can only be understood in light of genome-wide data and network-based analytical methods. Our data suggest that network radiations, like seen within the rattlesnakes, can only be understood in a phylogenomic context, necessitating similar approaches in our attempts to understand evolutionary history in other rapidly radiating species.
Collapse
Affiliation(s)
- Edward A Myers
- Department of Biological Sciences, Clemson University, Clemson, SC 29634, USA
- Department of Herpetology, California Academy of Sciences, San Francisco, CA 94118, USA
| | - Rhett M Rautsaw
- Department of Biological Sciences, Clemson University, Clemson, SC 29634, USA
| | - Miguel Borja
- Facultad de Ciencias Biológicas, Universdad Juárez del Estado de Durango, Av. Universidad s/n. Fracc. Filadelfia, Gómez Palacio, Durango., Mex
| | - Jason Jones
- Herp.mx A.C. C.P. 28989, Villa de Álvarez, Colima, Mexico
| | - Christoph I Grünwald
- Herp.mx A.C. C.P. 28989, Villa de Álvarez, Colima, Mexico
- Biodiversa A.C., Avenida de la Ribera #203, C.P. 45900, Chapala, Jalisco, Mexico
| | - Matthew L Holding
- Department of Biological Sciences, Clemson University, Clemson, SC 29634, USA
- Life Sciences Institute, University of Michigan, Ann Arbor, MI 48109, USA
| | - Felipe Grazziotin
- Laboratório Especial de Coleções Zoológicas, Instituto Butantan, Avenida Vital Brasil, São Paulo, 05503-900, Brazil
| | | |
Collapse
|
3
|
Huber KT, Moulton V, Owen M, Spillner A, St. John K. The Space of Equidistant Phylogenetic Cactuses. ANNALS OF COMBINATORICS 2023; 28:1-32. [PMID: 38433929 PMCID: PMC10904525 DOI: 10.1007/s00026-023-00656-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 05/24/2023] [Indexed: 03/05/2024]
Abstract
An equidistant X-cactus is a type of rooted, arc-weighted, directed acyclic graph with leaf set X, that is used in biology to represent the evolutionary history of a set X of species. In this paper, we introduce and investigate the space of equidistant X-cactuses. This space contains, as a subset, the space of ultrametric trees on X that was introduced by Gavryushkin and Drummond. We show that equidistant-cactus space is a CAT(0)-metric space which implies, for example, that there are unique geodesic paths between points. As a key step to proving this, we present a combinatorial result concerning ranked rooted X-cactuses. In particular, we show that such graphs can be encoded in terms of a pairwise compatibility condition arising from a poset of collections of pairs of subsets of X that satisfy certain set-theoretic properties. As a corollary, we also obtain an encoding of ranked, rooted X-trees in terms of partitions of X, which provides an alternative proof that the space of ultrametric trees on X is CAT(0). We expect that our results will provide the basis for novel ways to perform statistical analyses on collections of equidistant X-cactuses, as well as new directions for defining and understanding spaces of more general, arc-weighted phylogenetic networks.
Collapse
Affiliation(s)
- Katharina T. Huber
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ UK
| | - Vincent Moulton
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ UK
| | - Megan Owen
- Department of Mathematics, Lehman College, CUNY, New York, NY 10468 USA
| | - Andreas Spillner
- Merseburg University of Applied Sciences, 06217 Merseburg, Germany
| | - Katherine St. John
- Department of Computer Science, Hunter College, CUNY, New York, NY 10065 USA
| |
Collapse
|
4
|
Identifiability of species network topologies from genomic sequences using the logDet distance. J Math Biol 2022; 84:35. [PMID: 35385988 DOI: 10.1007/s00285-022-01734-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 01/12/2022] [Accepted: 03/02/2022] [Indexed: 10/18/2022]
Abstract
Inference of network-like evolutionary relationships between species from genomic data must address the interwoven signals from both gene flow and incomplete lineage sorting. The heavy computational demands of standard approaches to this problem severely limit the size of datasets that may be analyzed, in both the number of species and the number of genetic loci. Here we provide a theoretical pointer to more efficient methods, by showing that logDet distances computed from genomic-scale sequences retain sufficient information to recover network relationships in the level-1 ultrametric case. This result is obtained under the Network Multispecies Coalescent model combined with a mixture of General Time-Reversible sequence evolution models across individual gene trees. It applies to both unlinked site data, such as for SNPs, and to sequence data in which many contiguous sites may have evolved on a common tree, such as concatenated gene sequences. Thus under standard stochastic models statistically justifiable inference of network relationships from sequences can be accomplished without consideration of individual genes or gene trees.
Collapse
|
5
|
Gross E, van Iersel L, Janssen R, Jones M, Long C, Murakami Y. Distinguishing level-1 phylogenetic networks on the basis of data generated by Markov processes. J Math Biol 2021; 83:32. [PMID: 34482446 PMCID: PMC8418599 DOI: 10.1007/s00285-021-01653-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 07/07/2021] [Accepted: 08/16/2021] [Indexed: 11/24/2022]
Abstract
Phylogenetic networks can represent evolutionary events that cannot be described by phylogenetic trees. These networks are able to incorporate reticulate evolutionary events such as hybridization, introgression, and lateral gene transfer. Recently, network-based Markov models of DNA sequence evolution have been introduced along with model-based methods for reconstructing phylogenetic networks. For these methods to be consistent, the network parameter needs to be identifiable from data generated under the model. Here, we show that the semi-directed network parameter of a triangle-free, level-1 network model with any fixed number of reticulation vertices is generically identifiable under the Jukes–Cantor, Kimura 2-parameter, or Kimura 3-parameter constraints.
Collapse
Affiliation(s)
- Elizabeth Gross
- Department of Mathematics, University of Hawai'i at Mānoa, 2565 McCarthy Mall, Honolulu, HI, 96822, USA
| | - Leo van Iersel
- Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands
| | - Remie Janssen
- Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands
| | - Mark Jones
- Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands
| | - Colby Long
- The College of Wooster, 1189 Beall Avenue, Wooster, OH, 44691, USA
| | - Yukihiro Murakami
- Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands.
| |
Collapse
|
6
|
Allman ES, Baños H, Rhodes JA. NANUQ: a method for inferring species networks from gene trees under the coalescent model. Algorithms Mol Biol 2019; 14:24. [PMID: 31827592 PMCID: PMC6896299 DOI: 10.1186/s13015-019-0159-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 11/07/2019] [Indexed: 01/07/2023] Open
Abstract
Species networks generalize the notion of species trees to allow for hybridization or other lateral gene transfer. Under the network multispecies coalescent model, individual gene trees arising from a network can have any topology, but arise with frequencies dependent on the network structure and numerical parameters. We propose a new algorithm for statistical inference of a level-1 species network under this model, from data consisting of gene tree topologies, and provide the theoretical justification for it. The algorithm is based on an analysis of quartets displayed on gene trees, combining several statistical hypothesis tests with combinatorial ideas such as a quartet-based intertaxon distance appropriate to networks, the NeighborNet algorithm for circular split systems, and the Circular Network algorithm for constructing a splits graph.
Collapse
|
7
|
Baños H. Identifying Species Network Features from Gene Tree Quartets Under the Coalescent Model. Bull Math Biol 2019; 81:494-534. [PMID: 30094772 PMCID: PMC6344282 DOI: 10.1007/s11538-018-0485-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2017] [Accepted: 07/30/2018] [Indexed: 10/28/2022]
Abstract
We show that many topological features of level-1 species networks are identifiable from the distribution of the gene tree quartets under the network multi-species coalescent model. In particular, every cycle of size at least 4 and every hybrid node in a cycle of size at least 5 are identifiable. This is a step toward justifying the inference of such networks which was recently implemented by Solís-Lemus and Ané. We show additionally how to compute quartet concordance factors for a network in terms of simpler networks, and explore some circumstances in which cycles of size 3 and hybrid nodes in 4-cycles can be detected.
Collapse
Affiliation(s)
- Hector Baños
- University of Alaska Fairbanks, P.O. Box 756660, Fairbanks, AK, 99775-6660, USA.
| |
Collapse
|
8
|
Abstract
The need for structures capable of accommodating complex evolutionary signals such as those found in, for example, wheat has fueled research into phylogenetic networks. Such structures generalize the standard model of a phylogenetic tree by also allowing for cycles and have been introduced in rooted and unrooted form. In contrast to phylogenetic trees or their unrooted versions, rooted phylogenetic networks are notoriously difficult to understand. To help alleviate this, recent work on them has also centered on their "uprooted" versions. By focusing on such graphs and the combinatorial concept of a split system which underpins an unrooted phylogenetic network, we show that not only can a so-called (uprooted) 1-nested network N be obtained from the Buneman graph (sometimes also called a median network) associated with the split system [Formula: see text] induced on the set of leaves of N but also that that graph is, in a well-defined sense, optimal. Along the way, we establish the 1-nested analogue of the fundamental "splits equivalence theorem" for phylogenetic trees and characterize maximal circular split systems.
Collapse
Affiliation(s)
- P. Gambette
- LIGM (UMR 8049), UPEM, CNRS, ESIEE, ENPC, Université Paris-Est, 77454 Marne-la-Vallée, France
| | - K. T. Huber
- School of Computing Sciences, University of East Anglia, Norwich, UK
| | - G. E. Scholz
- School of Computing Sciences, University of East Anglia, Norwich, UK
| |
Collapse
|
9
|
On the challenge of reconstructing level-1 phylogenetic networks from triplets and clusters. J Math Biol 2016; 74:1729-1751. [PMID: 27800561 PMCID: PMC5420025 DOI: 10.1007/s00285-016-1068-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2014] [Revised: 04/05/2016] [Indexed: 11/16/2022]
Abstract
Phylogenetic networks have gained prominence over the years due to their ability to represent complex non-treelike evolutionary events such as recombination or hybridization. Popular combinatorial objects used to construct them are triplet systems and cluster systems, the motivation being that any network N induces a triplet system \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal R(N)$$\end{document}R(N) and a softwired cluster system \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal S(N)$$\end{document}S(N). Since in real-world studies it cannot be guaranteed that all triplets/softwired clusters induced by a network are available, it is of particular interest to understand whether subsets of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal R(N)$$\end{document}R(N) or \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal S(N)$$\end{document}S(N) allow one to uniquely reconstruct the underlying network N. Here we show that even within the highly restricted yet biologically interesting space of level-1 phylogenetic networks it is not always possible to uniquely reconstruct a level-1 network N, even when all triplets in \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal R(N)$$\end{document}R(N) or all clusters in \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal S(N)$$\end{document}S(N) are available. On the positive side, we introduce a reasonably large subclass of level-1 networks the members of which are uniquely determined by their induced triplet/softwired cluster systems. Along the way, we also establish various enumerative results, both positive and negative, including results which show that certain special subclasses of level-1 networks N can be uniquely reconstructed from proper subsets of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal R(N)$$\end{document}R(N) and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal S(N)$$\end{document}S(N). We anticipate these results to be of use in the design of algorithms for phylogenetic network inference.
Collapse
|
10
|
Huber KT, Linz S, Moulton V, Wu T. Spaces of phylogenetic networks from generalized nearest-neighbor interchange operations. J Math Biol 2015; 72:699-725. [PMID: 26037483 DOI: 10.1007/s00285-015-0899-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Revised: 05/04/2015] [Indexed: 11/29/2022]
Abstract
Phylogenetic networks are a generalization of evolutionary or phylogenetic trees that are used to represent the evolution of species which have undergone reticulate evolution. In this paper we consider spaces of such networks defined by some novel local operations that we introduce for converting one phylogenetic network into another. These operations are modeled on the well-studied nearest-neighbor interchange operations on phylogenetic trees, and lead to natural generalizations of the tree spaces that have been previously associated to such operations. We present several results on spaces of some relatively simple networks, called level-1 networks, including the size of the neighborhood of a fixed network, and bounds on the diameter of the metric defined by taking the smallest number of operations required to convert one network into another. We expect that our results will be useful in the development of methods for systematically searching for optimal phylogenetic networks using, for example, likelihood and Bayesian approaches.
Collapse
Affiliation(s)
- Katharina T Huber
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
| | - Simone Linz
- Department of Computer Science, University of Auckland, Auckland, New Zealand.
| | - Vincent Moulton
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
| | - Taoyang Wu
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
| |
Collapse
|
11
|
Jansson J, Lingas A. Computing the rooted triplet distance between galled trees by counting triangles. ACTA ACUST UNITED AC 2014. [DOI: 10.1016/j.jda.2013.10.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
12
|
On encodings of phylogenetic networks of bounded level. J Math Biol 2011; 65:157-80. [DOI: 10.1007/s00285-011-0456-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2009] [Revised: 06/22/2011] [Indexed: 10/18/2022]
|
13
|
Huber KT, van Iersel L, Kelk S, Suchecki R. A practical algorithm for reconstructing level-1 phylogenetic networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:635-649. [PMID: 21393651 DOI: 10.1109/tcbb.2010.17] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Recently, much attention has been devoted to the construction of phylogenetic networks which generalize phylogenetic trees in order to accommodate complex evolutionary processes. Here, we present an efficient, practical algorithm for reconstructing level-1 phylogenetic networks--a type of network slightly more general than a phylogenetic tree--from triplets. Our algorithm has been made publicly available as the program LEV1ATHAN. It combines ideas from several known theoretical algorithms for phylogenetic tree and network reconstruction with two novel subroutines. Namely, an exponential-time exact and a greedy algorithm both of which are of independent theoretical interest. Most importantly, LEV1ATHAN runs in polynomial time and always constructs a level-1 network. If the data are consistent with a phylogenetic tree, then the algorithm constructs such a tree. Moreover, if the input triplet set is dense and, in addition, is fully consistent with some level-1 network, it will find such a network. The potential of LEV1ATHAN is explored by means of an extensive simulation study and a biological data set. One of our conclusions is that LEV1ATHAN is able to construct networks consistent with a high percentage of input triplets, even when these input triplets are affected by a low to moderate level of noise.
Collapse
Affiliation(s)
- Katharina T Huber
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
| | | | | | | |
Collapse
|
14
|
Cardona G, Llabrés M, Rosselló F, Valiente G. Comparison of galled trees. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:410-427. [PMID: 20660951 DOI: 10.1109/tcbb.2010.60] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Galled trees, directed acyclic graphs that model evolutionary histories with isolated hybridization events, have become very popular due to both their biological significance and the existence of polynomial-time algorithms for their reconstruction. In this paper, we establish to which extent several distance measures for the comparison of evolutionary networks are metrics for galled trees, and hence, when they can be safely used to evaluate galled tree reconstruction methods.
Collapse
Affiliation(s)
- Gabriel Cardona
- Department of Mathematics and Computer Science, University of the Balearic Islands, E-07122 Palma de Mallorca, Spain.
| | | | | | | |
Collapse
|
15
|
Arenas M, Patricio M, Posada D, Valiente G. Characterization of phylogenetic networks with NetTest. BMC Bioinformatics 2010; 11:268. [PMID: 20487540 PMCID: PMC2880032 DOI: 10.1186/1471-2105-11-268] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2010] [Accepted: 05/20/2010] [Indexed: 11/13/2022] Open
Abstract
Background Typical evolutionary events like recombination, hybridization or gene transfer make necessary the use of phylogenetic networks to properly depict the evolution of DNA and protein sequences. Although several theoretical classes have been proposed to characterize these networks, they make stringent assumptions that will likely not be met by the evolutionary process. We have recently shown that the complexity of simulated networks is a function of the population recombination rate, and that at moderate and large recombination rates the resulting networks cannot be categorized. However, we do not know whether these results extend to networks estimated from real data. Results We introduce a web server for the categorization of explicit phylogenetic networks, including the most relevant theoretical classes developed so far. Using this tool, we analyzed statistical parsimony phylogenetic networks estimated from ~5,000 DNA alignments, obtained from the NCBI PopSet and Polymorphix databases. The level of characterization was correlated to nucleotide diversity, and a high proportion of the networks derived from these data sets could be formally characterized. Conclusions We have developed a public web server, NetTest (freely available from the software section at http://darwin.uvigo.es), to formally characterize the complexity of phylogenetic networks. Using NetTest we found that most statistical parsimony networks estimated with the program TCS could be assigned to a known network class. The level of network characterization was correlated to nucleotide diversity and dependent upon the intra/interspecific levels, although no significant differences were detected among genes. More research on the properties of phylogenetic networks is clearly needed.
Collapse
Affiliation(s)
- Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, E-36310 Vigo, Spain.
| | | | | | | |
Collapse
|