1
|
Wang Z, Sun J, Gao Y, Xue Y, Zhang Y, Li K, Zhang W, Zhang C, Zu J, Zhang L. Fusang: a framework for phylogenetic tree inference via deep learning. Nucleic Acids Res 2023; 51:10909-10923. [PMID: 37819036 PMCID: PMC10639059 DOI: 10.1093/nar/gkad805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 08/17/2023] [Accepted: 09/20/2023] [Indexed: 10/13/2023] Open
Abstract
Phylogenetic tree inference is a classic fundamental task in evolutionary biology that entails inferring the evolutionary relationship of targets based on multiple sequence alignment (MSA). Maximum likelihood (ML) and Bayesian inference (BI) methods have dominated phylogenetic tree inference for many years, but BI is too slow to handle a large number of sequences. Recently, deep learning (DL) has been successfully applied to quartet phylogenetic tree inference and tentatively extended into more sequences with the quartet puzzling algorithm. However, no DL-based tools are immediately available for practical real-world applications. In this paper, we propose Fusang (http://fusang.cibr.ac.cn), a DL-based framework that achieves comparable performance to that of ML-based tools with both simulated and real datasets. More importantly, with continuous optimization, e.g. through the use of customized training datasets for real-world scenarios, Fusang has great potential to outperform ML-based tools.
Collapse
Affiliation(s)
- Zhicheng Wang
- Chinese Institute for Brain Research, Beijing 102206, China
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Jinnan Sun
- School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an 710049, China
| | - Yuan Gao
- Chinese Institute for Brain Research, Beijing 102206, China
| | - Yongwei Xue
- Chinese Institute for Brain Research, Beijing 102206, China
| | - Yubo Zhang
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Kuan Li
- Chinese Institute for Brain Research, Beijing 102206, China
| | - Wei Zhang
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, China
| | - Chi Zhang
- Key Laboratory of Vertebrate Evolution and Human Origins, Institute of Vertebrate Paleontology and Paleoanthropology, Center for Excellence in Life and Paleoenvironment, Chinese Academy of Sciences, Beijing 100044, China
| | - Jian Zu
- School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an 710049, China
| | - Li Zhang
- Chinese Institute for Brain Research, Beijing 102206, China
| |
Collapse
|
2
|
Dornburg A, Su Z, Townsend JP. Optimal Rates for Phylogenetic Inference and Experimental Design in the Era of Genome-Scale Data Sets. Syst Biol 2018; 68:145-156. [PMID: 29939341 DOI: 10.1093/sysbio/syy047] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Accepted: 06/13/2018] [Indexed: 02/02/2023] Open
Abstract
With the rise of genome-scale data sets, there has been a call for increased data scrutiny and careful selection of loci that are appropriate to use in an attempt to resolve a phylogenetic problem. Such loci should maximize phylogenetic information content while minimizing the risk of homoplasy. Theory posits the existence of characters that evolve at an optimum rate, and efforts to determine optimal rates of inference have been a cornerstone of phylogenetic experimental design for over two decades. However, both theoretical and empirical investigations of optimal rates have varied dramatically in their conclusions: spanning no relationship to a tight relationship between the rate of change and phylogenetic utility. Herein, we synthesize these apparently contradictory views, demonstrating both empirical and theoretical conditions under which each is correct. We find that optimal rates of characters-not genes-are generally robust to most experimental design decisions. Moreover, consideration of site rate heterogeneity within a given locus is critical to accurate predictions of utility. Factors such as taxon sampling or the targeted number of characters providing support for a topology are additionally critical to the predictions of phylogenetic utility based on the rate of character change. Further, optimality of rates and predictions of phylogenetic utility are not equivalent, demonstrating the need for further development of comprehensive theory of phylogenetic experimental design. [Divergence time; GC bias; homoplasy; incongruence; information content; internode length; optimal rates; phylogenetic informativeness; phylogenetic theory; phylogenetic utility; phylogenomics; signal and noise; subtending branch length; state space; taxon and character sampling.].
Collapse
Affiliation(s)
- Alex Dornburg
- North Carolina Museum of Natural Sciences, Raleigh, 1671 Goldstar Drive, NC 27601, USA
| | - Zhuo Su
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, 165 Prospect Street, CT 06525, USA
| | - Jeffrey P Townsend
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, 165 Prospect Street, CT 06525, USA
- Department of Biostatistics, Yale University, New Haven, 60 College Street, CT 06510, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, 300 George Street, CT 06511, USA
| |
Collapse
|
3
|
Winkler IS, Blaschke JD, Davis DJ, Stireman JO, O'Hara JE, Cerretti P, Moulton JK. Explosive radiation or uninformative genes? Origin and early diversification of tachinid flies (Diptera: Tachinidae). Mol Phylogenet Evol 2015; 88:38-54. [PMID: 25841383 DOI: 10.1016/j.ympev.2015.03.021] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2014] [Revised: 03/20/2015] [Accepted: 03/25/2015] [Indexed: 12/01/2022]
Abstract
Molecular phylogenetic studies at all taxonomic levels often infer rapid radiation events based on short, poorly resolved internodes. While such rapid episodes of diversification are an important and widespread evolutionary phenomenon, much of this poor phylogenetic resolution may be attributed to the continuing widespread use of "traditional" markers (mitochondrial, ribosomal, and some nuclear protein-coding genes) that are often poorly suited to resolve difficult, higher-level phylogenetic problems. Here we reconstruct phylogenetic relationships among a representative set of taxa of the parasitoid fly family Tachinidae and related outgroups of the superfamily Oestroidea. The Tachinidae are one of the most species rich, yet evolutionarily recent families of Diptera, providing an ideal case study for examining the differential performance of loci in resolving phylogenetic relationships and the benefits of adding more loci to phylogenetic analyses. We assess the phylogenetic utility of nine genes including both traditional genes (e.g., CO1 mtDNA, 28S rDNA) and nuclear protein-coding genes newly developed for phylogenetic analysis. Our phylogenetic findings, based on a limited set of taxa, include: a close relationship between Tachinidae and the calliphorid subfamily Polleninae, monophyly of Tachinidae and the subfamilies Exoristinae and Dexiinae, subfamily groupings of Dexiinae+Phasiinae and Tachininae+Exoristinae, and robust phylogenetic placement of the somewhat enigmatic genera Strongygaster, Euthera, and Ceracia. In contrast to poor resolution and phylogenetic incongruence of "traditional genes," we find that a more selective set of highly informative genes is able to more precisely identify regions of the phylogeny that experienced rapid radiation of lineages, while more accurately depicting their phylogenetic context. Although much expanded taxon sampling is necessary to effectively assess the monophyly of and relationships among major tachinid lineages and their relatives, we show that a small number of well-chosen nuclear protein-coding genes can successfully resolve even difficult phylogenetic problems.
Collapse
Affiliation(s)
- Isaac S Winkler
- Department of Biological Sciences, Wright State University, Dayton, OH 45435, USA; Department of Biology, Linfield College, McMinnville, OR 97128, USA
| | - Jeremy D Blaschke
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN 37996, USA
| | - Daniel J Davis
- Department of Biological Sciences, Wright State University, Dayton, OH 45435, USA
| | - John O Stireman
- Department of Biological Sciences, Wright State University, Dayton, OH 45435, USA.
| | - James E O'Hara
- Canadian National Collection of Insects, Agriculture and Agri-Food Canada, 960 Carling Avenue, Ottawa, Ontario K1A 0C6, Canada
| | - Pierfilippo Cerretti
- DAFNAE-Entomology, Università degli Studi di Padova, Viale dell'Università 16, 35020 Legnaro (Padova), Italy; Dipartimento di Biologia e Biotecnologie 'Charles Darwin', 'Sapienza' Università di Roma, Piazzale A. Moro 5, 00185 Rome, Italy
| | - John K Moulton
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN 37996, USA
| |
Collapse
|
4
|
Criswell KE. The comparative osteology and phylogenetic relationships of African and South American lungfishes (Sarcopterygii: Dipnoi). Zool J Linn Soc 2015. [DOI: 10.1111/zoj.12255] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
5
|
The phylogenetic utility of acetyltransferase (ARD1) and glutaminyl tRNA synthetase (QtRNA) for reconstructing Cenozoic relationships as exemplified by the large Australian cicada Pauropsalta generic complex. Mol Phylogenet Evol 2015; 83:258-77. [DOI: 10.1016/j.ympev.2014.07.008] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2014] [Revised: 06/25/2014] [Accepted: 07/14/2014] [Indexed: 11/19/2022]
|
6
|
Strecker AL, Olden JD. Fish species introductions provide novel insights into the patterns and drivers of phylogenetic structure in freshwaters. Proc Biol Sci 2014; 281:20133003. [PMID: 24452027 PMCID: PMC3906946 DOI: 10.1098/rspb.2013.3003] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2013] [Accepted: 12/17/2013] [Indexed: 11/12/2022] Open
Abstract
Despite long-standing interest of terrestrial ecologists, freshwater ecosystems are a fertile, yet unappreciated, testing ground for applying community phylogenetics to uncover mechanisms of species assembly. We quantify phylogenetic clustering and overdispersion of native and non-native fishes of a large river basin in the American Southwest to test for the mechanisms (environmental filtering versus competitive exclusion) and spatial scales influencing community structure. Contrary to expectations, non-native species were phylogenetically clustered and related to natural environmental conditions, whereas native species were not phylogenetically structured, likely reflecting human-related changes to the basin. The species that are most invasive (in terms of ecological impacts) tended to be the most phylogenetically divergent from natives across watersheds, but not within watersheds, supporting the hypothesis that Darwin's naturalization conundrum is driven by the spatial scale. Phylogenetic distinctiveness may facilitate non-native establishment at regional scales, but environmental filtering restricts local membership to closely related species with physiological tolerances for current environments. By contrast, native species may have been phylogenetically clustered in historical times, but species loss from contemporary populations by anthropogenic activities has likely shaped the phylogenetic signal. Our study implies that fundamental mechanisms of community assembly have changed, with fundamental consequences for the biogeography of both native and non-native species.
Collapse
Affiliation(s)
- Angela L. Strecker
- Department of Environmental Science and Management, Portland State University, Portland, OR 97207, USA
| | - Julian D. Olden
- School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98105, USA
| |
Collapse
|
7
|
Response to: The relative utility of sequence divergence and phylogenetic informativeness profiling in phylogenetic study design. Mol Phylogenet Evol 2013; 66:436. [DOI: 10.1016/j.ympev.2012.09.035] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2012] [Accepted: 09/28/2012] [Indexed: 11/23/2022]
|
8
|
Makowsky R, Cox CL, Roelke CE, Chippindale PT. The relative utility of sequence divergence and phylogenetic informativeness profiling in phylogenetic study design. Mol Phylogenet Evol 2013; 66:437. [DOI: 10.1016/j.ympev.2012.10.016] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2012] [Revised: 10/09/2012] [Accepted: 10/15/2012] [Indexed: 10/27/2022]
|
9
|
Huelsken T, Tapken D, Dahlmann T, Wägele H, Riginos C, Hollmann M. Systematics and phylogenetic species delimitation within Polinices s.l. (Caenogastropoda: Naticidae) based on molecular data and shell morphology. ORG DIVERS EVOL 2012. [DOI: 10.1007/s13127-012-0111-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
10
|
Application of the phylogenetic informativeness method to chloroplast markers: a test case of closely related species in tribe Hydrangeeae (Hydrangeaceae). Mol Phylogenet Evol 2012; 66:233-42. [PMID: 23063487 DOI: 10.1016/j.ympev.2012.09.029] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2012] [Revised: 09/19/2012] [Accepted: 09/24/2012] [Indexed: 11/21/2022]
Abstract
In evolutionary biology appropriate marker selection for the reconstruction of solid phylogenetic hypotheses is fundamental. One of the most challenging tasks addresses the appropriate choice of genomic regions in studies of closely related species. Robust phylogenetic frameworks are central to studies dealing with questions ranging from evolutionary and conservation biology, biogeography to plant breeding. Phylogenetic informativeness profiles provide a quantitative measure of the phylogenetic signal in markers and therefore a method for locus prioritization. The present work profiles phylogenetic informativeness of mostly non-coding chloroplast regions in an angiosperm lineage of closely related species: the popular ornamental tribe Hydrangeeae (Hydrangeaceae, Cornales, Asterids). A recent phylogenetic study denoted a case of resolution contrast between the two strongly supported clades within tribe Hydrangeeae. We evaluate the phylogenetic signal of 13 highly variable plastid markers for estimating relationships within and among the currently recognized monophyletic groups of this tribe. A selection of combined loci based on their phylogenetic informativeness retrieved more robust phylogenetic hypotheses than simply combining individual markers performing best with respect to resolution, nodal support and accuracy or those presenting the highest number of parsimony informative characters. We propose the rpl32-ndhF intergenic spacer (IGS), trnV-ndhC IGS, trnL-rpl32 IGS, psbT-petB region and ndhA intron as the best candidates for future phylogenetic studies in Hydrangeeae and potentially in other Asterids. We also contrasted the phylogenetic informativeness of coded indels against substitutions concluding that, despite their low phylogenetic informativeness, coded indels provide additional phylogenetic signal that is nearly free of noise. Phylogenetic relationships obtained from our total combined analyses showed improved resolution and nodal support with respect to recently published results.
Collapse
|
11
|
Lambret-Frotté J, Perini FA, de Moraes Russo CA. Efficiency of nuclear and mitochondrial markers recovering and supporting known amniote groups. Evol Bioinform Online 2012; 8:463-73. [PMID: 23032608 PMCID: PMC3422098 DOI: 10.4137/ebo.s9656] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
We have analysed the efficiency of all mitochondrial protein coding genes and six nuclear markers (Adora3, Adrb2, Bdnf, Irbp, Rag2 and Vwf) in reconstructing and statistically supporting known amniote groups (murines, rodents, primates, eutherians, metatherians, therians). The efficiencies of maximum likelihood, Bayesian inference, maximum parsimony, neighbor-joining and UPGMA were also evaluated, by assessing the number of correct and incorrect recovered groupings. In addition, we have compared support values using the conservative bootstrap test and the Bayesian posterior probabilities. First, no correlation was observed between gene size and marker efficiency in recovering or supporting correct nodes. As expected, tree-building methods performed similarly, even UPGMA that, in some cases, outperformed other most extensively used methods. Bayesian posterior probabilities tend to show much higher support values than the conservative bootstrap test, for correct and incorrect nodes. Our results also suggest that nuclear markers do not necessarily show a better performance than mitochondrial genes. The so-called dependency among mitochondrial markers was not observed comparing genome performances. Finally, the amniote groups with lowest recovery rates were therians and rodents, despite the morphological support for their monophyletic status. We suggest that, regardless of the tree-building method, a few carefully selected genes are able to unfold a detailed and robust scenario of phylogenetic hypotheses, particularly if taxon sampling is increased.
Collapse
Affiliation(s)
- Julia Lambret-Frotté
- Departamento de Genética, Instituto de Biologia, Universidade Federal do Rio de Janeiro
| | | | | |
Collapse
|
12
|
Machida RJ, Kweskin M, Knowlton N. PCR primers for metazoan mitochondrial 12S ribosomal DNA sequences. PLoS One 2012; 7:e35887. [PMID: 22536450 PMCID: PMC3334914 DOI: 10.1371/journal.pone.0035887] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2011] [Accepted: 03/27/2012] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Assessment of the biodiversity of communities of small organisms is most readily done using PCR-based analysis of environmental samples consisting of mixtures of individuals. Known as metagenetics, this approach has transformed understanding of microbial communities and is beginning to be applied to metazoans as well. Unlike microbial studies, where analysis of the 16S ribosomal DNA sequence is standard, the best gene for metazoan metagenetics is less clear. In this study we designed a set of PCR primers for the mitochondrial 12S ribosomal DNA sequence based on 64 complete mitochondrial genomes and then tested their efficacy. METHODOLOGY/PRINCIPAL FINDINGS A total of the 64 complete mitochondrial genome sequences representing all metazoan classes available in GenBank were downloaded using the NCBI Taxonomy Browser. Alignment of sequences was performed for the excised mitochondrial 12S ribosomal DNA sequences, and conserved regions were identified for all 64 mitochondrial genomes. These regions were used to design a primer pair that flanks a more variable region in the gene. Then all of the complete metazoan mitochondrial genomes available in NCBI's Organelle Genome Resources database were used to determine the percentage of taxa that would likely be amplified using these primers. Results suggest that these primers will amplify target sequences for many metazoans. CONCLUSIONS/SIGNIFICANCE Newly designed 12S ribosomal DNA primers have considerable potential for metazoan metagenetic analysis because of their ability to amplify sequences from many metazoans.
Collapse
Affiliation(s)
- Ryuji J Machida
- National Museum of Natural History, Smithsonian Institution, Washington, DC, United States of America.
| | | | | |
Collapse
|
13
|
BELL KARENLEANNE, PHILIPS TKEITH. Molecular systematics and evolution of the Ptinidae (Coleoptera: Bostrichoidea) and related families. Zool J Linn Soc 2012. [DOI: 10.1111/j.1096-3642.2011.00792.x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
14
|
Cox CL, Davis Rabosky AR, Reyes-Velasco J, Ponce-Campos P, Smith EN, Flores-Villela O, Campbell JA. Molecular systematics of the genusSonora(Squamata: Colubridae) in central and western Mexico. SYST BIODIVERS 2012. [DOI: 10.1080/14772000.2012.666293] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
15
|
Moeller AH, Townsend JP. Phylogenetic informativeness profiling of 12 genes for 28 vertebrate taxa without divergence dates. Mol Phylogenet Evol 2011; 60:271-2. [PMID: 21558010 DOI: 10.1016/j.ympev.2011.04.023] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2011] [Revised: 04/15/2011] [Accepted: 04/26/2011] [Indexed: 01/15/2023]
|