1
|
Haag J, Hübner L, Kozlov AM, Stamatakis A. The Free Lunch is not over yet-systematic exploration of numerical thresholds in maximum likelihood phylogenetic inference. BIOINFORMATICS ADVANCES 2023; 3:vbad124. [PMID: 37750068 PMCID: PMC10518076 DOI: 10.1093/bioadv/vbad124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 09/04/2023] [Accepted: 09/12/2023] [Indexed: 09/27/2023]
Abstract
Summary Maximum likelihood (ML) is a widely used phylogenetic inference method. ML implementations heavily rely on numerical optimization routines that use internal numerical thresholds to determine convergence. We systematically analyze the impact of these threshold settings on the log-likelihood and runtimes for ML tree inferences with RAxML-NG, IQ-TREE, and FastTree on empirical datasets. We provide empirical evidence that we can substantially accelerate tree inferences with RAxML-NG and IQ-TREE by changing the default values of two such numerical thresholds. At the same time, altering these settings does not significantly impact the quality of the inferred trees. We further show that increasing both thresholds accelerates the RAxML-NG bootstrap without influencing the resulting support values. For RAxML-NG, increasing the likelihood thresholds ϵ LnL and ϵ brlen to 10 and 103, respectively, results in an average tree inference speedup of 1.9 ± 0.6 on Data collection 1, 1.8 ± 1.1 on Data collection 2, and 1.9 ± 0.8 on Data collection 2 for the RAxML-NG bootstrap compared to the runtime under the current default setting. Increasing the likelihood threshold ϵ LnL to 10 in IQ-TREE results in an average tree inference speedup of 1.3 ± 0.4 on Data collection 1 and 1.3 ± 0.9 on Data collection 2. Availability and implementation All MSAs we used for our analyses, as well as all results, are available for download at https://cme.h-its.org/exelixis/material/freeLunch_data.tar.gz. Our data generation scripts are available at https://github.com/tschuelia/ml-numerical-analysis.
Collapse
Affiliation(s)
- Julia Haag
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
| | - Lukas Hübner
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, 76131 Karlsruhe, Germany
| | - Alexey M Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, 76131 Karlsruhe, Germany
- Biodiversity Computing Group, Institute of Computer Science, Foundation for Research and Technology – Hellas, 70013 Heraklion, Greece
| |
Collapse
|
2
|
Haag J, Höhler D, Bettisworth B, Stamatakis A. From Easy to Hopeless-Predicting the Difficulty of Phylogenetic Analyses. Mol Biol Evol 2022; 39:6832260. [PMID: 36395091 PMCID: PMC9728795 DOI: 10.1093/molbev/msac254] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Phylogenetic analyzes under the Maximum-Likelihood (ML) model are time and resource intensive. To adequately capture the vastness of tree space, one needs to infer multiple independent trees. On some datasets, multiple tree inferences converge to similar tree topologies, on others to multiple, topologically highly distinct yet statistically indistinguishable topologies. At present, no method exists to quantify and predict this behavior. We introduce a method to quantify the degree of difficulty for analyzing a dataset and present Pythia, a Random Forest Regressor that accurately predicts this difficulty. Pythia predicts the degree of difficulty of analyzing a dataset prior to initiating ML-based tree inferences. Pythia can be used to increase user awareness with respect to the amount of signal and uncertainty to be expected in phylogenetic analyzes, and hence inform an appropriate (post-)analysis setup. Further, it can be used to select appropriate search algorithms for easy-, intermediate-, and hard-to-analyze datasets.
Collapse
Affiliation(s)
| | - Dimitri Höhler
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Ben Bettisworth
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany,Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| |
Collapse
|
3
|
Vasilikopoulos A, Gustafson GT, Balke M, Niehuis O, Beutel RG, Misof B. Resolving the phylogenetic position of Hygrobiidae (Coleoptera: Adephaga) requires objective statistical tests and exhaustive phylogenetic methodology: a response to Cai et al. (2020). Mol Phylogenet Evol 2020; 162:106923. [PMID: 32771549 DOI: 10.1016/j.ympev.2020.106923] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Accepted: 08/03/2020] [Indexed: 12/20/2022]
Affiliation(s)
- Alexandros Vasilikopoulos
- Center for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, 53121 Bonn, Germany.
| | - Grey T Gustafson
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, 66045 KS, USA
| | - Michael Balke
- Department of Entomology, SNSB-Bavarian State Collections of Zoology, 81247 Munich, Germany
| | - Oliver Niehuis
- Department of Evolutionary Biology and Ecology, Institute of Biology I (Zoology), Albert Ludwig University of Freiburg, 79104 Freiburg, Germany
| | - Rolf G Beutel
- Institut für Zoologie und Evolutionsforschung, Friedrich-Schiller-Universität Jena, 07743 Jena, Germany
| | - Bernhard Misof
- Zoological Research Museum Alexander Koenig, 53121 Bonn, Germany
| |
Collapse
|
4
|
Paetzold C, Wood KR, Eaton DAR, Wagner WL, Appelhans MS. Phylogeny of Hawaiian Melicope (Rutaceae): RAD-seq Resolves Species Relationships and Reveals Ancient Introgression. FRONTIERS IN PLANT SCIENCE 2019; 10:1074. [PMID: 31608076 PMCID: PMC6758601 DOI: 10.3389/fpls.2019.01074] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Accepted: 08/07/2019] [Indexed: 05/11/2023]
Abstract
Hawaiian Melicope are one of the major adaptive radiations of the Hawaiian Islands comprising 54 endemic species. The lineage is monophyletic with an estimated crown age predating the rise of the current high islands. Phylogenetic inference based on Sanger sequencing has not been sufficient to resolve species or deeper level relationships. Here, we apply restriction site-associated DNA sequencing (RAD-seq) to the lineage to infer phylogenetic relationships. We employ Quartet Sampling to assess information content and statistical support, and to quantify discordance as well as partitioned ABBA-BABA tests to uncover evidence of introgression. Our new results drastically improved resolution of relationships within Hawaiian Melicope. The lineage is divided into five fully supported main clades, two of which correspond to morphologically circumscribed infrageneric groups. We provide evidence for both ancestral and current hybridization events. We confirm the necessity for a taxonomic revision of the Melicope section Pelea, as well as a re-evaluation of several species complexes by combining genomic and morphological data.
Collapse
Affiliation(s)
- Claudia Paetzold
- Department of Systematics, Biodiversity and Evolution of Plants (with Herbarium), University of Göttingen, Goettingen, Germany
| | - Kenneth R. Wood
- National Tropical Botanical Garden, Kalaheo, HI, United States
| | - Deren A. R. Eaton
- Department of Ecology, Evolution and Environmental Biology, Columbia University, New York, NY, USA
- Department of Ecology, Evolution, and Environmental Biology, Columbia University, New York, NY, United States
| | - Warren L. Wagner
- Department of Botany, Smithsonian Institution, Washington, DC, United States
| | - Marc S. Appelhans
- Department of Systematics, Biodiversity and Evolution of Plants (with Herbarium), University of Göttingen, Goettingen, Germany
- Department of Botany, Smithsonian Institution, Washington, DC, United States
| |
Collapse
|
5
|
Matos-Maraví P, Duarte Ritter C, Barnes CJ, Nielsen M, Olsson U, Wahlberg N, Marquina D, Sääksjärvi I, Antonelli A. Biodiversity seen through the perspective of insects: 10 simple rules on methodological choices and experimental design for genomic studies. PeerJ 2019; 7:e6727. [PMID: 31106048 PMCID: PMC6499058 DOI: 10.7717/peerj.6727] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2018] [Accepted: 03/06/2019] [Indexed: 12/18/2022] Open
Abstract
Massively parallel DNA sequencing opens up opportunities for bridging multiple temporal and spatial dimensions in biodiversity research, thanks to its efficiency to recover millions of nucleotide polymorphisms. Here, we identify the current status, discuss the main challenges, and look into future perspectives on biodiversity genomics focusing on insects, which arguably constitute the most diverse and ecologically important group among all animals. We suggest 10 simple rules that provide a succinct step-by-step guide and best-practices to anyone interested in biodiversity research through the study of insect genomics. To this end, we review relevant literature on biodiversity and evolutionary research in the field of entomology. Our compilation is targeted at researchers and students who may not yet be specialists in entomology or molecular biology. We foresee that the genomic revolution and its application to the study of non-model insect lineages will represent a major leap to our understanding of insect diversity.
Collapse
Affiliation(s)
- Pável Matos-Maraví
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
- Gothenburg Global Biodiversity Centre, Gothenburg, Sweden
- Institute of Entomology, Biology Centre CAS, České Budějovice, Czech Republic
| | - Camila Duarte Ritter
- Department of Eukaryotic Microbiology, University of Duisburg-Essen, Essen, Germany
| | | | - Martin Nielsen
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
- Section for Evolutionary Genomics, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Urban Olsson
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
- Gothenburg Global Biodiversity Centre, Gothenburg, Sweden
| | | | - Daniel Marquina
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden
- Department of Zoology, Stockholm University, Stockholm, Sweden
| | | | - Alexandre Antonelli
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
- Gothenburg Global Biodiversity Centre, Gothenburg, Sweden
- Royal Botanical Garden, Kew, Richmond, Surrey, UK
| |
Collapse
|
6
|
Evangelista D, Thouzé F, Kohli MK, Lopez P, Legendre F. Topological support and data quality can only be assessed through multiple tests in reviewing Blattodea phylogeny. Mol Phylogenet Evol 2018; 128:112-122. [PMID: 29969656 DOI: 10.1016/j.ympev.2018.05.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Revised: 05/07/2018] [Accepted: 05/08/2018] [Indexed: 11/18/2022]
Abstract
Assessing support for molecular phylogenies is difficult because the data is heterogeneous in quality and overwhelming in quantity. Traditionally, node support values (bootstrap frequency, Bayesian posterior probability) are used to assess confidence in tree topologies. Other analyses to assess the quality of phylogenetic data (e.g. Lento plots, saturation plots, trait consistency) and the resulting phylogenetic trees (e.g. internode certainty, parameter permutation tests, topological tests) exist but are rarely applied. Here we argue that a single qualitative analysis is insufficient to assess support of a phylogenetic hypothesis and relate data quality to tree quality. We use six molecular markers to infer the phylogeny of Blattodea and apply various tests to assess relationship support, locus quality, and the relationship between the two. We use internode-certainty calculations in conjunction with bootstrap scores, alignment permutations, and an approximately unbiased (AU) test to assess if the molecular data unambiguously support the phylogenetic relationships found. Our results show higher support for the position of Lamproblattidae, high support for the termite phylogeny, and low support for the position of Anaplectidae, Corydioidea and phylogeny of Blaberoidea. We use Lento plots in conjunction with mutation-saturation plots, calculations of locus homoplasy to assess locus quality, identify long branch attraction, and decide if the tree's relationships are the result of data biases. We conclude that multiple tests and metrics need to be taken into account to assess tree support and data robustness.
Collapse
Affiliation(s)
- Dominic Evangelista
- Institut de Systématique, Evolution, Biodiversité ISYEB - UMR 7205 - MNHN CNRS UPMC EPHE, Muséum national d'Histoire naturelle, Sorbonne Universités, CP50, 57 rue Cuvier, 75005 Paris, France.
| | - France Thouzé
- Institut de Systématique, Evolution, Biodiversité ISYEB - UMR 7205 - MNHN CNRS UPMC EPHE, Muséum national d'Histoire naturelle, Sorbonne Universités, CP50, 57 rue Cuvier, 75005 Paris, France.
| | - Manpreet Kaur Kohli
- Department of Biological Sciences, Rutgers, The State University of New Jersey, 195 University Ave., Newark, NJ 07102, United States.
| | - Philippe Lopez
- Institut de Systématique, Evolution, Biodiversité ISYEB - UMR 7205 - MNHN CNRS UPMC EPHE, Muséum national d'Histoire naturelle, Sorbonne Universités, CP50, 57 rue Cuvier, 75005 Paris, France.
| | - Frédéric Legendre
- Institut de Systématique, Evolution, Biodiversité ISYEB - UMR 7205 - MNHN CNRS UPMC EPHE, Muséum national d'Histoire naturelle, Sorbonne Universités, CP50, 57 rue Cuvier, 75005 Paris, France.
| |
Collapse
|
7
|
Fonseca LHM, Lohmann LG. Plastome Rearrangements in the " Adenocalymma-Neojobertia" Clade (Bignonieae, Bignoniaceae) and Its Phylogenetic Implications. FRONTIERS IN PLANT SCIENCE 2017; 8:1875. [PMID: 29163600 PMCID: PMC5672021 DOI: 10.3389/fpls.2017.01875] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2017] [Accepted: 10/16/2017] [Indexed: 05/02/2023]
Abstract
The chloroplast is one of the most important organelles of plants. This organelle has a circular DNA with approximately 130 genes. The use of plastid genomic data in phylogenetic and evolutionary studies became possible with high-throughput sequencing methods, which allowed us to rapidly obtain complete genomes at a reasonable cost. Here, we use high-throughput sequencing to study the "Adenocalymma-Neojobertia" clade (Bignonieae, Bignoniaceae). More specifically, we use Hi-Seq Illumina technology to sequence 10 complete plastid genomes. Plastomes were assembled using selected plastid reads and de novo approach with SPAdes. The 10 assembled genomes were analyzed in a phylogenetic context using five different partition schemes: (1) 91 protein-coding genes ("coding"); (2) 76 introns and spacers with alignment manually edited ("non-coding edited"); (3) 76 non-coding regions with poorly aligned regions removed using T-Coffee ("non-coding filtered"); (4) 91 coding regions plus 76 non-coding regions edited ("coding + non-coding edited"); and, (5) 91 protein-coding regions plus the 76 filtered non-coding regions ("coding + non-coding filtered"). Fragmented regions were aligned using Mafft. Phylogenetic analyses were conducted using Maximum Likelihood (ML) and Bayesian Criteria (BC). The analyses of the individual plastomes consistently recovered an expansion of the Inverted Repeated (IRs) regions and a compression of the Small Single Copy (SSC) region. Major genomic translocations were observed at the Large Single Copy (LSC) and IRs. ML phylogenetic analyses of the individual datasets led to the same topology, with the exception of the analysis of the "non-coding filtered" dataset. Overall, relationships were strongly supported, with the highest support values obtained through the analysis of the "coding + non-coding edited" dataset. Four regions at the LSC, SSC, and IR were selected for primer development. The "Adenocalymma-Neojobertia" clade shows an unusual pattern of plastid structure variation, including four major genomic translocations. These rearrangements challenge the current view of conserved plastid genome architecture in terms of gene order. It also complicates both genomic assemblies using reference genomes and sequence alignments using whole plastomes. Therefore, strategies that employ de novo assemblies and manual evaluation of sequence alignments are required to prevent assembly and alignment errors.
Collapse
|
8
|
Yeates DK, Meusemann K, Trautwein M, Wiegmann B, Zwick A. Power, resolution and bias: recent advances in insect phylogeny driven by the genomic revolution. CURRENT OPINION IN INSECT SCIENCE 2016; 13:16-23. [PMID: 27436549 DOI: 10.1016/j.cois.2015.10.007] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2015] [Revised: 10/08/2015] [Accepted: 10/18/2015] [Indexed: 06/06/2023]
Abstract
Our understanding on the phylogenetic relationships of insects has been revolutionised in the last decade by the proliferation of next generation sequencing technologies (NGS). NGS has allowed insect systematists to assemble very large molecular datasets that include both model and non-model organisms. Such datasets often include a large proportion of the total number of protein coding sequences available for phylogenetic comparison. We review some early entomological phylogenomic studies that employ a range of different data sampling protocols and analyses strategies, illustrating a fundamental renaissance in our understanding of insect evolution all driven by the genomic revolution. The analysis of phylogenomic datasets is challenging because of their size and complexity, and it is obvious that the increasing size alone does not ensure that phylogenetic signal overcomes systematic biases in the data. Biases can be due to various factors such as the method of data generation and assembly, or intrinsic biological feature of the data per se, such as similarities due to saturation or compositional heterogeneity. Such biases often cause violations in the underlying assumptions of phylogenetic models. We review some of the bioinformatics tools available and being developed to detect and minimise systematic biases in phylogenomic datasets. Phylogenomic-scale data coupled with sophisticated analyses will revolutionise our understanding of insect functional genomics. This will illuminate the relationship between the vast range of insect phenotypic diversity and underlying genetic diversity. In combination with rapidly developing methods to estimate divergence times, these analyses will also provide a compelling view of the rates and patterns of lineagenesis (birth of lineages) over the half billion years of insect evolution.
Collapse
Affiliation(s)
- David K Yeates
- Australian National Insect Collection, CSIRO National Research Collections Australia, Canberra, ACT 2601, Australia.
| | - Karen Meusemann
- Australian National Insect Collection, CSIRO National Research Collections Australia, Canberra, ACT 2601, Australia
| | - Michelle Trautwein
- California Academy of Sciences, 55 Music Concourse Drive, San Francisco, CA 94118, USA
| | - Brian Wiegmann
- Department of Entomology, North Carolina State University, Raleigh, NC 27695-7613, USA
| | - Andreas Zwick
- Australian National Insect Collection, CSIRO National Research Collections Australia, Canberra, ACT 2601, Australia
| |
Collapse
|
9
|
Schrödl M, Stöger I. A review on deep molluscan phylogeny: old markers, integrative approaches, persistent problems. J NAT HIST 2014. [DOI: 10.1080/00222933.2014.963184] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|