1
|
Miao J, Chen T, Misir M, Lin Y. Deep learning for predicting 16S rRNA gene copy number. Sci Rep 2024; 14:14282. [PMID: 38902329 PMCID: PMC11190246 DOI: 10.1038/s41598-024-64658-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 06/11/2024] [Indexed: 06/22/2024] Open
Abstract
Culture-independent 16S rRNA gene metabarcoding is a commonly used method for microbiome profiling. To achieve more quantitative cell fraction estimates, it is important to account for the 16S rRNA gene copy number (hereafter 16S GCN) of different community members. Currently, there are several bioinformatic tools available to estimate the 16S GCN values, either based on taxonomy assignment or phylogeny. Here we present a novel approach ANNA16, Artificial Neural Network Approximator for 16S rRNA gene copy number, a deep learning-based method that estimates the 16S GCN values directly from the 16S gene sequence strings. Based on 27,579 16S rRNA gene sequences and gene copy number data from the rrnDB database, we show that ANNA16 outperforms the commonly used 16S GCN prediction algorithms. Interestingly, Shapley Additive exPlanations (SHAP) shows that ANNA16 can identify unexpected informative positions in 16S rRNA gene sequences without any prior phylogenetic knowledge, which suggests potential applications beyond 16S GCN prediction.
Collapse
Affiliation(s)
- Jiazheng Miao
- Division of Applied and Natural Sciences, Duke Kunshan University, Suzhou, China
- Department of Biomedical Informatics, Harvard Medical School, Boston, USA
| | - Tianlai Chen
- Division of Applied and Natural Sciences, Duke Kunshan University, Suzhou, China
- Department of Biomedical Engineering, Duke University, Durham, USA
| | - Mustafa Misir
- Division of Applied and Natural Sciences, Duke Kunshan University, Suzhou, China.
| | - Yajuan Lin
- Division of Applied and Natural Sciences, Duke Kunshan University, Suzhou, China.
- Department of Life Sciences, Texas A&M University-Corpus Christi, Corpus Christi, USA.
| |
Collapse
|
2
|
Simmons MP, Goloboff PA, Stöver BC, Springer MS, Gatesy J. Quantification of congruence among gene trees with polytomies using overall success of resolution for phylogenomic coalescent analyses. Cladistics 2023; 39:418-436. [PMID: 37096985 DOI: 10.1111/cla.12540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 02/22/2023] [Accepted: 03/24/2023] [Indexed: 04/26/2023] Open
Abstract
Gene-tree-inference error can cause species-tree-inference artefacts in summary phylogenomic coalescent analyses. Here we integrate two ways of accommodating these inference errors: collapsing arbitrarily or dubiously resolved gene-tree branches, and subsampling gene trees based on their pairwise congruence. We tested the effect of collapsing gene-tree branches with 0% approximate-likelihood-ratio-test (SH-like aLRT) support in likelihood analyses and strict consensus trees for parsimony, and then subsampled those partially resolved trees based on congruence measures that do not penalize polytomies. For this purpose we developed a new TNT script for congruence sorting (congsort), and used it to calculate topological incongruence for eight phylogenomic datasets using three distance measures: standard Robinson-Foulds (RF) distances; overall success of resolution (OSR), which is based on counting both matching and contradicting clades; and RF contradictions, which only counts contradictory clades. As expected, we found that gene-tree incongruence was often concentrated in clades that are arbitrarily or dubiously resolved and that there was greater congruence between the partially collapsed gene trees and the coalescent and concatenation topologies inferred from those genes. Coalescent branch lengths typically increased as the most incongruent gene trees were excluded, although branch supports typically did not. We investigated two successful and complementary approaches to prioritizing genes for investigation of alignment or homology errors. Coalescent-tree clades that contradicted concatenation-tree clades were generally less robust to gene-tree subsampling than congruent clades. Our preferred approach to collapsing likelihood gene-tree clades (0% SH-like aLRT support) and subsampling those trees (OSR) generally outperformed competing approaches for a large fungal dataset with respect to branch lengths, support and congruence. We recommend widespread application of this approach (and strict consensus trees for parsimony-based analyses) for improving quantification of gene-tree congruence/conflict, estimating coalescent branch lengths, testing robustness of coalescent analyses to gene-tree-estimation error, and improving topological robustness of summary coalescent analyses. This approach is quick and easy to implement, even for huge datasets.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO, 80523, USA
| | - Pablo A Goloboff
- CONICET, INSUE, Fundación Miguel Lillo, Miguel Lillo 251, 4000, S.M. de Tucumán, Argentina
| | - Ben C Stöver
- Institute for Evolution and Biodiversity, WMU Münster, 48149, Münster, Germany
| | - Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA, 92521, USA
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY, 10024, USA
| |
Collapse
|
3
|
Estupiñán RA, Torres de Farias S, Gonçalves EC, Camargo M, Cruz Schneider MP. Performance of intron 7 of the β-fibrinogen gene for phylogenetic analysis: An example using gladiator frogs, Boana Gray, 1825 (Anura, Hylidae, Cophomantinae). Zookeys 2023; 1149:145-169. [DOI: 10.3897/zookeys.1149.85627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 07/22/2022] [Indexed: 02/24/2023] Open
Abstract
Boana, the third largest genus of Hylinae, has cryptic morphological species. The potential applicability of b-fibrinogen intron 7 – FGBI7 is explored to propose a robust phylogeny of Boana. The phylogenetic potential of FGBI7 was evaluated using maximum parsimony, MrBayes, and maximum likelihood analysis. Comparison of polymorphic sites and topologies obtained with concatenated analysis of FGBI7 and other nuclear genes (CXCR4, CXCR4, RHO, SIAH1, TYR, and 28S) allowed evaluation of the phylogenetic signal of FGBI7. Mean evolutionary rates were calculated using the sequences of the mitochondrial genes ND1 and CYTB available for Boana in GenBank. Dating of Boana and some of its groups was performed using the RelTime method with secondary calibration. FGBI7 analysis revealed high values at informative sites for parsimony. The absolute values of the mean evolutionary rate were higher for mitochondrial genes than for FGBI7. Dating of congruent Boana groups for ND1, CYTB, and FGBI7 revealed closer values between mitochondrial genes and slightly different values from those of FGBI7. Divergence times of basal groups tended to be overestimated when mtDNA was used and were more accurate when nDNA was used. Although there is evidence of phylogenetic potential arising from concatenation of specific genes, FGBI7 provides well-resolved independent gene trees. These results lead to a paradigm for linking data in phylogenomics that focuses on the uniqueness of species histories and ignores the multiplicities of individual gene histories.
Collapse
|
4
|
Edwards SV, Tonini JFR, Mcinerney N, Welch C, Beerli P. Multilocus phylogeography, population genetics and niche evolution of Australian brown and black-tailed treecreepers (Aves: Climacteris). Biol J Linn Soc Lond 2023. [DOI: 10.1093/biolinnean/blac144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Abstract
The Carpentarian barrier across north-eastern Australia is a major biogeographic barrier and a generator of biodiversity within the Australian Monsoonal Tropics. Here we present a continent-wide analysis of mitochondrial (control region) and autosomal (14 anonymous loci) sequence and indel variation and niche modelling of brown and black-tailed treecreepers (Climacteris picumnus and Climacteris melanurus), a clade with a classic distribution on either side of the Carpentarian barrier. mtDNA control region sequences exhibited reciprocal monophyly and strong differentiation (Fst = 0.91), and revealed a signature of a recent selective sweep in C. picumnus. A variety of tests support an isolation-with-migration model of divergence, albeit with low levels of gene flow across the Carpentarian barrier and a divergence time between species of ~1.7–2.8 Mya. Palaeoecological niche models show that both range size as measured by available habitat and estimated historical population sizes of both species declined in the past ~600 kyr and that the area of interspecific range overlap was never historically large, perhaps decreasing opportunities for extensive gene flow. The relatively long divergence time and low opportunity for gene flow may have facilitated speciation more so than in other co-distributed bird taxa across the Australian Monsoonal Tropics.
Collapse
Affiliation(s)
- Scott V Edwards
- Museum of Comparative Zoology, Harvard University , Cambridge, MA 02138 , USA
- Department of Organismic and Evolutionary Biology, Harvard University , Cambridge, MA 02138 , USA
| | - João F R Tonini
- Museum of Comparative Zoology, Harvard University , Cambridge, MA 02138 , USA
- Department of Organismic and Evolutionary Biology, Harvard University , Cambridge, MA 02138 , USA
- Department of Biology, University of Richmond , Richmond, VA 23217 , USA
| | - Nancy Mcinerney
- Smithsonian's National Zoo and Conservation Biology Institute , NW, Washington, DC 20008 , USA
| | - Corey Welch
- Department of Biology and Burke Museum, University of Washington , Seattle, WA 98195 , USA
- STEM Scholars Program, Student Innovation Center, Iowa State University , Ames, IA 50011 , USA
| | - Peter Beerli
- Department of Scientific Computing, Florida State University, Florida State University , Tallahassee, FL 32306 , USA
| |
Collapse
|
5
|
Černý D, Natale R. Comprehensive taxon sampling and vetted fossils help clarify the time tree of shorebirds (Aves, Charadriiformes). Mol Phylogenet Evol 2022; 177:107620. [PMID: 36038056 DOI: 10.1016/j.ympev.2022.107620] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 06/03/2022] [Accepted: 08/17/2022] [Indexed: 01/20/2023]
Abstract
Shorebirds (Charadriiformes) are a globally distributed clade of modern birds and, due to their ecological and morphological disparity, a frequent subject of comparative studies. While molecular phylogenies have been key to establishing the suprafamilial backbone of the charadriiform tree, a number of relationships at both deep and shallow taxonomic levels remain poorly resolved. The timescale of shorebird evolution also remains uncertain as a result of extensive disagreements among the published divergence dating studies, stemming largely from different choices of fossil calibrations. Here, we present the most comprehensive non-supertree phylogeny of shorebirds to date, based on a total-evidence dataset comprising 353 ingroup taxa (90% of all extant or recently extinct species), 27 loci (15 mitochondrial and 12 nuclear), and 69 morphological characters. We further clarify the timeline of charadriiform evolution by time-scaling this phylogeny using a set of 14 up-to-date and thoroughly vetted fossil calibrations. In addition, we assemble a taxonomically restricted 100-locus dataset specifically designed to resolve outstanding problems in higher-level charadriiform phylogeny. In terms of tree topology, our results are largely congruent with previous studies but indicate that some of the conflicts among earlier analyses reflect a genuine signal of pervasive gene tree discordance. Monophyly of the plovers (Charadriidae), the position of the ibisbill (Ibidorhyncha), and the relationships among the five subfamilies of the gulls (Laridae) could not be resolved even with greatly increased locus and taxon sampling. Moreover, several localized regions of uncertainty persist in shallower parts of the tree, including the interrelationships of the true auks (Alcinae) and anarhynchine plovers. Our node-dating and macroevolutionary rate analyses find support for a Paleocene origin of crown-group shorebirds, as well as exceptionally rapid recent radiations of Old World oystercatchers (Haematopodidae) and select genera of gulls. Our study underscores the challenges involved in estimating a comprehensively sampled and carefully calibrated time tree for a diverse avian clade, and highlights areas in need of further research.
Collapse
Affiliation(s)
- David Černý
- Department of the Geophysical Sciences, University of Chicago, Chicago 60637, USA.
| | - Rossy Natale
- Department of Organismal Biology & Anatomy, University of Chicago, Chicago 60637, USA
| |
Collapse
|
6
|
Birth N, Dencker T, Morgenstern B. Insertions and deletions as phylogenetic signal in an alignment-free context. PLoS Comput Biol 2022; 18:e1010303. [PMID: 35939516 PMCID: PMC9387925 DOI: 10.1371/journal.pcbi.1010303] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 08/18/2022] [Accepted: 06/14/2022] [Indexed: 11/18/2022] Open
Abstract
Most methods for phylogenetic tree reconstruction are based on sequence alignments; they infer phylogenies from substitutions that may have occurred at the aligned sequence positions. Gaps in alignments are usually not employed as phylogenetic signal. In this paper, we explore an alignment-free approach that uses insertions and deletions (indels) as an additional source of information for phylogeny inference. For a set of four or more input sequences, we generate so-called quartet blocks of four putative homologous segments each. For pairs of such quartet blocks involving the same four sequences, we compare the distances between the two blocks in these sequences, to obtain hints about indels that may have happened between the blocks since the respective four sequences have evolved from their last common ancestor. A prototype implementation that we call Gap-SpaM is presented to infer phylogenetic trees from these data, using a quartet-tree approach or, alternatively, under the maximum-parsimony paradigm. This approach should not be regarded as an alternative to established methods, but rather as a complementary source of phylogenetic information. Interestingly, however, our software is able to produce phylogenetic trees from putative indels alone that are comparable to trees obtained with existing alignment-free methods. Phylogenetic tree inference based on DNA or protein sequence comparison is a fundamental task in computational biology. Given a multiple alignment of a set of input sequences, most approaches compare aligned sequence positions to each other, to find a suitable tree, based on a model of molecular evolution. Insertions and deletions that may have happened since the input sequences evolved from their last common ancestor are ignored by most phylogeny methods. Herein, we show that insertions and deletions can provide an additional source of information for phylogeny inference, and that such information can be obtained with a simple alignment-free approach. We provide an implementation of this idea that we call Gap-SpaM. The proposed approach is complementary to existing phylogeny methods since it is based on a completely different source of information. It is, thus, not meant to be an alternative to those existing methods but rather as a possible additional source of information for tree inference.
Collapse
Affiliation(s)
- Niklas Birth
- Department of Bioinformatics, Institute of Microbiology and Genetics, Universisät Göttingen, Göttingen, Germany
| | - Thomas Dencker
- Department of Bioinformatics, Institute of Microbiology and Genetics, Universisät Göttingen, Göttingen, Germany
| | - Burkhard Morgenstern
- Department of Bioinformatics, Institute of Microbiology and Genetics, Universisät Göttingen, Göttingen, Germany
- Göttingen Center of Molecular Biosciences (GZMB), Göttingen, Germany
- Campus-Institute Data Science (CIDAS), Göttingen, Germany
- * E-mail:
| |
Collapse
|
7
|
Gatesy J, Springer MS. Phylogenomic Coalescent Analyses of Avian Retroelements Infer Zero-Length Branches at the Base of Neoaves, Emergent Support for Controversial Clades, and Ancient Introgressive Hybridization in Afroaves. Genes (Basel) 2022; 13:genes13071167. [PMID: 35885951 PMCID: PMC9324441 DOI: 10.3390/genes13071167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Revised: 06/20/2022] [Accepted: 06/21/2022] [Indexed: 01/25/2023] Open
Abstract
Retroelement insertions (RIs) are low-homoplasy characters that are ideal data for addressing deep evolutionary radiations, where gene tree reconstruction errors can severely hinder phylogenetic inference with DNA and protein sequence data. Phylogenomic studies of Neoaves, a large clade of birds (>9000 species) that first diversified near the Cretaceous−Paleogene boundary, have yielded an array of robustly supported, contradictory relationships among deep lineages. Here, we reanalyzed a large RI matrix for birds using recently proposed quartet-based coalescent methods that enable inference of large species trees including branch lengths in coalescent units, clade-support, statistical tests for gene flow, and combined analysis with DNA-sequence-based gene trees. Genome-scale coalescent analyses revealed extremely short branches at the base of Neoaves, meager branch support, and limited congruence with previous work at the most challenging nodes. Despite widespread topological conflicts with DNA-sequence-based trees, combined analyses of RIs with thousands of gene trees show emergent support for multiple higher-level clades (Columbea, Passerea, Columbimorphae, Otidimorphae, Phaethoquornithes). RIs express asymmetrical support for deep relationships within the subclade Afroaves that hints at ancient gene flow involving the owl lineage (Strigiformes). Because DNA-sequence data are challenged by gene tree-reconstruction error, analysis of RIs represents one approach for improving gene tree-based methods when divergences are deep, internodes are short, terminal branches are long, and introgressive hybridization further confounds species−tree inference.
Collapse
Affiliation(s)
- John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA
- Correspondence:
| | - Mark S. Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA 92521, USA;
| |
Collapse
|
8
|
Wang N, Braun EL, Liang B, Cracraft J, Smith SA. Categorical edge-based analyses of phylogenomic data reveal conflicting signals for difficult relationships in the avian tree. Mol Phylogenet Evol 2022; 174:107550. [PMID: 35691570 DOI: 10.1016/j.ympev.2022.107550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 05/13/2022] [Accepted: 06/02/2022] [Indexed: 11/28/2022]
Abstract
Phylogenetic analyses fail to yield a satisfactory resolution of some relationships in the tree of life even with genome-scale datasets, so the failure is unlikely to reflect limitations in the amount of data. Gene tree conflicts are particularly notable in studies focused on these contentious nodes, and taxon sampling, different analytical methods, and/or data type effects can further confound analyses. Although many efforts have been made to incorporate biological conflicts, few studies have curated individual genes for their efficiency in phylogenomic studies. Here, we conduct an edge-based analysis of Neoavian evolution, examining the phylogenetic efficacy of two recent phylogenomic bird datasets and three datatypes (ultraconserved elements [UCEs], introns, and coding regions). We assess the potential causes for biases in signal-resolution for three difficult nodes: the earliest divergence of Neoaves, the position of the enigmatic Hoatzin (Opisthocomus hoazin), and the position of owls (Strigiformes). We observed extensive conflict among genes for all data types and datasets even after meticulous curation. Edge-based analyses (EBA) increased congruence and provided information about the impact of data type, GC content variation (GCCV), and outlier genes on each of nodes we examined. First, outlier gene signals appeared to drive different patterns of support for the relationships among the earliest diverging Neoaves. Second, the placement of Hoatzin was highly variable, although our EBA did reveal a previously unappreciated data type effect with an impact on its position. It also revealed that the resolution with the most support here was Hoatzin + shorebirds. Finally, GCCV, rather than data type (i.e., coding vs non-coding) per se, was correlated with a signal that supports monophyly of owls + Accipitriformes (hawks, eagles, and vultures). Eliminating high GCCV loci increased the signal for owls + mousebirds. Categorical EBA was able to reveal the nature of each edge and provide a way to highlight especially problematic branches that warrant a further examination. The current study increases our understanding about the contentious parts of the avian tree, which show even greater conflicts than appreciated previously.
Collapse
Affiliation(s)
- Ning Wang
- College of Life Sciences, Inner Mongolia University, Hohhot 010070, China; Department of Ecology & Evolutionary Biology, University of Michigan, 1105 N University Ave, Ann Arbor, MI 48109-1048, USA; Department of Ornithology, American Museum of Natural History, New York, NY 10024, USA.
| | - Edward L Braun
- Department of Biology, University of Florida, Gainesville, FL 32607, USA
| | - Bin Liang
- College of Life Sciences, Inner Mongolia University, Hohhot 010070, China; Department of Ecology & Evolutionary Biology, University of Michigan, 1105 N University Ave, Ann Arbor, MI 48109-1048, USA
| | - Joel Cracraft
- Department of Ornithology, American Museum of Natural History, New York, NY 10024, USA
| | - Stephen A Smith
- Department of Ecology & Evolutionary Biology, University of Michigan, 1105 N University Ave, Ann Arbor, MI 48109-1048, USA
| |
Collapse
|
9
|
Liu B, Warnow T. Scalable Species Tree Inference with External Constraints. J Comput Biol 2022; 29:664-678. [PMID: 35196115 DOI: 10.1089/cmb.2021.0543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Species tree inference is a basic step in biological discovery, but discordance between gene trees creates analytical challenges and large data sets create computational challenges. Although there is generally some information available about the species trees that could be used to speed up the estimation, only one species tree estimation method that addresses gene tree discordance-ASTRAL-J, a recent development in the ASTRAL family of methods-is able to use this information. Here we describe two new methods, NJst-J and FASTRAL-J, that can estimate the species tree, given a partial knowledge of the species tree in the form of a nonbinary unrooted constraint tree. We show that both NJst-J and FASTRAL-J are much faster than ASTRAL-J and we prove that all three methods are statistically consistent under the multispecies coalescent model subject to this constraint. Our extensive simulation study shows that both FASTRAL-J and NJst-J provide advantages over ASTRAL-J: both are faster (and NJst-J is particularly fast), and FASTRAL-J is generally at least as accurate as ASTRAL-J. An analysis of the Avian Phylogenomics Project data set with 48 species and 14,446 genes presents additional evidence of the value of FASTRAL-J over ASTRAL-J (and both over ASTRAL), with dramatic reductions in running time (20 hours for default ASTRAL, and minutes or seconds for ASTRAL-J and FASTRAL-J, respectively).
Collapse
Affiliation(s)
- Baqiao Liu
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| |
Collapse
|
10
|
Boutte J, Fishbein M, Straub SCK. NGS-Indel Coder v2.0: A Streamlined Pipeline to Code Indel Characters in Phylogenomic Data. Methods Mol Biol 2022; 2512:61-72. [PMID: 35817999 DOI: 10.1007/978-1-0716-2429-6_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Hypothesized evolutionary insertions and deletions in nucleic acid sequences (indels) contain significant phylogenetic information and can be integrated in phylogenomic analyses. However, assemblies of short reads obtained from next-generation sequencing (NGS) technologies can contain errors that result in falsely inferred indels that need to be detected and omitted to avoid inclusion in phylogenetic analysis. Here, we detail the commands that comprise a new version of the NGS-Indel Coder pipeline, which was developed to validate indels using assembly read depth.
Collapse
Affiliation(s)
- Julien Boutte
- Department of Biology, Hobart and William Smith Colleges, Geneva, NY, USA.
| | - Mark Fishbein
- Department of Plant Biology, Ecology and Evolution, Oklahoma State University, Stillwater, OK, USA
| | - Shannon C K Straub
- Department of Biology, Hobart and William Smith Colleges, Geneva, NY, USA
| |
Collapse
|
11
|
Simmons MP, Springer MS, Gatesy J. Gene-tree misrooting drives conflicts in phylogenomic coalescent analyses of palaeognath birds. Mol Phylogenet Evol 2021; 167:107344. [PMID: 34748873 DOI: 10.1016/j.ympev.2021.107344] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 10/08/2021] [Accepted: 11/02/2021] [Indexed: 10/19/2022]
Abstract
Phylogenomic analyses of ancient rapid radiations can produce conflicting results that are driven by differential sampling of taxa and characters as well as the limitations of alternative analytical methods. We re-examine basal relationships of palaeognath birds (ratites and tinamous) using recently published datasets of nucleotide characters from 20,850 loci as well as 4301 retroelement insertions. The original studies attributed conflicting resolutions of rheas in their inferred coalescent and concatenation trees to concatenation failing in the anomaly zone. By contrast, we find that the coalescent-based resolution of rheas is premised upon extensive gene-tree estimation errors. Furthermore, retroelement insertions contain much more conflict than originally reported and multiple insertion loci support the basal position of rheas found in concatenation trees, while none were reported in the original publication. We demonstrate how even remarkable congruence in phylogenomic studies may be driven by long-branch misplacement of a divergent outgroup, highly incongruent gene trees, differential taxon sampling that can result in gene-tree misrooting errors that bias species-tree inference, and gross homology errors. What was previously interpreted as broad, robustly supported corroboration for a single resolution in coalescent analyses may instead indicate a common bias that taints phylogenomic results across multiple genome-scale datasets. The updated retroelement dataset now supports a species tree with branch lengths that suggest an ancient anomaly zone, and both concatenation and coalescent analyses of the huge nucleotide datasets fail to yield coherent, reliable results in this challenging phylogenetic context.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA.
| | - Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA 92521, USA
| | - John Gatesy
- Division of Vertebrate Zoology and Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA
| |
Collapse
|
12
|
Bravo GA, Schmitt CJ, Edwards SV. What Have We Learned from the First 500 Avian Genomes? ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS 2021. [DOI: 10.1146/annurev-ecolsys-012121-085928] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The increased capacity of DNA sequencing has significantly advanced our understanding of the phylogeny of birds and the proximate and ultimate mechanisms molding their genomic diversity. In less than a decade, the number of available avian reference genomes has increased to over 500—approximately 5% of bird diversity—placing birds in a privileged position to advance the fields of phylogenomics and comparative, functional, and population genomics. Whole-genome sequence data, as well as indels and rare genomic changes, are further resolving the avian tree of life. The accumulation of bird genomes, increasingly with long-read sequence data, greatly improves the resolution of genomic features such as germline-restricted chromosomes and the W chromosome, and is facilitating the comparative integration of genotypes and phenotypes. Community-based initiatives such as the Bird 10,000 Genomes Project and Vertebrate Genome Project are playing a fundamental role in amplifying and coalescing a vibrant international program in avian comparative genomics.
Collapse
Affiliation(s)
- Gustavo A. Bravo
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard University, Cambridge, Massachusetts 02138, USA;, ,
| | - C. Jonathan Schmitt
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard University, Cambridge, Massachusetts 02138, USA;, ,
| | - Scott V. Edwards
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard University, Cambridge, Massachusetts 02138, USA;, ,
| |
Collapse
|
13
|
Protein Structure, Models of Sequence Evolution, and Data Type Effects in Phylogenetic Analyses of Mitochondrial Data: A Case Study in Birds. DIVERSITY 2021. [DOI: 10.3390/d13110555] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Phylogenomic analyses have revolutionized the study of biodiversity, but they have revealed that estimated tree topologies can depend, at least in part, on the subset of the genome that is analyzed. For example, estimates of trees for avian orders differ if protein-coding or non-coding data are analyzed. The bird tree is a good study system because the historical signal for relationships among orders is very weak, which should permit subtle non-historical signals to be identified, while monophyly of orders is strongly corroborated, allowing identification of strong non-historical signals. Hydrophobic amino acids in mitochondrially-encoded proteins, which are expected to be found in transmembrane helices, have been hypothesized to be associated with non-historical signals. We tested this hypothesis by comparing the evolution of transmembrane helices and extramembrane segments of mitochondrial proteins from 420 bird species, sampled from most avian orders. We estimated amino acid exchangeabilities for both structural environments and assessed the performance of phylogenetic analysis using each data type. We compared those relative exchangeabilities with values calculated using a substitution matrix for transmembrane helices estimated using a variety of nuclear- and mitochondrially-encoded proteins, allowing us to compare the bird-specific mitochondrial models with a general model of transmembrane protein evolution. To complement our amino acid analyses, we examined the impact of protein structure on patterns of nucleotide evolution. Models of transmembrane and extramembrane sequence evolution for amino acids and nucleotides exhibited striking differences, but there was no evidence for strong topological data type effects. However, incorporating protein structure into analyses of mitochondrially-encoded proteins improved model fit. Thus, we believe that considering protein structure will improve analyses of mitogenomic data, both in birds and in other taxa.
Collapse
|
14
|
Forthman M, Braun EL, Kimball RT. Gene tree quality affects empirical coalescent branch length estimation. ZOOL SCR 2021. [DOI: 10.1111/zsc.12512] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Affiliation(s)
- Michael Forthman
- Department of Entomology & Nematology University of Florida Gainesville FL USA
- California State Collection of Arthropods Plant Pest Diagnostics Branch California Department of Food & Agriculture Sacramento CA USA
| | - Edward L. Braun
- Department of Biology University of Florida Gainesville FL USA
| | | |
Collapse
|
15
|
Urantówka AD, Kroczak A, Strzała T, Zaniewicz G, Kurkowski M, Mackiewicz P. Mitogenomes of Accipitriformes and Cathartiformes Were Subjected to Ancestral and Recent Duplications Followed by Gradual Degeneration. Genome Biol Evol 2021; 13:6357707. [PMID: 34432018 PMCID: PMC8435663 DOI: 10.1093/gbe/evab193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/15/2021] [Indexed: 11/25/2022] Open
Abstract
The rearrangement of 37 genes with one control region, firstly identified in Gallus gallus mitogenome, is believed to be ancestral for all Aves. However, mitogenomic sequences obtained in recent years revealed that many avian mitogenomes contain duplicated regions that were omitted in previous genomic versions. Their evolution and mechanism of duplication are still poorly understood. The order of Accipitriformes is especially interesting in this context because its representatives contain a duplicated control region in various stages of degeneration. Therefore, we applied an appropriate PCR strategy to look for duplications within the mitogenomes of the early diverged species Sagittarius serpentarius and Cathartiformes, which is a sister order to Accipitriformes. The analyses revealed the same duplicated gene order in all examined taxa and the common ancestor of these groups. The duplicated regions were subjected to gradual degeneration and homogenization during concerted evolution. The latter process occurred recently in the species of Cathartiformes as well as in the early diverged lineages of Accipitriformes, that is, Sagittarius serpentarius and Pandion haliaetus. However, in other lineages, that is, Pernis ptilorhynchus, as well as representatives of Aegypiinae, Aquilinae, and five related subfamilies of Accipitriformes (Accipitrinae, Circinae, Buteoninae, Haliaeetinae, and Milvinae), the duplications were evolving independently for at least 14–47 Myr. Different portions of control regions in Cathartiformes showed conflicting phylogenetic signals indicating that some sections of these regions were homogenized at a frequency higher than the rate of speciation, whereas others have still evolved separately.
Collapse
Affiliation(s)
- Adam Dawid Urantówka
- Department of Genetics, Wroclaw University of Environmental and Life Sciences, Poland
| | - Aleksandra Kroczak
- Department of Genetics, Wroclaw University of Environmental and Life Sciences, Poland.,Department of Bioinformatics and Genomics, Faculty of Biotechnology, Wrocław University, Poland
| | - Tomasz Strzała
- Department of Genetics, Wroclaw University of Environmental and Life Sciences, Poland
| | - Grzegorz Zaniewicz
- Department of Vertebrate Ecology and Zoology, Avian Ecophysiology Unit, University of Gdańsk, Poland
| | - Marcin Kurkowski
- Department of Genetics, Wroclaw University of Environmental and Life Sciences, Poland
| | - Paweł Mackiewicz
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, Wrocław University, Poland
| |
Collapse
|
16
|
Kuhl H, Frankl-Vilches C, Bakker A, Mayr G, Nikolaus G, Boerno ST, Klages S, Timmermann B, Gahr M. An Unbiased Molecular Approach Using 3'-UTRs Resolves the Avian Family-Level Tree of Life. Mol Biol Evol 2021; 38:108-127. [PMID: 32781465 PMCID: PMC7783168 DOI: 10.1093/molbev/msaa191] [Citation(s) in RCA: 56] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
Presumably, due to a rapid early diversification, major parts of the higher-level phylogeny of birds are still resolved controversially in different analyses or are considered unresolvable. To address this problem, we produced an avian tree of life, which includes molecular sequences of one or several species of ∼90% of the currently recognized family-level taxa (429 species, 379 genera) including all 106 family-level taxa of the nonpasserines and 115 of the passerines (Passeriformes). The unconstrained analyses of noncoding 3-prime untranslated region (3′-UTR) sequences and those of coding sequences yielded different trees. In contrast to the coding sequences, the 3′-UTR sequences resulted in a well-resolved and stable tree topology. The 3′-UTR contained, unexpectedly, transcription factor binding motifs that were specific for different higher-level taxa. In this tree, grebes and flamingos are the sister clade of all other Neoaves, which are subdivided into five major clades. All nonpasserine taxa were placed with robust statistical support including the long-time enigmatic hoatzin (Opisthocomiformes), which was found being the sister taxon of the Caprimulgiformes. The comparatively late radiation of family-level clades of the songbirds (oscine Passeriformes) contrasts with the attenuated diversification of nonpasseriform taxa since the early Miocene. This correlates with the evolution of vocal production learning, an important speciation factor, which is ancestral for songbirds and evolved convergent only in hummingbirds and parrots. As 3′-UTR-based phylotranscriptomics resolved the avian family-level tree of life, we suggest that this procedure will also resolve the all-species avian tree of life
Collapse
Affiliation(s)
- Heiner Kuhl
- Department of Behavioural Neurobiology, Max Planck Institute for Ornithology, Seewiesen, Germany.,Max Planck Institute for Molecular Genetics, Sequencing Core Facility, Berlin, Germany.,Department of Ecophysiology and Aquaculture, Leibniz-Institute of Freshwater Ecology and Inland Fisheries, Berlin, Germany
| | - Carolina Frankl-Vilches
- Department of Behavioural Neurobiology, Max Planck Institute for Ornithology, Seewiesen, Germany
| | - Antje Bakker
- Department of Behavioural Neurobiology, Max Planck Institute for Ornithology, Seewiesen, Germany
| | - Gerald Mayr
- Ornithological Section, Senckenberg Research Institute, Frankfurt am Main, Germany
| | - Gerhard Nikolaus
- Department of Behavioural Neurobiology, Max Planck Institute for Ornithology, Seewiesen, Germany
| | - Stefan T Boerno
- Max Planck Institute for Molecular Genetics, Sequencing Core Facility, Berlin, Germany
| | - Sven Klages
- Max Planck Institute for Molecular Genetics, Sequencing Core Facility, Berlin, Germany
| | - Bernd Timmermann
- Max Planck Institute for Molecular Genetics, Sequencing Core Facility, Berlin, Germany
| | - Manfred Gahr
- Department of Behavioural Neurobiology, Max Planck Institute for Ornithology, Seewiesen, Germany
| |
Collapse
|
17
|
Nasir A, Mughal F, Caetano-Anollés G. The tree of life describes a tripartite cellular world. Bioessays 2021; 43:e2000343. [PMID: 33837594 DOI: 10.1002/bies.202000343] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2020] [Revised: 03/11/2021] [Accepted: 03/15/2021] [Indexed: 12/28/2022]
Abstract
The canonical view of a 3-domain (3D) tree of life was recently challenged by the discovery of Asgardarchaeota encoding eukaryote signature proteins (ESPs), which were treated as missing links of a 2-domain (2D) tree. Here we revisit the debate. We discuss methodological limitations of building trees with alignment-dependent approaches, which often fail to satisfactorily address the problem of ''gaps.'' In addition, most phylogenies are reconstructed unrooted, neglecting the power of direct rooting methods. Alignment-free methodologies lift most difficulties but require employing realistic evolutionary models. We argue that the discoveries of Asgards and ESPs, by themselves, do not rule out the 3D tree, which is strongly supported by comparative and evolutionary genomic analyses and vast genomic and biochemical superkingdom distinctions. Given uncertainties of retrodiction and interpretation difficulties, we conclude that the 3D view has not been falsified but instead has been strengthened by genomic analyses. In turn, the objections to the 2D model have not been lifted. The debate remains open. Also see the video abstract here: https://youtu.be/-6TBN0bubI8.
Collapse
Affiliation(s)
- Arshan Nasir
- Theoretical Biology and Biophysics (T-6), Los Alamos National Laboratory, Los Alamos, New Mexico, USA
| | - Fizza Mughal
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Gustavo Caetano-Anollés
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| |
Collapse
|
18
|
Dibaeinia P, Tabe-Bordbar S, Warnow T. FASTRAL: Improving scalability of phylogenomic analysis. Bioinformatics 2021; 37:2317-2324. [PMID: 33576396 PMCID: PMC8388037 DOI: 10.1093/bioinformatics/btab093] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 02/02/2021] [Accepted: 02/04/2021] [Indexed: 01/22/2023] Open
Abstract
MOTIVATION ASTRAL is the current leading method for species tree estimation from phylogenomic datasets (i.e., hundreds to thousands of genes) that addresses gene tree discord resulting from incomplete lineage sorting (ILS). ASTRAL is statistically consistent under the multi-locus coalescent model (MSC), runs in polynomial time, and is able to run on large datasets. Key to ASTRAL's algorithm is the use of dynamic programming to find an optimal solution to the MQSST (maximum quartet support supertree) within a constraint space that it computes from the input. Yet, ASTRAL can fail to complete within reasonable timeframes on large datasets with many genes and species, because in these cases the constraint space it computes is too large. RESULTS Here we introduce FASTRAL, a phylogenomic estimation method. FASTRAL is based on ASTRAL, but uses a different technique for constructing the constraint space. The technique we use to define the constraint space maintains statistical consistency and is polynomial time; thus we prove that FASTRAL is a polynomial time algorithm that is statistically consistent under the MSC. Our performance study on both biological and simulated data sets demonstrates that FASTRAL matches or improves on ASTRAL with respect to species tree topology accuracy (and under high ILS conditions it is statistically significantly more accurate), while being dramatically faster-especially on datasets with large numbers of genes and high ILS-due to using a significantly smaller constraint space. AVAILABILITY FASTRAL is available in open-source form at https://github.com/PayamDiba/FASTRAL. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Payam Dibaeinia
- Department of Computer Science, University of Illinois, Urbana, IL 61801, USA
| | - Shayan Tabe-Bordbar
- Department of Computer Science, University of Illinois, Urbana, IL 61801, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois, Urbana, IL 61801, USA,To whom correspondence should be addressed.
| |
Collapse
|
19
|
Collapsing dubiously resolved gene-tree branches in phylogenomic coalescent analyses. Mol Phylogenet Evol 2021; 158:107092. [PMID: 33545272 DOI: 10.1016/j.ympev.2021.107092] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 12/30/2020] [Accepted: 01/28/2021] [Indexed: 01/15/2023]
Abstract
In two-step coalescent analyses of phylogenomic data, gene-tree topologies are treated as fixed prior to species-tree inference. Although all gene-tree conflict is assumed to be caused by lineage sorting when applying these methods, in empirical datasets much of the conflict can be caused by estimation error. Weakly supported and even arbitrarily resolved clades are important sources of this estimation error for gene trees inferred from few informative characters relative to the number of sampled terminals, and the resulting extraneous conflict among gene trees can negatively impact species-tree inference. In this study, we quantified the relative severity of alternative methods for collapsing gene-tree branches for seven empirical datasets and quantified their effects on species-tree inference. The branch-collapsing methods that we employed were based on the strict consensus of optimal topologies, various bootstrap thresholds, and 0% approximate likelihood ratio test (SH-like aLRT) support. Up to 86% of internal gene-tree branches are dubiously or arbitrarily resolved in reanalyses of these published phylogenomic datasets, and collapsing these branches increased inferred species-tree coalescent branch lengths by up to 455%. For two datasets, the longer inferred branch lengths sometimes impacted inference of anomaly-zone conditions. Although branch-collapsing methods did not consistently affect the species-tree topology, they often increased branch support. The more severe and clearly justified gene-tree branch-collapsing methods, which we recommend be broadly applied for two-step coalescent analyses, are use of the strict consensus in parsimony analyses and the collapse clades with 0% SH-like aLRT support in likelihood analyses. Collapsing dubiously or arbitrarily resolved branches in gene trees sometimes improved congruence between coalescent-based results and concatenation trees. In such cases, we contend that the resolution provided by concatenation should be preferred and that incomplete lineage sorting is a poor explanation for the initial conflict between phylogenetic approaches.
Collapse
|
20
|
Abstract
The phylogeny of Neoaves, the largest clade of extant birds, has remained unclear despite intense study. The difficulty associated with resolving the early branches in Neoaves is likely driven by the rapid radiation of this group. However, conflicts among studies may be exacerbated by the data type analyzed. For example, analyses of coding exons typically yield trees that place Strisores (nightjars and allies) sister to the remaining Neoaves, while analyses of non-coding data typically yield trees where Mirandornites (flamingos and grebes) is the sister of the remaining Neoaves. Our understanding of data type effects is hampered by the fact that previous analyses have used different taxa, loci, and types of non-coding data. Herein, we provide strong corroboration of the data type effects hypothesis for Neoaves by comparing trees based on coding and non-coding data derived from the same taxa and gene regions. A simple analytical method known to minimize biases due to base composition (coding nucleotides as purines and pyrimidines) resulted in coding exon data with increased congruence to the non-coding topology using concatenated analyses. These results improve our understanding of the resolution of neoavian phylogeny and point to a challenge—data type effects—that is likely to be an important factor in phylogenetic analyses of birds (and many other taxonomic groups). Using our results, we provide a summary phylogeny that identifies well-corroborated relationships and highlights specific nodes where future efforts should focus.
Collapse
|
21
|
Skoracki M, Kosicki JZ, Hromada M. Unusual parasite from an enigmatic host – a new group of mites infesting feather quills of the hoatzin. THE EUROPEAN ZOOLOGICAL JOURNAL 2021. [DOI: 10.1080/24750263.2020.1849437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open
Affiliation(s)
- M. Skoracki
- Department of Animal Morphology, Faculty of Biology, Adam Mickiewicz University, Poznań Poland
- Laboratory and Museum of Evolutionary Ecology, Department of Ecology, Faculty of Humanities and Natural Sciences, University of Presov, Prešov, Slovakia
| | - J. Z. Kosicki
- Department of Avian Biology and Ecology, Faculty of Biology, Adam Mickiewicz University, Poznań Poland
| | - M. Hromada
- Laboratory and Museum of Evolutionary Ecology, Department of Ecology, Faculty of Humanities and Natural Sciences, University of Presov, Prešov, Slovakia
- Faculty of Biological Sciences, University of Zielona Góra, Zielona Góra, Poland
| |
Collapse
|
22
|
Deep-Time Demographic Inference Suggests Ecological Release as Driver of Neoavian Adaptive Radiation. DIVERSITY-BASEL 2020. [DOI: 10.3390/d12040164] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Assessing the applicability of theory to major adaptive radiations in deep time represents an extremely difficult problem in evolutionary biology. Neoaves, which includes 95% of living birds, is believed to have undergone a period of rapid diversification roughly coincident with the Cretaceous–Paleogene (K-Pg) boundary. We investigate whether basal neoavian lineages experienced an ecological release in response to ecological opportunity, as evidenced by density compensation. We estimated effective population sizes (Ne) of basal neoavian lineages by combining coalescent branch lengths (CBLs) and the numbers of generations between successive divergences. We used a modified version of Accurate Species TRee Algorithm (ASTRAL) to estimate CBLs directly from insertion–deletion (indel) data, as well as from gene trees using DNA sequence and/or indel data. We found that some divergences near the K-Pg boundary involved unexpectedly high gene tree discordance relative to the estimated number of generations between speciation events. The simplest explanation for this result is an increase in Ne, despite the caveats discussed herein. It appears that at least some early neoavian lineages, similar to the ancestor of the clade comprising doves, mesites, and sandgrouse, experienced ecological release near the time of the K-Pg mass extinction.
Collapse
|
23
|
Ducrest A, Neuenschwander S, Schmid‐Siegert E, Pagni M, Train C, Dylus D, Nevers Y, Warwick Vesztrocy A, San‐Jose LM, Dupasquier M, Dessimoz C, Xenarios I, Roulin A, Goudet J. New genome assembly of the barn owl ( Tyto alba alba). Ecol Evol 2020; 10:2284-2298. [PMID: 32184981 PMCID: PMC7069322 DOI: 10.1002/ece3.5991] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 12/05/2019] [Accepted: 12/16/2019] [Indexed: 12/25/2022] Open
Abstract
New genomic tools open doors to study ecology, evolution, and population genomics of wild animals. For the Barn owl species complex, a cosmopolitan nocturnal raptor, a very fragmented draft genome was assembled for the American species (Tyto furcata pratincola) (Jarvis et al. 2014). To improve the genome, we assembled de novo Illumina and Pacific Biosciences (PacBio) long reads sequences of its European counterpart (Tyto alba alba). This genome assembly of 1.219 Gbp comprises 21,509 scaffolds and results in a N50 of 4,615,526 bp. BUSCO (Universal Single-Copy Orthologs) analysis revealed an assembly completeness of 94.8% with only 1.8% of the genes missing out of 4,915 avian orthologs searched, a proportion similar to that found in the genomes of the zebra finch (Taeniopygia guttata) or the collared flycatcher (Ficedula albicollis). By mapping the reads of the female American barn owl to the male European barn owl reads, we detected several structural variants and identified 70 Mbp of the Z chromosome. The barn owl scaffolds were further mapped to the chromosomes of the zebra finch. In addition, the completeness of the European barn owl genome is demonstrated with 94 of 128 proteins missing in the chicken genome retrieved in the European barn owl transcripts. This improved genome will help future barn owl population genomic investigations.
Collapse
Affiliation(s)
- Anne‐Lyse Ducrest
- Department of Ecology and EvolutionUniversity of LausanneLausanneSwitzerland
| | | | | | - Marco Pagni
- Vital‐ITSwiss Institute of BioinformaticsLausanneSwitzerland
| | - Clément Train
- Department of Computational BiologyUniversity of LausanneLausanneSwitzerland
- Center for Integrative GenomicsUniversity of LausanneLausanneSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - David Dylus
- Department of Computational BiologyUniversity of LausanneLausanneSwitzerland
- Center for Integrative GenomicsUniversity of LausanneLausanneSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Yannis Nevers
- Department of Computational BiologyUniversity of LausanneLausanneSwitzerland
- Center for Integrative GenomicsUniversity of LausanneLausanneSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Alex Warwick Vesztrocy
- Center for Life's Origins and EvolutionDepartment of Genetics, Evolution and EnvironmentUniversity College LondonLondonUK
| | - Luis M. San‐Jose
- Laboratory Evolution and Biological DiversityUMR 5174CNRSUniversity of Toulouse III Paul SabatierToulouseFrance
| | | | - Christophe Dessimoz
- Department of Computational BiologyUniversity of LausanneLausanneSwitzerland
- Center for Integrative GenomicsUniversity of LausanneLausanneSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Ioannis Xenarios
- Center for Integrative GenomicsUniversity of LausanneLausanneSwitzerland
| | - Alexandre Roulin
- Department of Ecology and EvolutionUniversity of LausanneLausanneSwitzerland
| | - Jérôme Goudet
- Department of Ecology and EvolutionUniversity of LausanneLausanneSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| |
Collapse
|
24
|
Skoracki M, Sikora B, Jerzak L, Hromada M. Tanopicobia gen. nov., a new genus of quill mites, its phylogenetic placement in the subfamily Picobiinae (Acariformes: Syringophilidae) and picobiine relationships with avian hosts. PLoS One 2020; 15:e0225982. [PMID: 31940314 PMCID: PMC6961858 DOI: 10.1371/journal.pone.0225982] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2019] [Accepted: 10/12/2019] [Indexed: 11/25/2022] Open
Abstract
A new monotypic genus Tanopicobia gen. nov. is established for a new species Tanopicobia trachyphoni sp. nov., parasitizing Trachyphonus erythrocephalus Cabanis, 1878 (Piciformes: Lybiidae) from Tanzania. In phylogenetic analyses based on morphological data and constructed using the maximum parsimony approach, this taxon falls within the subfamily Picobiinae Johnston and Kethley, 1973 in the Neopicobia-species-group as closely related to the genus Pipicobia Glowska and Schmidt, 2014. Tanopicobia differs from Pipicobia by the following features in females: genital setae absent; setae ve are situated far and posteromedial to the level of setal bases vi; setae 3a are thick and knobbed. Additionally, a new generic key for subfamily Picobiinae is constructed and general host-parasite ecological and phylogenetic relationships are discussed. Picobiines are present in several lineages of neoavian birds, from basal Galloanseres to terminal Telluraves, which are infested by 70 (89.7% of all) species of these ectoparasites.
Collapse
Affiliation(s)
- Maciej Skoracki
- Department of Animal Morphology, Faculty of Biology, Adam Mickiewicz University, Poznań, Poland
- Laboratory and Museum of Evolutionary Ecology, Department of Ecology, Faculty of Humanities and Natural Sciences, University of Presov, Prešov, Slovakia
| | - Bozena Sikora
- Department of Animal Morphology, Faculty of Biology, Adam Mickiewicz University, Poznań, Poland
| | - Leszek Jerzak
- Faculty of Biological Sciences, University of Zielona Góra, Zielona Góra, Poland
| | - Martin Hromada
- Laboratory and Museum of Evolutionary Ecology, Department of Ecology, Faculty of Humanities and Natural Sciences, University of Presov, Prešov, Slovakia
- Faculty of Biological Sciences, University of Zielona Góra, Zielona Góra, Poland
| |
Collapse
|
25
|
Springer MS, Molloy EK, Sloan DB, Simmons MP, Gatesy J. ILS-Aware Analysis of Low-Homoplasy Retroelement Insertions: Inference of Species Trees and Introgression Using Quartets. J Hered 2019; 111:147-168. [DOI: 10.1093/jhered/esz076] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Accepted: 12/12/2019] [Indexed: 12/20/2022] Open
Abstract
Abstract
DNA sequence alignments have provided the majority of data for inferring phylogenetic relationships with both concatenation and coalescent methods. However, DNA sequences are susceptible to extensive homoplasy, especially for deep divergences in the Tree of Life. Retroelement insertions have emerged as a powerful alternative to sequences for deciphering evolutionary relationships because these data are nearly homoplasy-free. In addition, retroelement insertions satisfy the “no intralocus-recombination” assumption of summary coalescent methods because they are singular events and better approximate neutrality relative to DNA loci commonly sampled in phylogenomic studies. Retroelements have traditionally been analyzed with parsimony, distance, and network methods. Here, we analyze retroelement data sets for vertebrate clades (Placentalia, Laurasiatheria, Balaenopteroidea, Palaeognathae) with 2 ILS-aware methods that operate by extracting, weighting, and then assembling unrooted quartets into a species tree. The first approach constructs a species tree from retroelement bipartitions with ASTRAL, and the second method is based on split-decomposition with parsimony. We also develop a Quartet-Asymmetry test to detect hybridization using retroelements. Both ILS-aware methods recovered the same species-tree topology for each data set. The ASTRAL species trees for Laurasiatheria have consecutive short branch lengths in the anomaly zone whereas Palaeognathae is outside of this zone. For the Balaenopteroidea data set, which includes rorquals (Balaenopteridae) and gray whale (Eschrichtiidae), both ILS-aware methods resolved balaeonopterids as paraphyletic. Application of the Quartet-Asymmetry test to this data set detected 19 different quartets of species for which historical introgression may be inferred. Evidence for introgression was not detected in the other data sets.
Collapse
Affiliation(s)
- Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA
| | - Erin K Molloy
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL
| | - Daniel B Sloan
- Department of Biology, Colorado State University, Fort Collins, CO
| | - Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO
| | - John Gatesy
- Division of Vertebrate Zoology and Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY
| |
Collapse
|