1
|
Chen G, Yu D, Yang Y, Li X, Wang X, Sun D, Lu Y, Ke R, Zhang G, Cui J, Feng S. Adaptive expansion of ERVK solo-LTRs is associated with Passeriformes speciation events. Nat Commun 2024; 15:3151. [PMID: 38605055 PMCID: PMC11009239 DOI: 10.1038/s41467-024-47501-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 04/02/2024] [Indexed: 04/13/2024] Open
Abstract
Endogenous retroviruses (ERVs) are ancient retroviral remnants integrated in host genomes, and commonly deleted through unequal homologous recombination, leaving solitary long terminal repeats (solo-LTRs). This study, analysing the genomes of 362 bird species and their reptilian and mammalian outgroups, reveals an unusually higher level of solo-LTRs formation in birds, indicating evolutionary forces might have purged ERVs during evolution. Strikingly in the order Passeriformes, and especially the parvorder Passerida, endogenous retrovirus K (ERVK) solo-LTRs showed bursts of formation and recurrent accumulations coinciding with speciation events over past 22 million years. Moreover, our results indicate that the ongoing expansion of ERVK solo-LTRs in these bird species, marked by high transcriptional activity of ERVK retroviral genes in reproductive organs, caused variation of solo-LTRs between individual zebra finches. We experimentally demonstrated that cis-regulatory activity of recently evolved ERVK solo-LTRs may significantly increase the expression level of ITGA2 in the brain of zebra finches compared to chickens. These findings suggest that ERVK solo-LTRs expansion may introduce novel genomic sequences acting as cis-regulatory elements and contribute to adaptive evolution. Overall, our results underscore that the residual sequences of ancient retroviruses could influence the adaptive diversification of species by regulating host gene expression.
Collapse
Affiliation(s)
- Guangji Chen
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
- Center for Evolutionary & Organismal Biology, Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China
- BGI Research, Wuhan, China
| | - Dan Yu
- Center for Evolutionary & Organismal Biology, Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
| | - Yu Yang
- School of Medicine, Huaqiao University, Xiamen, Fujian, 361021, China
| | - Xiang Li
- CAS Key Laboratory of Molecular Virology & Immunology, Shanghai Institute of Immunity and Infection, Chinese Academy of Sciences, Shanghai, China
| | - Xiaojing Wang
- CAS Key Laboratory of Molecular Virology & Immunology, Shanghai Institute of Immunity and Infection, Chinese Academy of Sciences, Shanghai, China
| | - Danyang Sun
- Center for Evolutionary & Organismal Biology, Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
| | - Yanlin Lu
- Center for Evolutionary & Organismal Biology, Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
| | - Rongqin Ke
- School of Medicine, Huaqiao University, Xiamen, Fujian, 361021, China
| | - Guojie Zhang
- Center for Evolutionary & Organismal Biology, Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China
- Innovation Center of Yangtze River Delta, Zhejiang University, Jiashan, China
| | - Jie Cui
- Department of Infectious Diseases, National Medical Center for Infectious Diseases, Huashan Hospital, Institute of Infection and Health Research, Fudan University, Shanghai, China.
- Laboratory for Marine Biology and Biotechnology, Qingdao Marine Science and Technology Center, Qingdao, China.
- Shanghai Sci-Tech Inno Center for Infection & Immunity, Shanghai, 200052, China.
- Shanghai Key Laboratory of Infectious Diseases and Biosafety Emergency Response, Huashan Hospital, Fudan University, Shanghai, China.
| | - Shaohong Feng
- Center for Evolutionary & Organismal Biology, Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China.
- Innovation Center of Yangtze River Delta, Zhejiang University, Jiashan, China.
- Department of General Surgery of Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China.
| |
Collapse
|
2
|
Chesters D, Ferrari RR, Lin X, Orr MC, Staab M, Zhu CD. Launching insectphylo.org; a new hub facilitating construction and use of synthesis molecular phylogenies of insects. Mol Ecol Resour 2023; 23:1556-1573. [PMID: 37265018 DOI: 10.1111/1755-0998.13817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Revised: 05/07/2023] [Accepted: 05/17/2023] [Indexed: 06/03/2023]
Abstract
The Holy Grail of an Insect Tree of Life can only be 'discovered' through extensive collaboration among taxon specialists, phylogeneticists and centralized frameworks such as Open Tree of Life, but insufficient effort from stakeholders has so far hampered this promising approach. The resultant unavailability of synthesis phylogenies is an unfortunate situation given the numerous practical usages of phylogenies in the near term and against the backdrop of the ongoing biodiversity crisis. To resolve this issue, we establish a new online hub that centralizes the collation of relevant phylogenetic data and provides the resultant synthesis molecular phylogenies. This is achieved through key developments in a proposed pipeline for the construction of a species-level insect phylogeny. The functionality of the framework is demonstrated through the construction of a highly supported, species-comprehensive phylogeny of Diptera, built from integrated omics data, COI DNA barcodes, and a compiled database of over 100 standardized, published Diptera phylogenies. Machine-readable forms of the phylogeny (and subsets thereof) are publicly available at insectphylo.org, a new public repository for species-comprehensive phylogenies for biological research.
Collapse
Affiliation(s)
- Douglas Chesters
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- International College, University of Chinese Academy of Sciences, Beijing, China
| | - Rafael R Ferrari
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Xiaolong Lin
- Engineering Research Center of Environmental DNA and Ecological Water Health Assessment, Shanghai Ocean University, Shanghai, China
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai Ocean University, Shanghai, China
| | - Michael C Orr
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- Entomologie, Staatliches Museum für Naturkunde Stuttgart, Stuttgart, Germany
| | - Michael Staab
- Ecological Networks, Technische Universität Darmstadt, Darmstadt, Germany
| | - Chao-Dong Zhu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- International College, University of Chinese Academy of Sciences, Beijing, China
- State Key Laboratory of Integrated Pest Management, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- College of Biological Sciences, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
3
|
Bravo GA, Schmitt CJ, Edwards SV. What Have We Learned from the First 500 Avian Genomes? ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS 2021. [DOI: 10.1146/annurev-ecolsys-012121-085928] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The increased capacity of DNA sequencing has significantly advanced our understanding of the phylogeny of birds and the proximate and ultimate mechanisms molding their genomic diversity. In less than a decade, the number of available avian reference genomes has increased to over 500—approximately 5% of bird diversity—placing birds in a privileged position to advance the fields of phylogenomics and comparative, functional, and population genomics. Whole-genome sequence data, as well as indels and rare genomic changes, are further resolving the avian tree of life. The accumulation of bird genomes, increasingly with long-read sequence data, greatly improves the resolution of genomic features such as germline-restricted chromosomes and the W chromosome, and is facilitating the comparative integration of genotypes and phenotypes. Community-based initiatives such as the Bird 10,000 Genomes Project and Vertebrate Genome Project are playing a fundamental role in amplifying and coalescing a vibrant international program in avian comparative genomics.
Collapse
Affiliation(s)
- Gustavo A. Bravo
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard University, Cambridge, Massachusetts 02138, USA;, ,
| | - C. Jonathan Schmitt
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard University, Cambridge, Massachusetts 02138, USA;, ,
| | - Scott V. Edwards
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard University, Cambridge, Massachusetts 02138, USA;, ,
| |
Collapse
|
4
|
Mctavish EJ, Sánchez-Reyes LL, Holder MT. OpenTree: A Python Package for Accessing and Analyzing Data from the Open Tree of Life. Syst Biol 2021; 70:1295-1301. [PMID: 33970279 PMCID: PMC8513759 DOI: 10.1093/sysbio/syab033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 04/27/2021] [Accepted: 05/03/2021] [Indexed: 11/14/2022] Open
Abstract
The Open Tree of Life project constructs a comprehensive, dynamic, and digitally available tree of life by synthesizing published phylogenetic trees along with taxonomic data. Open Tree of Life provides web-service application programming interfaces (APIs) to make the tree estimate, unified taxonomy, and input phylogenetic data available to anyone. Here, we describe the Python package opentree, which provides a user friendly Python wrapper for these APIs and a set of scripts and tutorials for straightforward downstream data analyses. We demonstrate the utility of these tools by generating an estimate of the phylogenetic relationships of all bird families, and by capturing a phylogenetic estimate for all taxa observed at the University of California Merced Vernal Pools and Grassland Reserve.[Evolution; open science; phylogenetics; Python; taxonomy.].
Collapse
Affiliation(s)
- Emily Jane Mctavish
- Department of Life and Environmental Sciences, University of California, Merced, CA 95343, USA
| | | | - Mark T Holder
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA
- Biodiversity Institute, University of Kansas, Lawrence, KS 66045, USA
| |
Collapse
|
5
|
Kimball RT, Hosner PA, Braun EL. A phylogenomic supermatrix of Galliformes (Landfowl) reveals biased branch lengths. Mol Phylogenet Evol 2021; 158:107091. [PMID: 33545275 DOI: 10.1016/j.ympev.2021.107091] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 01/16/2021] [Accepted: 01/27/2021] [Indexed: 11/25/2022]
Abstract
Building taxon-rich phylogenies is foundational for macroevolutionary studies. One approach to improve taxon sampling beyond individual studies is to build supermatricies of publicly available data, incorporating taxa sampled across different studies and utilizing different loci. Most existing supermatrix studies have focused on loci commonly sequenced with Sanger technology ("legacy" markers, such as mitochondrial data and small numbers of nuclear loci). However, incorporating phylogenomic studies into supermatrices allows problem nodes to be targeted and resolved with considerable amounts of data, while improving taxon sampling with legacy data. Here we estimate phylogeny from a galliform supermatrix which includes well-known model and agricultural species such as the chicken and turkey. We assembled a supermatrix comprising 4500 ultra-conserved elements (UCEs) collected as part of recent phylogenomic studies in this group and legacy mitochondrial and nuclear (intron and exon) sequences. Our resulting phylogeny included 88% of extant species and recovered well-accepted relationships with strong support. However, branch lengths, which are particularly important in down-stream macroevolutionary studies, appeared vastly skewed. Taxa represented only by rapidly evolving mitochondrial data had high proportions of missing data and exhibited long terminal branches. Conversely, taxa sampled for slowly evolving UCEs with low proportions of missing data exhibited substantially shorter terminal branches. We explored several branch length re-estimation methods with particular attention to terminal branches and conclude that re-estimation using well-sampled mitochondrial sequences may be a pragmatic approach to obtain trees suitable for macroevolutionary analysis.
Collapse
Affiliation(s)
- Rebecca T Kimball
- Department of Biology, University of Florida, Gainesville, FL 32607, USA.
| | - Peter A Hosner
- Department of Biology, University of Florida, Gainesville, FL 32607, USA; Natural History Museum of Denmark and Center for Macroecology, Evolution and Climate, University of Copenhagen, Copenhagen, Denmark
| | - Edward L Braun
- Department of Biology, University of Florida, Gainesville, FL 32607, USA
| |
Collapse
|
6
|
Dense sampling of bird diversity increases power of comparative genomics. Nature 2020; 587:252-257. [PMID: 33177665 PMCID: PMC7759463 DOI: 10.1038/s41586-020-2873-9] [Citation(s) in RCA: 186] [Impact Index Per Article: 46.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Accepted: 07/27/2020] [Indexed: 12/13/2022]
Abstract
Whole-genome sequencing projects are increasingly populating the tree of life and characterizing biodiversity1–4. Sparse taxon sampling has previously been proposed to confound phylogenetic inference5, and captures only a fraction of the genomic diversity. Here we report a substantial step towards the dense representation of avian phylogenetic and molecular diversity, by analysing 363 genomes from 92.4% of bird families—including 267 newly sequenced genomes produced for phase II of the Bird 10,000 Genomes (B10K) Project. We use this comparative genome dataset in combination with a pipeline that leverages a reference-free whole-genome alignment to identify orthologous regions in greater numbers than has previously been possible and to recognize genomic novelties in particular bird lineages. The densely sampled alignment provides a single-base-pair map of selection, has more than doubled the fraction of bases that are confidently predicted to be under conservation and reveals extensive patterns of weak selection in predominantly non-coding DNA. Our results demonstrate that increasing the diversity of genomes used in comparative studies can reveal more shared and lineage-specific variation, and improve the investigation of genomic characteristics. We anticipate that this genomic resource will offer new perspectives on evolutionary processes in cross-species comparative analyses and assist in efforts to conserve species. A dataset of the genomes of 363 species from the Bird 10,000 Genomes Project shows increased power to detect shared and lineage-specific variation, demonstrating the importance of phylogenetically diverse taxon sampling in whole-genome sequencing.
Collapse
|
7
|
Larson DA, Walker JF, Vargas OM, Smith SA. A consensus phylogenomic approach highlights paleopolyploid and rapid radiation in the history of Ericales. AMERICAN JOURNAL OF BOTANY 2020; 107:773-789. [PMID: 32350864 DOI: 10.1002/ajb2.1469] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2019] [Accepted: 02/12/2020] [Indexed: 05/27/2023]
Abstract
PREMISE Large genomic data sets offer the promise of resolving historically recalcitrant species relationships. However, different methodologies can yield conflicting results, especially when clades have experienced ancient, rapid diversification. Here, we analyzed the ancient radiation of Ericales and explored sources of uncertainty related to species tree inference, conflicting gene tree signal, and the inferred placement of gene and genome duplications. METHODS We used a hierarchical clustering approach, with tree-based homology and orthology detection, to generate six filtered phylogenomic matrices consisting of data from 97 transcriptomes and genomes. Support for species relationships was inferred from multiple lines of evidence including shared gene duplications, gene tree conflict, gene-wise edge-based analyses, concatenation, and coalescent-based methods, and is summarized in a consensus framework. RESULTS Our consensus approach supported a topology largely concordant with previous studies, but suggests that the data are not capable of resolving several ancient relationships because of lack of informative characters, sensitivity to methodology, and extensive gene tree conflict correlated with paleopolyploidy. We found evidence of a whole-genome duplication before the radiation of all or most ericalean families, and demonstrate that tree topology and heterogeneous evolutionary rates affect the inferred placement of genome duplications. CONCLUSIONS We provide several hypotheses regarding the history of Ericales, and confidently resolve most nodes, but demonstrate that a series of ancient divergences are unresolvable with these data. Whether paleopolyploidy is a major source of the observed phylogenetic conflict warrants further investigation.
Collapse
Affiliation(s)
- Drew A Larson
- Department of Ecology & Evolutionary Biology, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Joseph F Walker
- Sainsbury Laboratory (SLCU), University of Cambridge, Cambridge, CB2 1LR, UK
| | - Oscar M Vargas
- Department of Ecology & Evolutionary Biology, University of California, Santa Cruz, CA, 95060, USA
| | - Stephen A Smith
- Department of Ecology & Evolutionary Biology, University of Michigan, Ann Arbor, MI, 48109, USA
| |
Collapse
|
8
|
Comparative Phylogenomics, a Stepping Stone for Bird Biodiversity Studies. DIVERSITY-BASEL 2019. [DOI: 10.3390/d11070115] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Birds are a group with immense availability of genomic resources, and hundreds of forthcoming genomes at the doorstep. We review recent developments in whole genome sequencing, phylogenomics, and comparative genomics of birds. Short read based genome assemblies are common, largely due to efforts of the Bird 10K genome project (B10K). Chromosome-level assemblies are expected to increase due to improved long-read sequencing. The available genomic data has enabled the reconstruction of the bird tree of life with increasing confidence and resolution, but challenges remain in the early splits of Neoaves due to their explosive diversification after the Cretaceous-Paleogene (K-Pg) event. Continued genomic sampling of the bird tree of life will not just better reflect their evolutionary history but also shine new light onto the organization of phylogenetic signal and conflict across the genome. The comparatively simple architecture of avian genomes makes them a powerful system to study the molecular foundation of bird specific traits. Birds are on the verge of becoming an extremely resourceful system to study biodiversity from the nucleotide up.
Collapse
|
9
|
Abstract
It has long been appreciated that analyses of genomic data (e.g., whole genome sequencing or sequence capture) have the potential to reveal the tree of life, but it remains challenging to move from sequence data to a clear understanding of evolutionary history, in part due to the computational challenges of phylogenetic estimation using genome-scale data. Supertree methods solve that challenge because they facilitate a divide-and-conquer approach for large-scale phylogeny inference by integrating smaller subtrees in a computationally efficient manner. Here, we combined information from sequence capture and whole-genome phylogenies using supertree methods. However, the available phylogenomic trees had limited overlap so we used taxon-rich (but not phylogenomic) megaphylogenies to weave them together. This allowed us to construct a phylogenomic supertree, with support values, that included 707 bird species (~7% of avian species diversity). We estimated branch lengths using mitochondrial sequence data and we used these branch lengths to estimate divergence times. Our time-calibrated supertree supports radiation of all three major avian clades (Palaeognathae, Galloanseres, and Neoaves) near the Cretaceous-Paleogene (K-Pg) boundary. The approach we used will permit the continued addition of taxa to this supertree as new phylogenomic data are published, and it could be applied to other taxa as well.
Collapse
|
10
|
Bravo GA, Antonelli A, Bacon CD, Bartoszek K, Blom MPK, Huynh S, Jones G, Knowles LL, Lamichhaney S, Marcussen T, Morlon H, Nakhleh LK, Oxelman B, Pfeil B, Schliep A, Wahlberg N, Werneck FP, Wiedenhoeft J, Willows-Munro S, Edwards SV. Embracing heterogeneity: coalescing the Tree of Life and the future of phylogenomics. PeerJ 2019; 7:e6399. [PMID: 30783571 PMCID: PMC6378093 DOI: 10.7717/peerj.6399] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Accepted: 01/07/2019] [Indexed: 12/23/2022] Open
Abstract
Building the Tree of Life (ToL) is a major challenge of modern biology, requiring advances in cyberinfrastructure, data collection, theory, and more. Here, we argue that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by high-throughput sequencing (HTS). Such signals include those most commonly encountered in phylogenomic datasets, such as incomplete lineage sorting, but also those reticulate processes emerging with greater frequency, such as recombination and introgression. Here we focus specifically on how phylogenetic methods can accommodate the heterogeneity incurred by such population genetic processes; we do not discuss phylogenetic methods that ignore such processes, such as concatenation or supermatrix approaches or supertrees. We suggest that methods of data acquisition and the types of markers used in phylogenomics will remain restricted until a posteriori methods of marker choice are made possible with routine whole-genome sequencing of taxa of interest. We discuss limitations and potential extensions of a model supporting innovation in phylogenomics today, the multispecies coalescent model (MSC). Macroevolutionary models that use phylogenies, such as character mapping, often ignore the heterogeneity on which building phylogenies increasingly rely and suggest that assimilating such heterogeneity is an important goal moving forward. Finally, we argue that an integrative cyberinfrastructure linking all steps of the process of building the ToL, from specimen acquisition in the field to publication and tracking of phylogenomic data, as well as a culture that values contributors at each step, are essential for progress.
Collapse
Affiliation(s)
- Gustavo A. Bravo
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | - Alexandre Antonelli
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
- Gothenburg Botanical Garden, Göteborg, Sweden
| | - Christine D. Bacon
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Krzysztof Bartoszek
- Department of Computer and Information Science, Linköping University, Linköping, Sweden
| | - Mozes P. K. Blom
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden
| | - Stella Huynh
- Institut de Biologie, Université de Neuchâtel, Neuchâtel, Switzerland
| | - Graham Jones
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - L. Lacey Knowles
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Sangeet Lamichhaney
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | - Thomas Marcussen
- Centre for Ecological and Evolutionary Synthesis, University of Oslo, Oslo, Norway
| | - Hélène Morlon
- Institut de Biologie, Ecole Normale Supérieure de Paris, Paris, France
| | - Luay K. Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Bengt Oxelman
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Bernard Pfeil
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Alexander Schliep
- Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
| | | | - Fernanda P. Werneck
- Coordenação de Biodiversidade, Programa de Coleções Científicas Biológicas, Instituto Nacional de Pesquisa da Amazônia, Manaus, AM, Brazil
| | - John Wiedenhoeft
- Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
- Department of Computer Science, Rutgers University, Piscataway, NJ, USA
| | - Sandi Willows-Munro
- School of Life Sciences, University of Kwazulu-Natal, Pietermaritzburg, South Africa
| | - Scott V. Edwards
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
- Gothenburg Centre for Advanced Studies in Science and Technology, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
| |
Collapse
|
11
|
Franz NM, Musher LJ, Brown JW, Yu S, Ludäscher B. Verbalizing phylogenomic conflict: Representation of node congruence across competing reconstructions of the neoavian explosion. PLoS Comput Biol 2019; 15:e1006493. [PMID: 30768597 PMCID: PMC6395011 DOI: 10.1371/journal.pcbi.1006493] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Revised: 02/28/2019] [Accepted: 09/10/2018] [Indexed: 11/24/2022] Open
Abstract
Phylogenomic research is accelerating the publication of landmark studies that aim to resolve deep divergences of major organismal groups. Meanwhile, systems for identifying and integrating the products of phylogenomic inference-such as newly supported clade concepts-have not kept pace. However, the ability to verbalize node concept congruence and conflict across multiple, in effect simultaneously endorsed phylogenomic hypotheses, is a prerequisite for building synthetic data environments for biological systematics and other domains impacted by these conflicting inferences. Here we develop a novel solution to the conflict verbalization challenge, based on a logic representation and reasoning approach that utilizes the language of Region Connection Calculus (RCC-5) to produce consistent alignments of node concepts endorsed by incongruent phylogenomic studies. The approach employs clade concept labels to individuate concepts used by each source, even if these carry identical names. Indirect RCC-5 modeling of intensional (property-based) node concept definitions, facilitated by the local relaxation of coverage constraints, allows parent concepts to attain congruence in spite of their differentially sampled children. To demonstrate the feasibility of this approach, we align two recent phylogenomic reconstructions of higher-level avian groups that entail strong conflict in the "neoavian explosion" region. According to our representations, this conflict is constituted by 26 instances of input "whole concept" overlap. These instances are further resolvable in the output labeling schemes and visualizations as "split concepts", which provide the labels and relations needed to build truly synthetic phylogenomic data environments. Because the RCC-5 alignments fundamentally reflect the trained, logic-enabled judgments of systematic experts, future designs for such environments need to promote a culture where experts routinely assess the intensionalities of node concepts published by our peers-even and especially when we are not in agreement with each other.
Collapse
Affiliation(s)
- Nico M. Franz
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Lukas J. Musher
- Richard Gilder Graduate School and Department of Ornithology, American Museum of Natural History, New York, New York, United States of America
| | - Joseph W. Brown
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
| | - Shizhuo Yu
- Department of Computer Science, University of California at Davis, Davis, California, United States of America
| | - Bertram Ludäscher
- School of Information Sciences, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America
| |
Collapse
|
12
|
Dornburg A, Su Z, Townsend JP. Optimal Rates for Phylogenetic Inference and Experimental Design in the Era of Genome-Scale Data Sets. Syst Biol 2018; 68:145-156. [PMID: 29939341 DOI: 10.1093/sysbio/syy047] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Accepted: 06/13/2018] [Indexed: 02/02/2023] Open
Abstract
With the rise of genome-scale data sets, there has been a call for increased data scrutiny and careful selection of loci that are appropriate to use in an attempt to resolve a phylogenetic problem. Such loci should maximize phylogenetic information content while minimizing the risk of homoplasy. Theory posits the existence of characters that evolve at an optimum rate, and efforts to determine optimal rates of inference have been a cornerstone of phylogenetic experimental design for over two decades. However, both theoretical and empirical investigations of optimal rates have varied dramatically in their conclusions: spanning no relationship to a tight relationship between the rate of change and phylogenetic utility. Herein, we synthesize these apparently contradictory views, demonstrating both empirical and theoretical conditions under which each is correct. We find that optimal rates of characters-not genes-are generally robust to most experimental design decisions. Moreover, consideration of site rate heterogeneity within a given locus is critical to accurate predictions of utility. Factors such as taxon sampling or the targeted number of characters providing support for a topology are additionally critical to the predictions of phylogenetic utility based on the rate of character change. Further, optimality of rates and predictions of phylogenetic utility are not equivalent, demonstrating the need for further development of comprehensive theory of phylogenetic experimental design. [Divergence time; GC bias; homoplasy; incongruence; information content; internode length; optimal rates; phylogenetic informativeness; phylogenetic theory; phylogenetic utility; phylogenomics; signal and noise; subtending branch length; state space; taxon and character sampling.].
Collapse
Affiliation(s)
- Alex Dornburg
- North Carolina Museum of Natural Sciences, Raleigh, 1671 Goldstar Drive, NC 27601, USA
| | - Zhuo Su
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, 165 Prospect Street, CT 06525, USA
| | - Jeffrey P Townsend
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, 165 Prospect Street, CT 06525, USA
- Department of Biostatistics, Yale University, New Haven, 60 College Street, CT 06510, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, 300 George Street, CT 06511, USA
| |
Collapse
|
13
|
Liang B, Wang N, Li N, Kimball RT, Braun EL. Comparative Genomics Reveals a Burst of Homoplasy-Free Numt Insertions. Mol Biol Evol 2018; 35:2060-2064. [DOI: 10.1093/molbev/msy112] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Bin Liang
- Department of Biology, University of Florida, Gainesville, FL
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI
- Forestry Research Institute of Hainan Province, Haikou, Hainan, P. R. China
| | - Ning Wang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI
| | - Nan Li
- Department of Chemistry and Biochemistry, UC San Diego, La Jolla, CA
| | | | - Edward L Braun
- Department of Biology, University of Florida, Gainesville, FL
| |
Collapse
|
14
|
Franz NM, Sterner BW. To increase trust, change the social design behind aggregated biodiversity data. Database (Oxford) 2018; 2018:4791171. [PMID: 29315357 PMCID: PMC7206650 DOI: 10.1093/database/bax100] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Revised: 12/05/2017] [Accepted: 12/06/2017] [Indexed: 01/07/2023]
Abstract
Growing concerns about the quality of aggregated biodiversity data are lowering trust in large-scale data networks. Aggregators frequently respond to quality concerns by recommending that biologists work with original data providers to correct errors 'at the source.' We show that this strategy falls systematically short of a full diagnosis of the underlying causes of distrust. In particular, trust in an aggregator is not just a feature of the data signal quality provided by the sources to the aggregator, but also a consequence of the social design of the aggregation process and the resulting power balance between individual data contributors and aggregators. The latter have created an accountability gap by downplaying the authorship and significance of the taxonomic hierarchies-frequently called 'backbones'-they generate, and which are in effect novel classification theories that operate at the core of data-structuring process. The Darwin Core standard for sharing occurrence records plays an under-appreciated role in maintaining the accountability gap, because this standard lacks the syntactic structure needed to preserve the taxonomic coherence of data packages submitted for aggregation, potentially leading to inferences that no individual source would support. Since high-quality data packages can mirror competing and conflicting classifications, i.e. unsettled systematic research, this plurality must be accommodated in the design of biodiversity data integration. Looking forward, a key directive is to develop new technical pathways and social incentives for experts to contribute directly to the validation of taxonomically coherent data packages as part of a greater, trustworthy aggregation process.
Collapse
Affiliation(s)
- Nico M Franz
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| | - Beckett W Sterner
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| |
Collapse
|