51
|
Arcila D, Hughes LC, Meléndez-Vazquez F, Baldwin CC, White W, Carpenter K, Williams JT, Santos MD, Pogonoski J, Miya M, Ortí G, Betancur-R R. Testing the utility of alternative metrics of branch support to address the ancient evolutionary radiation of tunas, stromateoids, and allies (Teleostei: Pelagiaria). Syst Biol 2021; 70:1123-1144. [PMID: 33783539 DOI: 10.1093/sysbio/syab018] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 03/13/2021] [Indexed: 12/19/2022] Open
Abstract
The use of high-throughput sequencing technologies to produce genome-scale datasets was expected to settle some long-standing controversies across the Tree of Life, particularly in areas where short branches occur at deep timescales. Instead, these datasets have often yielded many well-supported but conflicting topologies, and highly variable gene-tree distributions. A variety of branch-support metrics beyond the nonparametric bootstrap are now available to assess how robust a phylogenetic hypothesis may be, as well as new methods to quantify gene-tree discordance. We applied multiple branch support metrics to an ancient group of marine fishes (Teleostei: Pelagiaria) whose interfamilial relationships have proven difficult to resolve due to a rapid accumulation of lineages very early in its history. We analyzed hundreds of loci including published UCE data and newly generated exonic data along with their flanking regions to represent all 16 extant families for more than 150 out of 284 valid species in the group. Branch support was lower for interfamilial relationships (except the SH-like aLRT and aBayes methods) regardless of the type of marker used. Several nodes that were highly supported with bootstrap had very low site and gene-tree concordance, revealing underlying conflict. Despite this conflict, we were able to identify four consistent interfamilial clades, each comprised of two or three families. Combining exons with their flanking regions also produced increased branch lengths in the deep branches of the pelagiarian tree. Our results demonstrate the limitations of employing current metrics of branch support and species-tree estimation when assessing the confidence of ancient evolutionary radiations and emphasize the necessity to embrace alternative measurements to explore phylogenetic uncertainty and discordance in phylogenomic datasets.
Collapse
Affiliation(s)
- Dahiana Arcila
- Department of Ichthyology, Sam Noble Oklahoma Museum of Natural History, Norman, Oklahoma, U.S.A.,Department of Biology, University of Oklahoma, Norman, Oklahoma, U.S.A
| | - Lily C Hughes
- Department of Biological Sciences, The George Washington University, Washington, District of Columbia, U.S.A.,Department of Organismal Biology and Anatomy, The University of Chicago, Illinois, Chicago, U.S.A.,Department of Vertebrate Zoology, Smithsonian Institution National Museum of Natural History, Washington, District of Columbia, U.S.A
| | - Fernando Meléndez-Vazquez
- Department of Ichthyology, Sam Noble Oklahoma Museum of Natural History, Norman, Oklahoma, U.S.A.,Department of Biology, University of Oklahoma, Norman, Oklahoma, U.S.A
| | - Carole C Baldwin
- Department of Vertebrate Zoology, Smithsonian Institution National Museum of Natural History, Washington, District of Columbia, U.S.A
| | - William White
- CSIRO Australian National Fish Collection, National Research Collections Australia, Hobart, Hobart, Tasmania, Australia
| | - Kent Carpenter
- Department of Biological Sciences, Old Dominion University, Norfolk, Virginia, U.S.A
| | - Jeffrey T Williams
- Department of Vertebrate Zoology, Smithsonian Institution National Museum of Natural History, Washington, District of Columbia, U.S.A
| | | | - John Pogonoski
- CSIRO Australian National Fish Collection, National Research Collections Australia, Hobart, Hobart, Tasmania, Australia
| | - Masaki Miya
- Natural History Museum and Institute, Chiba, Aoba-cho, Chuo-ku, Chiba, Japan
| | - Guillermo Ortí
- Department of Biological Sciences, The George Washington University, Washington, District of Columbia, U.S.A.,Department of Vertebrate Zoology, Smithsonian Institution National Museum of Natural History, Washington, District of Columbia, U.S.A
| | | |
Collapse
|
52
|
Chrisman BS, Paskov K, Stockham N, Tabatabaei K, Jung JY, Washington P, Varma M, Sun MW, Maleki S, Wall DP. Indels in SARS-CoV-2 occur at template-switching hotspots. BioData Min 2021; 14:20. [PMID: 33743803 PMCID: PMC7980745 DOI: 10.1186/s13040-021-00251-0] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 02/23/2021] [Indexed: 11/10/2022] Open
Abstract
The evolutionary dynamics of SARS-CoV-2 have been carefully monitored since the COVID-19 pandemic began in December 2019. However, analysis has focused primarily on single nucleotide polymorphisms and largely ignored the role of insertions and deletions (indels) as well as recombination in SARS-CoV-2 evolution. Using sequences from the GISAID database, we catalogue over 100 insertions and deletions in the SARS-CoV-2 consensus sequences. We hypothesize that these indels are artifacts of recombination events between SARS-CoV-2 replicates whereby RNA-dependent RNA polymerase (RdRp) re-associates with a homologous template at a different loci ("imperfect homologous recombination"). We provide several independent pieces of evidence that suggest this. (1) The indels from the GISAID consensus sequences are clustered at specific regions of the genome. (2) These regions are also enriched for 5' and 3' breakpoints in the transcription regulatory site (TRS) independent transcriptome, presumably sites of RNA-dependent RNA polymerase (RdRp) template-switching. (3) Within raw reads, these indel hotspots have cases of both high intra-host heterogeneity and intra-host homogeneity, suggesting that these indels are both consequences of de novo recombination events within a host and artifacts of previous recombination. We briefly analyze the indels in the context of RNA secondary structure, noting that indels preferentially occur in "arms" and loop structures of the predicted folded RNA, suggesting that secondary structure may be a mechanism for TRS-independent template-switching in SARS-CoV-2 or other coronaviruses. These insights into the relationship between structural variation and recombination in SARS-CoV-2 can improve our reconstructions of the SARS-CoV-2 evolutionary history as well as our understanding of the process of RdRp template-switching in RNA viruses.
Collapse
Affiliation(s)
| | - Kelley Paskov
- Department of Biomedical Data Science, Stanford University, Stanford, USA
| | - Nate Stockham
- Department of Neuroscience, Stanford University, Stanford, USA
| | - Kevin Tabatabaei
- Faculty of Health Sciences, McMaster University, Hamilton, Canada
| | - Jae-Yoon Jung
- Department of Biomedical Data Science, Stanford University, Stanford, USA
| | - Peter Washington
- Department of Bioengineering, Stanford University, Stanford, USA
| | - Maya Varma
- Department of Computer Science, Stanford University, Stanford, USA
| | - Min Woo Sun
- Department of Biomedical Data Science, Stanford University, Stanford, USA
| | - Sepideh Maleki
- Department of Computer Science, University of Texas Austin, Austin, USA
| | - Dennis P Wall
- Department of Biomedical Data Science, Stanford University, Stanford, USA.
- Department of Pediatrics (Systems Medicine), Stanford University, Stanford, USA.
| |
Collapse
|
53
|
Allio R, Tilak MK, Scornavacca C, Avenant NL, Kitchener AC, Corre E, Nabholz B, Delsuc F. High-quality carnivoran genomes from roadkill samples enable comparative species delineation in aardwolf and bat-eared fox. eLife 2021; 10:e63167. [PMID: 33599612 PMCID: PMC7963486 DOI: 10.7554/elife.63167] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 02/16/2021] [Indexed: 12/26/2022] Open
Abstract
In a context of ongoing biodiversity erosion, obtaining genomic resources from wildlife is essential for conservation. The thousands of yearly mammalian roadkill provide a useful source material for genomic surveys. To illustrate the potential of this underexploited resource, we used roadkill samples to study the genomic diversity of the bat-eared fox (Otocyon megalotis) and the aardwolf (Proteles cristatus), both having subspecies with similar disjunct distributions in Eastern and Southern Africa. First, we obtained reference genomes with high contiguity and gene completeness by combining Nanopore long reads and Illumina short reads. Then, we showed that the two subspecies of aardwolf might warrant species status (P. cristatus and P. septentrionalis) by comparing their genome-wide genetic differentiation to pairs of well-defined species across Carnivora with a new Genetic Differentiation index (GDI) based on only a few resequenced individuals. Finally, we obtained a genome-scale Carnivora phylogeny including the new aardwolf species.
Collapse
Affiliation(s)
- Rémi Allio
- Institut des Sciences de l’Evolution de Montpellier (ISEM), CNRS, IRD, EPHE, Université de MontpellierMontpellierFrance
| | - Marie-Ka Tilak
- Institut des Sciences de l’Evolution de Montpellier (ISEM), CNRS, IRD, EPHE, Université de MontpellierMontpellierFrance
| | - Celine Scornavacca
- Institut des Sciences de l’Evolution de Montpellier (ISEM), CNRS, IRD, EPHE, Université de MontpellierMontpellierFrance
| | - Nico L Avenant
- National Museum and Centre for Environmental Management, University of the Free StateBloemfonteinSouth Africa
| | - Andrew C Kitchener
- Department of Natural Sciences, National Museums ScotlandEdinburghUnited Kingdom
| | - Erwan Corre
- CNRS, Sorbonne Université, CNRS, ABiMS, Station Biologique de RoscoffRoscoffFrance
| | - Benoit Nabholz
- Institut des Sciences de l’Evolution de Montpellier (ISEM), CNRS, IRD, EPHE, Université de MontpellierMontpellierFrance
- Institut Universitaire de France (IUF)ParisFrance
| | - Frédéric Delsuc
- Institut des Sciences de l’Evolution de Montpellier (ISEM), CNRS, IRD, EPHE, Université de MontpellierMontpellierFrance
| |
Collapse
|
54
|
Del Amparo R, Branco C, Arenas J, Vicens A, Arenas M. Analysis of selection in protein-coding sequences accounting for common biases. Brief Bioinform 2021; 22:6105943. [PMID: 33479739 DOI: 10.1093/bib/bbaa431] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 12/17/2020] [Accepted: 12/22/2020] [Indexed: 12/16/2022] Open
Abstract
The evolution of protein-coding genes is usually driven by selective processes, which favor some evolutionary trajectories over others, optimizing the subsequent protein stability and activity. The analysis of selection in this type of genetic data is broadly performed with the metric nonsynonymous/synonymous substitution rate ratio (dN/dS). However, most of the well-established methodologies to estimate this metric make crucial assumptions, such as lack of recombination or invariable codon frequencies along genes, which can bias the estimation. Here, we review the most relevant biases in the dN/dS estimation and provide a detailed guide to estimate this metric using state-of-the-art procedures that account for such biases, along with illustrative practical examples and recommendations. We also discuss the traditional interpretation of the estimated dN/dS emphasizing the importance of considering complementary biological information such as the role of the observed substitutions on the stability and function of proteins. This review is oriented to help evolutionary biologists that aim to accurately estimate selection in protein-coding sequences.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Jesús Arenas
- Unit of Microbiology and Immunology, University of Zaragoza, 50013 Zaragoza, Spain
| | - Alberto Vicens
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| |
Collapse
|
55
|
Chan KO, Hutter CR, Wood PL, Grismer LL, Brown RM. Target-capture phylogenomics provide insights on gene and species tree discordances in Old World treefrogs (Anura: Rhacophoridae). Proc Biol Sci 2020; 287:20202102. [PMID: 33290680 PMCID: PMC7739936 DOI: 10.1098/rspb.2020.2102] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Accepted: 11/13/2020] [Indexed: 11/12/2022] Open
Abstract
Genome-scale data have greatly facilitated the resolution of recalcitrant nodes that Sanger-based datasets have been unable to resolve. However, phylogenomic studies continue to use traditional methods such as bootstrapping to estimate branch support; and high bootstrap values are still interpreted as providing strong support for the correct topology. Furthermore, relatively little attention has been given to assessing discordances between gene and species trees, and the underlying processes that produce phylogenetic conflict. We generated novel genomic datasets to characterize and determine the causes of discordance in Old World treefrogs (Family: Rhacophoridae)-a group that is fraught with conflicting and poorly supported topologies among major clades. Additionally, a suite of data filtering strategies and analytical methods were applied to assess their impact on phylogenetic inference. We showed that incomplete lineage sorting was detected at all nodes that exhibited high levels of discordance. Those nodes were also associated with extremely short internal branches. We also clearly demonstrate that bootstrap values do not reflect uncertainty or confidence for the correct topology and, hence, should not be used as a measure of branch support in phylogenomic datasets. Overall, we showed that phylogenetic discordances in Old World treefrogs resulted from incomplete lineage sorting and that species tree inference can be improved using a multi-faceted, total-evidence approach, which uses the most amount of data and considers results from different analytical methods and datasets.
Collapse
Affiliation(s)
- Kin Onn Chan
- Lee Kong Chian Natural History Museum, National University of Singapore, 2 Conservatory Drive, Singapore 117377, Republic of Singapore
| | - Carl R. Hutter
- Museum of Natural Sciences and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Perry L. Wood
- Department of Biological Sciences and Museum of Natural History, Auburn University, Auburn, AL 36849, USA
| | - L. Lee Grismer
- Herpetology Laboratory, Department of Biology, La Sierra University, Riverside, CA 92505, USA
| | - Rafe M. Brown
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA
| |
Collapse
|
56
|
Comprehensive phylogeny of Myrmecocystus honey ants highlights cryptic diversity and infers evolution during aridification of the American Southwest. Mol Phylogenet Evol 2020; 155:107036. [PMID: 33278587 DOI: 10.1016/j.ympev.2020.107036] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 10/06/2020] [Accepted: 11/30/2020] [Indexed: 11/22/2022]
Abstract
The New World ant genus Myrmecocystus Wesmael, 1838 (Formicidae: Formicinae: Lasiini) is endemic to arid and semi-arid habitats of the western United States and Mexico. Several intriguing life history traits have been described for the genus, the best-known of which are replete workers, that store liquified food in their largely expanded crops and are colloquially referred to as "honeypots". Despite their interesting biology and ecological importance for arid ecosystems, the evolutionary history of Myrmecocystus ants is largely unknown and the current taxonomy presents an unsatisfactory systematic framework. We use ultraconserved elements to infer the evolutionary history of Myrmecocystus ants and provide a comprehensive, dated phylogenetic framework that clarifies the molecular systematics within the genus with high statistical support, reveals cryptic diversity, and reconstructs ancestral foraging activity. Using maximum likelihood, Bayesian and species tree approaches on a data set of 134 ingroup specimens (including samples from natural history collections and type material), we recover largely identical topologies that leave the position of only few clades uncertain and cover the intra- and interspecific variation of 28 of the 29 described and six undescribed species. In addition to traditional support values, such as bootstrap and posterior probability, we quantify genealogical concordance to estimate the effects of conflicting evolutionary histories on phylogenetic inference. Our analyses reveal that the current taxonomic classification of the genus is inconsistent with the molecular phylogenetic inference, and we identify cryptic diversity in seven species. Divergence dating suggests that the split between Myrmecocystus and its sister taxon Lasius occurred in the early Miocene. Crown group Myrmecocystus started diversifying about 14.08 Ma ago when the gradual aridification of the southwestern United States and northern Mexico led to formation of the American deserts and to adaptive radiations of many desert taxa.
Collapse
|
57
|
Tayyrov A, Schnetzler M, Gillis-Germitsch N, Schnyder M. Genetic diversity of the cardiopulmonary canid nematode Angiostrongylus vasorum within and between rural and urban fox populations. INFECTION GENETICS AND EVOLUTION 2020; 87:104618. [PMID: 33188914 DOI: 10.1016/j.meegid.2020.104618] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 10/24/2020] [Accepted: 11/02/2020] [Indexed: 02/08/2023]
Abstract
Angiostrongylus vasorum is an emerging parasitic cardiopulmonary nematode of dogs, foxes, and other canids. In dogs, the infection causes respiratory and bleeding disorders along with other clinical signs collectively known as canine angiostrongylosis, while foxes represent an important wildlife reservoir. Despite the spread of A. vasorum across various countries in Europe and the Americas, little is known about the genetic diversity of A. vasorum populations at a local level in a highly endemic area. Thus, in the present study, we investigated the genetic diversity of 323 adult A. vasorum nematodes from 64 foxes living in the canton of Zurich, Switzerland. Among those, 279 worms isolated from 20 foxes were analyzed separately to investigate the genetic diversity of multiple worms within individual foxes. Part of the mitochondrial cytochrome c oxidase subunit I (mtCOI) gene was amplified and sequenced. Overall, 16 mitochondrial haplotypes were identified. The analysis of multiple worms per host revealed 12 haplotypes, with up to 5 different haplotypes in single individuals. Higher haplotype diversity (n = 10) of nematodes from foxes of urban areas than in rural areas (n = 7) was observed, with 5 shared haplotypes. Comparing our data with published GenBank sequences, five haplotypes were found to be unique within the Zurich nematode population. Interestingly, A. vasorum nematodes obtained from foxes in London and Zurich shared the same dominating haplotype. Further studies are needed to clarify if this haplotype has a different pathogenicity that may contribute to its dominance. Our findings show the importance of foxes as a reservoir for genetic parasite recombination and indicate that high fox population densities in urban areas with small and overlapping home ranges allow multiple infection events that lead to high genetic variability of A. vasorum.
Collapse
Affiliation(s)
- Annageldi Tayyrov
- Institute of Parasitology, Vetsuisse-Faculty, University of Zurich, Winterthurerstrasse 266a, 8057 Zurich, Switzerland
| | - Michèle Schnetzler
- Institute of Parasitology, Vetsuisse-Faculty, University of Zurich, Winterthurerstrasse 266a, 8057 Zurich, Switzerland
| | - Nina Gillis-Germitsch
- Institute of Parasitology, Vetsuisse-Faculty, University of Zurich, Winterthurerstrasse 266a, 8057 Zurich, Switzerland; Graduate School for Cellular and Biomedical Sciences, University of Bern, Switzerland
| | - Manuela Schnyder
- Institute of Parasitology, Vetsuisse-Faculty, University of Zurich, Winterthurerstrasse 266a, 8057 Zurich, Switzerland.
| |
Collapse
|
58
|
Pyott SJ, van Tuinen M, Screven LA, Schrode KM, Bai JP, Barone CM, Price SD, Lysakowski A, Sanderford M, Kumar S, Santos-Sacchi J, Lauer AM, Park TJ. Functional, Morphological, and Evolutionary Characterization of Hearing in Subterranean, Eusocial African Mole-Rats. Curr Biol 2020; 30:4329-4341.e4. [PMID: 32888484 DOI: 10.1016/j.cub.2020.08.035] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 06/09/2020] [Accepted: 08/07/2020] [Indexed: 12/26/2022]
Abstract
Naked mole-rats are highly vocal, eusocial, subterranean rodents with, counterintuitively, poor hearing. The causes underlying their altered hearing are unknown. Moreover, whether altered hearing is degenerate or adaptive to their unique lifestyles is controversial. We used various methods to identify the factors contributing to altered hearing in naked and the related Damaraland mole-rats and to examine whether these alterations result from relaxed or adaptive selection. Remarkably, we found that cochlear amplification was absent from both species despite normal prestin function in outer hair cells isolated from naked mole-rats. Instead, loss of cochlear amplification appears to result from abnormal hair bundle morphologies observed in both species. By exploiting a well-curated deafness phenotype-genotype database, we identified amino acid substitutions consistent with abnormal hair bundle morphology and reduced hearing sensitivity. Amino acid substitutions were found in unique groups of six hair bundle link proteins. Molecular evolutionary analyses revealed shifts in selection pressure at both the gene and the codon level for five of these six hair bundle link proteins. Substitutions in three of these proteins are associated exclusively with altered hearing. Altogether, our findings identify the likely mechanism of altered hearing in African mole-rats, making them the only identified mammals naturally lacking cochlear amplification. Moreover, our findings suggest that altered hearing in African mole-rats is adaptive, perhaps tailoring hearing to eusocial and subterranean lifestyles. Finally, our work reveals multiple, unique evolutionary trajectories in African mole-rat hearing and establishes species members as naturally occurring disease models to investigate human hearing loss.
Collapse
Affiliation(s)
- Sonja J Pyott
- University Medical Center Groningen and University of Groningen, Department of Otorhinolaryngology and Head/Neck Surgery, 9713GZ Groningen, the Netherlands.
| | - Marcel van Tuinen
- University Medical Center Groningen and University of Groningen, Department of Otorhinolaryngology and Head/Neck Surgery, 9713GZ Groningen, the Netherlands
| | - Laurel A Screven
- Johns Hopkins School of Medicine, Department of Otolaryngology, Baltimore, MD 21205, USA
| | - Katrina M Schrode
- Johns Hopkins School of Medicine, Department of Otolaryngology, Baltimore, MD 21205, USA
| | - Jun-Ping Bai
- Yale University School of Medicine, Department of Neurology, 333 Cedar Street, New Haven, CT 06510, USA
| | - Catherine M Barone
- University of Illinois at Chicago, Department of Biological Sciences, Chicago, IL 60612, USA
| | - Steven D Price
- University of Illinois at Chicago, Department of Anatomy and Cell Biology, Chicago, IL 60612, USA
| | - Anna Lysakowski
- University of Illinois at Chicago, Department of Anatomy and Cell Biology, Chicago, IL 60612, USA
| | - Maxwell Sanderford
- Temple University, Institute for Genomics and Evolutionary Medicine and Department of Biology, Philadelphia, PA 19122, USA
| | - Sudhir Kumar
- Temple University, Institute for Genomics and Evolutionary Medicine and Department of Biology, Philadelphia, PA 19122, USA; King Abdulaziz University, Center for Excellence in Genome Medicine and Research, Jeddah, Saudi Arabia
| | - Joseph Santos-Sacchi
- Yale University School of Medicine, Department of Surgery (Otolaryngology) and Department of Neuroscience and Cellular and Molecular Physiology, 333 Cedar Street, New Haven, CT 06510, USA
| | - Amanda M Lauer
- Johns Hopkins School of Medicine, Department of Otolaryngology, Baltimore, MD 21205, USA
| | - Thomas J Park
- University of Illinois at Chicago, Department of Biological Sciences, Chicago, IL 60612, USA
| |
Collapse
|
59
|
Le Kim T, Le Sy V. mPartition: A Model-Based Method for Partitioning Alignments. J Mol Evol 2020; 88:641-652. [PMID: 32864711 DOI: 10.1007/s00239-020-09963-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Accepted: 08/08/2020] [Indexed: 10/23/2022]
Abstract
Maximum likelihood (ML) analysis of nucleotide or amino-acid alignments is widely used to infer evolutionary relationships among species. Computing the likelihood of a phylogenetic tree from such alignments is a complicated task because the evolutionary processes typically vary across sites. A number of studies have shown that partitioning alignments into sub-alignments of sites, where each sub-alignment is analyzed using a different model of evolution (e.g., GTR + I + G), is a sensible strategy. Current partitioning methods group sites into subsets based on the inferred rates of evolution at the sites. However, these do not provide sufficient information to adequately reflect the substitution processes of characters at the sites. Moreover, the site rate-based methods group all invariant sites into one subset, potentially resulting in wrong phylogenetic trees. In this study, we propose a partitioning method, called mPartition, that combines not only the evolutionary rates but also substitution models at sites to partition alignments. Analyses of different partitioning methods on both real and simulated datasets showed that mPartition was better than the other partitioning methods tested. Notably, mPartition overcame the pitfall of grouping all invariant sites into one subset. Using mPartition may lead to increased accuracy of ML-based phylogenetic inference, especially for multiple loci or whole genome datasets.
Collapse
Affiliation(s)
- Thu Le Kim
- University of Engineering and Technology, Vietnam National University Hanoi, 144 Xuan Thuy, Cau Giay, Hanoi, 10000, Vietnam.,Hanoi University of Science and Technology, 1st Dai Co Viet, Hai Ba Trung, Hanoi, 10000, Vietnam
| | - Vinh Le Sy
- University of Engineering and Technology, Vietnam National University Hanoi, 144 Xuan Thuy, Cau Giay, Hanoi, 10000, Vietnam.
| |
Collapse
|
60
|
Erséus C, Williams BW, Horn KM, Halanych KM, Santos SR, James SW, Creuzé des Châtelliers M, Anderson FE. Phylogenomic analyses reveal a Palaeozoic radiation and support a freshwater origin for clitellate annelids. ZOOL SCR 2020. [DOI: 10.1111/zsc.12426] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Christer Erséus
- Department of Biological and Environmental Sciences University of Gothenburg Göteborg Sweden
| | - Bronwyn W. Williams
- School of Biological Sciences Southern Illinois University Carbondale IL USA
- Research Laboratory North Carolina Museum of Natural Sciences Raleigh NC USA
| | - Kevin M. Horn
- School of Biological Sciences Southern Illinois University Carbondale IL USA
- Division of Natural Sciences and Mathematics Kentucky Wesleyan College Owensboro Kentucky USA
| | - Kenneth M. Halanych
- Molette Biology Laboratory for Environmental and Climate Change Studies Department of Biological Sciences Auburn University Auburn AL USA
| | - Scott R. Santos
- Molette Biology Laboratory for Environmental and Climate Change Studies Department of Biological Sciences Auburn University Auburn AL USA
| | - Samuel W. James
- Sustainable Living Department Maharishi University of Management Fairfield IA USA
| | | | - Frank E. Anderson
- School of Biological Sciences Southern Illinois University Carbondale IL USA
| |
Collapse
|
61
|
Chan KO, Hutter CR, Wood PL, Grismer LL, Brown RM. Larger, unfiltered datasets are more effective at resolving phylogenetic conflict: Introns, exons, and UCEs resolve ambiguities in Golden-backed frogs (Anura: Ranidae; genus Hylarana). Mol Phylogenet Evol 2020; 151:106899. [PMID: 32590046 DOI: 10.1016/j.ympev.2020.106899] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 05/18/2020] [Accepted: 06/17/2020] [Indexed: 01/01/2023]
Abstract
Using FrogCap, a recently-developed sequence-capture protocol, we obtained >12,000 highly informative exons, introns, and ultraconserved elements (UCEs), which we used to illustrate variation in evolutionary histories of these classes of markers, and to resolve long-standing systematic problems in Southeast Asian Golden-backed frogs of the genus-complex Hylarana. We also performed a comprehensive suite of analyses to assess the relative performance of different genetic markers, data filtering strategies, tree inference methods, and different measures of branch support. To reduce gene tree estimation error, we filtered the data using different thresholds of taxon completeness (missing data) and parsimony informative sites (PIS). We then estimated species trees using concatenated datasets and Maximum Likelihood (IQ-TREE) in addition to summary (ASTRAL-III), distance-based (ASTRID), and site-based (SVDQuartets) multispecies coalescent methods. Topological congruence and branch support were examined using traditional bootstrap, local posterior probabilities, gene concordance factors, quartet frequencies, and quartet scores. Our results did not yield a single concordant topology. Instead, introns, exons, and UCEs clearly possessed different phylogenetic signals, resulting in conflicting, yet strongly-supported phylogenetic estimates. However, a combined analysis comprising the most informative introns, exons, and UCEs converged on a similar topology across all analyses, with the exception of SVDQuartets. Bootstrap values were consistently high despite high levels of incongruence and high proportions of gene trees supporting conflicting topologies. Although low bootstrap values did indicate low heuristic support, high bootstrap support did not necessarily reflect congruence or support for the correct topology. This study reiterates findings of some previous studies, which demonstrated that traditional bootstrap values can produce positively misleading measures of support in large phylogenomic datasets. We also showed a remarkably strong positive relationship between branch length and topological congruence across all datasets, implying that very short internodes remain a challenge to resolve, even with orders of magnitude more data than ever before. Overall, our results demonstrate that more data from unfiltered or combined datasets produced superior results. Although data filtering reduced gene tree incongruence, decreased amounts of data also biased phylogenetic estimation. A point of diminishing returns was evident, at which higher congruence (from more stringent filtering) at the expense of amount of data led to topological error as assessed by comparison to more complete datasets across different genomic markers. Additionally, we showed that applying a parameter-rich model to a partitioned analysis of concatenated data produces better results compared to unpartitioned, or even partitioned analysis using model selection. Despite some lingering uncertainties, a combined analysis of our genomic data and sequences supplemented from GenBank (on the basis of a few gene regions) revealed highly supported novel systematic arrangements. Based on these new findings, we transfer Amnirana nicobariensis into the genus Indosylvirana; and I. milleti and Hylarana celebensis to the genus Papurana. We also provisionally place H. attigua in the genus Papurana pending verification from positively identified (voucher substantiated) samples.
Collapse
Affiliation(s)
- Kin Onn Chan
- Lee Kong Chian National History Museum, Faculty of Science, National University of Singapore, 2 Conservatory Drive, 117377, Singapore.
| | - Carl R Hutter
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA; Museum of Natural Sciences and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Perry L Wood
- Museum of Natural Sciences and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA; Department of Biological Sciences & Museum of Natural History, Auburn University, Auburn, AL 36849, USA
| | - L Lee Grismer
- Herpetology Laboratory, Department of Biology, La Sierra University, 4500 Riverwalk Parkway, Riverside, CA 92505, USA
| | - Rafe M Brown
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA
| |
Collapse
|
62
|
Morales-Briones DF, Kadereit G, Tefarikis DT, Moore MJ, Smith SA, Brockington SF, Timoneda A, Yim WC, Cushman JC, Yang Y. Disentangling Sources of Gene Tree Discordance in Phylogenomic Data Sets: Testing Ancient Hybridizations in Amaranthaceae s.l. Syst Biol 2020; 70:219-235. [PMID: 32785686 PMCID: PMC7875436 DOI: 10.1093/sysbio/syaa066] [Citation(s) in RCA: 89] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Revised: 03/01/2020] [Accepted: 09/03/2020] [Indexed: 12/26/2022] Open
Abstract
Gene tree discordance in large genomic data sets can be caused by evolutionary processes such as incomplete lineage sorting and hybridization, as well as model violation, and errors in data processing, orthology inference, and gene tree estimation. Species tree methods that identify and accommodate all sources of conflict are not available, but a combination of multiple approaches can help tease apart alternative sources of conflict. Here, using a phylotranscriptomic analysis in combination with reference genomes, we test a hypothesis of ancient hybridization events within the plant family Amaranthaceae s.l. that was previously supported by morphological, ecological, and Sanger-based molecular data. The data set included seven genomes and 88 transcriptomes, 17 generated for this study. We examined gene-tree discordance using coalescent-based species trees and network inference, gene tree discordance analyses, site pattern tests of introgression, topology tests, synteny analyses, and simulations. We found that a combination of processes might have generated the high levels of gene tree discordance in the backbone of Amaranthaceae s.l. Furthermore, we found evidence that three consecutive short internal branches produce anomalous trees contributing to the discordance. Overall, our results suggest that Amaranthaceae s.l. might be a product of an ancient and rapid lineage diversification, and remains, and probably will remain, unresolved. This work highlights the potential problems of identifiability associated with the sources of gene tree discordance including, in particular, phylogenetic network methods. Our results also demonstrate the importance of thoroughly testing for multiple sources of conflict in phylogenomic analyses, especially in the context of ancient, rapid radiations. We provide several recommendations for exploring conflicting signals in such situations. [Amaranthaceae; gene tree discordance; hybridization; incomplete lineage sorting; phylogenomics; species network; species tree; transcriptomics.]
Collapse
Affiliation(s)
- Diego F Morales-Briones
- Department of Plant and Microbial Biology, University of Minnesota-Twin Cities, 1445 Gortner Avenue, St. Paul, MN 55108, USA
| | - Gudrun Kadereit
- Institut für Molekulare Physiologie, Johannes Gutenberg-Universität Mainz, D-55099 Mainz, Germany
| | - Delphine T Tefarikis
- Institut für Molekulare Physiologie, Johannes Gutenberg-Universität Mainz, D-55099 Mainz, Germany
| | - Michael J Moore
- Department of Biology, Oberlin College, Science Center K111, 119 Woodland Street, Oberlin, OH 44074-1097, USA
| | - Stephen A Smith
- Department of Ecology & Evolutionary Biology, University of Michigan, 830 North University Avenue, Ann Arbor, MI 48109-1048, USA
| | - Samuel F Brockington
- Department of Plant Sciences, University of Cambridge, Tennis Court Road, Cambridge CB2 3EA, UK
| | - Alfonso Timoneda
- Department of Plant Sciences, University of Cambridge, Tennis Court Road, Cambridge CB2 3EA, UK
| | - Won C Yim
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, NV, 89577, USA
| | - John C Cushman
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, NV, 89577, USA
| | - Ya Yang
- Department of Plant and Microbial Biology, University of Minnesota-Twin Cities, 1445 Gortner Avenue, St. Paul, MN 55108, USA
| |
Collapse
|
63
|
Vasilikopoulos A, Misof B, Meusemann K, Lieberz D, Flouri T, Beutel RG, Niehuis O, Wappler T, Rust J, Peters RS, Donath A, Podsiadlowski L, Mayer C, Bartel D, Böhm A, Liu S, Kapli P, Greve C, Jepson JE, Liu X, Zhou X, Aspöck H, Aspöck U. An integrative phylogenomic approach to elucidate the evolutionary history and divergence times of Neuropterida (Insecta: Holometabola). BMC Evol Biol 2020; 20:64. [PMID: 32493355 PMCID: PMC7268685 DOI: 10.1186/s12862-020-01631-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 05/19/2020] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The latest advancements in DNA sequencing technologies have facilitated the resolution of the phylogeny of insects, yet parts of the tree of Holometabola remain unresolved. The phylogeny of Neuropterida has been extensively studied, but no strong consensus exists concerning the phylogenetic relationships within the order Neuroptera. Here, we assembled a novel transcriptomic dataset to address previously unresolved issues in the phylogeny of Neuropterida and to infer divergence times within the group. We tested the robustness of our phylogenetic estimates by comparing summary coalescent and concatenation-based phylogenetic approaches and by employing different quartet-based measures of phylogenomic incongruence, combined with data permutations. RESULTS Our results suggest that the order Raphidioptera is sister to Neuroptera + Megaloptera. Coniopterygidae is inferred as sister to all remaining neuropteran families suggesting that larval cryptonephry could be a ground plan feature of Neuroptera. A clade that includes Nevrorthidae, Osmylidae, and Sisyridae (i.e. Osmyloidea) is inferred as sister to all other Neuroptera except Coniopterygidae, and Dilaridae is placed as sister to all remaining neuropteran families. Ithonidae is inferred as the sister group of monophyletic Myrmeleontiformia. The phylogenetic affinities of Chrysopidae and Hemerobiidae were dependent on the data type analyzed, and quartet-based analyses showed only weak support for the placement of Hemerobiidae as sister to Ithonidae + Myrmeleontiformia. Our molecular dating analyses suggest that most families of Neuropterida started to diversify in the Jurassic and our ancestral character state reconstructions suggest a primarily terrestrial environment of the larvae of Neuropterida and Neuroptera. CONCLUSION Our extensive phylogenomic analyses consolidate several key aspects in the backbone phylogeny of Neuropterida, such as the basal placement of Coniopterygidae within Neuroptera and the monophyly of Osmyloidea. Furthermore, they provide new insights into the timing of diversification of Neuropterida. Despite the vast amount of analyzed molecular data, we found that certain nodes in the tree of Neuroptera are not robustly resolved. Therefore, we emphasize the importance of integrating the results of morphological analyses with those of sequence-based phylogenomics. We also suggest that comparative analyses of genomic meta-characters should be incorporated into future phylogenomic studies of Neuropterida.
Collapse
Affiliation(s)
- Alexandros Vasilikopoulos
- Centre for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, 53113, Bonn, Germany.
| | - Bernhard Misof
- Centre for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, 53113, Bonn, Germany.
| | - Karen Meusemann
- Centre for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, 53113, Bonn, Germany
- Department of Evolutionary Biology and Ecology, Institute of Biology I (Zoology), Albert-Ludwigs-Universität Freiburg, 79104, Freiburg, Germany
- Australian National Insect Collection, National Research Collections Australia, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Canberra, ACT 2601, Australia
| | - Doria Lieberz
- Centre for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, 53113, Bonn, Germany
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK
| | - Rolf G Beutel
- Institut für Zoologie und Evolutionsforschung, Friedrich-Schiller-Universität Jena, 07743, Jena, Germany
| | - Oliver Niehuis
- Department of Evolutionary Biology and Ecology, Institute of Biology I (Zoology), Albert-Ludwigs-Universität Freiburg, 79104, Freiburg, Germany
| | - Torsten Wappler
- Natural History Department, Hessisches Landesmuseum Darmstadt, 64283, Darmstadt, Germany
| | - Jes Rust
- Steinmann-Institut für Geologie, Mineralogie und Paläontologie, Rheinische Friedrich-Wilhelms-Universität Bonn, 53115, Bonn, Germany
| | - Ralph S Peters
- Centre for Taxonomy and Evolutionary Research, Arthropoda Department, Zoological Research Museum Alexander Koenig, 53113, Bonn, Germany
| | - Alexander Donath
- Centre for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, 53113, Bonn, Germany
| | - Lars Podsiadlowski
- Centre for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, 53113, Bonn, Germany
| | - Christoph Mayer
- Centre for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, 53113, Bonn, Germany
| | - Daniela Bartel
- Department of Evolutionary Biology, University of Vienna, 1090, Vienna, Austria
| | - Alexander Böhm
- Department of Evolutionary Biology, University of Vienna, 1090, Vienna, Austria
| | - Shanlin Liu
- Department of Entomology, China Agricultural University, 100193, Beijing, People's Republic of China
| | - Paschalia Kapli
- Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK
| | - Carola Greve
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), 60325, Frankfurt, Germany
| | - James E Jepson
- School of Biological, Earth and Environmental Sciences, University College Cork, Distillery Fields, North Mall, T23 N73K, Cork, Ireland
| | - Xingyue Liu
- Department of Entomology, China Agricultural University, 100193, Beijing, People's Republic of China
| | - Xin Zhou
- Department of Entomology, China Agricultural University, 100193, Beijing, People's Republic of China
| | - Horst Aspöck
- Institute of Specific Prophylaxis and Tropical Medicine, Medical Parasitology, Medical University of Vienna (MUW), 1090, Vienna, Austria
| | - Ulrike Aspöck
- Department of Evolutionary Biology, University of Vienna, 1090, Vienna, Austria
- Zoological Department II, Natural History Museum of Vienna, 1010, Vienna, Austria
| |
Collapse
|
64
|
Jermiin LS, Catullo RA, Holland BR. A new phylogenetic protocol: dealing with model misspecification and confirmation bias in molecular phylogenetics. NAR Genom Bioinform 2020; 2:lqaa041. [PMID: 33575594 PMCID: PMC7671319 DOI: 10.1093/nargab/lqaa041] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 05/18/2020] [Accepted: 06/04/2020] [Indexed: 12/15/2022] Open
Abstract
Molecular phylogenetics plays a key role in comparative genomics and has increasingly significant impacts on science, industry, government, public health and society. In this paper, we posit that the current phylogenetic protocol is missing two critical steps, and that their absence allows model misspecification and confirmation bias to unduly influence phylogenetic estimates. Based on the potential offered by well-established but under-used procedures, such as assessment of phylogenetic assumptions and tests of goodness of fit, we introduce a new phylogenetic protocol that will reduce confirmation bias and increase the accuracy of phylogenetic estimates.
Collapse
Affiliation(s)
- Lars S Jermiin
- CSIRO Land & Water, Canberra, ACT 2601, Australia
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
- School of Biology & Environment Science, University College Dublin, Belfield, Dublin 4, Ireland
- Earth Institute, University College Dublin, Belfield, Dublin 4, Ireland
| | - Renee A Catullo
- CSIRO Land & Water, Canberra, ACT 2601, Australia
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
- School of Science and Health & Hawkesbury Institute of the Environment, Western Sydney University, Penrith, NSW 2751, Australia
| | - Barbara R Holland
- School of Natural Sciences, University of Tasmania, Hobart, TAS 7001, Australia
| |
Collapse
|
65
|
Tomasello S, Karbstein K, Hodač L, Paetzold C, Hörandl E. Phylogenomics unravels Quaternary vicariance and allopatric speciation patterns in temperate‐montane plant species: A case study on the
Ranunculus auricomus
species complex. Mol Ecol 2020; 29:2031-2049. [DOI: 10.1111/mec.15458] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Accepted: 04/21/2020] [Indexed: 01/06/2023]
Affiliation(s)
- Salvatore Tomasello
- Department of Systematics, Biodiversity and Evolution of Plants (with Herbarium) Albrecht‐von‐Haller Institute for Plant Sciences University of Goettingen Göttingen Germany
| | - Kevin Karbstein
- Department of Systematics, Biodiversity and Evolution of Plants (with Herbarium) Albrecht‐von‐Haller Institute for Plant Sciences University of Goettingen Göttingen Germany
- Georg‐August University School of Science (GAUSS) University of Goettingen Goettingen Germany
| | - Ladislav Hodač
- Department of Systematics, Biodiversity and Evolution of Plants (with Herbarium) Albrecht‐von‐Haller Institute for Plant Sciences University of Goettingen Göttingen Germany
| | - Claudia Paetzold
- Department of Systematics, Biodiversity and Evolution of Plants (with Herbarium) Albrecht‐von‐Haller Institute for Plant Sciences University of Goettingen Göttingen Germany
| | - Elvira Hörandl
- Department of Systematics, Biodiversity and Evolution of Plants (with Herbarium) Albrecht‐von‐Haller Institute for Plant Sciences University of Goettingen Göttingen Germany
| |
Collapse
|
66
|
Settlecowski AE, Cuervo AM, Tello JG, Harvey MG, Brumfield RT, Derryberry EP. Investigating the utility of traditional and genomic multi-locus datasets to resolve relationships in Lipaugus and Tijuca (Cotingidae). Mol Phylogenet Evol 2020; 147:106779. [PMID: 32135309 DOI: 10.1016/j.ympev.2020.106779] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Revised: 01/27/2020] [Accepted: 02/26/2020] [Indexed: 12/26/2022]
Abstract
Rapid diversification limits our ability to resolve evolutionary relationships and examine diversification history, as in the case of the Neotropical cotingas. Here we present an analysis with complete taxon sampling for the cotinga genera Lipaugus and Tijuca, which include some of the most range-restricted (e.g., T. condita) and also the most widespread and familiar (e.g., L. vociferans) forest birds in the Neotropics. We used two datasets: (1) Sanger sequencing data sampled from eight loci in 34 individuals across all described taxa and (2) sequence capture data linked to 1,079 ultraconserved elements and conserved exons sampled from one or two individuals per species. Phylogenies estimated from the Sanger sequencing data failed to resolve three nodes, but the sequence capture data produced a well-supported tree. Lipaugus and Tijuca formed a single, highly supported clade, but Tijuca species were not sister and were embedded within Lipaugus. A dated phylogeny confirmed Lipaugus and Tijuca diversified rapidly in the Miocene. Our study provides a detailed evolutionary hypothesis for Lipaugus and Tijuca and demonstrates that increasing genomic sampling can prove instrumental in resolving the evolutionary history of recent radiations.
Collapse
Affiliation(s)
- Amie E Settlecowski
- Department of Ecology and Evolutionary Biology, Tulane University, New Orleans, LA 70118, USA
| | - Andrés M Cuervo
- Department of Ecology and Evolutionary Biology, Tulane University, New Orleans, LA 70118, USA; Instituto de Ciencias Naturales, Universidad Nacional de Colombia, Bogotá, Colombia
| | - José G Tello
- Department of Ornithology, American Museum of Natural History, New York, NY 10024, USA; Department of Biology, Long Island University, Brooklyn, NY 11201, USA
| | - Michael G Harvey
- Department of Ecology and Evolutionary Biology, University of Tennessee, Knoxville, TN 37996, USA
| | - Robb T Brumfield
- Museum of Natural Science, Louisiana State University, Baton Rouge, LA 70803, USA; Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Elizabeth P Derryberry
- Department of Ecology and Evolutionary Biology, Tulane University, New Orleans, LA 70118, USA.
| |
Collapse
|
67
|
Sun X, Ding Y, Orr MC, Zhang F. Streamlining universal single-copy orthologue and ultraconserved element design: A case study in Collembola. Mol Ecol Resour 2020; 20. [PMID: 32065730 DOI: 10.1111/1755-0998.13146] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 01/18/2020] [Accepted: 02/10/2020] [Indexed: 11/27/2022]
Abstract
Genomic data sets are increasingly central to ecological and evolutionary biology, but far fewer resources are available for invertebrates. Powerful new computational tools and the rapidly decreasing cost of Illumina sequencing are beginning to change this, enabling rapid genome assembly and reference marker extraction. We have developed and tested a practical workflow for developing genomic resources in nonmodel groups with real-world data on Collembola (springtails), one of the most dominant soil animals on Earth. We designed universal molecular marker sets, single-copy orthologues (BUSCOs) and ultraconserved elements (UCEs), using three existing and 11 newly generated genomes. Both marker types were tested in silico via marker capture success and phylogenetic performance. The new genomes were assembled with Illumina short reads and 9,585-14,743 protein-coding genes were predicted with ab initio and protein homology evidence. We identified 1,997 benchmarking universal single-copy orthologues (BUSCOs) across 14 genomes and created and assessed a custom BUSCO data set for extracting single-copy genes. We also developed a new UCE probe set containing 46,087 baits targeting 1,885 loci. We successfully captured 1,437-1,865 BUSCOs and 975-1,186 UCEs across 14 genomes. Phylogenomic reconstructions using these markers proved robust, giving new insight on deep-time collembolan relationships. Our study demonstrates the feasibility of generating thousands of universal markers from highly efficient whole-genome sequencing, providing a valuable resource for genome-scale investigations in evolutionary biology and ecology.
Collapse
Affiliation(s)
- Xin Sun
- J. F. Blumenbach Institute of Zoology and Anthropology, University of Göttingen, Göttingen, Germany.,Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing, China
| | - Yinhuan Ding
- Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing, China
| | - Michael C Orr
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Feng Zhang
- Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing, China.,Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
68
|
Fadiji AE, Babalola OO. Metagenomics methods for the study of plant-associated microbial communities: A review. J Microbiol Methods 2020; 170:105860. [PMID: 32027927 DOI: 10.1016/j.mimet.2020.105860] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 01/31/2020] [Accepted: 02/02/2020] [Indexed: 12/20/2022]
Abstract
Plant microbiota have different effects on the plant which can be beneficial or pathogenic. In this study, we concentrated on beneficial microbes associated with plants using endophytic microbes as a case study. Detailed knowledge of the microbial diversity, abundance, composition, functional genes patterns, and metabolic pathways at genome level could assist in understanding the contributions of microbial community towards plant growth and health. Recently, the study of microbial community has improved greatly with the discovery of next-generation sequencing and bioinformatics technologies. Analysis of next generation sequencing data and a proper computational method plays a key role in examining microbial metagenome. This review presents the general metagenomics and computational methods used in processing plant associated metagenomes with concentration on endophytes. This includes 1) introduction of plant-associated microbiota and the factors driving their diversity. 2) plant metagenome focusing on DNA extraction, verification and quality control. 3) metagenomics methods used in community analysis of endophytes focusing on maize plant and, 4) computational methods used in the study of endophytic microbiomes. Limitations and future prospects of metagenomics and computational methods for the analysis of plant-associated metagenome (endophytic metagenome) were also discussed with the aim of fostering its development. We conclude that there is need to adopt advanced genomic features such as k-mers of random size, which do not depend on annotation and can represent other sequence alternatives.
Collapse
Affiliation(s)
- Ayomide Emmanuel Fadiji
- Food Security and Safety Niche, Faculty of Natural and Agricultural Sciences, North-West University, Private Mail Bag X2046, Mmabatho, South Africa
| | - Olubukola Oluranti Babalola
- Food Security and Safety Niche, Faculty of Natural and Agricultural Sciences, North-West University, Private Mail Bag X2046, Mmabatho, South Africa.
| |
Collapse
|
69
|
Han AX, Parker E, Scholer F, Maurer-Stroh S, Russell CA. Phylogenetic Clustering by Linear Integer Programming (PhyCLIP). Mol Biol Evol 2020; 36:1580-1595. [PMID: 30854550 PMCID: PMC6573476 DOI: 10.1093/molbev/msz053] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Subspecies nomenclature systems of pathogens are increasingly based on sequence data. The use of phylogenetics to identify and differentiate between clusters of genetically similar pathogens is particularly prevalent in virology from the nomenclature of human papillomaviruses to highly pathogenic avian influenza (HPAI) H5Nx viruses. These nomenclature systems rely on absolute genetic distance thresholds to define the maximum genetic divergence tolerated between viruses designated as closely related. However, the phylogenetic clustering methods used in these nomenclature systems are limited by the arbitrariness of setting intra and intercluster diversity thresholds. The lack of a consensus ground truth to define well-delineated, meaningful phylogenetic subpopulations amplifies the difficulties in identifying an informative distance threshold. Consequently, phylogenetic clustering often becomes an exploratory, ad hoc exercise. Phylogenetic Clustering by Linear Integer Programming (PhyCLIP) was developed to provide a statistically principled phylogenetic clustering framework that negates the need for an arbitrarily defined distance threshold. Using the pairwise patristic distance distributions of an input phylogeny, PhyCLIP parameterizes the intra and intercluster divergence limits as statistical bounds in an integer linear programming model which is subsequently optimized to cluster as many sequences as possible. When applied to the hemagglutinin phylogeny of HPAI H5Nx viruses, PhyCLIP was not only able to recapitulate the current WHO/OIE/FAO H5 nomenclature system but also further delineated informative higher resolution clusters that capture geographically distinct subpopulations of viruses. PhyCLIP is pathogen-agnostic and can be generalized to a wide variety of research questions concerning the identification of biologically informative clusters in pathogen phylogenies. PhyCLIP is freely available at http://github.com/alvinxhan/PhyCLIP, last accessed March 15, 2019.
Collapse
Affiliation(s)
- Alvin X Han
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore.,NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore (NUS), Singapore.,Laboratory of Applied Evolutionary Biology, Department of Medical Microbiology, Academic Medical Centre, University of Amsterdam, Amsterdam, The Netherlands
| | - Edyth Parker
- Laboratory of Applied Evolutionary Biology, Department of Medical Microbiology, Academic Medical Centre, University of Amsterdam, Amsterdam, The Netherlands.,Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom
| | - Frits Scholer
- Department of Medical Microbiology, Academic Medical Centre, University of Amsterdam, Amsterdam, The Netherlands
| | - Sebastian Maurer-Stroh
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore.,NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore (NUS), Singapore.,Department of Biological Sciences, National University of Singapore, Singapore
| | - Colin A Russell
- Laboratory of Applied Evolutionary Biology, Department of Medical Microbiology, Academic Medical Centre, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
70
|
Pérez-Losada M, Arenas M, Galán JC, Bracho MA, Hillung J, García-González N, González-Candelas F. High-throughput sequencing (HTS) for the analysis of viral populations. INFECTION GENETICS AND EVOLUTION 2020; 80:104208. [PMID: 32001386 DOI: 10.1016/j.meegid.2020.104208] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 01/21/2020] [Accepted: 01/24/2020] [Indexed: 12/12/2022]
Abstract
The development of High-Throughput Sequencing (HTS) technologies is having a major impact on the genomic analysis of viral populations. Current HTS platforms can capture nucleic acid variation across millions of genes for both selected amplicons and full viral genomes. HTS has already facilitated the discovery of new viruses, hinted new taxonomic classifications and provided a deeper and broader understanding of their diversity, population and genetic structure. Hence, HTS has already replaced standard Sanger sequencing in basic and applied research fields, but the next step is its implementation as a routine technology for the analysis of viruses in clinical settings. The most likely application of this implementation will be the analysis of viral genomics, because the huge population sizes, high mutation rates and very fast replacement of viral populations have demonstrated the limited information obtained with Sanger technology. In this review, we describe new technologies and provide guidelines for the high-throughput sequencing and genetic and evolutionary analyses of viral populations and metaviromes, including software applications. With the development of new HTS technologies, new and refurbished molecular and bioinformatic tools are also constantly being developed to process and integrate HTS data. These allow assembling viral genomes and inferring viral population diversity and dynamics. Finally, we also present several applications of these approaches to the analysis of viral clinical samples including transmission clusters and outbreak characterization.
Collapse
Affiliation(s)
- Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão 4485-661, Portugal
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain; Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain.
| | - Juan Carlos Galán
- Microbiology Service, Hospital Ramón y Cajal, Madrid, Spain; CIBER in Epidemiology and Public Health, Spain.
| | - Mª Alma Bracho
- CIBER in Epidemiology and Public Health, Spain; Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain.
| | - Julia Hillung
- Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| | - Neris García-González
- Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| | - Fernando González-Candelas
- CIBER in Epidemiology and Public Health, Spain; Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| |
Collapse
|
71
|
Prasanna AN, Gerber D, Kijpornyongpan T, Aime MC, Doyle VP, Nagy LG. Model Choice, Missing Data, and Taxon Sampling Impact Phylogenomic Inference of Deep Basidiomycota Relationships. Syst Biol 2020; 69:17-37. [PMID: 31062852 DOI: 10.1093/sysbio/syz029] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Revised: 04/21/2019] [Accepted: 04/26/2019] [Indexed: 11/12/2022] Open
Abstract
Resolving deep divergences in the tree of life is challenging even for analyses of genome-scale phylogenetic data sets. Relationships between Basidiomycota subphyla, the rusts and allies (Pucciniomycotina), smuts and allies (Ustilaginomycotina), and mushroom-forming fungi and allies (Agaricomycotina) were found particularly recalcitrant both to traditional multigene and genome-scale phylogenetics. Here, we address basal Basidiomycota relationships using concatenated and gene tree-based analyses of various phylogenomic data sets to examine the contribution of several potential sources of bias. We evaluate the contribution of biological causes (hard polytomy, incomplete lineage sorting) versus unmodeled evolutionary processes and factors that exacerbate their effects (e.g., fast-evolving sites and long-branch taxa) to inferences of basal Basidiomycota relationships. Bayesian Markov Chain Monte Carlo and likelihood mapping analyses reject the hard polytomy with confidence. In concatenated analyses, fast-evolving sites and oversimplified models of amino acid substitution favored the grouping of smuts with mushroom-forming fungi, often leading to maximal bootstrap support in both concatenation and coalescent analyses. On the contrary, the most conserved data subsets grouped rusts and allies with mushroom-forming fungi, although this relationship proved labile, sensitive to model choice, to different data subsets and to missing data. Excluding putative long-branch taxa, genes with high proportions of missing data and/or with strong signal failed to reveal a consistent trend toward one or the other topology, suggesting that additional sources of conflict are at play. While concatenated analyses yielded strong but conflicting support, individual gene trees mostly provided poor support for any resolution of rusts, smuts, and mushroom-forming fungi, suggesting that the true Basidiomycota tree might be in a part of tree space that is difficult to access using both concatenation and gene tree-based approaches. Inference-based assessments of absolute model fit strongly reject best-fit models for the vast majority of genes, indicating a poor fit of even the most commonly used models. While this is consistent with previous assessments of site-homogenous models of amino acid evolution, this does not appear to be the sole source of confounding signal. Our analyses suggest that topologies uniting smuts with mushroom-forming fungi can arise as a result of inappropriate modeling of amino acid sites that might be prone to systematic bias. We speculate that improved models of sequence evolution could shed more light on basal splits in the Basidiomycota, which, for now, remain unresolved despite the use of whole genome data.
Collapse
Affiliation(s)
- Arun N Prasanna
- Synthetic and Systems Biology Unit, Institute of Biochemistry, BRC-HAS, Szeged 6726, Hungary
| | - Daniel Gerber
- Synthetic and Systems Biology Unit, Institute of Biochemistry, BRC-HAS, Szeged 6726, Hungary.,Institute of Archaeology, Research Centre for the Humanities, Hungarian Academy of Sciences, Budapest 1097, Hungary
| | | | - M Catherine Aime
- Department of Botany and Plant Pathology, Purdue University, West Lafayette, IN 47907, USA
| | - Vinson P Doyle
- Department of Plant Pathology and Crop Physiology, Louisiana State University AgCenter, Baton Rouge, LA 70803, USA
| | - Laszlo G Nagy
- Synthetic and Systems Biology Unit, Institute of Biochemistry, BRC-HAS, Szeged 6726, Hungary
| |
Collapse
|
72
|
White ND, Braun MJ. Extracting phylogenetic signal from phylogenomic data: Higher-level relationships of the nightbirds (Strisores). Mol Phylogenet Evol 2019; 141:106611. [DOI: 10.1016/j.ympev.2019.106611] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 09/04/2019] [Accepted: 09/06/2019] [Indexed: 12/22/2022]
|
73
|
Naser-Khdour S, Minh BQ, Zhang W, Stone EA, Lanfear R. The Prevalence and Impact of Model Violations in Phylogenetic Analysis. Genome Biol Evol 2019; 11:3341-3352. [PMID: 31536115 PMCID: PMC6893154 DOI: 10.1093/gbe/evz193] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/03/2019] [Indexed: 12/24/2022] Open
Abstract
In phylogenetic inference, we commonly use models of substitution which assume that sequence evolution is stationary, reversible, and homogeneous (SRH). Although the use of such models is often criticized, the extent of SRH violations and their effects on phylogenetic inference of tree topologies and edge lengths are not well understood. Here, we introduce and apply the maximal matched-pairs tests of homogeneity to assess the scale and impact of SRH model violations on 3,572 partitions from 35 published phylogenetic data sets. We show that roughly one-quarter of all the partitions we analyzed (23.5%) reject the SRH assumptions, and that for 25% of data sets, tree topologies inferred from all partitions differ significantly from topologies inferred using the subset of partitions that do not reject the SRH assumptions. This proportion increases when comparing trees inferred using the subset of partitions that rejects the SRH assumptions, to those inferred from partitions that do not reject the SRH assumptions. These results suggest that the extent and effects of model violation in phylogenetics may be substantial. They highlight the importance of testing for model violations and possibly excluding partitions that violate models prior to tree reconstruction. Our results also suggest that further effort in developing models that do not require SRH assumptions could lead to large improvements in the accuracy of phylogenomic inference. The scripts necessary to perform the analysis are available in https://github.com/roblanf/SRHtests, and the new tests we describe are available as a new option in IQ-TREE (http://www.iqtree.org).
Collapse
Affiliation(s)
- Suha Naser-Khdour
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Bui Quang Minh
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
- Research School of Computer Science, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Wenqi Zhang
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Eric A Stone
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
| |
Collapse
|
74
|
Martín-Hernanz S, Aparicio A, Fernández-Mazuecos M, Rubio E, Reyes-Betancort JA, Santos-Guerra A, Olangua-Corral M, Albaladejo RG. Maximize Resolution or Minimize Error? Using Genotyping-By-Sequencing to Investigate the Recent Diversification of Helianthemum (Cistaceae). FRONTIERS IN PLANT SCIENCE 2019; 10:1416. [PMID: 31781140 PMCID: PMC6859804 DOI: 10.3389/fpls.2019.01416] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 10/11/2019] [Indexed: 05/27/2023]
Abstract
A robust phylogenetic framework, in terms of extensive geographical and taxonomic sampling, well-resolved species relationships and high certainty of tree topologies and branch length estimations, is critical in the study of macroevolutionary patterns. Whereas Sanger sequencing-based methods usually recover insufficient phylogenetic signal, especially in recently diversified lineages, reduced-representation sequencing methods tend to provide well-supported phylogenetic relationships, but usually entail remarkable bioinformatic challenges due to the inherent trade-off between the number of SNPs and the magnitude of associated error rates. The genus Helianthemum (Cistaceae) is a species-rich and taxonomically complex Palearctic group of plants that diversified mainly since the Upper Miocene. It is a challenging case study since previous attempts using Sanger sequencing were unable to resolve the intrageneric phylogenetic relationships. Aiming to obtain a robust phylogenetic reconstruction based on genotyping-by-sequencing (GBS), we established a rigorous methodological workflow in which we i) explored how variable settings during dataset assembly have an impact on error rates and on the degree of resolution under concatenation and coalescent approaches, ii) assessed the effect of two extreme parameter configurations (minimizing error rates vs. maximizing phylogenetic resolution) on tree topology and branch lengths, and iii) evaluated the effects of these two configurations on estimates of divergence times and diversification rates. Our analyses produced highly supported topologically congruent phylogenetic trees for both configurations. However, minimizing error rates did produce more reliable branch lengths, critically affecting the accuracy of downstream analyses (i.e. divergence times and diversification rates). In addition to recommending a revision of intrageneric systematics, our results enabled us to identify three highly diversified lineages in Helianthemum in contrasting geographical areas and ecological conditions, which started radiating in the Upper Miocene.
Collapse
Affiliation(s)
- Sara Martín-Hernanz
- Departamento de Biología Vegetal y Ecología, Universidad de Sevilla, Sevilla, Spain
| | - Abelardo Aparicio
- Departamento de Biología Vegetal y Ecología, Universidad de Sevilla, Sevilla, Spain
| | | | - Encarnación Rubio
- Departamento de Biología Vegetal y Ecología, Universidad de Sevilla, Sevilla, Spain
| | - J. Alfredo Reyes-Betancort
- Jardín de Aclimatación de la Orotava, Instituto Canario de Investigaciones Agrarias (ICIA), Santa Cruz de Tenerife, Spain
| | - Arnoldo Santos-Guerra
- Jardín de Aclimatación de la Orotava, Instituto Canario de Investigaciones Agrarias (ICIA), Santa Cruz de Tenerife, Spain
| | - María Olangua-Corral
- Departamento de Biología Reproductiva y Micro-morfología, Jardín Botánico Canario ‘Viera y Clavijo’—Unidad Asociada CSIC (Cabildo de Gran Canaria), Las Palmas de Gran Canaria, Spain
| | - Rafael G. Albaladejo
- Departamento de Biología Vegetal y Ecología, Universidad de Sevilla, Sevilla, Spain
| |
Collapse
|
75
|
Hill V, Baele G. Bayesian Estimation of Past Population Dynamics in BEAST 1.10 Using the Skygrid Coalescent Model. Mol Biol Evol 2019; 36:2620-2628. [PMID: 31364710 PMCID: PMC6805224 DOI: 10.1093/molbev/msz172] [Citation(s) in RCA: 84] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 06/24/2019] [Accepted: 07/12/2019] [Indexed: 12/24/2022] Open
Abstract
Inferring past population dynamics over time from heterochronous molecular sequence data is often achieved using the Bayesian Skygrid model, a nonparametric coalescent model that estimates the effective population size over time. Available in BEAST, a cross-platform program for Bayesian analysis of molecular sequences using Markov chain Monte Carlo, this coalescent model is often estimated in conjunction with a molecular clock model to produce time-stamped phylogenetic trees. We here provide a practical guide to using BEAST and its accompanying applications for the purpose of drawing inference under these models. We focus on best practices, potential pitfalls, and recommendations that can be generalized to other software packages for Bayesian inference. This protocol shows how to use TempEst, BEAUti, and BEAST 1.10 (http://beast.community/; last accessed July 29, 2019), LogCombiner as well as Tracer in a complete workflow.
Collapse
Affiliation(s)
- Verity Hill
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| |
Collapse
|
76
|
Li YX, Li ZH, Schuiteman A, Chase MW, Li JW, Huang WC, Hidayat A, Wu SS, Jin XH. Phylogenomics of Orchidaceae based on plastid and mitochondrial genomes. Mol Phylogenet Evol 2019; 139:106540. [DOI: 10.1016/j.ympev.2019.106540] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Revised: 06/05/2019] [Accepted: 06/18/2019] [Indexed: 10/26/2022]
|
77
|
Paetzold C, Wood KR, Eaton DAR, Wagner WL, Appelhans MS. Phylogeny of Hawaiian Melicope (Rutaceae): RAD-seq Resolves Species Relationships and Reveals Ancient Introgression. FRONTIERS IN PLANT SCIENCE 2019; 10:1074. [PMID: 31608076 PMCID: PMC6758601 DOI: 10.3389/fpls.2019.01074] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Accepted: 08/07/2019] [Indexed: 05/11/2023]
Abstract
Hawaiian Melicope are one of the major adaptive radiations of the Hawaiian Islands comprising 54 endemic species. The lineage is monophyletic with an estimated crown age predating the rise of the current high islands. Phylogenetic inference based on Sanger sequencing has not been sufficient to resolve species or deeper level relationships. Here, we apply restriction site-associated DNA sequencing (RAD-seq) to the lineage to infer phylogenetic relationships. We employ Quartet Sampling to assess information content and statistical support, and to quantify discordance as well as partitioned ABBA-BABA tests to uncover evidence of introgression. Our new results drastically improved resolution of relationships within Hawaiian Melicope. The lineage is divided into five fully supported main clades, two of which correspond to morphologically circumscribed infrageneric groups. We provide evidence for both ancestral and current hybridization events. We confirm the necessity for a taxonomic revision of the Melicope section Pelea, as well as a re-evaluation of several species complexes by combining genomic and morphological data.
Collapse
Affiliation(s)
- Claudia Paetzold
- Department of Systematics, Biodiversity and Evolution of Plants (with Herbarium), University of Göttingen, Goettingen, Germany
| | - Kenneth R. Wood
- National Tropical Botanical Garden, Kalaheo, HI, United States
| | - Deren A. R. Eaton
- Department of Ecology, Evolution and Environmental Biology, Columbia University, New York, NY, USA
- Department of Ecology, Evolution, and Environmental Biology, Columbia University, New York, NY, United States
| | - Warren L. Wagner
- Department of Botany, Smithsonian Institution, Washington, DC, United States
| | - Marc S. Appelhans
- Department of Systematics, Biodiversity and Evolution of Plants (with Herbarium), University of Göttingen, Goettingen, Germany
- Department of Botany, Smithsonian Institution, Washington, DC, United States
| |
Collapse
|
78
|
Quartet-Based Computations of Internode Certainty Provide Robust Measures of Phylogenetic Incongruence. Syst Biol 2019; 69:308-324. [DOI: 10.1093/sysbio/syz058] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Accepted: 08/26/2019] [Indexed: 11/14/2022] Open
Abstract
Abstract
Incongruence, or topological conflict, is prevalent in genome-scale data sets. Internode certainty (IC) and related measures were recently introduced to explicitly quantify the level of incongruence of a given internal branch among a set of phylogenetic trees and complement regular branch support measures (e.g., bootstrap, posterior probability) that instead assess the statistical confidence of inference. Since most phylogenomic studies contain data partitions (e.g., genes) with missing taxa and IC scores stem from the frequencies of bipartitions (or splits) on a set of trees, IC score calculation typically requires adjusting the frequencies of bipartitions from these partial gene trees. However, when the proportion of missing taxa is high, the scores yielded by current approaches that adjust bipartition frequencies in partial gene trees differ substantially from each other and tend to be overestimates. To overcome these issues, we developed three new IC measures based on the frequencies of quartets, which naturally apply to both complete and partial trees. Comparison of our new quartet-based measures to previous bipartition-based measures on simulated data shows that: (1) on complete data sets, both quartet-based and bipartition-based measures yield very similar IC scores; (2) IC scores of quartet-based measures on a given data set with and without missing taxa are more similar than the scores of bipartition-based measures; and (3) quartet-based measures are more robust to the absence of phylogenetic signal and errors in phylogenetic inference than bipartition-based measures. Additionally, the analysis of an empirical mammalian phylogenomic data set using our quartet-based measures reveals the presence of substantial levels of incongruence for numerous internal branches. An efficient open-source implementation of these quartet-based measures is freely available in the program QuartetScores (https://github.com/lutteropp/QuartetScores).
Collapse
|
79
|
Redmond AK, Zou J, Secombes CJ, Macqueen DJ, Dooley H. Discovery of All Three Types in Cartilaginous Fishes Enables Phylogenetic Resolution of the Origins and Evolution of Interferons. Front Immunol 2019; 10:1558. [PMID: 31354716 PMCID: PMC6640115 DOI: 10.3389/fimmu.2019.01558] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Accepted: 06/21/2019] [Indexed: 12/31/2022] Open
Abstract
Interferons orchestrate host antiviral responses in jawed vertebrates. They are categorized into three classes; IFN1 and IFN3 are the primary antiviral cytokine lineages, while IFN2 responds to a broader variety of pathogens. The evolutionary relationships within and between these three classes have proven difficult to resolve. Here, we reassess interferon evolution, considering key phylogenetic pitfalls including taxon sampling, alignment quality, model adequacy, and outgroup choice. We reveal that cartilaginous fishes, and hence the jawed vertebrate ancestor, possess(ed) orthologs of all three interferon classes. We show that IFN3 groups sister to IFN1, resolve the origins of the human IFN3 lineages, and find that intronless IFN3s emerged at least three times. IFN2 genes are highly conserved, except for IFN-γ-rel, which we confirm resulted from a teleost-specific duplication. Our analyses show that IFN1 phylogeny is highly sensitive to phylogenetic error. By accounting for this, we describe a new backbone IFN1 phylogeny that implies several IFN1 genes existed in the jawed vertebrate ancestor. One of these is represented by the intronless IFN1s of tetrapods, including mammalian-like repertoires of reptile IFN1s and a subset of amphibian IFN1s, in addition to newly-identified intron-containing shark IFN1 genes. IFN-f, previously only found in teleosts, likely represents another ancestral jawed vertebrate IFN1 family member, suggesting the current classification of fish IFN1s into two groups based on the number of cysteines may need revision. The providence of the remaining fish IFN1s and the coelacanth IFN1s proved difficult to resolve, but they may also be ancestral jawed vertebrate IFN1 lineages. Finally, a large group of amphibian-specific IFN1s falls sister to all other IFN1s and was likely also present in the jawed vertebrate ancestor. Our results verify that intronless IFN1s have evolved multiple times in amphibians and indicate that no one-to-one orthology exists between mammal and reptile IFN1s. Our data also imply that diversification of the multiple IFN1s present in the jawed vertebrate ancestor has occurred through a rapid birth-death process, consistent with functional maintenance over a 450-million-year host-pathogen arms race. In summary, this study reveals a new model of interferon evolution important to our understanding of jawed vertebrate antiviral immunity.
Collapse
Affiliation(s)
- Anthony K Redmond
- School of Biological Sciences, University of Aberdeen, Aberdeen, United Kingdom.,Centre for Genome-Enabled Biology and Medicine, University of Aberdeen, Aberdeen, United Kingdom.,Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland
| | - Jun Zou
- School of Biological Sciences, University of Aberdeen, Aberdeen, United Kingdom.,Scottish Fish Immunology Research Centre, Institute of Biological and Environmental Sciences, University of Aberdeen, Aberdeen, United Kingdom.,Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources, Ministry of Education, Shanghai Ocean University, Shanghai, China
| | - Christopher J Secombes
- School of Biological Sciences, University of Aberdeen, Aberdeen, United Kingdom.,Scottish Fish Immunology Research Centre, Institute of Biological and Environmental Sciences, University of Aberdeen, Aberdeen, United Kingdom
| | - Daniel J Macqueen
- School of Biological Sciences, University of Aberdeen, Aberdeen, United Kingdom.,The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Edinburgh, United Kingdom
| | - Helen Dooley
- School of Biological Sciences, University of Aberdeen, Aberdeen, United Kingdom.,Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD, United States.,Institute of Marine and Environmental Technology, Baltimore, MD, United States
| |
Collapse
|
80
|
A Robust Phylogenomic Time Tree for Biotechnologically and Medically Important Fungi in the Genera Aspergillus and Penicillium. mBio 2019; 10:mBio.00925-19. [PMID: 31289177 PMCID: PMC6747717 DOI: 10.1128/mbio.00925-19] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Understanding the evolution of traits across technologically and medically significant fungi requires a robust phylogeny. Even though species in the Aspergillus and Penicillium genera (family Aspergillaceae, class Eurotiomycetes) are some of the most significant technologically and medically relevant fungi, we still lack a genome-scale phylogeny of the lineage or knowledge of the parts of the phylogeny that exhibit conflict among analyses. Here, we used a phylogenomic approach to infer evolutionary relationships among 81 genomes that span the diversity of Aspergillus and Penicillium species, to identify conflicts in the phylogeny, and to determine the likely underlying factors of the observed conflicts. Using a data matrix comprised of 1,668 genes, we found that while most branches of the phylogeny of the Aspergillaceae are robustly supported and recovered irrespective of method of analysis, a few exhibit various degrees of conflict among our analyses. Further examination of the observed conflict revealed that it largely stems from incomplete lineage sorting and hybridization or introgression. Our analyses provide a robust and comprehensive evolutionary genomic roadmap for this important lineage, which will facilitate the examination of the diverse technologically and medically relevant traits of these fungi in an evolutionary context. The filamentous fungal family Aspergillaceae contains >1,000 known species, mostly in the genera Aspergillus and Penicillium. Several species are used in the food, biotechnology, and drug industries (e.g., Aspergillus oryzae and Penicillium camemberti), while others are dangerous human and plant pathogens (e.g., Aspergillus fumigatus and Penicillium digitatum). To infer a robust phylogeny and pinpoint poorly resolved branches and their likely underlying contributors, we used 81 genomes spanning the diversity of Aspergillus and Penicillium to construct a 1,668-gene data matrix. Phylogenies of the nucleotide and amino acid versions of this full data matrix as well as of several additional data matrices were generated using three different maximum likelihood schemes (i.e., gene-partitioned, unpartitioned, and coalescence) and using both site-homogenous and site-heterogeneous models (total of 64 species-level phylogenies). Examination of the topological agreement among these phylogenies and measures of internode certainty identified 11/78 (14.1%) bipartitions that were incongruent and pinpointed the likely underlying contributing factors, which included incomplete lineage sorting, hidden paralogy, hybridization or introgression, and reconstruction artifacts associated with poor taxon sampling. Relaxed molecular clock analyses suggest that Aspergillaceae likely originated in the lower Cretaceous and that the Aspergillus and Penicillium genera originated in the upper Cretaceous. Our results shed light on the ongoing debate on Aspergillus systematics and taxonomy and provide a robust evolutionary and temporal framework for comparative genomic analyses in Aspergillaceae. More broadly, our approach provides a general template for phylogenomic identification of resolved and contentious branches in densely genome-sequenced lineages across the tree of life.
Collapse
|
81
|
Tagliacollo VA, Lanfear R. Estimating Improved Partitioning Schemes for Ultraconserved Elements. Mol Biol Evol 2019; 35:1798-1811. [PMID: 29659989 PMCID: PMC5995204 DOI: 10.1093/molbev/msy069] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Ultraconserved (UCEs) are popular markers for phylogenomic studies. They are relatively simple to collect from distantly-related organisms, and contain sufficient information to infer relationships at almost all taxonomic levels. Most studies of UCEs use partitioning to account for variation in rates and patterns of molecular evolution among sites, for example by estimating an independent model of molecular evolution for each UCE. However, rates and patterns of molecular evolution vary substantially within as well as between UCEs, suggesting that there may be opportunities to improve how UCEs are partitioned for phylogenetic inference. We propose and evaluate new partitioning methods for phylogenomic studies of UCEs: Sliding-Window Site Characteristics (SWSC), and UCE Site Position (UCESP). The first method uses site characteristics such as entropy, multinomial likelihood, and GC content to generate partitions that account for heterogeneity in rates and patterns of molecular evolution within each UCE. The second method groups together nucleotides that are found in similar physical locations within the UCEs. We examined the new methods with seven published data sets from a variety of taxa. We demonstrate the UCESP method generates partitions that are worse than other strategies used to partition UCE data sets (e.g., one partition per UCE). The SWSC method, particularly when based on site entropies, generates partitions that account for within-UCE heterogeneity and leads to large increases in the model fit. All of the methods, code, and data used in this study, are available from https://github.com/Tagliacollo/PartitionUCE. Simplified code for implementing the best method, the SWSC-EN, is available from https://github.com/Tagliacollo/PFinderUCE-SWSC-EN.
Collapse
Affiliation(s)
- Victor A Tagliacollo
- Programa de Pós-graduação Ciências do Ambiente (CIAMB), Universidade Federal do Tocantins, Palmas, Tocantins, Brazil.,Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australia
| | - Robert Lanfear
- Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australia
| |
Collapse
|
82
|
de Lajudie PM, Andrews M, Ardley J, Eardly B, Jumas-Bilak E, Kuzmanović N, Lassalle F, Lindström K, Mhamdi R, Martínez-Romero E, Moulin L, Mousavi SA, Nesme X, Peix A, Puławska J, Steenkamp E, Stępkowski T, Tian CF, Vinuesa P, Wei G, Willems A, Zilli J, Young P. Minimal standards for the description of new genera and species of rhizobia and agrobacteria. Int J Syst Evol Microbiol 2019; 69:1852-1863. [PMID: 31140963 DOI: 10.1099/ijsem.0.003426] [Citation(s) in RCA: 114] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Herein the members of the Subcommittee on Taxonomy of Rhizobia and Agrobacteria of the International Committee on Systematics of Prokaryotes review recent developments in rhizobial and agrobacterial taxonomy and propose updated minimal standards for the description of new species (and genera) in these groups. The essential requirements (minimal standards) for description of a new species are (1) a genome sequence of at least the proposed type strain and (2) evidence for differentiation from other species based on genome sequence comparisons. It is also recommended that (3) genetic variation within the species is documented with sequence data from several clearly different strains and (4) phenotypic features are described, and their variation documented with data from a relevant set of representative strains. Furthermore, it is encouraged that information is provided on (5) nodulation or pathogenicity phenotypes, as appropriate, with relevant gene sequences. These guidelines supplement the current rules of general bacterial taxonomy, which require (6) a name that conforms to the International Code of Nomenclature of Prokaryotes, (7) validation of the name by publication either directly in the International Journal of Systematic and Evolutionary Microbiology or in a validation list when published elsewhere, and (8) deposition of the type strain in two international culture collections in separate countries.
Collapse
Affiliation(s)
| | - Mitchell Andrews
- 2Faculty of Agriculture and Life Sciences, Lincoln University, Lincoln 7647, New Zealand
| | - Julie Ardley
- 3School of Veterinary and Life Sciences, Murdoch University, Murdoch, Australia
| | | | - Estelle Jumas-Bilak
- 5UMR 5569, Department of Microbiology, Faculty of Pharmacy, University of Montpellier, France
| | - Nemanja Kuzmanović
- 6Julius Kühn-Institut, Federal Research Centre for Cultivated Plants, Institute for Epidemiology and Pathogen Diagnostics, Messeweg 11/12, 38104 Braunschweig, Germany
| | - Florent Lassalle
- 7Department of Infectious Disease Epidemiology - MRC Centre for Outbreak Analysis and Modelling, St Mary's Hospital, Praed Street, London W2 1NY, UK
| | - Kristina Lindström
- 8Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki FI-00014, Finland
| | - Ridha Mhamdi
- 9Centre of Biotechnology of Borj-Cedria, BP 901 Hammam-lif 2050, Tunisia
| | - Esperanza Martínez-Romero
- 10Centro de Ciencias Genómicas, Universidad Nacional Autónoma de Mexico, Cuernavaca, Morelos, Mexico
| | - Lionel Moulin
- 11IRD, CIRAD, University of Montpellier, IPME, Montpellier, France
| | - Seyed Abdollah Mousavi
- 8Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki FI-00014, Finland
| | - Xavier Nesme
- 12LEM, UCBL, CNRS, INRA, Univ Lyon, Villeurbanne, France
| | - Alvaro Peix
- 13Instituto de Recursos Naturales y Agrobiología, IRNASA-CSIC, c/Cordel de Merinas 40-52, 37008 Salamanca, Spain
| | - Joanna Puławska
- 14Department of Phytopathology, Research Institute of Horticulture, ul. Konstytucji 3 Maja 1/3, 96-100 Skierniewice, Poland
| | - Emma Steenkamp
- 15Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria 0002, South Africa
| | - Tomasz Stępkowski
- 16Autonomous Department of Microbial Biology, Faculty of Agriculture and Biology, Warsaw University of Life Sciences (SGGW), Nowoursynowska 159, 02-776 Warsaw, Poland
| | - Chang-Fu Tian
- 17State Key Laboratory of Agrobiotechnology, MOA Key Laboratory of Soil Microbiology, Rhizobium Research Center, College of Biological Sciences, China Agricultural University, 100193, Beijing, PR China
| | - Pablo Vinuesa
- 10Centro de Ciencias Genómicas, Universidad Nacional Autónoma de Mexico, Cuernavaca, Morelos, Mexico
| | - Gehong Wei
- 18Northwest A&F University, Yangling, Shaanxi, PR China
| | - Anne Willems
- 19Department Biochemistry and Microbiology, Lab. Microbiology, Ghent University, Belgium
| | - Jerri Zilli
- 20Embrapa Agrobiologia, BR 465 km 07, Seropédica, Rio de Janeiro, Brazil, 23891-000, Brazil
| | - Peter Young
- 21Department of Biology, University of York, York YO10 5DD, UK
| |
Collapse
|
83
|
Palmer M, Venter SN, McTaggart AR, Coetzee MPA, Van Wyk S, Avontuur JR, Beukes CW, Fourie G, Santana QC, Van Der Nest MA, Blom J, Steenkamp ET. The synergistic effect of concatenation in phylogenomics: the case in Pantoea. PeerJ 2019; 7:e6698. [PMID: 31024760 PMCID: PMC6474361 DOI: 10.7717/peerj.6698] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Accepted: 02/26/2019] [Indexed: 11/29/2022] Open
Abstract
With the increased availability of genome sequences for bacteria, it has become routine practice to construct genome-based phylogenies. These phylogenies have formed the basis for various taxonomic decisions, especially for resolving problematic relationships between taxa. Despite the popularity of concatenating shared genes to obtain well-supported phylogenies, various issues regarding this combined-evidence approach have been raised. These include the introduction of phylogenetic error into datasets, as well as incongruence due to organism-level evolutionary processes, particularly horizontal gene transfer and incomplete lineage sorting. Because of the huge effect that this could have on phylogenies, we evaluated the impact of phylogenetic conflict caused by organism-level evolutionary processes on the established species phylogeny for Pantoea, a member of the Enterobacterales. We explored the presence and distribution of phylogenetic conflict at the gene partition and nucleotide levels, by identifying putative inter-lineage recombination events that might have contributed to such conflict. Furthermore, we determined whether smaller, randomly constructed datasets had sufficient signal to reconstruct the current species tree hypothesis or if they would be overshadowed by phylogenetic incongruence. We found that no individual gene tree was fully congruent with the species phylogeny of Pantoea, although many of the expected nodes were supported by various individual genes across the genome. Evidence of recombination was found across all lineages within Pantoea, and provides support for organism-level evolutionary processes as a potential source of phylogenetic conflict. The phylogenetic signal from at least 70 random genes recovered robust, well-supported phylogenies for the backbone and most species relationships of Pantoea, and was unaffected by phylogenetic conflict within the dataset. Furthermore, despite providing limited resolution among taxa at the level of single gene trees, concatenated analyses of genes that were identified as having no signal resulted in a phylogeny that resembled the species phylogeny of Pantoea. This distribution of signal and noise across the genome presents the ideal situation for phylogenetic inference, as the topology from a ≥70-gene concatenated species phylogeny is not driven by single genes, and our data suggests that this finding may also hold true for smaller datasets. We thus argue that, by using a concatenation-based approach in phylogenomics, one can obtain robust phylogenies due to the synergistic effect of the combined signal obtained from multiple genes.
Collapse
Affiliation(s)
- Marike Palmer
- Department of Biochemistry, Genetics and Microbiology, DST-NRF Centre of Excellence in Tree Health Biotechnology (CTHB) and Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, Gauteng, South Africa
| | - Stephanus N Venter
- Department of Biochemistry, Genetics and Microbiology, DST-NRF Centre of Excellence in Tree Health Biotechnology (CTHB) and Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, Gauteng, South Africa
| | - Alistair R McTaggart
- Department of Biochemistry, Genetics and Microbiology, DST-NRF Centre of Excellence in Tree Health Biotechnology (CTHB) and Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, Gauteng, South Africa.,Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, Queensland, Australia
| | - Martin P A Coetzee
- Department of Biochemistry, Genetics and Microbiology, DST-NRF Centre of Excellence in Tree Health Biotechnology (CTHB) and Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, Gauteng, South Africa
| | - Stephanie Van Wyk
- Department of Biochemistry, Genetics and Microbiology, DST-NRF Centre of Excellence in Tree Health Biotechnology (CTHB) and Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, Gauteng, South Africa
| | - Juanita R Avontuur
- Department of Biochemistry, Genetics and Microbiology, DST-NRF Centre of Excellence in Tree Health Biotechnology (CTHB) and Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, Gauteng, South Africa
| | - Chrizelle W Beukes
- Department of Biochemistry, Genetics and Microbiology, DST-NRF Centre of Excellence in Tree Health Biotechnology (CTHB) and Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, Gauteng, South Africa
| | - Gerda Fourie
- Department of Biochemistry, Genetics and Microbiology, DST-NRF Centre of Excellence in Tree Health Biotechnology (CTHB) and Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, Gauteng, South Africa
| | - Quentin C Santana
- Department of Biochemistry, Genetics and Microbiology, DST-NRF Centre of Excellence in Tree Health Biotechnology (CTHB) and Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, Gauteng, South Africa
| | - Magriet A Van Der Nest
- Department of Biochemistry, Genetics and Microbiology, DST-NRF Centre of Excellence in Tree Health Biotechnology (CTHB) and Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, Gauteng, South Africa
| | - Jochen Blom
- Bioinformatics and Systems Biology, Justus Liebig Universität Gießen, Giessen, Germany
| | - Emma T Steenkamp
- Department of Biochemistry, Genetics and Microbiology, DST-NRF Centre of Excellence in Tree Health Biotechnology (CTHB) and Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, Gauteng, South Africa
| |
Collapse
|
84
|
Abstract
The protein titin plays a key role in vertebrate muscle where it acts like a giant molecular spring. Despite its importance and conservation over vertebrate evolution, a lack of high quality annotations in non-model species makes comparative evolutionary studies of titin challenging. The PEVK region of titin—named for its high proportion of Pro-Glu-Val-Lys amino acids—is particularly difficult to annotate due to its abundance of alternatively spliced isoforms and short, highly repetitive exons. To understand PEVK evolution across mammals, we developed a bioinformatics tool, PEVK_Finder, to annotate PEVK exons from genomic sequences of titin and applied it to a diverse set of mammals. PEVK_Finder consistently outperforms standard annotation tools across a broad range of conditions and improves annotations of the PEVK region in non-model mammalian species. We find that the PEVK region can be divided into two subregions (PEVK-N, PEVK-C) with distinct patterns of evolutionary constraint and divergence. The bipartite nature of the PEVK region has implications for titin diversification. In the PEVK-N region, certain exons are conserved and may be essential, but natural selection also acts on particular codons. In the PEVK-C, exons are more homogenous and length variation of the PEVK region may provide the raw material for evolutionary adaptation in titin function. The PEVK-C region can be further divided into a highly repetitive region (PEVK-CA) and one that is more variable (PEVK-CB). Taken together, we find that the very complexity that makes titin a challenge for annotation tools may also promote evolutionary adaptation.
Collapse
|
85
|
Jones CT, Youssef N, Susko E, Bielawski JP. Phenomenological Load on Model Parameters Can Lead to False Biological Conclusions. Mol Biol Evol 2019; 35:1473-1488. [PMID: 29596684 DOI: 10.1093/molbev/msy049] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
When a substitution model is fitted to an alignment using maximum likelihood, its parameters are adjusted to account for as much site-pattern variation as possible. A parameter might therefore absorb a substantial quantity of the total variance in an alignment (or more formally, bring about a substantial reduction in the deviance of the fitted model) even if the process it represents played no role in the generation of the data. When this occurs, we say that the parameter estimate carries phenomenological load (PL). Large PL in a parameter estimate is a concern because it not only invalidates its mechanistic interpretation (if it has one) but also increases the likelihood that it will be found to be statistically significant. The problem of PL was not identified in the past because most off-the-shelf substitution models make simplifying assumptions that preclude the generation of realistic levels of variation. In this study, we use the more realistic mutation-selection framework as the basis of a generating model formulated to produce data that mimic an alignment of mammalian mitochondrial DNA. We show that a parameter estimate can carry PL when 1) the substitution model is underspecified and 2) the parameter represents a process that is confounded with other processes represented in the data-generating model. We then provide a method that can be used to identify signal for the process that a given parameter represents despite the existence of PL.
Collapse
Affiliation(s)
- Christopher T Jones
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada
| | - Noor Youssef
- Department of Biology, Dalhousie University, Halifax, NS, Canada
| | - Edward Susko
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada
| | | |
Collapse
|
86
|
Mongiardino Koch N. The phylogenomic revolution and its conceptual innovations: a text mining approach. ORG DIVERS EVOL 2019. [DOI: 10.1007/s13127-019-00397-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
|
87
|
Bravo GA, Antonelli A, Bacon CD, Bartoszek K, Blom MPK, Huynh S, Jones G, Knowles LL, Lamichhaney S, Marcussen T, Morlon H, Nakhleh LK, Oxelman B, Pfeil B, Schliep A, Wahlberg N, Werneck FP, Wiedenhoeft J, Willows-Munro S, Edwards SV. Embracing heterogeneity: coalescing the Tree of Life and the future of phylogenomics. PeerJ 2019; 7:e6399. [PMID: 30783571 PMCID: PMC6378093 DOI: 10.7717/peerj.6399] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Accepted: 01/07/2019] [Indexed: 12/23/2022] Open
Abstract
Building the Tree of Life (ToL) is a major challenge of modern biology, requiring advances in cyberinfrastructure, data collection, theory, and more. Here, we argue that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by high-throughput sequencing (HTS). Such signals include those most commonly encountered in phylogenomic datasets, such as incomplete lineage sorting, but also those reticulate processes emerging with greater frequency, such as recombination and introgression. Here we focus specifically on how phylogenetic methods can accommodate the heterogeneity incurred by such population genetic processes; we do not discuss phylogenetic methods that ignore such processes, such as concatenation or supermatrix approaches or supertrees. We suggest that methods of data acquisition and the types of markers used in phylogenomics will remain restricted until a posteriori methods of marker choice are made possible with routine whole-genome sequencing of taxa of interest. We discuss limitations and potential extensions of a model supporting innovation in phylogenomics today, the multispecies coalescent model (MSC). Macroevolutionary models that use phylogenies, such as character mapping, often ignore the heterogeneity on which building phylogenies increasingly rely and suggest that assimilating such heterogeneity is an important goal moving forward. Finally, we argue that an integrative cyberinfrastructure linking all steps of the process of building the ToL, from specimen acquisition in the field to publication and tracking of phylogenomic data, as well as a culture that values contributors at each step, are essential for progress.
Collapse
Affiliation(s)
- Gustavo A. Bravo
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | - Alexandre Antonelli
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
- Gothenburg Botanical Garden, Göteborg, Sweden
| | - Christine D. Bacon
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Krzysztof Bartoszek
- Department of Computer and Information Science, Linköping University, Linköping, Sweden
| | - Mozes P. K. Blom
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden
| | - Stella Huynh
- Institut de Biologie, Université de Neuchâtel, Neuchâtel, Switzerland
| | - Graham Jones
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - L. Lacey Knowles
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Sangeet Lamichhaney
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | - Thomas Marcussen
- Centre for Ecological and Evolutionary Synthesis, University of Oslo, Oslo, Norway
| | - Hélène Morlon
- Institut de Biologie, Ecole Normale Supérieure de Paris, Paris, France
| | - Luay K. Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Bengt Oxelman
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Bernard Pfeil
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Alexander Schliep
- Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
| | | | - Fernanda P. Werneck
- Coordenação de Biodiversidade, Programa de Coleções Científicas Biológicas, Instituto Nacional de Pesquisa da Amazônia, Manaus, AM, Brazil
| | - John Wiedenhoeft
- Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
- Department of Computer Science, Rutgers University, Piscataway, NJ, USA
| | - Sandi Willows-Munro
- School of Life Sciences, University of Kwazulu-Natal, Pietermaritzburg, South Africa
| | - Scott V. Edwards
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
- Gothenburg Centre for Advanced Studies in Science and Technology, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
| |
Collapse
|
88
|
Herman JL. Enhancing Statistical Multiple Sequence Alignment and Tree Inference Using Structural Information. Methods Mol Biol 2019; 1851:183-214. [PMID: 30298398 DOI: 10.1007/978-1-4939-8736-8_10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
For highly divergent sequences, there is often insufficient information to reliably construct alignments and phylogenetic trees. Since protein structure may be strongly conserved despite large divergences in sequence, structural information can be used to help identify homology in such cases.While there exist well-studied models of sequence evolution, structurally informed alignment methods have typically made use of geometric measures of deviation that do not take into account the underlying mutational processes. In order to integrate structural information into sequence-based evolutionary models, we recently developed a stochastic model of structural evolution on a phylogenetic tree and implemented this as the StructAlign plugin for the StatAlign statistical alignment package.In this chapter, we will outline the types of analyses that can be carried out using StructAlign, illustrating how the inclusion of structural information can be used to inform joint estimation of alignments and trees. StructAlign can also be used to infer branch-specific rates of structural evolution, and analysis of an example globin dataset highlights strong variation in the inferred rate across the tree. While structure is more highly conserved within clades, the rate of structural divergence as a function of sequence variation is larger between functionally divergent proteins. Allowing for the rate of structural divergence to vary over the tree results in an improved fit to the empirically observed pairwise RMSD values.
Collapse
Affiliation(s)
- Joseph L Herman
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
89
|
Abstract
In this chapter, we focus on the computational challenges associated with statistical phylogenomics and how use of the broad-platform evolutionary analysis general likelihood evaluator (BEAGLE), a high-performance library for likelihood computation, can help to substantially reduce computation time in phylogenomic and phylodynamic analyses. We discuss computational improvements brought about by the BEAGLE library on a variety of state-of-the-art multicore hardware, and for a range of commonly used evolutionary models. For data sets of varying dimensions, we specifically focus on comparing performance in the Bayesian evolutionary analysis by sampling trees (BEAST) software between multicore central processing units (CPUs) and a wide range of graphics processing cards (GPUs). We put special emphasis on computational benchmarks from the field of phylodynamics, which combines the challenges of phylogenomics with those of modelling trait data associated with the observed sequence data. In conclusion, we show that for increasingly large molecular sequence data sets, GPUs can offer tremendous computational advancements through the use of the BEAGLE library, which is available for software packages for both Bayesian inference and maximum-likelihood frameworks.
Collapse
|
90
|
Richards EJ, Brown JM, Barley AJ, Chong RA, Thomson RC. Variation Across Mitochondrial Gene Trees Provides Evidence for Systematic Error: How Much Gene Tree Variation Is Biological? Syst Biol 2018; 67:847-860. [PMID: 29471536 DOI: 10.1093/sysbio/syy013] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Accepted: 02/15/2018] [Indexed: 12/28/2022] Open
Abstract
The use of large genomic data sets in phylogenetics has highlighted extensive topological variation across genes. Much of this discordance is assumed to result from biological processes. However, variation among gene trees can also be a consequence of systematic error driven by poor model fit, and the relative importance of biological vs. methodological factors in explaining gene tree variation is a major unresolved question. Using mitochondrial genomes to control for biological causes of gene tree variation, we estimate the extent of gene tree discordance driven by systematic error and employ posterior prediction to highlight the role of model fit in producing this discordance. We find that the amount of discordance among mitochondrial gene trees is similar to the amount of discordance found in other studies that assume only biological causes of variation. This similarity suggests that the role of systematic error in generating gene tree variation is underappreciated and critical evaluation of fit between assumed models and the data used for inference is important for the resolution of unresolved phylogenetic questions.
Collapse
Affiliation(s)
- Emilie J Richards
- Department of Biology, University of Hawai'i, 2538 McCarthy Mall, Edmondson Hall 2016, Honolulu, HI 96822, USA.,Department of Biology, University of North Carolina, 120 South Road, Coker Hall CB 3280 Chapel Hill, NC 27599, USA
| | - Jeremy M Brown
- Department of Biological Sciences and Museum of Natural Science, Louisiana State University, 202 Life Sciences Building, Baton Rouge, LA 70803, USA
| | - Anthony J Barley
- Department of Biology, University of Hawai'i, 2538 McCarthy Mall, Edmondson Hall 2016, Honolulu, HI 96822, USA
| | - Rebecca A Chong
- Department of Biology, University of Hawai'i, 2538 McCarthy Mall, Edmondson Hall 2016, Honolulu, HI 96822, USA
| | - Robert C Thomson
- Department of Biology, University of Hawai'i, 2538 McCarthy Mall, Edmondson Hall 2016, Honolulu, HI 96822, USA
| |
Collapse
|
91
|
Pérez-Losada M, Arenas M, Castro-Nallar E. Microbial sequence typing in the genomic era. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2018; 63:346-359. [PMID: 28943406 PMCID: PMC5908768 DOI: 10.1016/j.meegid.2017.09.022] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Revised: 09/18/2017] [Accepted: 09/19/2017] [Indexed: 12/18/2022]
Abstract
Next-generation sequencing (NGS), also known as high-throughput sequencing, is changing the field of microbial genomics research. NGS allows for a more comprehensive analysis of the diversity, structure and composition of microbial genes and genomes compared to the traditional automated Sanger capillary sequencing at a lower cost. NGS strategies have expanded the versatility of standard and widely used typing approaches based on nucleotide variation in several hundred DNA sequences and a few gene fragments (MLST, MLVA, rMLST and cgMLST). NGS can now accommodate variation in thousands or millions of sequences from selected amplicons to full genomes (WGS, NGMLST and HiMLST). To extract signals from high-dimensional NGS data and make valid statistical inferences, novel analytic and statistical techniques are needed. In this review, we describe standard and new approaches for microbial sequence typing at gene and genome levels and guidelines for subsequent analysis, including methods and computational frameworks. We also present several applications of these approaches to some disciplines, namely genotyping, phylogenetics and molecular epidemiology.
Collapse
Affiliation(s)
- Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Ashburn, VA 20147, USA; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão 4485-661, Portugal; Children's National Medical Center, Washington, DC 20010, USA.
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | - Eduardo Castro-Nallar
- Universidad Andrés Bello, Center for Bioinformatics and Integrative Biology, Facultad de Ciencias Biológicas, Santiago 8370146, Chile
| |
Collapse
|
92
|
Mongiardino Koch N, Gauthier JA. Noise and biases in genomic data may underlie radically different hypotheses for the position of Iguania within Squamata. PLoS One 2018; 13:e0202729. [PMID: 30133514 PMCID: PMC6105018 DOI: 10.1371/journal.pone.0202729] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Accepted: 08/08/2018] [Indexed: 12/23/2022] Open
Abstract
Squamate reptiles are a major component of vertebrate biodiversity whose crown-clade traces its origin to a narrow window of time in the Mesozoic during which the main subclades diverged in rapid succession. Deciphering phylogenetic relationships among these lineages has proven challenging given the conflicting signals provided by genomic and phenomic data. Most notably, the placement of Iguania has routinely differed between data sources, with morphological evidence supporting a sister relationship to the remaining squamates (Scleroglossa hypothesis) and molecular data favoring a highly nested position alongside snakes and anguimorphs (Toxicofera hypothesis). We provide novel insights by generating an expanded morphological dataset and exploring the presence of phylogenetic signal, noise, and biases in molecular data. Our analyses confirm the presence of strong conflicting signals for the position of Iguania between morphological and molecular datasets. However, we also find that molecular data behave highly erratically when inferring the deepest branches of the squamate tree, a consequence of limited phylogenetic signal to resolve this ancient radiation with confidence. This, in turn, seems to result from a rate of evolution that is too high for historical signals to survive to the present. Finally, we detect significant systematic biases, with iguanians and snakes sharing faster rates of molecular evolution and a similarly biased nucleotide composition. A combination of scant phylogenetic signal, high levels of noise, and the presence of systematic biases could result in the misplacement of Iguania. We regard this explanation to be at least as plausible as the complex scenario of convergence and reversals required for morphological data to be misleading. We further evaluate and discuss the utility of morphological data to resolve ancient radiations, as well as its impact in combined-evidence phylogenomic analyses, with results relevant for the assessment of evidence and conflict across the Tree of Life.
Collapse
Affiliation(s)
- Nicolás Mongiardino Koch
- Department of Geology and Geophysics, Yale University, New Haven, Connecticut, United States of America
| | - Jacques A. Gauthier
- Department of Geology and Geophysics, Yale University, New Haven, Connecticut, United States of America
- Yale Peabody Museum of Natural History, New Haven, Connecticut, United States of America
| |
Collapse
|
93
|
Mahfouz N, Caucci S, Achatz E, Semmler T, Guenther S, Berendonk TU, Schroeder M. High genomic diversity of multi-drug resistant wastewater Escherichia coli. Sci Rep 2018; 8:8928. [PMID: 29895899 PMCID: PMC5997705 DOI: 10.1038/s41598-018-27292-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2017] [Accepted: 05/18/2018] [Indexed: 12/13/2022] Open
Abstract
Wastewater treatment plants play an important role in the emergence of antibiotic resistance. They provide a hot spot for exchange of resistance within and between species. Here, we analyse and quantify the genomic diversity of the indicator Escherichia coli in a German wastewater treatment plant and we relate it to isolates’ antibiotic resistance. Our results show a surprisingly large pan-genome, which mirrors how rich an environment a treatment plant is. We link the genomic analysis to a phenotypic resistance screen and pinpoint genomic hot spots, which correlate with a resistance phenotype. Besides well-known resistance genes, this forward genomics approach generates many novel genes, which correlated with resistance and which are partly completely unknown. A surprising overall finding of our analyses is that we do not see any difference in resistance and pan genome size between isolates taken from the inflow of the treatment plant and from the outflow. This means that while treatment plants reduce the amount of bacteria released into the environment, they do not reduce the potential for antibiotic resistance of these bacteria.
Collapse
Affiliation(s)
| | - Serena Caucci
- Institute for Hydrobiology, TU Dresden, Dresden, Germany.,United Nations University Institute for Integrated Management of Material Fluxes and of Resources, Dresden, Germany
| | | | - Torsten Semmler
- Institute of Microbiology und Epizootics, FU, Berlin, Germany
| | - Sebastian Guenther
- Institute of Microbiology und Epizootics, FU, Berlin, Germany.,Institut für Pharmazie Pharmazeutische Biologie, Ernst-Moritz-Arndt-Universität Greifswald, Greifswald, Germany
| | | | | |
Collapse
|
94
|
Fang L, Leliaert F, Novis PM, Zhang Z, Zhu H, Liu G, Penny D, Zhong B. Improving phylogenetic inference of core Chlorophyta using chloroplast sequences with strong phylogenetic signals and heterogeneous models. Mol Phylogenet Evol 2018; 127:248-255. [PMID: 29885933 DOI: 10.1016/j.ympev.2018.06.006] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Revised: 05/26/2018] [Accepted: 06/04/2018] [Indexed: 01/09/2023]
Abstract
Phylogenetic relationships within the green algal phylum Chlorophyta have proven difficult to resolve. The core Chlorophyta include Chlorophyceae, Ulvophyceae, Trebouxiophyceae, Pedinophyceae and Chlorodendrophyceae, but the relationships among these classes remain unresolved and the monophyly of Ulvophyceae and Trebouxiophyceae are highly controversial. We analyzed a dataset of 101 green algal species and 73 protein-coding genes sampled from complete and partial chloroplast genomes, including six newly sequenced ulvophyte genomes (Blidingia minima NIES-1837, Ulothrix zonata, Halochlorococcum sp. NIES-1838, Scotinosphaera sp. NIES-154, Caulerpa brownii and Cephaleuros sp. HZ-2017). We applied the Tree Certainty (TC) score to quantify the level of incongruence between phylogenetic trees in chloroplast genomic datasets, and show that the conflicting phylogenetic trees of core Chlorophyta stem from the most GC-heterogeneous sites. With removing the most GC-heterogeneous sites, our chloroplast phylogenomic analyses using heterogeneous models consistently support monophyly of the Chlorophyceae and of the Trebouxiophyceae, but the Ulvophyceae was resolved as polyphyletic. Our analytical framework provides an efficient approach to reconstruct the optimal phylogenetic relationships by minimizing conflicting signals.
Collapse
Affiliation(s)
- Ling Fang
- College of Life Sciences, Nanjing Normal University, Nanjing 210023, China
| | - Frederik Leliaert
- Botanic Garden Meise, 1860 Meise, Belgium; Phycology Research Group, Biology Department, Ghent University, 9000 Ghent, Belgium
| | - Phil M Novis
- Allan Herbarium, Manaaki Whenua-Landcare Research, Lincoln 7640, New Zealand
| | - Zhenhua Zhang
- College of Life Sciences, Nanjing Normal University, Nanjing 210023, China
| | - Huan Zhu
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Guoxiang Liu
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - David Penny
- Institute of Fundamental Sciences, Massey University, Palmerston North 4442, New Zealand
| | - Bojian Zhong
- College of Life Sciences, Nanjing Normal University, Nanjing 210023, China.
| |
Collapse
|
95
|
Bell CD, Gonzalez LA. Exploring the utility of “next-generation” sequence data on inferring the phylogeny of the South American Valeriana (Valerianaceae). Mol Phylogenet Evol 2018; 123:44-49. [DOI: 10.1016/j.ympev.2018.02.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Revised: 02/14/2018] [Accepted: 02/14/2018] [Indexed: 10/18/2022]
|
96
|
Vinuesa P, Ochoa-Sánchez LE, Contreras-Moreira B. GET_PHYLOMARKERS, a Software Package to Select Optimal Orthologous Clusters for Phylogenomics and Inferring Pan-Genome Phylogenies, Used for a Critical Geno-Taxonomic Revision of the Genus Stenotrophomonas. Front Microbiol 2018; 9:771. [PMID: 29765358 PMCID: PMC5938378 DOI: 10.3389/fmicb.2018.00771] [Citation(s) in RCA: 97] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Accepted: 04/05/2018] [Indexed: 12/17/2022] Open
Abstract
The massive accumulation of genome-sequences in public databases promoted the proliferation of genome-level phylogenetic analyses in many areas of biological research. However, due to diverse evolutionary and genetic processes, many loci have undesirable properties for phylogenetic reconstruction. These, if undetected, can result in erroneous or biased estimates, particularly when estimating species trees from concatenated datasets. To deal with these problems, we developed GET_PHYLOMARKERS, a pipeline designed to identify high-quality markers to estimate robust genome phylogenies from the orthologous clusters, or the pan-genome matrix (PGM), computed by GET_HOMOLOGUES. In the first context, a set of sequential filters are applied to exclude recombinant alignments and those producing anomalous or poorly resolved trees. Multiple sequence alignments and maximum likelihood (ML) phylogenies are computed in parallel on multi-core computers. A ML species tree is estimated from the concatenated set of top-ranking alignments at the DNA or protein levels, using either FastTree or IQ-TREE (IQT). The latter is used by default due to its superior performance revealed in an extensive benchmark analysis. In addition, parsimony and ML phylogenies can be estimated from the PGM. We demonstrate the practical utility of the software by analyzing 170 Stenotrophomonas genome sequences available in RefSeq and 10 new complete genomes of Mexican environmental S. maltophilia complex (Smc) isolates reported herein. A combination of core-genome and PGM analyses was used to revise the molecular systematics of the genus. An unsupervised learning approach that uses a goodness of clustering statistic identified 20 groups within the Smc at a core-genome average nucleotide identity (cgANIb) of 95.9% that are perfectly consistent with strongly supported clades on the core- and pan-genome trees. In addition, we identified 16 misclassified RefSeq genome sequences, 14 of them labeled as S. maltophilia, demonstrating the broad utility of the software for phylogenomics and geno-taxonomic studies. The code, a detailed manual and tutorials are freely available for Linux/UNIX servers under the GNU GPLv3 license at https://github.com/vinuesa/get_phylomarkers. A docker image bundling GET_PHYLOMARKERS with GET_HOMOLOGUES is available at https://hub.docker.com/r/csicunam/get_homologues/, which can be easily run on any platform.
Collapse
Affiliation(s)
- Pablo Vinuesa
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Luz E Ochoa-Sánchez
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Bruno Contreras-Moreira
- Estación Experimental de Aula Dei - Consejo Superior de Investigaciones Científicas, Zaragoza, Spain.,Fundación Agencia Aragonesa para la Investigacion y el Desarrollo (ARAID), Zaragoza, Spain
| |
Collapse
|
97
|
Dupuis JR, Bremer FT, Kauwe A, San Jose M, Leblanc L, Rubinoff D, Geib SM. HiMAP: Robust phylogenomics from highly multiplexed amplicon sequencing. Mol Ecol Resour 2018. [PMID: 29633537 DOI: 10.1101/213454] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
High-throughput sequencing has fundamentally changed how molecular phylogenetic data sets are assembled, and phylogenomic data sets commonly contain 50- to 100-fold more loci than those generated using traditional Sanger sequencing-based approaches. Here, we demonstrate a new approach for building phylogenomic data sets using single-tube, highly multiplexed amplicon sequencing, which we name HiMAP (highly multiplexed amplicon-based phylogenomics) and present bioinformatic pipelines for locus selection based on genomic and transcriptomic data resources and postsequencing consensus calling and alignment. This method is inexpensive and amenable to sequencing a large number (hundreds) of taxa simultaneously and requires minimal hands-on time at the bench (<1/2 day), and data analysis can be accomplished without the need for read mapping or assembly. We demonstrate this approach by sequencing 878 amplicons in single reactions for 82 species of tephritid fruit flies across seven genera (384 individuals), including some of the most economically important agricultural insect pests. The resulting filtered data set (>150,000-bp concatenated alignment, ~20% missing character sites across all individuals and amplicons) contained >40,000 phylogenetically informative characters, and although some discordance was observed between analyses, it provided unparalleled resolution of many phylogenetic relationships in this group. Most notably, we found high support for the generic status of Zeugodacus and the sister relationship between Dacus and Zeugodacus. We discuss HiMAP, with regard to its molecular and bioinformatic strengths, and the insight the resulting data set provides into relationships of this diverse insect group.
Collapse
Affiliation(s)
- Julian R Dupuis
- U.S. Department of Agriculture-Agricultural Research Service, Daniel K. Inouye U.S. Pacific Basin Agricultural Research Center, Hilo, Hawaii
- Department of Plant and Environmental Protection Services, University of Hawaii at Manoa, Honolulu, Hawaii
| | - Forest T Bremer
- U.S. Department of Agriculture-Agricultural Research Service, Daniel K. Inouye U.S. Pacific Basin Agricultural Research Center, Hilo, Hawaii
- Department of Plant and Environmental Protection Services, University of Hawaii at Manoa, Honolulu, Hawaii
| | - Angela Kauwe
- U.S. Department of Agriculture-Agricultural Research Service, Daniel K. Inouye U.S. Pacific Basin Agricultural Research Center, Hilo, Hawaii
| | - Michael San Jose
- Department of Plant and Environmental Protection Services, University of Hawaii at Manoa, Honolulu, Hawaii
| | - Luc Leblanc
- Department of Entomology, Plant Pathology and Nematology, University of Idaho, Moscow, Idaho
| | - Daniel Rubinoff
- Department of Plant and Environmental Protection Services, University of Hawaii at Manoa, Honolulu, Hawaii
| | - Scott M Geib
- U.S. Department of Agriculture-Agricultural Research Service, Daniel K. Inouye U.S. Pacific Basin Agricultural Research Center, Hilo, Hawaii
| |
Collapse
|
98
|
Dupuis JR, Bremer FT, Kauwe A, San Jose M, Leblanc L, Rubinoff D, Geib SM. HiMAP: Robust phylogenomics from highly multiplexed amplicon sequencing. Mol Ecol Resour 2018; 18:1000-1019. [PMID: 29633537 DOI: 10.1111/1755-0998.12783] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Revised: 03/07/2018] [Accepted: 03/19/2018] [Indexed: 01/22/2023]
Abstract
High-throughput sequencing has fundamentally changed how molecular phylogenetic data sets are assembled, and phylogenomic data sets commonly contain 50- to 100-fold more loci than those generated using traditional Sanger sequencing-based approaches. Here, we demonstrate a new approach for building phylogenomic data sets using single-tube, highly multiplexed amplicon sequencing, which we name HiMAP (highly multiplexed amplicon-based phylogenomics) and present bioinformatic pipelines for locus selection based on genomic and transcriptomic data resources and postsequencing consensus calling and alignment. This method is inexpensive and amenable to sequencing a large number (hundreds) of taxa simultaneously and requires minimal hands-on time at the bench (<1/2 day), and data analysis can be accomplished without the need for read mapping or assembly. We demonstrate this approach by sequencing 878 amplicons in single reactions for 82 species of tephritid fruit flies across seven genera (384 individuals), including some of the most economically important agricultural insect pests. The resulting filtered data set (>150,000-bp concatenated alignment, ~20% missing character sites across all individuals and amplicons) contained >40,000 phylogenetically informative characters, and although some discordance was observed between analyses, it provided unparalleled resolution of many phylogenetic relationships in this group. Most notably, we found high support for the generic status of Zeugodacus and the sister relationship between Dacus and Zeugodacus. We discuss HiMAP, with regard to its molecular and bioinformatic strengths, and the insight the resulting data set provides into relationships of this diverse insect group.
Collapse
Affiliation(s)
- Julian R Dupuis
- U.S. Department of Agriculture-Agricultural Research Service, Daniel K. Inouye U.S. Pacific Basin Agricultural Research Center, Hilo, Hawaii
- Department of Plant and Environmental Protection Services, University of Hawaii at Manoa, Honolulu, Hawaii
| | - Forest T Bremer
- U.S. Department of Agriculture-Agricultural Research Service, Daniel K. Inouye U.S. Pacific Basin Agricultural Research Center, Hilo, Hawaii
- Department of Plant and Environmental Protection Services, University of Hawaii at Manoa, Honolulu, Hawaii
| | - Angela Kauwe
- U.S. Department of Agriculture-Agricultural Research Service, Daniel K. Inouye U.S. Pacific Basin Agricultural Research Center, Hilo, Hawaii
| | - Michael San Jose
- Department of Plant and Environmental Protection Services, University of Hawaii at Manoa, Honolulu, Hawaii
| | - Luc Leblanc
- Department of Entomology, Plant Pathology and Nematology, University of Idaho, Moscow, Idaho
| | - Daniel Rubinoff
- Department of Plant and Environmental Protection Services, University of Hawaii at Manoa, Honolulu, Hawaii
| | - Scott M Geib
- U.S. Department of Agriculture-Agricultural Research Service, Daniel K. Inouye U.S. Pacific Basin Agricultural Research Center, Hilo, Hawaii
| |
Collapse
|
99
|
Lemoine F, Domelevo Entfellner JB, Wilkinson E, Correia D, Dávila Felipe M, De Oliveira T, Gascuel O. Renewing Felsenstein's phylogenetic bootstrap in the era of big data. Nature 2018; 556:452-456. [PMID: 29670290 PMCID: PMC6030568 DOI: 10.1038/s41586-018-0043-0] [Citation(s) in RCA: 349] [Impact Index Per Article: 58.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 03/01/2018] [Indexed: 12/29/2022]
Abstract
Felsenstein's application of the bootstrap method to evolutionary trees is one of the most cited scientific papers of all time. The bootstrap method, which is based on resampling and replications, is used extensively to assess the robustness of phylogenetic inferences. However, increasing numbers of sequences are now available for a wide variety of species, and phylogenies based on hundreds or thousands of taxa are becoming routine. With phylogenies of this size Felsenstein's bootstrap tends to yield very low supports, especially on deep branches. Here we propose a new version of the phylogenetic bootstrap in which the presence of inferred branches in replications is measured using a gradual 'transfer' distance rather than the binary presence or absence index used in Felsenstein's original version. The resulting supports are higher and do not induce falsely supported branches. The application of our method to large mammal, HIV and simulated datasets reveals their phylogenetic signals, whereas Felsenstein's bootstrap fails to do so.
Collapse
Affiliation(s)
- F Lemoine
- Unité Bioinformatique Evolutive, C3BI USR 3756, Institut Pasteur & CNRS, Paris, France
- Hub Bioinformatique et Biostatistique, C3BI USR 3756, Institut Pasteur & CNRS, Paris, France
| | - J-B Domelevo Entfellner
- Department of Computer Science, University of the Western Cape, Cape Town, South Africa
- South African MRC Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Cape Town, South Africa
| | - E Wilkinson
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine and Medical Sciences, College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - D Correia
- Unité Bioinformatique Evolutive, C3BI USR 3756, Institut Pasteur & CNRS, Paris, France
| | - M Dávila Felipe
- Unité Bioinformatique Evolutive, C3BI USR 3756, Institut Pasteur & CNRS, Paris, France
| | - T De Oliveira
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine and Medical Sciences, College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
- Centre for the AIDS Programme of Research in South Africa (CAPRISA), University of KwaZulu-Natal, Durban, South Africa
| | - O Gascuel
- Unité Bioinformatique Evolutive, C3BI USR 3756, Institut Pasteur & CNRS, Paris, France.
- Méthodes et Algorithmes pour la Bioinformatique, LIRMM UMR 5506, Université de Montpellier & CNRS, Montpellier, France.
| |
Collapse
|
100
|
Barley AJ, Brown JM, Thomson RC. Impact of Model Violations on the Inference of Species Boundaries Under the Multispecies Coalescent. Syst Biol 2018; 67:269-284. [PMID: 28945903 DOI: 10.1093/sysbio/syx073] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2017] [Accepted: 08/31/2017] [Indexed: 11/14/2022] Open
Abstract
The use of genetic data for identifying species-level lineages across the tree of life has received increasing attention in the field of systematics over the past decade. The multispecies coalescent model provides a framework for understanding the process of lineage divergence and has become widely adopted for delimiting species. However, because these studies lack an explicit assessment of model fit, in many cases, the accuracy of the inferred species boundaries are unknown. This is concerning given the large amount of empirical data and theory that highlight the complexity of the speciation process. Here, we seek to fill this gap by using simulation to characterize the sensitivity of inference under the multispecies coalescent (MSC) to several violations of model assumptions thought to be common in empirical data. We also assess the fit of the MSC model to empirical data in the context of species delimitation. Our results show substantial variation in model fit across data sets. Posterior predictive tests find the poorest model performance in data sets that were hypothesized to be impacted by model violations. We also show that while the inferences assuming the MSC are robust to minor model violations, such inferences can be biased under some biologically plausible scenarios. Taken together, these results suggest that researchers can identify individual data sets in which species delimitation under the MSC is likely to be problematic, thereby highlighting the cases where additional lines of evidence to identify species boundaries are particularly important to collect. Our study supports a growing body of work highlighting the importance of model checking in phylogenetics, and the usefulness of tailoring tests of model fit to assess the reliability of particular inferences. [Populations structure, gene flow, demographic changes, posterior prediction, simulation, genetics.].
Collapse
Affiliation(s)
- Anthony J Barley
- Department of Biology, University of Hawai'i, 2538 McCarthy Mall, Edmondson Hall 216, Honolulu, HI 96822, USA
| | - Jeremy M Brown
- Department of Biological Sciences and Museum of Natural Science, Louisiana State University, 202 Life Sciences Building, Baton Rouge, LA 70803, USA
| | - Robert C Thomson
- Department of Biology, University of Hawai'i, 2538 McCarthy Mall, Edmondson Hall 216, Honolulu, HI 96822, USA
| |
Collapse
|