1
|
Wong EB, Kamaruddin N, Mokhtar M, Yusof N, Khairuddin RFR. Assessing sequence heterogeneity in Chlorellaceae DNA barcode markers for phylogenetic inference. J Genet Eng Biotechnol 2023; 21:104. [PMID: 37851281 PMCID: PMC10584744 DOI: 10.1186/s43141-023-00550-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 09/20/2023] [Indexed: 10/19/2023]
Abstract
Phylogenetic inference is an important approach that allows the recovery of the evolutionary history and the origin of the Chlorellaceae species. Despite the species' potential for biofuel feedstock production, their high phenotypic plasticity and similar morphological structures among the species have muddled the taxonomy and identification of the Chlorellaceae species. This study aimed to decipher Chlorellaceae DNA barcode marker heterogeneity by examining the sequence divergence and genomic properties of 18S rRNA, ITS (ITS1-5.8S rRNA-ITS2-28S rRNA), and rbcL from 655 orthologous sequences of 64 species across 31 genera in the Chlorellaceae family. The study assessed the distinct evolutionary properties of the DNA markers that may have caused the discordance between individual trees in the phylogenetic inference using the Robinson-Foulds distance and the Shimodaira-Hasegawa test. Our findings suggest that using the supermatrix approach improves the congruency between trees by reducing stochastic error and increasing the confidence of the inferred Chlorellaceae phylogenetic tree. This study also found that the phylogenies inferred through the supermatrix approach might not always be well supported by all markers. The study highlights that assessing sequence heterogeneity prior to the phylogenetic inference could allow the approach to accommodate sequence evolutionary properties and support species identification from the most congruent phylogeny, which can better represent the evolution of Chlorellaceae species.
Collapse
Affiliation(s)
- Ee Bhei Wong
- Department of Biology, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900, Tanjong Malim, Perak, Malaysia
| | - Nurhaida Kamaruddin
- Department of Biology, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900, Tanjong Malim, Perak, Malaysia
| | - Marina Mokhtar
- Department of Biology, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900, Tanjong Malim, Perak, Malaysia
| | - Norjan Yusof
- Department of Biology, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900, Tanjong Malim, Perak, Malaysia
| | - Raja Farhana R Khairuddin
- Department of Biology, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900, Tanjong Malim, Perak, Malaysia.
- Centre of Research for Computational Sciences and Informatics for Biology, Bioindustry, Environment, Agriculture, and Healthcare (CRYSTAL), Universiti Malaya, Kuala Lumpur, Malaysia.
| |
Collapse
|
2
|
Susko E. Complex statistical modelling for phylogenetic inference. CAN J STAT 2022. [DOI: 10.1002/cjs.11741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Edward Susko
- Department of Mathematics and Statistics Dalhousie University Halifax Nova Scotia Canada B3H 3J5
| |
Collapse
|
3
|
Demirtaş S, Budak M, Korkmaz EM, Searle JB, Bilton DT, Gündüz İ. The complete mitochondrial genome of Talpa martinorum (Mammalia: Talpidae), a mole species endemic to Thrace: genome content and phylogenetic considerations. Genetica 2022; 150:317-325. [PMID: 36029420 DOI: 10.1007/s10709-022-00162-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 08/04/2022] [Indexed: 11/04/2022]
Abstract
The complete mitogenome sequence of Talpa martinorum, a recently described Balkan endemic mole, was assembled from next generation sequence data. The mitogenome is similar to that of the three other Talpa species sequenced to date, being 16,835 bp in length, and containing 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, an origin of L-strand replication, and a control region or D-loop. Compared to other Talpa mitogenomes sequenced to date, that of T. martinorum differs in the length of D-loop and stop codon usage. TAG and T-- are the stop codons for the ND1 and ATP8 genes, respectively, in T. martinorum, whilst TAA acts as a stop codon for both ND1 and ATP8 in the other three Talpa species sequenced. Phylogeny reconstructions based on Maximum Likelihood and Bayesian inference analyses yielded phylogenies with similar topologies, demonstrating that T. martinorum nests within the western lineage of the genus, being closely related to T. aquitania and T. occidentalis.
Collapse
Affiliation(s)
- Sadık Demirtaş
- Department of Biology, Faculty of Arts and Sciences, Ondokuz Mayis University, Samsun, Turkey
| | - Mahir Budak
- Department of Molecular Biology and Genetics, Faculty of Science, Sivas Cumhuriyet University, Sivas, Turkey
| | - Ertan M Korkmaz
- Department of Molecular Biology and Genetics, Faculty of Science, Sivas Cumhuriyet University, Sivas, Turkey
| | - Jeremy B Searle
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, 14853-2701, USA
| | - David T Bilton
- School of Biological and Marine Sciences, University of Plymouth, Plymouth, Devon, PL4 8AA, UK.,Department of Zoology, University of Johannesburg, Auckland Park, PO Box 524, Johannesburg, 2006, South Africa
| | - İslam Gündüz
- Department of Biology, Faculty of Arts and Sciences, Ondokuz Mayis University, Samsun, Turkey.
| |
Collapse
|
4
|
Lucio J, Gonzalez-Jimenez I, Rivero-Menendez O, Alastruey-Izquierdo A, Pelaez T, Alcazar-Fuoli L, Mellado E. Point Mutations in the 14-α Sterol Demethylase Cyp51A or Cyp51C Could Contribute to Azole Resistance in Aspergillus flavus. Genes (Basel) 2020; 11:genes11101217. [PMID: 33080784 PMCID: PMC7602989 DOI: 10.3390/genes11101217] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Revised: 10/08/2020] [Accepted: 10/13/2020] [Indexed: 12/26/2022] Open
Abstract
Infections caused by Aspergillus species are being increasingly reported. Aspergillus flavus is the second most common species within this genus causing invasive infections in humans, and isolates showing azole resistance have been recently described. A. flavus has three cyp51-related genes (cyp51A, cyp51B, and cyp51C) encoding 14-α sterol demethylase-like enzymes which are the target of azole drugs. In order to study triazole drug resistance in A. flavus, three strains showing reduced azole susceptibility and 17 azole susceptible isolates were compared. The three cyp51-related genes were amplified and sequenced. A comparison of the deduced Cyp51A, Cyp51B, and Cyp51C protein sequences with other protein sequences from orthologous genes in different filamentous fungi led to a protein identity that ranged from 50% to 80%. Cyp51A and Cyp51C presented several synonymous and non-synonymous point mutations among both susceptible and non-susceptible strains. However, two amino acid mutations were present only in two resistant isolates: one strain harbored a P214L substitution in Cyp51A, and another a H349R in Cyp51C that also showed an increase of cyp51A and cyp51C gene expression compared to the susceptible strain ATCC2004304. Isolates that showed reduced in vitro susceptibility to clinical azoles exhibited a different susceptibility profile to demethylation inhibitors (DMIs). Although P214L substitution might contribute to azole resistance, the role of H349R substitution together with changes in gene expression remains unclear.
Collapse
Affiliation(s)
- Jose Lucio
- Mycology Reference Laboratory, National Centre for Microbiology, Instituto de Salud Carlos III (ISCIII), Majadahonda, 28220 Madrid, Spain; (J.L.); (I.G.-J.); (O.R.-M.); (A.A.-I.); (L.A.-F.)
| | - Irene Gonzalez-Jimenez
- Mycology Reference Laboratory, National Centre for Microbiology, Instituto de Salud Carlos III (ISCIII), Majadahonda, 28220 Madrid, Spain; (J.L.); (I.G.-J.); (O.R.-M.); (A.A.-I.); (L.A.-F.)
| | - Olga Rivero-Menendez
- Mycology Reference Laboratory, National Centre for Microbiology, Instituto de Salud Carlos III (ISCIII), Majadahonda, 28220 Madrid, Spain; (J.L.); (I.G.-J.); (O.R.-M.); (A.A.-I.); (L.A.-F.)
| | - Ana Alastruey-Izquierdo
- Mycology Reference Laboratory, National Centre for Microbiology, Instituto de Salud Carlos III (ISCIII), Majadahonda, 28220 Madrid, Spain; (J.L.); (I.G.-J.); (O.R.-M.); (A.A.-I.); (L.A.-F.)
- Spanish Network for Research in Infectious Diseases (REIPI RD16/CIII/0004/0003), ISCIII, Majadahonda, 28220 Madrid, Spain
| | - Teresa Pelaez
- Hospital Universitario Central de Asturias, Fundación para la Investigación Biosanitaria del Principado de Asturias (FINBA), Oviedo, 33011 Asturias, Spain;
| | - Laura Alcazar-Fuoli
- Mycology Reference Laboratory, National Centre for Microbiology, Instituto de Salud Carlos III (ISCIII), Majadahonda, 28220 Madrid, Spain; (J.L.); (I.G.-J.); (O.R.-M.); (A.A.-I.); (L.A.-F.)
- Spanish Network for Research in Infectious Diseases (REIPI RD16/CIII/0004/0003), ISCIII, Majadahonda, 28220 Madrid, Spain
| | - Emilia Mellado
- Mycology Reference Laboratory, National Centre for Microbiology, Instituto de Salud Carlos III (ISCIII), Majadahonda, 28220 Madrid, Spain; (J.L.); (I.G.-J.); (O.R.-M.); (A.A.-I.); (L.A.-F.)
- Spanish Network for Research in Infectious Diseases (REIPI RD16/CIII/0004/0003), ISCIII, Majadahonda, 28220 Madrid, Spain
- Correspondence:
| |
Collapse
|
5
|
Wang HC, Susko E, Roger AJ. The Relative Importance of Modeling Site Pattern Heterogeneity Versus Partition-Wise Heterotachy in Phylogenomic Inference. Syst Biol 2020; 68:1003-1019. [PMID: 31140564 DOI: 10.1093/sysbio/syz021] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Revised: 02/04/2019] [Accepted: 04/09/2019] [Indexed: 12/18/2022] Open
Abstract
Large taxa-rich genome-scale data sets are often necessary for resolving ancient phylogenetic relationships. But accurate phylogenetic inference requires that they are analyzed with realistic models that account for the heterogeneity in substitution patterns amongst the sites, genes and lineages. Two kinds of adjustments are frequently used: models that account for heterogeneity in amino acid frequencies at sites in proteins, and partitioned models that accommodate the heterogeneity in rates (branch lengths) among different proteins in different lineages (protein-wise heterotachy). Although partitioned and site-heterogeneous models are both widely used in isolation, their relative importance to the inference of correct phylogenies has not been carefully evaluated. We conducted several empirical analyses and a large set of simulations to compare the relative performances of partitioned models, site-heterogeneous models, and combined partitioned site heterogeneous models. In general, site-homogeneous models (partitioned or not) performed worse than site heterogeneous, except in simulations with extreme protein-wise heterotachy. Furthermore, simulations using empirically-derived realistic parameter settings showed a marked long-branch attraction (LBA) problem for analyses employing protein-wise partitioning even when the generating model included partitioning. This LBA problem results from a small sample bias compounded over many single protein alignments. In some cases, this problem was ameliorated by clustering similarly-evolving proteins together into larger partitions using the PartitionFinder method. Similar results were obtained under simulations with larger numbers of taxa or heterogeneity in simulating topologies over genes. For an empirical Microsporidia test data set, all but one tested site-heterogeneous models (with or without partitioning) obtain the correct Microsporidia+Fungi grouping, whereas site-homogenous models (with or without partitioning) did not. The single exception was the fully partitioned site-heterogeneous analysis that succumbed to the compounded small sample LBA bias. In general unless protein-wise heterotachy effects are extreme, it is more important to model site-heterogeneity than protein-wise heterotachy in phylogenomic analyses. Complete protein-wise partitioning should be avoided as it can lead to a serious LBA bias. In cases of extreme protein-wise heterotachy, approaches that cluster similarly-evolving proteins together and coupled with site-heterogeneous models work well for phylogenetic estimation.
Collapse
Affiliation(s)
- Huai-Chun Wang
- Department of Mathematics and Statistics, Dalhousie University, 6316 Coburg Road, Halifax, Nova Scotia B3H 4R2, Canada.,Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, 5850 College Street, Halifax, Nova Scotia B3H 4R2, Canada
| | - Edward Susko
- Department of Mathematics and Statistics, Dalhousie University, 6316 Coburg Road, Halifax, Nova Scotia B3H 4R2, Canada.,Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, 5850 College Street, Halifax, Nova Scotia B3H 4R2, Canada
| | - Andrew J Roger
- Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, 5850 College Street, Halifax, Nova Scotia B3H 4R2, Canada.,Department of Biochemistry and Molecular Biology, Dalhousie University, 5850 College Street, Halifax, Nova Scotia B3H 4R2, Canada
| |
Collapse
|
6
|
A Robust Phylogenomic Time Tree for Biotechnologically and Medically Important Fungi in the Genera Aspergillus and Penicillium. mBio 2019; 10:mBio.00925-19. [PMID: 31289177 PMCID: PMC6747717 DOI: 10.1128/mbio.00925-19] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Understanding the evolution of traits across technologically and medically significant fungi requires a robust phylogeny. Even though species in the Aspergillus and Penicillium genera (family Aspergillaceae, class Eurotiomycetes) are some of the most significant technologically and medically relevant fungi, we still lack a genome-scale phylogeny of the lineage or knowledge of the parts of the phylogeny that exhibit conflict among analyses. Here, we used a phylogenomic approach to infer evolutionary relationships among 81 genomes that span the diversity of Aspergillus and Penicillium species, to identify conflicts in the phylogeny, and to determine the likely underlying factors of the observed conflicts. Using a data matrix comprised of 1,668 genes, we found that while most branches of the phylogeny of the Aspergillaceae are robustly supported and recovered irrespective of method of analysis, a few exhibit various degrees of conflict among our analyses. Further examination of the observed conflict revealed that it largely stems from incomplete lineage sorting and hybridization or introgression. Our analyses provide a robust and comprehensive evolutionary genomic roadmap for this important lineage, which will facilitate the examination of the diverse technologically and medically relevant traits of these fungi in an evolutionary context. The filamentous fungal family Aspergillaceae contains >1,000 known species, mostly in the genera Aspergillus and Penicillium. Several species are used in the food, biotechnology, and drug industries (e.g., Aspergillus oryzae and Penicillium camemberti), while others are dangerous human and plant pathogens (e.g., Aspergillus fumigatus and Penicillium digitatum). To infer a robust phylogeny and pinpoint poorly resolved branches and their likely underlying contributors, we used 81 genomes spanning the diversity of Aspergillus and Penicillium to construct a 1,668-gene data matrix. Phylogenies of the nucleotide and amino acid versions of this full data matrix as well as of several additional data matrices were generated using three different maximum likelihood schemes (i.e., gene-partitioned, unpartitioned, and coalescence) and using both site-homogenous and site-heterogeneous models (total of 64 species-level phylogenies). Examination of the topological agreement among these phylogenies and measures of internode certainty identified 11/78 (14.1%) bipartitions that were incongruent and pinpointed the likely underlying contributing factors, which included incomplete lineage sorting, hidden paralogy, hybridization or introgression, and reconstruction artifacts associated with poor taxon sampling. Relaxed molecular clock analyses suggest that Aspergillaceae likely originated in the lower Cretaceous and that the Aspergillus and Penicillium genera originated in the upper Cretaceous. Our results shed light on the ongoing debate on Aspergillus systematics and taxonomy and provide a robust evolutionary and temporal framework for comparative genomic analyses in Aspergillaceae. More broadly, our approach provides a general template for phylogenomic identification of resolved and contentious branches in densely genome-sequenced lineages across the tree of life.
Collapse
|
7
|
Wang Y, Zhou X, Wang L, Liu X, Yang D, Rokas A. Gene Selection and Evolutionary Modeling Affect Phylogenomic Inference of Neuropterida Based on Transcriptome Data. Int J Mol Sci 2019; 20:E1072. [PMID: 30832228 PMCID: PMC6429444 DOI: 10.3390/ijms20051072] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2018] [Revised: 02/18/2019] [Accepted: 02/26/2019] [Indexed: 11/30/2022] Open
Abstract
Neuropterida is a super order of Holometabola that consists of the orders Megaloptera (dobsonflies, fishflies, and alderflies), Neuroptera (lacewings) and Raphidioptera (snakeflies). Several proposed higher-level relationships within Neuropterida, such as the relationships between the orders or between the families, have been extensively debated. To further understand the evolutionary history of Neuropterida, we conducted phylogenomic analyses of all 13 published transcriptomes of the neuropterid species, as well as of a new transcriptome of the fishfly species Ctenochauliodes similis of Liu and Yang, 2006 (Megaloptera: Corydalidae: Chauliodinae) that we sequenced. Our phylogenomic data matrix contained 1392 ortholog genes from 22 holometabolan species representing six families from Neuroptera, two families from Raphidioptera, and two families from Megaloptera as the ingroup taxa, and nine orders of Holometabola as outgroups. Phylogenetic reconstruction was performed using both concatenation and coalescent-based approaches under a site-homogeneous model as well as under a site-heterogeneous model. Surprisingly, analyses using the site-homogeneous model strongly supported a paraphyletic Neuroptera, with Coniopterygidae assigned as the sister group of all other Neuropterida. In contrast, analyses using the site-heterogeneous model recovered Neuroptera as monophyletic. The monophyly of Neuroptera was also recovered in concatenation and coalescent-based analyses using genes with stronger phylogenetic signals [i.e., higher average bootstrap support (ABS) values and higher relative tree certainty including all conflicting bipartitions (RTCA) values] under the site-homogeneous model. The present study illustrated how both data selection and model selection influence phylogenomic analyses of large-scale data matrices comprehensively.
Collapse
Affiliation(s)
- Yuyu Wang
- College of Plant Protection, Hebei Agricultural University, Baoding 071001, China.
- Department of Entomology, China Agricultural University, Beijing 100193, China.
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA.
| | - Xiaofan Zhou
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA.
- Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Centre, South China Agricultural University, Guangzhou 510642, China.
| | - Liming Wang
- College of Plant Protection, Hebei Agricultural University, Baoding 071001, China.
| | - Xingyue Liu
- Department of Entomology, China Agricultural University, Beijing 100193, China.
| | - Ding Yang
- Department of Entomology, China Agricultural University, Beijing 100193, China.
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA.
| |
Collapse
|
8
|
Shen XX, Hittinger CT, Rokas A. Contentious relationships in phylogenomic studies can be driven by a handful of genes. Nat Ecol Evol 2017; 1:126. [PMID: 28812701 PMCID: PMC5560076 DOI: 10.1038/s41559-017-0126] [Citation(s) in RCA: 256] [Impact Index Per Article: 36.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2016] [Accepted: 03/01/2017] [Indexed: 01/05/2023]
Abstract
Phylogenomic studies have resolved countless branches of the tree of life, but remain strongly contradictory on certain, contentious relationships. Here, we use a maximum likelihood framework to quantify the distribution of phylogenetic signal among genes and sites for 17 contentious branches and 6 well-established control branches in plant, animal and fungal phylogenomic data matrices. We find that resolution in some of these 17 branches rests on a single gene or a few sites, and that removal of a single gene in concatenation analyses or a single site from every gene in coalescence-based analyses diminishes support and can alter the inferred topology. These results suggest that tiny subsets of very large data matrices drive the resolution of specific internodes, providing a dissection of the distribution of support and observed incongruence in phylogenomic analyses. We submit that quantifying the distribution of phylogenetic signal in phylogenomic data is essential for evaluating whether branches, especially contentious ones, are truly resolved. Finally, we offer one detailed example of such an evaluation for the controversy regarding the earliest-branching metazoan phylum, for which examination of the distributions of gene-wise and site-wise phylogenetic signal across eight data matrices consistently supports ctenophores as the sister group to all other metazoans.
Collapse
Affiliation(s)
- Xing-Xing Shen
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA
| | - Chris Todd Hittinger
- Laboratory of Genetics, Genome Center of Wisconsin, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA
| |
Collapse
|
9
|
Nagy LG, Szöllősi G. Fungal Phylogeny in the Age of Genomics: Insights Into Phylogenetic Inference From Genome-Scale Datasets. ADVANCES IN GENETICS 2017; 100:49-72. [DOI: 10.1016/bs.adgen.2017.09.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
10
|
Shen XX, Salichos L, Rokas A. A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference. Genome Biol Evol 2016; 8:2565-80. [PMID: 27492233 PMCID: PMC5010910 DOI: 10.1093/gbe/evw179] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/25/2016] [Indexed: 12/13/2022] Open
Abstract
Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal and could be useful in guiding the choice of phylogenetic markers.
Collapse
Affiliation(s)
- Xing-Xing Shen
- Department of Biological Sciences, Vanderbilt University
| | - Leonidas Salichos
- Department of Biological Sciences, Vanderbilt University Department of Molecular Biophysics and Biochemistry, Yale University
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University
| |
Collapse
|
11
|
Abstract
Phylogenetic inference can potentially result in a more accurate tree using data from multiple loci. However, if the loci are incongruent-due to events such as incomplete lineage sorting or horizontal gene transfer-it can be misleading to infer a single tree. To address this, many previous contributions have taken a mechanistic approach, by modeling specific processes. Alternatively, one can cluster loci without assuming how these incongruencies might arise. Such "process-agnostic" approaches typically infer a tree for each locus and cluster these. There are, however, many possible combinations of tree distance and clustering methods; their comparative performance in the context of tree incongruence is largely unknown. Furthermore, because standard model selection criteria such as AIC cannot be applied to problems with a variable number of topologies, the issue of inferring the optimal number of clusters is poorly understood. Here, we perform a large-scale simulation study of phylogenetic distances and clustering methods to infer loci of common evolutionary history. We observe that the best-performing combinations are distances accounting for branch lengths followed by spectral clustering or Ward's method. We also introduce two statistical tests to infer the optimal number of clusters and show that they strongly outperform the silhouette criterion, a general-purpose heuristic. We illustrate the usefulness of the approach by 1) identifying errors in a previous phylogenetic analysis of yeast species and 2) identifying topological incongruence among newly sequenced loci of the globeflower fly genus Chiastocheta We release treeCl, a new program to cluster genes of common evolutionary history (http://git.io/treeCl).
Collapse
Affiliation(s)
- Kevin Gori
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Campus, Hinxton, United Kingdom
| | - Tomasz Suchan
- Department of Ecology and Evolution, Biophore Building, UNIL-Sorge, University of Lausanne, Lausanne, Switzerland
| | - Nadir Alvarez
- Department of Ecology and Evolution, Biophore Building, UNIL-Sorge, University of Lausanne, Lausanne, Switzerland
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Campus, Hinxton, United Kingdom
| | - Christophe Dessimoz
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Campus, Hinxton, United Kingdom Department of Ecology and Evolution, Biophore Building, UNIL-Sorge, University of Lausanne, Lausanne, Switzerland Department of Genetics, Evolution & Environment, University College London, London, United Kingdom Department of Computer Science, University College London, London, United Kingdom Centre for Integrative Genomics, University of Lausanne, Lausanne, Switzerland Swiss Institute of Bioinformatics, Biophore, Lausanne, Switzerland
| |
Collapse
|
12
|
Takezaki N, Nishihara H. Resolving the Phylogenetic Position of Coelacanth: The Closest Relative Is Not Always the Most Appropriate Outgroup. Genome Biol Evol 2016; 8:1208-21. [PMID: 27026053 PMCID: PMC4860700 DOI: 10.1093/gbe/evw071] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Determining the phylogenetic relationship of two extant lineages of lobe-finned fish, coelacanths and lungfishes, and tetrapods is important for understanding the origin of tetrapods. We analyzed data sets from two previous studies along with a newly collected data set, each of which had varying numbers of species and genes and varying extent of missing sites. We found that in all the data sets the sister relationship of lungfish and tetrapods was constructed with the use of cartilaginous fish as the outgroup with a high degree of statistical support. In contrast, when ray-finned fish were used as the outgroup, which is taxonomically an immediate outgroup of lobe-finned fish and tetrapods, the sister relationship of coelacanth and tetrapods was supported most strongly, although the statistical support was weaker. Even though it is generally accepted that the closest relative is an appropriate outgroup, our analysis suggested that the large divergence of the ray-finned fish as indicated by their long branch lengths and different amino acid frequencies made them less suitable as an outgroup than cartilaginous fish.
Collapse
Affiliation(s)
- Naoko Takezaki
- Life Science Research Center, Kagawa University, Mikicho, Kitagun, Kagawa, Japan
| | - Hidenori Nishihara
- Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, Nagatsuta-Cho, Midori-Ku, Yokohama, Kanagawa, Japan
| |
Collapse
|
13
|
Mengual-Chuliá B, Bedhomme S, Lafforgue G, Elena SF, Bravo IG. Assessing parallel gene histories in viral genomes. BMC Evol Biol 2016; 16:32. [PMID: 26847371 PMCID: PMC4743424 DOI: 10.1186/s12862-016-0605-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Accepted: 01/29/2016] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND The increasing abundance of sequence data has exacerbated a long known problem: gene trees and species trees for the same terminal taxa are often incongruent. Indeed, genes within a genome have not all followed the same evolutionary path due to events such as incomplete lineage sorting, horizontal gene transfer, gene duplication and deletion, or recombination. Considering conflicts between gene trees as an obstacle, numerous methods have been developed to deal with these incongruences and to reconstruct consensus evolutionary histories of species despite the heterogeneity in the history of their genes. However, inconsistencies can also be seen as a source of information about the specific evolutionary processes that have shaped genomes. RESULTS The goal of the approach here proposed is to exploit this conflicting information: we have compiled eleven variables describing phylogenetic relationships and evolutionary pressures and submitted them to dimensionality reduction techniques to identify genes with similar evolutionary histories. To illustrate the applicability of the method, we have chosen two viral datasets, namely papillomaviruses and Turnip mosaic virus (TuMV) isolates, largely dissimilar in genome, evolutionary distance and biology. Our method pinpoints viral genes with common evolutionary patterns. In the case of papillomaviruses, gene clusters match well our knowledge on viral biology and life cycle, illustrating the potential of our approach. For the less known TuMV, our results trigger new hypotheses about viral evolution and gene interaction. CONCLUSIONS The approach here presented allows turning phylogenetic inconsistencies into evolutionary information, detecting gene assemblies with similar histories, and could be a powerful tool for comparative pathogenomics.
Collapse
Affiliation(s)
- Beatriz Mengual-Chuliá
- Infections and Cancer Laboratory, Catalan Institute of Oncology (ICO), Barcelona, Spain.,Bellvitge Institute of Biomedical Research (IDIBELL), Barcelona, Spain
| | - Stéphanie Bedhomme
- Infections and Cancer Laboratory, Catalan Institute of Oncology (ICO), Barcelona, Spain.,Bellvitge Institute of Biomedical Research (IDIBELL), Barcelona, Spain.,Centre d'Ecologie Fonctionnelle et Evolutive, UMR CNRS 5175, Montpellier, France
| | - Guillaume Lafforgue
- Centre d'Ecologie Fonctionnelle et Evolutive, UMR CNRS 5175, Montpellier, France.,Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas-Universidad Politécnica de Valencia, València, Spain
| | - Santiago F Elena
- Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas-Universidad Politécnica de Valencia, València, Spain.,I2SysBio, Consejo Superior de Investigaciones Científicas-Universitat de València, València, Spain.,The Santa Fe Institute, Santa Fe, NM, USA
| | - Ignacio G Bravo
- Infections and Cancer Laboratory, Catalan Institute of Oncology (ICO), Barcelona, Spain. .,MIVEGEC (UMR CNRS 5290, IRD 224, UM), National Center for Scientific Research (CNRS), Montpellier, France. .,National Center for Scientific Research (CNRS), Maladies Infectieuses et Vecteurs: Ecologie, Génétique, Evolution et Contrôle (MIVEGEC), UMR CNRS 5290, IRD 224, UM, 911 Avenue Agropolis, BP 64501, 34394, Montpellier, Cedex 5, France.
| |
Collapse
|
14
|
Wang Y, Zhou X, Yang D, Rokas A. A Genome-Scale Investigation of Incongruence in Culicidae Mosquitoes. Genome Biol Evol 2015; 7:3463-71. [PMID: 26608059 PMCID: PMC4700963 DOI: 10.1093/gbe/evv235] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Comparison of individual gene trees in several recent phylogenomic studies from diverse lineages has revealed a surprising amount of topological conflict or incongruence, but we still know relatively little about its distribution across the tree of life. To further our understanding of incongruence, the factors that contribute to it and how it can be ameliorated, we examined its distribution in a clade of 20 Culicidae mosquito species through the reconstruction and analysis of the phylogenetic histories of 2,007 groups of orthologous genes. Levels of incongruence were generally low, the three exceptions being the internodes concerned with the branching of Anopheles christyi, with the branching of the subgenus Anopheles as well as the already reported incongruence within the Anopheles gambiae species complex. Two of these incongruence events (A. gambiae species complex and A. christyi) are likely due to biological factors, whereas the third (subgenus Anopheles) is likely due to analytical factors. Similar to previous studies, the use of genes or internodes with high bootstrap support or internode certainty values, both of which were positively correlated with gene alignment length, substantially reduced the observed incongruence. However, the clade support values of the internodes concerned with the branching of the subgenus Anopheles as well as within the A. gambiae species complex remained very low. Based on these results, we infer that the prevalence of incongruence in Culicidae mosquitoes is generally low, that it likely stems from both analytical and biological factors, and that it can be ameliorated through the selection of genes with strong phylogenetic signal. More generally, selection of genes with strong phylogenetic signal may be a general empirical solution for reducing incongruence and increasing the robustness of inference in phylogenomic studies.
Collapse
Affiliation(s)
- Yuyu Wang
- Department of Entomology, China Agricultural University, Beijing, China Department of Biological Sciences, Vanderbilt University
| | - Xiaofan Zhou
- Department of Biological Sciences, Vanderbilt University
| | - Ding Yang
- Department of Entomology, China Agricultural University, Beijing, China
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University
| |
Collapse
|
15
|
Doyle VP, Young RE, Naylor GJP, Brown JM. Can We Identify Genes with Increased Phylogenetic Reliability? Syst Biol 2015; 64:824-37. [DOI: 10.1093/sysbio/syv041] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Accepted: 06/09/2015] [Indexed: 12/19/2022] Open
|
16
|
Black M, Moolhuijzen P, Barrero R, La T, Phillips N, Hampson D, Herbst W, Barth S, Bellgard M. Analysis of Multiple Brachyspira hyodysenteriae Genomes Confirms That the Species Is Relatively Conserved but Has Potentially Important Strain Variation. PLoS One 2015; 10:e0131050. [PMID: 26098837 PMCID: PMC4476648 DOI: 10.1371/journal.pone.0131050] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Accepted: 05/28/2015] [Indexed: 12/19/2022] Open
Abstract
The intestinal spirochete Brachyspira hyodysenteriae is an important pathogen in swine, causing mucohemorrhagic colitis in a disease known as swine dysentery. Based on the detection of significant linkage disequilibrium in multilocus sequence data, the species is considered to be clonal. An analysis of the genome sequence of Western Australian B. hyodysenteriae strain WA1 has been published, and in the current study 19 further strains from countries around the world were sequenced with Illumina technology. The genomes were assembled and aligned to over 97.5% of the reference WA1 genome at a percentage sequence identity better than 80%. Strain regions not aligned to the reference ranged between 0.2 and 2.5%. Clustering of the strain genes found on average 2,354 (88%) core genes, 255 (8.6%) ancillary genes and 77 (2.9%) unique genes per strain. Depending on the strain the proportion of genes with 100% sequence identity to WA1 ranged from 85% to 20%. The result is a global comparative genomic analysis of B. hyodysenteriae genomes revealing potential differential phenotypic markers for numerous strains. Despite the differences found, the genomes were less varied than those of the related pathogenic species Brachyspira pilosicoli, and the analysis supports the clonal nature of the species. From this study, a public genome resource has been created that will serve as a repository for further genetic and phenotypic studies of these important porcine bacteria. This is the first intra-species B. hyodysenteriae comparative genomic analysis.
Collapse
Affiliation(s)
- Michael Black
- Centre for Comparative Genomics, Murdoch University, Murdoch, Western Australia, Australia
| | - Paula Moolhuijzen
- Centre for Comparative Genomics, Murdoch University, Murdoch, Western Australia, Australia
| | - Roberto Barrero
- Centre for Comparative Genomics, Murdoch University, Murdoch, Western Australia, Australia
| | - Tom La
- School of Veterinary and Life Sciences, Murdoch University, Murdoch, Western Australia, Australia
| | - Nyree Phillips
- School of Veterinary and Life Sciences, Murdoch University, Murdoch, Western Australia, Australia
| | - David Hampson
- School of Veterinary and Life Sciences, Murdoch University, Murdoch, Western Australia, Australia
| | - Werner Herbst
- Institute for Hygiene and Infectious Diseases of Animals, Justus-Liebig University Giessen, Giessen, Germany
| | - Stefanie Barth
- Institute for Hygiene and Infectious Diseases of Animals, Justus-Liebig University Giessen, Giessen, Germany
| | - Matthew Bellgard
- School of Veterinary and Life Sciences, Murdoch University, Murdoch, Western Australia, Australia
- * E-mail:
| |
Collapse
|
17
|
Liu L, Xi Z, Davis CC. Coalescent Methods Are Robust to the Simultaneous Effects of Long Branches and Incomplete Lineage Sorting. Mol Biol Evol 2014; 32:791-805. [DOI: 10.1093/molbev/msu331] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
18
|
Salichos L, Stamatakis A, Rokas A. Novel information theory-based measures for quantifying incongruence among phylogenetic trees. Mol Biol Evol 2014; 31:1261-71. [PMID: 24509691 DOI: 10.1093/molbev/msu061] [Citation(s) in RCA: 175] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Phylogenies inferred from different data matrices often conflict with each other necessitating the development of measures that quantify this incongruence. Here, we introduce novel measures that use information theory to quantify the degree of conflict or incongruence among all nontrivial bipartitions present in a set of trees. The first measure, internode certainty (IC), calculates the degree of certainty for a given internode by considering the frequency of the bipartition defined by the internode (internal branch) in a given set of trees jointly with that of the most prevalent conflicting bipartition in the same tree set. The second measure, IC All (ICA), calculates the degree of certainty for a given internode by considering the frequency of the bipartition defined by the internode in a given set of trees in conjunction with that of all conflicting bipartitions in the same underlying tree set. Finally, the tree certainty (TC) and TC All (TCA) measures are the sum of IC and ICA values across all internodes of a phylogeny, respectively. IC, ICA, TC, and TCA can be calculated from different types of data that contain nontrivial bipartitions, including from bootstrap replicate trees to gene trees or individual characters. Given a set of phylogenetic trees, the IC and ICA values of a given internode reflect its specific degree of incongruence, and the TC and TCA values describe the global degree of incongruence between trees in the set. All four measures are implemented and freely available in version 8.0.0 and subsequent versions of the widely used program RAxML.
Collapse
|
19
|
Koufopanou V, Swire J, Lomas S, Burt A. Primers for fourteen protein-coding genes and the deep phylogeny of the true yeasts. FEMS Yeast Res 2013; 13:574-84. [PMID: 23786589 PMCID: PMC3906836 DOI: 10.1111/1567-1364.12059] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2013] [Revised: 06/13/2013] [Accepted: 06/13/2013] [Indexed: 11/30/2022] Open
Abstract
The Saccharomycetales or 'true yeasts' consist of more than 800 described species, including many of scientific, medical and commercial importance. Considerable progress has been made in determining the phylogenetic relationships of these species, largely based on rDNA sequences, but many nodes for early-diverging lineages cannot be resolved with rDNA alone. rDNA is also not ideal for delineating recently diverged species. From published full-genome sequence data, we have identified 14 regions of protein-coding genes that can be PCR-amplified in a large proportion of a diverse collection of 25 yeast species using degenerate primers. Phylogenetic analysis of the sequences thus obtained reveals a well-resolved phylogeny of the Saccharomycetales with many branches having high bootstrap support. Analysis of published sequences from the Saccharomyces paradoxus species complex shows that these protein-coding gene fragments are also informative about genealogical relationships amongst closely related strains. Our set of protein-coding gene fragments is therefore suitable for analysing both ancient and recent evolutionary relationships amongst yeasts.
Collapse
|
20
|
Salichos L, Rokas A. Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 2013; 497:327-31. [DOI: 10.1038/nature12130] [Citation(s) in RCA: 466] [Impact Index Per Article: 42.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2012] [Accepted: 03/28/2013] [Indexed: 11/09/2022]
|
21
|
Dunn KA, Jiang W, Field C, Bielawski JP. Improving evolutionary models for mitochondrial protein data with site-class specific amino acid exchangeability matrices. PLoS One 2013; 8:e55816. [PMID: 23383286 PMCID: PMC3561347 DOI: 10.1371/journal.pone.0055816] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2012] [Accepted: 01/02/2013] [Indexed: 11/24/2022] Open
Abstract
Adequate modeling of mitochondrial sequence evolution is an essential component of mitochondrial phylogenomics (comparative mitogenomics). There is wide recognition within the field that lineage-specific aspects of mitochondrial evolution should be accommodated through lineage-specific amino-acid exchangeability matrices (e.g., mtMam for mammalian data). However, such a matrix must be applied to all sites and this implies that all sites are subject to the same, or largely similar, evolutionary constraints. This assumption is unjustified. Indeed, substantial differences are expected to arise from three-dimensional structures that impose different physiochemical environments on individual amino acid residues. The objectives of this paper are (1) to investigate the extent to which amino acid evolution varies among sites of mitochondrial proteins, and (2) to assess the potential benefits of explicitly modeling such variability. To achieve this, we developed a novel method for partitioning sites based on amino acid physiochemical properties. We apply this method to two datasets derived from complete mitochondrial genomes of mammals and fish, and use maximum likelihood to estimate amino acid exchangeabilities for the different groups of sites. Using this approach we identified large groups of sites evolving under unique physiochemical constraints. Estimates of amino acid exchangeabilities differed significantly among such groups. Moreover, we found that joint estimates of amino acid exchangeabilities do not adequately represent the natural variability in evolutionary processes among sites of mitochondrial proteins. Significant improvements in likelihood are obtained when the new matrices are employed. We also find that maximum likelihood estimates of branch lengths can be strongly impacted. We provide sets of matrices suitable for groups of sites subject to similar physiochemical constraints, and discuss how they might be used to analyze real data. We also discuss how the general approach might be employed to improve a variety of mitogenomic-based research activities.
Collapse
Affiliation(s)
- Katherine A Dunn
- Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada.
| | | | | | | |
Collapse
|