1
|
McKibben MTW, Finch G, Barker MS. Species-tree topology impacts the inference of ancient whole-genome duplications across the angiosperm phylogeny. AMERICAN JOURNAL OF BOTANY 2024; 111:e16378. [PMID: 39039654 DOI: 10.1002/ajb2.16378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 06/11/2024] [Accepted: 06/12/2024] [Indexed: 07/24/2024]
Abstract
PREMISE The history of angiosperms is marked by repeated rounds of ancient whole-genome duplications (WGDs). Here we used state-of-the-art methods to provide an up-to-date view of the distribution of WGDs in the history of angiosperms that considers both uncertainty introduced by different WGD inference methods and different underlying species-tree hypotheses. METHODS We used the distribution synonymous divergences (Ks) of paralogs and orthologs from transcriptomic and genomic data to infer and place WGDs across two hypothesized angiosperm phylogenies. We further tested these WGD hypotheses with syntenic inferences and Bayesian models of duplicate gene gain and loss. RESULTS The predicted number of WGDs in the history of angiosperms (~170) based on the current taxon sampling is largely similar across different inference methods, but varies in the precise placement of WGDs on the phylogeny. Ks-based methods often yield alternative hypothesized WGD placements due to variation in substitution rates among lineages. Phylogenetic models of duplicate gene gain and loss are more robust to topological variation. However, errors in species-tree inference can still produce spurious WGD hypotheses, regardless of method used. CONCLUSIONS Here we showed that different WGD inference methods largely agree on an average of 3.5 WGD in the history of individual angiosperm species. However, the precise placement of WGDs on the phylogeny is subject to the WGD inference method and tree topology. As researchers continue to test hypotheses regarding the impacts ancient WGDs have on angiosperm evolution, it is important to consider the uncertainty of the phylogeny as well as WGD inference methods.
Collapse
Affiliation(s)
- Michael T W McKibben
- Department of Ecology & Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Geoffrey Finch
- Department of Ecology & Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Michael S Barker
- Department of Ecology & Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| |
Collapse
|
2
|
Wang M, Zhu M, Qian J, Yang Z, Shang F, Egan AN, Li P, Liu L. Phylogenomics of mulberries (Morus, Moraceae) inferred from plastomes and single copy nuclear genes. Mol Phylogenet Evol 2024; 197:108093. [PMID: 38740145 DOI: 10.1016/j.ympev.2024.108093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 04/30/2024] [Accepted: 05/06/2024] [Indexed: 05/16/2024]
Abstract
Mulberries (genus Morus), belonging to the order Rosales, family Moraceae, are important woody plants due to their economic values in sericulture, as well as for nutritional benefits and medicinal values. However, the taxonomy and phylogeny of Morus, especially for the Asian species, remains challenging due to its wide geographical distribution, morphological plasticity, and interspecific hybridization. To better understand the evolutionary history of Morus, we combined plastomes and a large-scale nuclear gene analyses to investigate their phylogenetic relationships. We assembled the plastomes and screened 211 single-copy nuclear genes from 13 Morus species and related taxa. The plastomes of Morus species were relatively conserved in terms of genome size, gene content, synteny, IR boundary and codon usage. Using nuclear data, our results elucidated identical topologies based on coalescent and concatenation methods. The genus Morus was supported as monophyletic, with M. notabilis as the first diverging lineage and the two North American Morus species, M. microphylla and M. rubra, as sister to the other Asian species. In the Asian Morus species, interspecific relationships were completely resolved. However, cyto-nuclear discordances and gene tree-species tree conflicts were detected in the phylogenies of Morus, with multiple evidences supporting hybridization/introgression as the main cause of discordances between nuclear and plastid phylogenies, while gene tree-species tree conflicts were mainly caused by ILS.
Collapse
Affiliation(s)
- Meizhen Wang
- College of Life Sciences, Henan Normal University, Xinxiang 453000, China; Systematic & Evolutionary Botany and Biodiversity Group, MOE Key Laboratory of Biosystems Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Mengmeng Zhu
- Laboratory of Plant Germplasm and Genetic Engineering, School of Life Sciences, Henan University, Kaifeng 475001, China
| | - Jiayi Qian
- College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China
| | - Zhaoping Yang
- College of Life Sciences and Technologies, Tarim University, Alar 843300, China
| | - Fude Shang
- Laboratory of Plant Germplasm and Genetic Engineering, School of Life Sciences, Henan University, Kaifeng 475001, China; College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China.
| | - Ashley N Egan
- Department of Biology, Utah Valley University, Orem, UT 84058, United States.
| | - Pan Li
- Systematic & Evolutionary Botany and Biodiversity Group, MOE Key Laboratory of Biosystems Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Luxian Liu
- College of Life Sciences, Henan Normal University, Xinxiang 453000, China; Laboratory of Plant Germplasm and Genetic Engineering, School of Life Sciences, Henan University, Kaifeng 475001, China.
| |
Collapse
|
3
|
Schutz K, Melie T, Smith SD, Quandt CA. Patterns recovered in phylogenomic analysis of Candida auris and close relatives implicate broad environmental flexibility in Candida/Clavispora clade yeasts. Microb Genom 2024; 10. [PMID: 38630608 DOI: 10.1099/mgen.0.001233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024] Open
Abstract
Fungal pathogens commonly originate from benign or non-pathogenic strains living in the natural environment. The recently emerged human pathogen, Candida auris, is one example of a fungus believed to have originated in the environment and recently transitioned into a clinical setting. To date, however, there is limited evidence about the origins of this species in the natural environment and when it began associating with humans. One approach to overcome this gap is to reconstruct phylogenetic relationships between (1) strains isolated from clinical and non-clinical environments and (2) between species known to cause disease in humans and benign environmental saprobes. C. auris belongs to the Candida/Clavispora clade, a diverse group of 45 yeast species including human pathogens and environmental saprobes. We present a phylogenomic analysis of the Candida/Clavispora clade aimed at understanding the ecological breadth and evolutionary relationships between an expanded sample of environmentally and clinically isolated yeasts. To build a robust framework for investigating these relationships, we developed a whole-genome sequence dataset of 108 isolates representing 18 species, including four newly sequenced species and 18 environmentally isolated strains. Our phylogeny, based on 619 orthologous genes, shows environmentally isolated species and strains interspersed with clinically isolated counterparts, suggesting that there have been many transitions between humans and the natural environment in this clade. Our findings highlight the breadth of environments these yeasts inhabit and imply that many clinically isolated yeasts in this clade could just as easily live outside the human body in diverse natural environments and vice versa.
Collapse
Affiliation(s)
- Kyle Schutz
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, USA
| | - Tina Melie
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, USA
| | - Stacey D Smith
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, USA
| | - C Alisha Quandt
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, USA
| |
Collapse
|
4
|
Morel B, Williams TA, Stamatakis A, Szöllősi GJ. AleRax: a tool for gene and species tree co-estimation and reconciliation under a probabilistic model of gene duplication, transfer, and loss. Bioinformatics 2024; 40:btae162. [PMID: 38514421 PMCID: PMC10990685 DOI: 10.1093/bioinformatics/btae162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 01/30/2024] [Accepted: 03/19/2024] [Indexed: 03/23/2024] Open
Abstract
MOTIVATION Genomes are a rich source of information on the pattern and process of evolution across biological scales. How best to make use of that information is an active area of research in phylogenetics. Ideally, phylogenetic methods should not only model substitutions along gene trees, which explain differences between homologous gene sequences, but also the processes that generate the gene trees themselves along a shared species tree. To conduct accurate inferences, one needs to account for uncertainty at both levels, that is, in gene trees estimated from inherently short sequences and in their diverse evolutionary histories along a shared species tree. RESULTS We present AleRax, a software that can infer reconciled gene trees together with a shared species tree using a simple, yet powerful, probabilistic model of gene duplication, transfer, and loss. A key feature of AleRax is its ability to account for uncertainty in the gene tree and its reconciliation by using an efficient approximation to calculate the joint phylogenetic-reconciliation likelihood and sample reconciled gene trees accordingly. Simulations and analyses of empirical data show that AleRax is one order of magnitude faster than competing gene tree inference tools while attaining the same accuracy. It is consistently more robust than species tree inference methods such as SpeciesRax and ASTRAL-Pro 2 under gene tree uncertainty. Finally, AleRax can process multiple gene families in parallel thereby allowing users to compare competing phylogenetic hypotheses and estimate model parameters, such as duplication, transfer, and loss probabilities for genome-scale datasets with hundreds of taxa. AVAILABILITY AND IMPLEMENTATION GNU GPL at https://github.com/BenoitMorel/AleRax and data are made available at https://cme.h-its.org/exelixis/material/alerax_data.tar.gz.
Collapse
Affiliation(s)
- Benoit Morel
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe 76131, Germany
| | - Tom A Williams
- School of Biological Sciences, University of Bristol, Bristol BS8 1TQ, United Kingdom
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe 76131, Germany
- Institute of Computer Science, Biodiversity Computing Group, Heraklion GR-70013, Greece
| | - Gergely J Szöllősi
- ELTE-MTA “Lendület”, Evolutionary Genomics Research Group, Budapest H-1117, Hungary
- Institute of Evolution, HUN-REN Centre for Ecological Research, Budapest H-1121, Hungary
- Model-Based Evolutionary Genomics Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa 904-0495, Japan
| |
Collapse
|
5
|
Patané JSL, Martins J, Setubal JC. A Guide to Phylogenomic Inference. Methods Mol Biol 2024; 2802:267-345. [PMID: 38819564 DOI: 10.1007/978-1-0716-3838-5_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Phylogenomics aims at reconstructing the evolutionary histories of organisms taking into account whole genomes or large fractions of genomes. Phylogenomics has significant applications in fields such as evolutionary biology, systematics, comparative genomics, and conservation genetics, providing valuable insights into the origins and relationships of species and contributing to our understanding of biological diversity and evolution. This chapter surveys phylogenetic concepts and methods aimed at both gene tree and species tree reconstruction while also addressing common pitfalls, providing references to relevant computer programs. A practical phylogenomic analysis example including bacterial genomes is presented at the end of the chapter.
Collapse
Affiliation(s)
- José S L Patané
- Laboratório de Genética e Cardiologia Molecular, Instituto do Coração/Heart Institute Hospital das Clínicas - Faculdade de Medicina da Universidade de São Paulo São Paulo, São Paulo, SP, Brazil
| | - Joaquim Martins
- Integrative Omics group, Biorenewables National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, SP, Brazil
| | - João Carlos Setubal
- Departmento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP, Brazil.
| |
Collapse
|
6
|
Hellmuth M, Stadler PF. The Theory of Gene Family Histories. Methods Mol Biol 2024; 2802:1-32. [PMID: 38819554 DOI: 10.1007/978-1-0716-3838-5_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Most genes are part of larger families of evolutionary-related genes. The history of gene families typically involves duplications and losses of genes as well as horizontal transfers into other organisms. The reconstruction of detailed gene family histories, i.e., the precise dating of evolutionary events relative to phylogenetic tree of the underlying species has remained a challenging topic despite their importance as a basis for detailed investigations into adaptation and functional evolution of individual members of the gene family. The identification of orthologs, moreover, is a particularly important subproblem of the more general setting considered here. In the last few years, an extensive body of mathematical results has appeared that tightly links orthology, a formal notion of best matches among genes, and horizontal gene transfer. The purpose of this chapter is to broadly outline some of the key mathematical insights and to discuss their implication for practical applications. In particular, we focus on tree-free methods, i.e., methods to infer orthology or horizontal gene transfer as well as gene trees, species trees, and reconciliations between them without using a priori knowledge of the underlying trees or statistical models for the inference of phylogenetic trees. Instead, the initial step aims to extract binary relations among genes.
Collapse
Affiliation(s)
- Marc Hellmuth
- Department of Mathematics, Faculty of Science, Stockholm University, Stockholm, Sweden
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, Leipzig University, Leipzig, Germany.
- Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
- Universidad Nacional de Colombia, Bogotá, Colombia.
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria.
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.
- Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
7
|
Qin HT, Mӧller M, Milne R, Luo YH, Zhu GF, Li DZ, Liu J, Gao LM. Multiple paternally inherited chloroplast capture events associated with Taxus speciation in the Hengduan Mountains. Mol Phylogenet Evol 2023; 189:107915. [PMID: 37666379 DOI: 10.1016/j.ympev.2023.107915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 06/16/2023] [Accepted: 09/01/2023] [Indexed: 09/06/2023]
Abstract
Mountainous regions provide a multitude of habitats and opportunities for complex speciation scenarios. Hybridization leading to chloroplast capture, which can be revealed by incongruent phylogenetic trees, is one possible outcome. Four allopatric Taxus lineages (three species and an undescribed lineage) from the Hengduan Mountains, southwest China, exhibit conflicting phylogenetic relationships between nuclear and chloroplast phylogenies. Here, we use multi-omic data at the population level to investigate their historical speciation processes. Population genomic analysis based on ddRAD-seq data revealed limited contemporary inter-specific gene flow involving only populations located close to another species. In a historical context, chloroplast and nuclear data (transcriptome) consistently showed conflicting phylogenetic relationships for T. florinii and the Emei type lineage. ILS and chloroplast recombination were excluded as possible causes, and transcriptome and ddRAD-seq data revealed an absence of the mosaic nuclear genomes that characterize hybrid origin scenarios. Therefore, T. florinii appears to have originated when a lineage of T. florinii captured the T. chinensis plastid type, whereas plastid introgression in the opposite direction generated the Emei Type. All four species have distinct ecological niche based on community investigations and ecological niche analyses. We propose that the origins of both species represent very rare examples of chloroplast capture events despite the paternal cpDNA inheritance of gymnosperms. Specifically, allopatrically and/or ecologically diverged parental species experienced a rare secondary contact, subsequent hybridization and reciprocal chloroplast capture, generating two new lineages, each of which acquired a unique ecological niche. These events might have been triggered by orogenic activities of the Hengduan Mountains and an intensification of the Asian monsoon in the late Miocene, and may represent a scenario more common in these mountains than presently known.
Collapse
Affiliation(s)
- Han-Tao Qin
- CAS Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China; Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Michael Mӧller
- Royal Botanic Garden Edinburgh, Edinburgh EH3 5LR, United Kingdom
| | - Richard Milne
- Institute of Molecular Plant Sciences, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JH, United Kingdom
| | - Ya-Huang Luo
- CAS Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China; Lijiang Forest Biodiversity National Observation and Research Station, Kunming Institute of Botany, Chinese Academy of Sciences, Lijiang 674100, Yunnan, China
| | - Guang-Fu Zhu
- CAS Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China; Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - De-Zhu Li
- CAS Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China; Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China; University of Chinese Academy of Sciences, Beijing 100049, China; Lijiang Forest Biodiversity National Observation and Research Station, Kunming Institute of Botany, Chinese Academy of Sciences, Lijiang 674100, Yunnan, China.
| | - Jie Liu
- CAS Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China; Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China.
| | - Lian-Ming Gao
- CAS Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China; Lijiang Forest Biodiversity National Observation and Research Station, Kunming Institute of Botany, Chinese Academy of Sciences, Lijiang 674100, Yunnan, China.
| |
Collapse
|
8
|
Tan X, Qi J, Liu Z, Fan P, Liu G, Zhang L, Shen Y, Li J, Roos C, Zhou X, Li M. Phylogenomics Reveals High Levels of Incomplete Lineage Sorting at the Ancestral Nodes of the Macaque Radiation. Mol Biol Evol 2023; 40:msad229. [PMID: 37823401 PMCID: PMC10638670 DOI: 10.1093/molbev/msad229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 09/06/2023] [Accepted: 10/08/2023] [Indexed: 10/13/2023] Open
Abstract
The genus Macaca includes 23 species assigned into 4 to 7 groups. It exhibits the largest geographic range and represents the most successful example of adaptive radiation of nonhuman primates. However, intrageneric phylogenetic relationships among species remain controversial and have not been resolved so far. In this study, we conducted a phylogenomic analysis on 16 newly generated and 8 published macaque genomes. We found strong evidence supporting the division of this genus into 7 species groups. Incomplete lineage sorting (ILS) was the primary factor contributing to the discordance observed among gene trees; however, we also found evidence of hybridization events, specifically between the ancestral arctoides/sinica and silenus/nigra lineages that resulted in the hybrid formation of the fascicularis/mulatta group. Combined with fossil data, our phylogenomic data were used to establish a scenario for macaque radiation. These findings provide insights into ILS and potential ancient introgression events that were involved in the radiation of macaques, which will lead to a better understanding of the rapid speciation occurring in nonhuman primates.
Collapse
Affiliation(s)
- Xinxin Tan
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- Geneplus-Beijing Institute, Beijing 102206, China
| | - Jiwei Qi
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Zhijin Liu
- College of Life Sciences, Capital Normal University, Beijing 100049, China
| | - Pengfei Fan
- School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, China
| | - Gaoming Liu
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Liye Zhang
- Primate Genetics Laboratory, German Primate Center, Leibniz Institute for Primate Research, Göttingen 37077, Germany
| | - Ying Shen
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Jing Li
- Key Laboratory of Bio-resources and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610064, China
| | - Christian Roos
- Primate Genetics Laboratory, German Primate Center, Leibniz Institute for Primate Research, Göttingen 37077, Germany
- Gene Bank of Primates, German Primate Center, Leibniz Institute for Primate Research, Göttingen 37077, Germany
| | - Xuming Zhou
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Ming Li
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
9
|
Bhalla D, van Noort V. Molecular Evolution of Aryl Hydrocarbon Receptor Signaling Pathway Genes. J Mol Evol 2023; 91:628-646. [PMID: 37392220 DOI: 10.1007/s00239-023-10124-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 06/13/2023] [Indexed: 07/03/2023]
Abstract
The Aryl hydrocarbon receptor is an ancient transcriptional factor originally discovered as a sensor of dioxin. In addition to its function as a receptor of environmental toxicants, it plays an important role in development. Although a significant amount of research has been carried out to understand the AHR signal transduction pathway and its involvement in species' susceptibility to environmental toxicants, none of them to date has comprehensively studied its evolutionary origins. Studying the evolutionary origins of molecules can inform ancestral relationships of genes. The vertebrate genome has been shaped by two rounds of whole-genome duplications (WGD) at the base of vertebrate evolution approximately 600 million years ago, followed by lineage-specific gene losses, which often complicate the assignment of orthology. It is crucial to understand the evolutionary origins of this transcription factor and its partners, to distinguish orthologs from ancient non-orthologous homologs. In this study, we have investigated the evolutionary origins of proteins involved in the AHR pathway. Our results provide evidence of gene loss and duplications, crucial for understanding the functional connectivity of humans and model species. Multiple studies have shown that 2R-ohnologs (genes and proteins that have survived from the 2R-WGD) are enriched in signaling components relevant to developmental disorders and cancer. Our findings provide a link between the AHR pathway's evolutionary trajectory and its potential mechanistic involvement in pathogenesis.
Collapse
Affiliation(s)
- Diksha Bhalla
- Centre of Microbial and Plant Genetics, Faculty of Bioscience Engineering, KU Leuven, Leuven, Belgium.
| | - Vera van Noort
- Centre of Microbial and Plant Genetics, Faculty of Bioscience Engineering, KU Leuven, Leuven, Belgium
- Institute of Biology Leiden, Leiden University, Leiden, The Netherlands
| |
Collapse
|
10
|
Langschied F, Leisegang MS, Brandes RP, Ebersberger I. ncOrtho: efficient and reliable identification of miRNA orthologs. Nucleic Acids Res 2023; 51:e71. [PMID: 37260093 PMCID: PMC10359484 DOI: 10.1093/nar/gkad467] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 05/04/2023] [Accepted: 05/30/2023] [Indexed: 06/02/2023] Open
Abstract
MicroRNAs (miRNAs) are post-transcriptional regulators that finetune gene expression via translational repression or degradation of their target mRNAs. Despite their functional relevance, frameworks for the scalable and accurate detection of miRNA orthologs are missing. Consequently, there is still no comprehensive picture of how miRNAs and their associated regulatory networks have evolved. Here we present ncOrtho, a synteny informed pipeline for the targeted search of miRNA orthologs in unannotated genome sequences. ncOrtho matches miRNA annotations from multi-tissue transcriptomes in precision, while scaling to the analysis of hundreds of custom-selected species. The presence-absence pattern of orthologs to 266 human miRNA families across 402 vertebrate species reveals four bursts of miRNA acquisition, of which the most recent event occurred in the last common ancestor of higher primates. miRNA families are rarely modified or lost, but notable exceptions for both events exist. miRNA co-ortholog numbers faithfully indicate lineage-specific whole genome duplications, and miRNAs are powerful markers for phylogenomic analyses. Their exceptionally low genetic diversity makes them suitable to resolve clades where the phylogenetic signal is blurred by incomplete lineage sorting of ancestral alleles. In summary, ncOrtho allows to routinely consider miRNAs in evolutionary analyses that were thus far reserved to protein-coding genes.
Collapse
Affiliation(s)
- Felix Langschied
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
| | - Matthias S Leisegang
- Institute for Cardiovascular Physiology, Goethe University, Frankfurt, Germany
- German Center of Cardiovascular Research (DZHK), Partner site RheinMain, Frankfurt, Germany
| | - Ralf P Brandes
- Institute for Cardiovascular Physiology, Goethe University, Frankfurt, Germany
- German Center of Cardiovascular Research (DZHK), Partner site RheinMain, Frankfurt, Germany
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
- Senckenberg Biodiversity and Climate Research Centre (S-BIK-F), Frankfurt am Main, Germany
- LOEWE Centre for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| |
Collapse
|
11
|
Auerbach AA, Becker JT, Moraes SN, Moghadasi SA, Duda JM, Salamango DJ, Harris RS. Ancestral APOBEC3B Nuclear Localization Is Maintained in Humans and Apes and Altered in Most Other Old World Primate Species. mSphere 2022; 7:e0045122. [PMID: 36374108 PMCID: PMC9769932 DOI: 10.1128/msphere.00451-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 10/24/2022] [Indexed: 11/16/2022] Open
Abstract
APOBEC3B is an innate immune effector enzyme capable of introducing mutations in viral genomes through DNA cytosine-to-uracil editing. Recent studies have shown that gamma-herpesviruses, such as Epstein-Barr virus (EBV), have evolved a potent APOBEC3B neutralization mechanism to protect lytic viral DNA replication intermediates in the nuclear compartment. APOBEC3B is additionally unique as the only human DNA deaminase family member that is constitutively nuclear. Nuclear localization has therefore been inferred to be essential for innate antiviral function. Here, we combine evolutionary, molecular, and cell biology approaches to address whether nuclear localization is a conserved feature of APOBEC3B in primates. Despite the relatively recent emergence of APOBEC3B approximately 30 to 40 million years ago (MYA) in Old World primates by genetic recombination (after the split from the New World monkey lineage 40 to 50 MYA), we find that the hallmark nuclear localization of APOBEC3B shows variability. For instance, although human and several nonhuman primate APOBEC3B enzymes are predominantly nuclear, rhesus macaque and other Old World primate APOBEC3B proteins are clearly cytoplasmic or cell wide. A series of human/rhesus macaque chimeras and mutants combined to map localization determinants to the N-terminal half of the protein with residues 15, 19, and 24 proving critical. Ancestral APOBEC3B reconstructed from present-day primate species also shows strong nuclear localization. Together, these results indicate that the ancestral nuclear localization of APOBEC3B is maintained in present-day human and ape proteins, but nuclear localization is not conserved in all Old World monkey species despite a need for antiviral functions in the nuclear compartment. IMPORTANCE APOBEC3 enzymes are single-stranded DNA cytosine-to-uracil deaminases with beneficial roles in antiviral immunity and detrimental roles in cancer mutagenesis. Regarding viral infection, all seven human APOBEC3 enzymes have overlapping roles in restricting virus types that require DNA for replication, including EBV, HIV, human papillomavirus (HPV), and human T-cell leukemia virus (HTLV). Regarding cancer, at least two APOBEC3 enzymes, APOBEC3B and APOBEC3A, are prominent sources of mutation capable of influencing clinical outcomes. Here, we combine evolutionary, molecular, and cell biology approaches to characterize primate APOBEC3B enzymes. We show that nuclear localization is an ancestral property of APOBEC3B that is maintained in present-day human and ape enzymes, but not conserved in other nonhuman primates. This partial mechanistic conservation indicates that APOBEC3B is important for limiting the replication of DNA-based viruses in the nuclear compartment. Understanding these pathogen-host interactions may contribute to the development of future antiviral and antitumor therapies.
Collapse
Affiliation(s)
- Ashley A Auerbach
- Department of Biochemistry and Structural Biology, University of Texas Health San Antonio, San Antonio, Texas, USA
- Institute for Molecular Virology, University of Minnesotagrid.17635.36 - Twin Cities, Minneapolis, Minnesota, USA
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesotagrid.17635.36 - Twin Cities, Minneapolis, Minnesota, USA
| | - Jordan T Becker
- Institute for Molecular Virology, University of Minnesotagrid.17635.36 - Twin Cities, Minneapolis, Minnesota, USA
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesotagrid.17635.36 - Twin Cities, Minneapolis, Minnesota, USA
- Department of Microbiology and Immunology, University of Minnesotagrid.17635.36 - Twin Cities, Minneapolis, Minnesota, USA
| | - Sofia N Moraes
- Institute for Molecular Virology, University of Minnesotagrid.17635.36 - Twin Cities, Minneapolis, Minnesota, USA
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesotagrid.17635.36 - Twin Cities, Minneapolis, Minnesota, USA
| | - Seyed Arad Moghadasi
- Institute for Molecular Virology, University of Minnesotagrid.17635.36 - Twin Cities, Minneapolis, Minnesota, USA
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesotagrid.17635.36 - Twin Cities, Minneapolis, Minnesota, USA
| | - Jolene M Duda
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesotagrid.17635.36 - Twin Cities, Minneapolis, Minnesota, USA
| | - Daniel J Salamango
- Institute for Molecular Virology, University of Minnesotagrid.17635.36 - Twin Cities, Minneapolis, Minnesota, USA
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesotagrid.17635.36 - Twin Cities, Minneapolis, Minnesota, USA
- Department of Microbiology and Immunology, Stony Brook University, Stony Brook, New York, USA
| | - Reuben S Harris
- Department of Biochemistry and Structural Biology, University of Texas Health San Antonio, San Antonio, Texas, USA
- Howard Hughes Medical Institute, University of Texas Health San Antonio, San Antonio, Texas, USA
| |
Collapse
|
12
|
McCormack A, Hoff P. The Stein effect for Fréchet means. Ann Stat 2022. [DOI: 10.1214/22-aos2245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Affiliation(s)
| | - Peter Hoff
- Department of Statistical Science, Duke University
| |
Collapse
|
13
|
Menet H, Daubin V, Tannier E. Phylogenetic reconciliation. PLoS Comput Biol 2022; 18:e1010621. [PMID: 36327227 PMCID: PMC9632901 DOI: 10.1371/journal.pcbi.1010621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Affiliation(s)
- Hugo Menet
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
| | - Vincent Daubin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
- * E-mail: (VD); (ET)
| | - Eric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
- Inria, centre de recherche de Lyon, Villeurbanne, France
- * E-mail: (VD); (ET)
| |
Collapse
|
14
|
Zhang C, Mirarab S. Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees. Mol Biol Evol 2022; 39:6750035. [PMID: 36201617 PMCID: PMC9750496 DOI: 10.1093/molbev/msac215] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 09/20/2022] [Accepted: 10/03/2022] [Indexed: 01/07/2023] Open
Abstract
Phylogenomic analyses routinely estimate species trees using methods that account for gene tree discordance. However, the most scalable species tree inference methods, which summarize independently inferred gene trees to obtain a species tree, are sensitive to hard-to-avoid errors introduced in the gene tree estimation step. This dilemma has created much debate on the merits of concatenation versus summary methods and practical obstacles to using summary methods more widely and to the exclusion of concatenation. The most successful attempt at making summary methods resilient to noisy gene trees has been contracting low support branches from the gene trees. Unfortunately, this approach requires arbitrary thresholds and poses new challenges. Here, we introduce threshold-free weighting schemes for the quartet-based species tree inference, the metric used in the popular method ASTRAL. By reducing the impact of quartets with low support or long terminal branches (or both), weighting provides stronger theoretical guarantees and better empirical performance than the unweighted ASTRAL. Our simulations show that weighting improves accuracy across many conditions and reduces the gap with concatenation in conditions with low gene tree discordance and high noise. On empirical data, weighting improves congruence with concatenation and increases support. Together, our results show that weighting, enabled by a new optimization algorithm we introduce, improves the utility of summary methods and can reduce the incongruence often observed across analytical pipelines.
Collapse
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology, UC San Diego, La Jolla, CA, USA
| | | |
Collapse
|
15
|
Borges R, Boussau B, Höhna S, Pereira RJ, Kosiol C. Polymorphism‐aware estimation of species trees and evolutionary forces from genomic sequences with
RevBayes. Methods Ecol Evol 2022. [DOI: 10.1111/2041-210x.13980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Rui Borges
- Institut für Populationsgenetik, Vetmeduni Vienna Wien Austria
| | - Bastien Boussau
- Université de Lyon, Université Claude Bernard Lyon 1 Villeurbanne France
| | - Sebastian Höhna
- GeoBio‐Center, Ludwig‐Maximilians‐Universität München Munich Germany
- Department of Earth and Environmental Sciences, Paleontology & Geobiology Ludwig‐Maximilians‐Universität München Munich Germany
| | - Ricardo J. Pereira
- Division of Evolutionary Biology, Department of Biology II Ludwig‐Maximilians‐Universität München Martinsried Germany
| | - Carolin Kosiol
- Institut für Populationsgenetik, Vetmeduni Vienna Wien Austria
- Centre for Biological Diversity University of St Andrews St Andrews UK
| |
Collapse
|
16
|
Mulvey LPA, Warnock RCM, De Baets K. Where traditional extinction estimates fall flat: using novel cophylogenetic methods to estimate extinction risk in platyhelminths. Proc Biol Sci 2022; 289:20220432. [PMID: 36043279 PMCID: PMC9428535 DOI: 10.1098/rspb.2022.0432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 08/08/2022] [Indexed: 11/12/2022] Open
Abstract
Today parasites comprise a huge proportion of living biodiversity and play a major role in shaping community structure. Given their ecological significance, parasite extinctions could result in massive cascading effects across ecosystems. It is therefore crucial that we have a way of estimating their extinction risk. Attempts to do this have often relied on information about host extinction risk, without explicitly incorporating information about the parasites. However, assuming an identical risk may be misleading. Here, we apply a novel metric to estimate the cophylogenetic extinction rate, Ec, of parasites with their hosts. This metric incorporates information about the evolutionary history of parasites and hosts that can be estimated using event-based cophylogenetic methods. To explore this metric, we investigated the use of different cophylogenetic methods to inform the Ec rate, based on the analysis of polystome parasites and their anuran hosts. We show using both parsimony- and model-based approaches that different methods can have a large effect on extinction risk estimation. Further, we demonstrate that model-based approaches offer greater potential to provide insights into cophylogenetic history and extinction risk.
Collapse
Affiliation(s)
- Laura P. A. Mulvey
- GeoZentrum Nordbayern, Department of Geography and Geosciences, Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen 91054, Germany
| | - Rachel C. M. Warnock
- GeoZentrum Nordbayern, Department of Geography and Geosciences, Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen 91054, Germany
| | - Kenneth De Baets
- Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research Centre, University of Warsaw, Warsaw 00-927 Warszawa, Poland
| |
Collapse
|
17
|
Gühmann M, Porter ML, Bok MJ. The Gluopsins: Opsins without the Retinal Binding Lysine. Cells 2022; 11:cells11152441. [PMID: 35954284 PMCID: PMC9368030 DOI: 10.3390/cells11152441] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 07/23/2022] [Accepted: 07/28/2022] [Indexed: 12/14/2022] Open
Abstract
Opsins allow us to see. They are G-protein-coupled receptors and bind as ligand retinal, which is bound covalently to a lysine in the seventh transmembrane domain. This makes opsins light-sensitive. The lysine is so conserved that it is used to define a sequence as an opsin and thus phylogenetic opsin reconstructions discard any sequence without it. However, recently, opsins were found that function not only as photoreceptors but also as chemoreceptors. For chemoreception, the lysine is not needed. Therefore, we wondered: Do opsins exists that have lost this lysine during evolution? To find such opsins, we built an automatic pipeline for reconstructing a large-scale opsin phylogeny. The pipeline compiles and aligns sequences from public sources, reconstructs the phylogeny, prunes rogue sequences, and visualizes the resulting tree. Our final opsin phylogeny is the largest to date with 4956 opsins. Among them is a clade of 33 opsins that have the lysine replaced by glutamic acid. Thus, we call them gluopsins. The gluopsins are mainly dragonfly and butterfly opsins, closely related to the RGR-opsins and the retinochromes. Like those, they have a derived NPxxY motif. However, what their particular function is, remains to be seen.
Collapse
Affiliation(s)
- Martin Gühmann
- School of Biological Sciences, University of Bristol, Bristol BS8 1TQ, UK
- Correspondence:
| | - Megan L. Porter
- Department of Biology, University of Hawai’i at Mānoa, Honolulu, HI 96822, USA
| | - Michael J. Bok
- Lund Vision Group, Department of Biology, University of Lund, 223 62 Lund, Sweden
| |
Collapse
|
18
|
Flouri T, Huang J, Jiao X, Kapli P, Rannala B, Yang Z. Bayesian phylogenetic inference using relaxed-clocks and the multispecies coalescent. Mol Biol Evol 2022; 39:6652437. [PMID: 35907248 PMCID: PMC9366188 DOI: 10.1093/molbev/msac161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The multispecies coalescent (MSC) model accommodates both species divergences and within-species coalescent and provides a natural framework for phylogenetic analysis of genomic data when the gene trees vary across the genome. The MSC model implemented in the program bpp assumes a molecular clock and the Jukes–Cantor model, and is suitable for analyzing genomic data from closely related species. Here we extend our implementation to more general substitution models and relaxed clocks to allow the rate to vary among species. The MSC-with-relaxed-clock model allows the estimation of species divergence times and ancestral population sizes using genomic sequences sampled from contemporary species when the strict clock assumption is violated, and provides a simulation framework for evaluating species tree estimation methods. We conducted simulations and analyzed two real datasets to evaluate the utility of the new models. We confirm that the clock-JC model is adequate for inference of shallow trees with closely related species, but it is important to account for clock violation for distant species. Our simulation suggests that there is valuable phylogenetic information in the gene-tree branch lengths even if the molecular clock assumption is seriously violated, and the relaxed-clock models implemented in bpp are able to extract such information. Our Markov chain Monte Carlo algorithms suffer from mixing problems when used for species tree estimation under the relaxed clock and we discuss possible improvements. We conclude that the new models are currently most effective for estimating population parameters such as species divergence times when the species tree is fixed.
Collapse
Affiliation(s)
- Tomáš Flouri
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Jun Huang
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK.,School of Biomedical Engineering, Capital Medical University, Beijing, 100069, China
| | - Xiyun Jiao
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK.,Department of Statistics and Data Science, China Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
| | - Paschalia Kapli
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Bruce Rannala
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| | - Ziheng Yang
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
19
|
Francis A, Jarvis PD. Brauer and partition diagram models for phylogenetic trees and forests. Proc Math Phys Eng Sci 2022; 478:20220044. [PMID: 35702594 PMCID: PMC9185836 DOI: 10.1098/rspa.2022.0044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Accepted: 05/04/2022] [Indexed: 11/24/2022] Open
Abstract
We introduce a correspondence between phylogenetic trees and Brauer diagrams, inspired by links between binary trees and matchings described by Diaconis and Holmes (1998 Proc. Natl Acad. Sci. USA 95, 14 600-14 602. (doi:10.1073/pnas.95.25.14600)). This correspondence gives rise to a range of semigroup structures on the set of phylogenetic trees, and opens the prospect of many applications. We furthermore extend the Diaconis-Holmes correspondence from binary trees to non-binary trees and to forests, showing for instance that the set of all forests is in bijection with the set of partitions of finite sets.
Collapse
Affiliation(s)
- Andrew Francis
- Centre for Research in Mathematics and Data Science, Western Sydney University, Penrith South, New South Wales, Australia
| | - Peter D. Jarvis
- School of Mathematics and Physics, University of Tasmania, Hobart, Tasmania, Australia
| |
Collapse
|
20
|
Tahiri N, Veriga A, Koshkarov A, Morozov B. Invariant transformers of Robinson and Foulds distance matrices for convolutional neural network. J Bioinform Comput Biol 2022; 20:2250012. [DOI: 10.1142/s0219720022500123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
21
|
Feng S, Bai M, Rivas-González I, Li C, Liu S, Tong Y, Yang H, Chen G, Xie D, Sears KE, Franco LM, Gaitan-Espitia JD, Nespolo RF, Johnson WE, Yang H, Brandies PA, Hogg CJ, Belov K, Renfree MB, Helgen KM, Boomsma JJ, Schierup MH, Zhang G. Incomplete lineage sorting and phenotypic evolution in marsupials. Cell 2022; 185:1646-1660.e18. [PMID: 35447073 PMCID: PMC9200472 DOI: 10.1016/j.cell.2022.03.034] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 12/22/2021] [Accepted: 03/21/2022] [Indexed: 12/19/2022]
Abstract
Incomplete lineage sorting (ILS) makes ancestral genetic polymorphisms persist during rapid speciation events, inducing incongruences between gene trees and species trees. ILS has complicated phylogenetic inference in many lineages, including hominids. However, we lack empirical evidence that ILS leads to incongruent phenotypic variation. Here, we performed phylogenomic analyses to show that the South American monito del monte is the sister lineage of all Australian marsupials, although over 31% of its genome is closer to the Diprotodontia than to other Australian groups due to ILS during ancient radiation. Pervasive conflicting phylogenetic signals across the whole genome are consistent with some of the morphological variation among extant marsupials. We detected hundreds of genes that experienced stochastic fixation during ILS, encoding the same amino acids in non-sister species. Using functional experiments, we confirm how ILS may have directly contributed to hemiplasy in morphological traits that were established during rapid marsupial speciation ca. 60 mya.
Collapse
Affiliation(s)
- Shaohong Feng
- BGI-Shenzhen, Shenzhen 518083, China; State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | - Ming Bai
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; School of Agriculture, Ningxia University, Yinchuan 750021, China; College of Plant Protection, Hebei Agricultural University, Baoding 071001, China
| | | | - Cai Li
- School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China
| | | | - Yijie Tong
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; State Key Laboratory of North China Crop Improvement and Regulation, Hebei Agricultural University, Baoding, Hebei 071001, China; Hainan Yazhou Bay Seed Lab, Building 1, No. 7 Yiju Road, Yazhou District, Sanya, Hainan 572024, China
| | - Haidong Yang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; Guangdong Key Laboratory of Animal Conservation and Resource Utilization, Guangdong Public Laboratory of Wild Animal Conservation and Utilization, Institute of Zoology, Guangdong Academy of Sciences, Guangzhou 510260, China
| | - Guangji Chen
- BGI-Shenzhen, Shenzhen 518083, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Duo Xie
- Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Karen E Sears
- Department of Ecology and Evolutionary Biology, University of California at Los Angeles, Los Angeles, CA 90095, USA
| | - Lida M Franco
- Facultad de Ciencias Naturales y Matemáticas, Universidad de Ibagué, Carrera 22 Calle 67, Ibagué, Colombia
| | - Juan Diego Gaitan-Espitia
- The Swire Institute of Marine Science and School of Biological Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Roberto F Nespolo
- Instituto de Ciencias Ambientales y Evolutivas, Facultad de Ciencias, Universidad Austral de Chile, Campus Isla Teja, Valdivia 5090000, Chile; Center of Applied Ecology and Sustainability (CAPES), Facultad de Ciencias Biológicas, Universidad Católica de Chile, Santiago 6513677, Chile; Millenium Institute for Integrative Biology (iBio), Santiago, Chile; Millennium Nucleus of Patagonian Limit of Life (LiLi), Valdivia, Chile
| | - Warren E Johnson
- Center for Species Survival, Smithsonian Conservation Biology Institute, National Zoological Park, 1500 Remont Road, Front Royal, VA 22630, USA; The Walter Reed Biosystematics Unit, Museum Support Center MRC-534, Smithsonian Institution, 4210 Silver Hill Rd., Suitland, MD 20746-2863, USA; Walter Reed Army Institute of Research, 503 Robert Grant Avenue, Silver Spring, MD 20910, USA
| | - Huanming Yang
- BGI-Shenzhen, Shenzhen 518083, China; James D. Watson Institute of Genome Sciences, Hangzhou 310058, China
| | - Parice A Brandies
- School of Life and Environmental Sciences, University of Sydney, NSW 2006, Australia
| | - Carolyn J Hogg
- School of Life and Environmental Sciences, University of Sydney, NSW 2006, Australia
| | - Katherine Belov
- School of Life and Environmental Sciences, University of Sydney, NSW 2006, Australia
| | - Marilyn B Renfree
- School of BioSciences, The University of Melbourne, Melbourne, VIC 3010, Australia
| | - Kristofer M Helgen
- Australian Museum Research Institute, Australian Museum, Sydney, NSW 2010, Australia; Australian Research Council Centre of Excellence for Australian Biodiversity and Heritage, University of New South Wales, Sydney, NSW 2052, Australia
| | - Jacobus J Boomsma
- Section for Ecology and Evolution, Department of Biology, Universitetsparken 15, University of Copenhagen, 2100 Copenhagen, Denmark
| | | | - Guojie Zhang
- BGI-Shenzhen, Shenzhen 518083, China; State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; Villum Centre for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, Universitetsparken 15, University of Copenhagen, 2100 Copenhagen, Denmark; Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China.
| |
Collapse
|
22
|
Using ultraconserved elements to reconstruct the termite tree of life. Mol Phylogenet Evol 2022; 173:107520. [DOI: 10.1016/j.ympev.2022.107520] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 04/22/2022] [Accepted: 05/10/2022] [Indexed: 11/17/2022]
|
23
|
Tree Reconciliation Methods for Host-Symbiont Cophylogenetic Analyses. Life (Basel) 2022; 12:life12030443. [PMID: 35330194 PMCID: PMC8951107 DOI: 10.3390/life12030443] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 03/05/2022] [Accepted: 03/10/2022] [Indexed: 12/16/2022] Open
Abstract
Phylogenetic reconciliation is a fundamental method in the study of pairs of coevolving species. This paper provides an overview of the underlying theory of reconciliation in the context of host-symbiont cophylogenetics, identifying some of the major challenges to users of these methods, such as selecting event costs and selecting representative reconciliations. Next, recent advances to address these challenges are discussed followed by a discussion of several established and recent software tools.
Collapse
|
24
|
Borges R, Boussau B, Szöllősi GJ, Kosiol C. Nucleotide Usage Biases Distort Inferences of the Species Tree. Genome Biol Evol 2022; 14:6496956. [PMID: 34983052 PMCID: PMC8829901 DOI: 10.1093/gbe/evab290] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/27/2021] [Indexed: 12/15/2022] Open
Abstract
Despite the importance of natural selection in species’ evolutionary history, phylogenetic methods that take into account population-level processes typically ignore selection. The assumption of neutrality is often based on the idea that selection occurs at a minority of loci in the genome and is unlikely to compromise phylogenetic inferences significantly. However, genome-wide processes like GC-bias and some variation segregating at the coding regions are known to evolve in the nearly neutral range. As we are now using genome-wide data to estimate species trees, it is natural to ask whether weak but pervasive selection is likely to blur species tree inferences. We developed a polymorphism-aware phylogenetic model tailored for measuring signatures of nucleotide usage biases to test the impact of selection in the species tree. Our analyses indicate that although the inferred relationships among species are not significantly compromised, the genetic distances are systematically underestimated in a node-height-dependent manner: that is, the deeper nodes tend to be more underestimated than the shallow ones. Such biases have implications for molecular dating. We dated the evolutionary history of 30 worldwide fruit fly populations, and we found signatures of GC-bias considerably affecting the estimated divergence times (up to 23%) in the neutral model. Our findings call for the need to account for selection when quantifying divergence or dating species evolution.
Collapse
Affiliation(s)
- Rui Borges
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria
| | - Bastien Boussau
- Université de Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5558, LBBE, Villeurbanne, France
| | - Gergely J Szöllősi
- Department of Biological Physics, Eötvös University, Budapest , Hungary.,MTA-ELTE "Lendület" Evolutionary Genomics Research Group, Budapest, Hungary.,Evolutionary Systems Research Group, Centre for Ecological Research, Hungarian Academy of Sciences, Tihany, Hungary
| | - Carolin Kosiol
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria.,Centre for Biological Diversity, University of St Andrews, St Andrews, United Kingdom
| |
Collapse
|
25
|
Abstract
Motivation Phylogenomics faces a dilemma: on the one hand, most accurate species and gene tree estimation methods are those that co-estimate them; on the other hand, these co-estimation methods do not scale to moderately large numbers of species. The summary-based methods, which first infer gene trees independently and then combine them, are much more scalable but are prone to gene tree estimation error, which is inevitable when inferring trees from limited-length data. Gene tree estimation error is not just random noise and can create biases such as long-branch attraction. Results We introduce a scalable likelihood-based approach to co-estimation under the multi-species coalescent model. The method, called quartet co-estimation (QuCo), takes as input independently inferred distributions over gene trees and computes the most likely species tree topology and internal branch length for each quartet, marginalizing over gene tree topologies and ignoring branch lengths by making several simplifying assumptions. It then updates the gene tree posterior probabilities based on the species tree. The focus on gene tree topologies and the heuristic division to quartets enables fast likelihood calculations. We benchmark our method with extensive simulations for quartet trees in zones known to produce biased species trees and further with larger trees. We also run QuCo on a biological dataset of bees. Our results show better accuracy than the summary-based approach ASTRAL run on estimated gene trees. Availability and implementation QuCo is available on https://github.com/maryamrabiee/quco. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maryam Rabiee
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| | | |
Collapse
|
26
|
Zhu Q, Mirarab S. Assembling a Reference Phylogenomic Tree of Bacteria and Archaea by Summarizing Many Gene Phylogenies. Methods Mol Biol 2022; 2569:137-165. [PMID: 36083447 DOI: 10.1007/978-1-0716-2691-7_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Phylogenomics is the inference of phylogenetic trees based on multiple marker genes sampled in the genomes of interest. An important challenge in phylogenomics is the potential incongruence among the evolutionary histories of individual genes, which can be widespread in microorganisms due to the prevalence of horizontal gene transfer. This protocol introduces the procedures for building a phylogenetic tree of a large number of microbial genomes using a broad sampling of marker genes that are representative of whole-genome evolution. The protocol highlights the use of a gene tree summary method, which can effectively reconstruct the species tree while accounting for the topological conflicts among individual gene trees. The pipeline described in this protocol is scalable to tens of thousands of genomes while retaining high accuracy. We discussed multiple software tools, libraries, and scripts to enable convenient adoption of the protocol. The protocol is suitable for microbiology and microbiome studies based on public genomes and metagenomic data.
Collapse
Affiliation(s)
- Qiyun Zhu
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA.
- School of Life Sciences, Arizona State University, Tempe, AZ, USA.
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA, USA
| |
Collapse
|
27
|
Mirarab S, Nakhleh L, Warnow T. Multispecies Coalescent: Theory and Applications in Phylogenetics. ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS 2021. [DOI: 10.1146/annurev-ecolsys-012121-095340] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Species tree estimation is a basic part of many biological research projects, ranging from answering basic evolutionary questions (e.g., how did a group of species adapt to their environments?) to addressing questions in functional biology. Yet, species tree estimation is very challenging, due to processes such as incomplete lineage sorting, gene duplication and loss, horizontal gene transfer, and hybridization, which can make gene trees differ from each other and from the overall evolutionary history of the species. Over the last 10–20 years, there has been tremendous growth in methods and mathematical theory for estimating species trees and phylogenetic networks, and some of these methods are now in wide use. In this survey, we provide an overview of the current state of the art, identify the limitations of existing methods and theory, and propose additional research problems and directions.
Collapse
Affiliation(s)
- Siavash Mirarab
- Electrical and Computer Engineering Department, University of California, San Diego, La Jolla, California 92093, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
28
|
Anselmetti Y, El-Mabrouk N, Lafond M, Ouangraoua A. Gene tree and species tree reconciliation with endosymbiotic gene transfer. Bioinformatics 2021; 37:i120-i132. [PMID: 34252921 PMCID: PMC8312264 DOI: 10.1093/bioinformatics/btab328] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION It is largely established that all extant mitochondria originated from a unique endosymbiotic event integrating an α-proteobacterial genome into an eukaryotic cell. Subsequently, eukaryote evolution has been marked by episodes of gene transfer, mainly from the mitochondria to the nucleus, resulting in a significant reduction of the mitochondrial genome, eventually completely disappearing in some lineages. However, in other lineages such as in land plants, a high variability in gene repertoire distribution, including genes encoded in both the nuclear and mitochondrial genome, is an indication of an ongoing process of Endosymbiotic Gene Transfer (EGT). Understanding how both nuclear and mitochondrial genomes have been shaped by gene loss, duplication and transfer is expected to shed light on a number of open questions regarding the evolution of eukaryotes, including rooting of the eukaryotic tree. RESULTS We address the problem of inferring the evolution of a gene family through duplication, loss and EGT events, the latter considered as a special case of horizontal gene transfer occurring between the mitochondrial and nuclear genomes of the same species (in one direction or the other). We consider both EGT events resulting in maintaining (EGTcopy) or removing (EGTcut) the gene copy in the source genome. We present a linear-time algorithm for computing the DLE (Duplication, Loss and EGT) distance, as well as an optimal reconciled tree, for the unitary cost, and a dynamic programming algorithm allowing to output all optimal reconciliations for an arbitrary cost of operations. We illustrate the application of our EndoRex software and analyze different costs settings parameters on a plant dataset and discuss the resulting reconciled trees. AVAILABILITY AND IMPLEMENTATION EndoRex implementation and supporting data are available on the GitHub repository via https://github.com/AEVO-lab/EndoRex.
Collapse
Affiliation(s)
- Yoann Anselmetti
- Département d'informatique, Université de Sherbrooke, 2500, boulevard de l'Université, Sherbrooke (Québec) J1K 2R1, Canada
| | - Nadia El-Mabrouk
- Département d'informatique et de recherche opérationnelle, Université de Montréal, CP 6128 succ Centre-Ville, Montréal, Québec H3C 3J7, Canada
| | - Manuel Lafond
- Département d'informatique, Université de Sherbrooke, 2500, boulevard de l'Université, Sherbrooke (Québec) J1K 2R1, Canada
| | - Aïda Ouangraoua
- Département d'informatique, Université de Sherbrooke, 2500, boulevard de l'Université, Sherbrooke (Québec) J1K 2R1, Canada
| |
Collapse
|
29
|
Paszek J, Markin A, Górecki P, Eulenstein O. Taming the Duplication-Loss-Coalescence Model with Integer Linear Programming. J Comput Biol 2021; 28:758-773. [PMID: 34125600 DOI: 10.1089/cmb.2021.0011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The duplication-loss-coalescence (DLC) parsimony model is invaluable for analyzing the complex scenarios of concurrent duplication loss and deep coalescence events in the evolution of gene families. However, inferring such scenarios for already moderately sized families is prohibitive owing to the computational complexity involved. To overcome this stringent limitation, we make the first step by describing a flexible integer linear programming (ILP) formulation for inferring DLC evolutionary scenarios. Then, to make the DLC model more scalable, we introduce four sensibly constrained versions of the model and describe modified versions of our ILP formulation reflecting these constraints. Our simulation studies showcase that our constrained ILP formulations compute evolutionary scenarios that are substantially larger than scenarios computable under our original ILP formulation and the original dynamic programming algorithm by Wu et al. Furthermore, scenarios computed under our constrained DLC models are remarkably accurate compared with corresponding scenarios under the original DLC model, which we also confirm in an empirical study with thousands of gene families.
Collapse
Affiliation(s)
- Jarosław Paszek
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Alexey Markin
- Department of Computer Science, Iowa State University, Ames, Iowa, USA
| | - Paweł Górecki
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Oliver Eulenstein
- Department of Computer Science, Iowa State University, Ames, Iowa, USA
| |
Collapse
|
30
|
Morel B, Kozlov AM, Stamatakis A, Szöllősi GJ. GeneRax: A Tool for Species-Tree-Aware Maximum Likelihood-Based Gene Family Tree Inference under Gene Duplication, Transfer, and Loss. Mol Biol Evol 2021; 37:2763-2774. [PMID: 32502238 PMCID: PMC8312565 DOI: 10.1093/molbev/msaa141] [Citation(s) in RCA: 67] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Inferring phylogenetic trees for individual homologous gene families is difficult because
alignments are often too short, and thus contain insufficient signal, while substitution
models inevitably fail to capture the complexity of the evolutionary processes. To
overcome these challenges, species-tree-aware methods also leverage information from a
putative species tree. However, only few methods are available that implement a full
likelihood framework or account for horizontal gene transfers. Furthermore, these methods
often require expensive data preprocessing (e.g., computing bootstrap trees) and rely on
approximations and heuristics that limit the degree of tree space exploration. Here, we
present GeneRax, the first maximum likelihood species-tree-aware phylogenetic inference
software. It simultaneously accounts for substitutions at the sequence level as well as
gene level events, such as duplication, transfer, and loss relying on established maximum
likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for
multiple gene families, directly from the per-gene sequence alignments and a rooted, yet
undated, species tree. We show that compared with competing tools, on simulated data
GeneRax infers trees that are the closest to the true tree in 90% of the simulations in
terms of relative Robinson–Foulds distance. On empirical data sets, GeneRax is the fastest
among all tested methods when starting from aligned sequences, and it infers trees with
the highest likelihood score, based on our model. GeneRax completed tree inferences and
reconciliations for 1,099 Cyanobacteria families in 8 min on 512 CPU cores. Thus, its
parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at
https://github.com/BenoitMorel/GeneRax (last accessed June 17, 2020).
Collapse
Affiliation(s)
- Benoit Morel
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Alexey M Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.,Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Gergely J Szöllősi
- ELTE-MTA "Lendület" Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös University, Budapest, Hungary.,Evolutionary Systems Research Group, Centre for Ecological Research, Hungarian Academy of Sciences, Tihany, Hungary
| |
Collapse
|
31
|
Costello R, Emms DM, Kelly S. Gene Duplication Accelerates the Pace of Protein Gain and Loss from Plant Organelles. Mol Biol Evol 2021; 37:969-981. [PMID: 31750917 PMCID: PMC7086175 DOI: 10.1093/molbev/msz275] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Organelle biogenesis and function is dependent on the concerted action of both organellar-encoded (if present) and nuclear-encoded proteins. Differences between homologous organelles across the Plant Kingdom arise, in part, as a result of differences in the cohort of nuclear-encoded proteins that are targeted to them. However, neither the rate at which differences in protein targeting accumulate nor the evolutionary consequences of these changes are known. Using phylogenomic approaches coupled to ancestral state estimation, we show that the plant organellar proteome has diversified in proportion with molecular sequence evolution such that the proteomes of plant chloroplasts and mitochondria lose or gain on average 3.6 proteins per million years. We further demonstrate that changes in organellar protein targeting are associated with an increase in the rate of molecular sequence evolution and that such changes predominantly occur in genes with regulatory rather than metabolic functions. Finally, we show that gain and loss of protein target signals occurs at a higher rate following gene duplication, revealing that gene and genome duplication are a key facilitator of plant organelle evolution.
Collapse
Affiliation(s)
- Rona Costello
- Department of Plant Sciences, University of Oxford, Oxford, United Kingdom
| | - David M Emms
- Department of Plant Sciences, University of Oxford, Oxford, United Kingdom
| | - Steven Kelly
- Department of Plant Sciences, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
32
|
Scossa F, Fernie AR. Ancestral sequence reconstruction - An underused approach to understand the evolution of gene function in plants? Comput Struct Biotechnol J 2021; 19:1579-1594. [PMID: 33868595 PMCID: PMC8039532 DOI: 10.1016/j.csbj.2021.03.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Revised: 03/04/2021] [Accepted: 03/06/2021] [Indexed: 02/06/2023] Open
Abstract
Whilst substantial research effort has been placed on understanding the interactions of plant proteins with their molecular partners, relatively few studies in plants - by contrast to work in other organisms - address how these interactions evolve. It is thought that ancestral proteins were more promiscuous than modern proteins and that specificity often evolved following gene duplication and subsequent functional refining. However, ancestral protein resurrection studies have found that some modern proteins have evolved de novo from ancestors lacking those functions. Intriguingly, the new interactions evolved as a consequence of just a few mutations and, as such, acquisition of new functions appears to be neither difficult nor rare, however, only a few of them are incorporated into biological processes before they are lost to subsequent mutations. Here, we detail the approach of ancestral sequence reconstruction (ASR), providing a primer to reconstruct the sequence of an ancestral gene. We will present case studies from a range of different eukaryotes before discussing the few instances where ancestral reconstructions have been used in plants. As ASR is used to dig into the remote evolutionary past, we will also present some alternative genetic approaches to investigate molecular evolution on shorter timescales. We argue that the study of plant secondary metabolism is particularly well suited for ancestral reconstruction studies. Indeed, its ancient evolutionary roots and highly diverse landscape provide an ideal context in which to address the focal issue around the emergence of evolutionary novelties and how this affects the chemical diversification of plant metabolism.
Collapse
Key Words
- APR, ancestral protein resurrection
- ASR, ancestral sequence reconstruction
- Ancestral sequence reconstruction
- CDS, coding sequence
- Evolution
- GR, glucocorticoid receptor
- GWAS, genome wide association study
- Genomics
- InDel, insertion/deletion
- MCMC, Markov Chain Monte Carlo
- ML, maximum likelihood
- MP, maximum parsimony
- MR, mineralcorticoid receptor
- MSA, multiple sequence alignment
- Metabolism
- NJ, neighbor-joining
- Phylogenetics
- Plants
- SFS, site frequency spectrum
Collapse
Affiliation(s)
- Federico Scossa
- Max-Planck-Institute of Molecular Plant Physiology (MPI-MP), 14476 Potsdam-Golm, Germany
- Council for Agricultural Research and Economics (CREA), Research Centre for Genomics and Bioinformatics (CREA-GB), Rome, Italy
| | - Alisdair R. Fernie
- Max-Planck-Institute of Molecular Plant Physiology (MPI-MP), 14476 Potsdam-Golm, Germany
- Center of Plant Systems Biology and Biotechnology (CPSBB), Plovdiv, Bulgaria
| |
Collapse
|
33
|
Comte N, Morel B, Hasić D, Guéguen L, Boussau B, Daubin V, Penel S, Scornavacca C, Gouy M, Stamatakis A, Tannier E, Parsons DP. Treerecs: an integrated phylogenetic tool, from sequences to reconciliations. Bioinformatics 2021; 36:4822-4824. [PMID: 33085745 DOI: 10.1093/bioinformatics/btaa615] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 06/22/2020] [Accepted: 07/09/2020] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION Gene and species tree reconciliation methods are used to interpret gene trees, root them and correct uncertainties that are due to scarcity of signal in multiple sequence alignments. So far, reconciliation tools have not been integrated in standard phylogenetic software and they either lack performance on certain functions, or usability for biologists. RESULTS We present Treerecs, a phylogenetic software based on duplication-loss reconciliation. Treerecs is simple to install and to use. It is fast and versatile, has a graphic output, and can be used along with methods for phylogenetic inference on multiple alignments like PLL and Seaview. AVAILABILITY AND IMPLEMENTATION Treerecs is open-source. Its source code (C++, AGPLv3) and manuals are available from https://project.inria.fr/treerecs/.
Collapse
Affiliation(s)
- Nicolas Comte
- Inria Grenoble Rhône-Alpes, 38334 Montbonnot, France
| | - Benoit Morel
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Damir Hasić
- Department of Mathematics, University of Sarajevo, Sarajevo, Bosnia and Herzegovina
| | - Laurent Guéguen
- Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | - Bastien Boussau
- Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | - Vincent Daubin
- Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | - Simon Penel
- Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | - Celine Scornavacca
- ISEM, CNRS, Université de Montpellier, IRD, EPHE, Montpellier 34000, France
| | - Manolo Gouy
- Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.,Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Eric Tannier
- Inria Grenoble Rhône-Alpes, 38334 Montbonnot, France.,Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | | |
Collapse
|
34
|
Han X, Guo J, Pang E, Song H, Lin K. Ab Initio Construction and Evolutionary Analysis of Protein-Coding Gene Families with Partially Homologous Relationships: Closely Related Drosophila Genomes as a Case Study. Genome Biol Evol 2021; 12:185-202. [PMID: 32108239 PMCID: PMC7144356 DOI: 10.1093/gbe/evaa041] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/18/2020] [Indexed: 01/05/2023] Open
Abstract
How have genes evolved within a well-known genome phylogeny? Many protein-coding genes should have evolved as a whole at the gene level, and some should have evolved partly through fragments at the subgene level. To comprehensively explore such complex homologous relationships and better understand gene family evolution, here, with de novo-identified modules, the subgene units which could consecutively cover proteins within a set of closely related species, we applied a new phylogeny-based approach that considers evolutionary models with partial homology to classify all protein-coding genes in nine Drosophila genomes. Compared with two other popular methods for gene family construction, our approach improved practical gene family classifications with a more reasonable view of homology and provided a much more complete landscape of gene family evolution at the gene and subgene levels. In the case study, we found that most expanded gene families might have evolved mainly through module rearrangements rather than gene duplications and mainly generated single-module genes through partial gene duplication, suggesting that there might be pervasive subgene rearrangement in the evolution of protein-coding gene families. The use of a phylogeny-based approach with partial homology to classify and analyze protein-coding gene families may provide us with a more comprehensive landscape depicting how genes evolve within a well-known genome phylogeny.
Collapse
Affiliation(s)
- Xia Han
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| | - Jindan Guo
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| | - Erli Pang
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| | - Hongtao Song
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| | - Kui Lin
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| |
Collapse
|
35
|
New Approaches for Inferring Phylogenies in the Presence of Paralogs. Trends Genet 2021; 37:174-187. [DOI: 10.1016/j.tig.2020.08.012] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 08/13/2020] [Accepted: 08/19/2020] [Indexed: 12/18/2022]
|
36
|
Zhu T, Yang Z. Complexity of the simplest species tree problem. Mol Biol Evol 2021; 38:3993-4009. [PMID: 33492385 PMCID: PMC8382899 DOI: 10.1093/molbev/msab009] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Revised: 01/04/2021] [Accepted: 01/13/2021] [Indexed: 02/06/2023] Open
Abstract
The multispecies coalescent model provides a natural framework for species tree estimation accounting for gene-tree conflicts. Although a number of species tree methods under the multispecies coalescent have been suggested and evaluated using simulation, their statistical properties remain poorly understood. Here, we use mathematical analysis aided by computer simulation to examine the identifiability, consistency, and efficiency of different species tree methods in the case of three species and three sequences under the molecular clock. We consider four major species-tree methods including concatenation, two-step, independent-sites maximum likelihood, and maximum likelihood. We develop approximations that predict that the probit transform of the species tree estimation error decreases linearly with the square root of the number of loci. Even in this simplest case, major differences exist among the methods. Full-likelihood methods are considerably more efficient than summary methods such as concatenation and two-step. They also provide estimates of important parameters such as species divergence times and ancestral population sizes,whereas these parameters are not identifiable by summary methods. Our results highlight the need to improve the statistical efficiency of summary methods and the computational efficiency of full likelihood methods of species tree estimation.
Collapse
Affiliation(s)
- Tianqi Zhu
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Ziheng Yang
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,Department of Genetics, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
37
|
Shah D, Freas C, Weber IT, Harrison RW. Evolution of drug resistance in HIV protease. BMC Bioinformatics 2020; 21:497. [PMID: 33375936 PMCID: PMC7772915 DOI: 10.1186/s12859-020-03825-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2020] [Accepted: 10/19/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Drug resistance is a critical problem limiting effective antiviral therapy for HIV/AIDS. Computational techniques for predicting drug resistance profiles from genomic data can accelerate the appropriate choice of therapy. These techniques can also be used to identify protease mutants for experimental studies of resistance and thereby assist in the development of next-generation therapies. Few studies, however, have assessed the evolution of resistance from genotype-phenotype data. RESULTS The machine learning produced highly accurate and robust classification of resistance to HIV protease inhibitors. Genotype data were mapped to the enzyme structure and encoded using Delaunay triangulation. Estimates of evolutionary relationships, based on this encoding, and using Minimum Spanning Trees, showed clusters of mutations that closely resemble the wild type. These clusters appear to evolve uniquely to more resistant phenotypes. CONCLUSIONS Using the triangulation metric and spanning trees results in paths that are consistent with evolutionary theory. The majority of the paths show bifurcation, namely they switch once from non-resistant to resistant or from resistant to non-resistant. Paths that lose resistance almost uniformly have far lower levels of resistance than those which either gain resistance or are stable. This strongly suggests that selection for stability in the face of a rapid rate of mutation is as important as selection for resistance in retroviral systems.
Collapse
Affiliation(s)
- Dhara Shah
- Department of Computer Science, 25 Park Place, Atlanta, GA 30303 USA
| | - Christopher Freas
- Department of Computer Science, 25 Park Place, Atlanta, GA 30303 USA
| | - Irene T. Weber
- Department of Biology, 100 Piedmont Ave., Atlanta, GA 30303 USA
| | - Robert W. Harrison
- Department of Computer Science, 25 Park Place, Atlanta, GA 30303 USA
- Department of Biology, 100 Piedmont Ave., Atlanta, GA 30303 USA
| |
Collapse
|
38
|
Phylogenomics reveals the basis of adaptation of Pseudorhizobium species to extreme environments and supports a taxonomic revision of the genus. Syst Appl Microbiol 2020; 44:126165. [PMID: 33360413 DOI: 10.1016/j.syapm.2020.126165] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 11/10/2020] [Accepted: 11/11/2020] [Indexed: 11/21/2022]
Abstract
The family Rhizobiaceae includes many genera of soil bacteria, often isolated for their association with plants. Herein, we investigate the genomic diversity of a group of Rhizobium species and unclassified strains isolated from atypical environments, including seawater, rock matrix or polluted soil. Based on whole-genome similarity and core genome phylogeny, we show that this group corresponds to the genus Pseudorhizobium. We thus reclassify Rhizobium halotolerans, R. marinum, R. flavum and R. endolithicum as P. halotolerans sp. nov., P. marinum comb. nov., P. flavum comb. nov. and P. endolithicum comb. nov., respectively, and show that P. pelagicum is a synonym of P. marinum. We also delineate a new chemolithoautotroph species, P. banfieldiae sp. nov., whose type strain is NT-26T (=DSM 106348T=CFBP 8663T). This genome-based classification was supported by a chemotaxonomic comparison, with increasing taxonomic resolution provided by fatty acid, protein and metabolic profiles. In addition, we used a phylogenetic approach to infer scenarios of duplication, horizontal transfer and loss for all genes in the Pseudorhizobium pangenome. We thus identify the key functions associated with the diversification of each species and higher clades, shedding light on the mechanisms of adaptation to their respective ecological niches. Respiratory proteins acquired at the origin of Pseudorhizobium were combined with clade-specific genes to enable different strategies for detoxification and nutrition in harsh, nutrient-poor environments.
Collapse
|
39
|
Zhang C, Scornavacca C, Molloy EK, Mirarab S. ASTRAL-Pro: Quartet-Based Species-Tree Inference despite Paralogy. Mol Biol Evol 2020; 37:3292-3307. [PMID: 32886770 PMCID: PMC7751180 DOI: 10.1093/molbev/msaa139] [Citation(s) in RCA: 80] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Phylogenetic inference from genome-wide data (phylogenomics) has revolutionized the study of evolution because it enables accounting for discordance among evolutionary histories across the genome. To this end, summary methods have been developed to allow accurate and scalable inference of species trees from gene trees. However, most of these methods, including the widely used ASTRAL, can only handle single-copy gene trees and do not attempt to model gene duplication and gene loss. As a result, most phylogenomic studies have focused on single-copy genes and have discarded large parts of the data. Here, we first propose a measure of quartet similarity between single-copy and multicopy trees that accounts for orthology and paralogy. We then introduce a method called ASTRAL-Pro (ASTRAL for PaRalogs and Orthologs) to find the species tree that optimizes our quartet similarity measure using dynamic programing. By studying its performance on an extensive collection of simulated data sets and on real data sets, we show that ASTRAL-Pro is more accurate than alternative methods.
Collapse
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology, University of California San Diego, San Diego, CA
| | | | - Erin K Molloy
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA
| |
Collapse
|
40
|
Rossi A, Treu L, Toppo S, Zschach H, Campanaro S, Dutilh BE. Evolutionary Study of the Crassphage Virus at Gene Level. Viruses 2020; 12:v12091035. [PMID: 32957679 PMCID: PMC7551546 DOI: 10.3390/v12091035] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 09/03/2020] [Accepted: 09/14/2020] [Indexed: 12/15/2022] Open
Abstract
crAss-like viruses are a putative family of bacteriophages recently discovered. The eponym of the clade, crAssphage, is an enteric bacteriophage estimated to be present in at least half of the human population and it constitutes up to 90% of the sequences in some human fecal viral metagenomic datasets. We focused on the evolutionary dynamics of the genes encoded on the crAssphage genome. By investigating the conservation of the genes, a consistent variation in the evolutionary rates across the different functional groups was found. Gene duplications in crAss-like genomes were detected. By exploring the differences among the functional categories of the genes, we confirmed that the genes encoding capsid proteins were the most ubiquitous, despite their overall low sequence conservation. It was possible to identify a core of proteins whose evolutionary trees strongly correlate with each other, suggesting their genetic interaction. This group includes the capsid proteins, which are thus established as extremely suitable for rebuilding the phylogenetic tree of this viral clade. A negative correlation between the ubiquity and the conservation of viral protein sequences was shown. Together, this study provides an in-depth picture of the evolution of different genes in crAss-like viruses.
Collapse
Affiliation(s)
- Alessandro Rossi
- Department of Biology, University of Padova, 35131 Padova, Italy; (A.R.); (S.C.)
| | - Laura Treu
- Department of Biology, University of Padova, 35131 Padova, Italy; (A.R.); (S.C.)
- Correspondence: ; Tel.: +39-049-827-6165
| | - Stefano Toppo
- Department of Molecular Medicine, University of Padova, 35131 Padova, Italy;
| | - Henrike Zschach
- Department of Biology, University of Copenhagen, 1017 Copenhagen, Denmark;
| | - Stefano Campanaro
- Department of Biology, University of Padova, 35131 Padova, Italy; (A.R.); (S.C.)
- CRIBI Biotechnology Center, University of Padua, 35131 Padova, Italy
| | - Bas E. Dutilh
- Institute of Biodynamics and Biocomplexity, University of Utrecht, 3508 Utrecht, The Netherlands;
| |
Collapse
|
41
|
Teulet A, Gully D, Rouy Z, Camuel A, Koebnik R, Giraud E, Lassalle F. Phylogenetic distribution and evolutionary dynamics of nod and T3SS genes in the genus Bradyrhizobium. Microb Genom 2020; 6:mgen000407. [PMID: 32783800 PMCID: PMC7643967 DOI: 10.1099/mgen.0.000407] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Accepted: 06/26/2020] [Indexed: 01/22/2023] Open
Abstract
Bradyrhizobium are abundant soil bacteria and the major symbiont of legumes. The recent availability of Bradyrhizobium genome sequences provides a large source of information for analysis of symbiotic traits. In this study, we investigated the evolutionary dynamics of the nodulation genes (nod) and their relationship with the genes encoding type III secretion systems (T3SS) and their effectors among bradyrhizobia. Based on the comparative analysis of 146 Bradyrhizobium genome sequences, we identified six different types of T3SS gene clusters. The two predominant cluster types are designated RhcIa and RhcIb and both belong to the RhcI-T3SS family previously described in other rhizobia. They are found in 92/146 strains, most of them also containing nod genes. RhcIa and RhcIb gene clusters differ in the genes they carry: while the translocon-encoding gene nopX is systematically found in strains containing RhcIb, the nopE and nopH genes are specifically conserved in strains containing RhcIa, suggesting that these last two genes might functionally substitute nopX and play a role related to effector translocation. Phylogenetic analysis suggests that bradyrhizobia simultaneously gained nod and RhcI-T3SS gene clusters via horizontal transfer or subsequent vertical inheritance of a symbiotic island containing both. Sequence similarity searches for known Nop effector proteins in bradyrhizobial proteomes revealed the absence of a so-called core effectome, i.e. that no effector is conserved among all Bradyrhizobium strains. However, NopM and SUMO proteases were found to be the main effector families, being represented in the majority of the genus. This study indicates that bradyrhizobial T3SSs might play a more significant symbiotic role than previously thought and provides new candidates among T3SS structural proteins and effectors for future functional investigations.
Collapse
Affiliation(s)
- Albin Teulet
- IRD, Laboratoire des Symbioses Tropicales et Méditerranéennes (LSTM), UMR IRD/SupAgro/INRA/Université de Montpellier/CIRAD, TA-A82/J – Campus de Baillarguet 34398, Montpellier cedex 5, France
| | - Djamel Gully
- IRD, Laboratoire des Symbioses Tropicales et Méditerranéennes (LSTM), UMR IRD/SupAgro/INRA/Université de Montpellier/CIRAD, TA-A82/J – Campus de Baillarguet 34398, Montpellier cedex 5, France
| | - Zoe Rouy
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry, France
| | - Alicia Camuel
- IRD, Laboratoire des Symbioses Tropicales et Méditerranéennes (LSTM), UMR IRD/SupAgro/INRA/Université de Montpellier/CIRAD, TA-A82/J – Campus de Baillarguet 34398, Montpellier cedex 5, France
| | - Ralf Koebnik
- IRD, CIRAD, Université de Montpellier, IPME, Montpellier, France
| | - Eric Giraud
- IRD, Laboratoire des Symbioses Tropicales et Méditerranéennes (LSTM), UMR IRD/SupAgro/INRA/Université de Montpellier/CIRAD, TA-A82/J – Campus de Baillarguet 34398, Montpellier cedex 5, France
| | - Florent Lassalle
- Department of Infectious Disease Epidemiology. Imperial College London, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
- Pathogen and Microbes Program, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
- Present address: Pathogen and Microbes Program, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| |
Collapse
|
42
|
Gabaldón T. Patterns and impacts of nonvertical evolution in eukaryotes: a paradigm shift. Ann N Y Acad Sci 2020; 1476:78-92. [PMID: 32860228 PMCID: PMC7589212 DOI: 10.1111/nyas.14471] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 07/19/2020] [Accepted: 07/27/2020] [Indexed: 12/14/2022]
Abstract
Evolution of eukaryotic species and their genomes has been traditionally understood as a vertical process in which genetic material is transmitted from parents to offspring along a lineage, and in which genetic exchange is restricted within species boundaries. However, mounting evidence from comparative genomics indicates that this paradigm is often violated. Horizontal gene transfer and mating between diverged lineages blur species boundaries and challenge the reconstruction of evolutionary histories of species and their genomes. Nonvertical evolution might be more restricted in eukaryotes than in prokaryotes, yet it is not negligible and can be common in certain groups. Recognition of such processes brings about the need to incorporate this complexity into our models, as well as to conceptually reframe eukaryotic diversity and evolution. Here, I review the recent work from genomics studies that supports the effects of nonvertical modes of evolution including introgression, hybridization, and horizontal gene transfer in different eukaryotic groups. I then discuss emerging patterns and effects, illustrated by specific examples, that support the conclusion that nonvertical processes are often at the root of important evolutionary transitions and adaptations. I will argue that a paradigm shift is needed to naturally accommodate nonvertical processes in eukaryotic evolution.
Collapse
Affiliation(s)
- Toni Gabaldón
- Barcelona Supercomputing Centre (BCS-CNS), Barcelona, Spain.,Institute for Research in Biomedicine (IRB), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
43
|
Van Dam MH, Henderson JB, Esposito L, Trautwein M. Genomic Characterization and Curation of UCEs Improves Species Tree Reconstruction. Syst Biol 2020; 70:307-321. [PMID: 32750133 PMCID: PMC7875437 DOI: 10.1093/sysbio/syaa063] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 07/26/2020] [Accepted: 07/29/2020] [Indexed: 12/12/2022] Open
Abstract
Ultraconserved genomic elements (UCEs) are generally treated as independent loci in phylogenetic analyses. The identification pipeline for UCE probes does not require prior knowledge of genetic identity, only selecting loci that are highly conserved, single copy, without repeats, and of a particular length. Here, we characterized UCEs from 11 phylogenomic studies across the animal tree of life, from birds to marine invertebrates. We found that within vertebrate lineages, UCEs are mostly intronic and intergenic, while in invertebrates, the majority are in exons. We then curated four different sets of UCE markers by genomic category from five different studies including: birds, mammals, fish, Hymenoptera (ants, wasps, and bees), and Coleoptera (beetles). Of genes captured by UCEs, we find that many are represented by two or more UCEs, corresponding to nonoverlapping segments of a single gene. We considered these UCEs to be nonindependent, merged all UCEs that belonged to a particular gene, constructed gene and species trees, and then evaluated the subsequent effect of merging cogenic UCEs on gene and species tree reconstruction. Average bootstrap support for merged UCE gene trees was significantly improved across all data sets apparently driven by the increase in loci length. Additionally, we conducted simulations and found that gene trees generated from merged UCEs were more accurate than those generated by unmerged UCEs. As loci length improves gene tree accuracy, this modest degree of UCE characterization and curation impacts downstream analyses and demonstrates the advantages of incorporating basic genomic characterizations into phylogenomic analyses. [Anchored hybrid enrichment; ants; ASTRAL; bait capture; carangimorph; Coleoptera; conserved nonexonic elements; exon capture; gene tree; Hymenoptera; mammal; phylogenomic markers; songbird; species tree; ultraconserved elements; weevils.]
Collapse
Affiliation(s)
- Matthew H Van Dam
- Entomology Department, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Dr., San Francisco, CA 94118, USA.,Center for Comparative Genomics, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Dr., San Francisco, CA 94118, USA
| | - James B Henderson
- Center for Comparative Genomics, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Dr., San Francisco, CA 94118, USA
| | - Lauren Esposito
- Entomology Department, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Dr., San Francisco, CA 94118, USA.,Center for Comparative Genomics, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Dr., San Francisco, CA 94118, USA
| | - Michelle Trautwein
- Entomology Department, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Dr., San Francisco, CA 94118, USA.,Center for Comparative Genomics, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Dr., San Francisco, CA 94118, USA
| |
Collapse
|
44
|
Parey E, Louis A, Cabau C, Guiguen Y, Roest Crollius H, Berthelot C. Synteny-Guided Resolution of Gene Trees Clarifies the Functional Impact of Whole-Genome Duplications. Mol Biol Evol 2020; 37:3324-3337. [DOI: 10.1093/molbev/msaa149] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Abstract
Whole-genome duplications (WGDs) have major impacts on the evolution of species, as they produce new gene copies contributing substantially to adaptation, isolation, phenotypic robustness, and evolvability. They result in large, complex gene families with recurrent gene losses in descendant species that sequence-based phylogenetic methods fail to reconstruct accurately. As a result, orthologs and paralogs are difficult to identify reliably in WGD-descended species, which hinders the exploration of functional consequences of WGDs. Here, we present Synteny-guided CORrection of Paralogies and Orthologies (SCORPiOs), a novel method to reconstruct gene phylogenies in the context of a known WGD event. WGDs generate large duplicated syntenic regions, which SCORPiOs systematically leverages as a complement to sequence evolution to infer the evolutionary history of genes. We applied SCORPiOs to the 320-My-old WGD at the origin of teleost fish. We find that almost one in four teleost gene phylogenies in the Ensembl database (3,394) are inconsistent with their syntenic contexts. For 70% of these gene families (2,387), we were able to propose an improved phylogenetic tree consistent with both the molecular substitution distances and the local syntenic information. We show that these synteny-guided phylogenies are more congruent with the species tree, with sequence evolution and with expected expression conservation patterns than those produced by state-of-the-art methods. Finally, we show that synteny-guided gene trees emphasize contributions of WGD paralogs to evolutionary innovations in the teleost clade.
Collapse
Affiliation(s)
- Elise Parey
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France
| | - Alexandra Louis
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France
| | - Cédric Cabau
- SIGENAE, GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, France
| | | | - Hugues Roest Crollius
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France
| | - Camille Berthelot
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France
| |
Collapse
|
45
|
Delabre M, El-Mabrouk N, Huber KT, Lafond M, Moulton V, Noutahi E, Castellanos MS. Evolution through segmental duplications and losses: a Super-Reconciliation approach. Algorithms Mol Biol 2020; 15:12. [PMID: 32508979 PMCID: PMC7249433 DOI: 10.1186/s13015-020-00171-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Accepted: 05/05/2020] [Indexed: 02/02/2023] Open
Abstract
The classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is not appropriate for genes grouped into syntenic blocks, which are more plausibly the result of a concerted evolution. Here, we introduce the Super-Reconciliation problem which consists in inferring a history of segmental duplication and loss events (involving a set of neighboring genes) leading to a set of present-day syntenies from a single ancestral one. In other words, we extend the traditional Duplication-Loss reconciliation problem of a single gene tree, to a set of trees, accounting for segmental duplications and losses. Existency of a Super-Reconciliation depends on individual gene tree consistency. In addition, ignoring rearrangements implies that existency also depends on gene order consistency. We first show that the problem of reconstructing a most parsimonious Super-Reconciliation, if any, is NP-hard and give an exact exponential-time algorithm to solve it. Alternatively, we show that accounting for rearrangements in the evolutionary model, but still only minimizing segmental duplication and loss events, leads to an exact polynomial-time algorithm. We finally assess time efficiency of the former exponential time algorithm for the Duplication-Loss model on simulated datasets, and give a proof of concept on the opioid receptor genes.
Collapse
|
46
|
Abstract
Knowing phylogenetic relationships among species is fundamental for many studies in biology. An accurate phylogenetic tree underpins our understanding of the major transitions in evolution, such as the emergence of new body plans or metabolism, and is key to inferring the origin of new genes, detecting molecular adaptation, understanding morphological character evolution and reconstructing demographic changes in recently diverged species. Although data are ever more plentiful and powerful analysis methods are available, there remain many challenges to reliable tree building. Here, we discuss the major steps of phylogenetic analysis, including identification of orthologous genes or proteins, multiple sequence alignment, and choice of substitution models and inference methodologies. Understanding the different sources of errors and the strategies to mitigate them is essential for assembling an accurate tree of life.
Collapse
|
47
|
Rabiee M, Mirarab S. INSTRAL: Discordance-Aware Phylogenetic Placement Using Quartet Scores. Syst Biol 2020; 69:384-391. [PMID: 31290974 DOI: 10.1093/sysbio/syz045] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2018] [Accepted: 07/02/2019] [Indexed: 11/13/2022] Open
Abstract
Phylogenomic analyses have increasingly adopted species tree reconstruction using methods that account for gene tree discordance using pipelines that require both human effort and computational resources. As the number of available genomes continues to increase, a new problem is facing researchers. Once more species become available, they have to repeat the whole process from the beginning because updating species trees is currently not possible. However, the de novo inference can be prohibitively costly in human effort or machine time. In this article, we introduce INSTRAL, a method that extends ASTRAL to enable phylogenetic placement. INSTRAL is designed to place a new species on an existing species tree after sequences from the new species have already been added to gene trees; thus, INSTRAL is complementary to existing placement methods that update gene trees. [ASTRAL; ILS; phylogenetic placement; species tree reconstruction.].
Collapse
Affiliation(s)
- Maryam Rabiee
- Department of Computer Science and Engineering, UC San Diego, La Jolla, CA 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, UC, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| |
Collapse
|
48
|
Nagy LG, Merényi Z, Hegedüs B, Bálint B. Novel phylogenetic methods are needed for understanding gene function in the era of mega-scale genome sequencing. Nucleic Acids Res 2020; 48:2209-2219. [PMID: 31943056 PMCID: PMC7049691 DOI: 10.1093/nar/gkz1241] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 12/15/2019] [Accepted: 12/31/2019] [Indexed: 12/21/2022] Open
Abstract
Ongoing large-scale genome sequencing projects are forecasting a data deluge that will almost certainly overwhelm current analytical capabilities of evolutionary genomics. In contrast to population genomics, there are no standardized methods in evolutionary genomics for extracting evolutionary and functional (e.g. gene-trait association) signal from genomic data. Here, we examine how current practices of multi-species comparative genomics perform in this aspect and point out that many genomic datasets are under-utilized due to the lack of powerful methodologies. As a result, many current analyses emphasize gene families for which some functional data is already available, resulting in a growing gap between functionally well-characterized genes/organisms and the universe of unknowns. This leaves unknown genes on the 'dark side' of genomes, a problem that will not be mitigated by sequencing more and more genomes, unless we develop tools to infer functional hypotheses for unknown genes in a systematic manner. We provide an inventory of recently developed methods capable of predicting gene-gene and gene-trait associations based on comparative data, then argue that realizing the full potential of whole genome datasets requires the integration of phylogenetic comparative methods into genomics, a rich but underutilized toolbox for looking into the past.
Collapse
Affiliation(s)
- László G Nagy
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| | - Zsolt Merényi
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| | - Botond Hegedüs
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| | - Balázs Bálint
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| |
Collapse
|
49
|
Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models. J Math Biol 2020; 80:1353-1388. [PMID: 32060618 PMCID: PMC7052048 DOI: 10.1007/s00285-019-01465-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 11/18/2019] [Indexed: 10/28/2022]
Abstract
Given a set of species whose evolution is represented by a species tree, a gene family is a group of genes having evolved from a single ancestral gene. A gene family evolves along the branches of a species tree through various mechanisms, including-but not limited to-speciation ([Formula: see text]), gene duplication ([Formula: see text]), gene loss ([Formula: see text]), and horizontal gene transfer ([Formula: see text]). The reconstruction of a gene tree representing the evolution of a gene family constrained by a species tree is an important problem in phylogenomics. However, unlike in the multispecies coalescent evolutionary model that considers only speciation and incomplete lineage sorting events, very little is known about the search space for gene family histories accounting for gene duplication, gene loss and horizontal gene transfer (the [Formula: see text]-model). In this work, we introduce the notion of evolutionary histories defined as a binary ordered rooted tree describing the evolution of a gene family, constrained by a species tree in the [Formula: see text]-model. We provide formal grammars describing the set of all evolutionary histories that are compatible with a given species tree, whether it is ranked or unranked. These grammars allow us, using either analytic combinatorics or dynamic programming, to efficiently compute the number of histories of a given size, and also to generate random histories of a given size under the uniform distribution. We apply these tools to obtain exact asymptotics for the number of gene family histories for two species trees, the rooted caterpillar and complete binary tree, as well as estimates of the range of the exponential growth factor of the number of histories for random species trees of size up to 25. Our results show that including horizontal gene transfers induce a dramatic increase of the number of evolutionary histories. We also show that, within ranked species trees, the number of evolutionary histories in the [Formula: see text]-model is almost independent of the species tree topology. These results establish firm foundations for the development of ensemble methods for the prediction of reconciliations.
Collapse
|
50
|
Borges R, Kosiol C. Consistency and identifiability of the polymorphism-aware phylogenetic models. J Theor Biol 2020; 486:110074. [PMID: 31711991 DOI: 10.1016/j.jtbi.2019.110074] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 11/06/2019] [Indexed: 10/25/2022]
Abstract
Polymorphism-aware phylogenetic models (PoMo) constitute an alternative approach for species tree estimation from genome-wide data. PoMo builds on the standard substitution models of DNA evolution but expands the classic alphabet of the four nucleotide bases to include polymorphic states. By doing so, PoMo accounts for ancestral and current intra-population variation, while also accommodating population-level processes ruling the substitution process (e.g. genetic drift, mutations, allelic selection). PoMo has shown to be a valuable tool in several phylogenetic applications but a proof of statistical consistency (and identifiability, a necessary condition for consistency) is lacking. Here, we prove that PoMo is identifiable and, using this result, we further show that the maximum a posteriori (MAP) tree estimator of PoMo is a consistent estimator of the species tree. We complement our theoretical results with a simulated data set mimicking the diversity observed in natural populations exhibiting incomplete lineage sorting. We implemented PoMo in a Bayesian framework and show that the MAP tree easily recovers the true tree for typical numbers of sites that are sampled in genome-wide analyses.
Collapse
Affiliation(s)
- Rui Borges
- Institut für Populationsgenetik, Vetmeduni Vienna, Veterinärplatz 1, Wien 1210, Austria
| | - Carolin Kosiol
- Institut für Populationsgenetik, Vetmeduni Vienna, Veterinärplatz 1, Wien 1210, Austria; Centre for Biological Diversity, University of St Andrews, St Andrews, Fife KY16 9TH, UK.
| |
Collapse
|