1
|
Li Q, Chan YB, Galtier N, Scornavacca C. The Effect of Copy Number Hemiplasy on Gene Family Evolution. Syst Biol 2024; 73:355-374. [PMID: 38330161 DOI: 10.1093/sysbio/syae007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 01/24/2024] [Accepted: 02/03/2024] [Indexed: 02/10/2024] Open
Abstract
The evolution of gene families is complex, involving gene-level evolutionary events such as gene duplication, horizontal gene transfer, and gene loss, and other processes such as incomplete lineage sorting (ILS). Because of this, topological differences often exist between gene trees and species trees. A number of models have been recently developed to explain these discrepancies, the most realistic of which attempts to consider both gene-level events and ILS. When unified in a single model, the interaction between ILS and gene-level events can cause polymorphism in gene copy number, which we refer to as copy number hemiplasy (CNH). In this paper, we extend the Wright-Fisher process to include duplications and losses over several species, and show that the probability of CNH for this process can be significant. We study how well two unified models-multilocus multispecies coalescent (MLMSC), which models CNH, and duplication, loss, and coalescence (DLCoal), which does not-approximate the Wright-Fisher process with duplication and loss. We then study the effect of CNH on gene family evolution by comparing MLMSC and DLCoal. We generate comparable gene trees under both models, showing significant differences in various summary statistics; most importantly, CNH reduces the number of gene copies greatly. If this is not taken into account, the traditional method of estimating duplication rates (by counting the number of gene copies) becomes inaccurate. The simulated gene trees are also used for species tree inference with the summary methods ASTRAL and ASTRAL-Pro, demonstrating that their accuracy, based on CNH-unaware simulations calibrated on real data, may have been overestimated.
Collapse
Affiliation(s)
- Qiuyi Li
- School of Mathematics and Statistics/Melbourne Integrative Genomics, The University of Melbourne, Melbourne 3010, Australia
- Alibaba Cloud, Hangzhou, China
| | - Yao-Ban Chan
- School of Mathematics and Statistics/Melbourne Integrative Genomics, The University of Melbourne, Melbourne 3010, Australia
| | - Nicolas Galtier
- Institut des Sciences de lEvolution, Université Montpellier, CNRS, IRD, EPHE, Montpellier 34095, France
| | - Celine Scornavacca
- Institut des Sciences de l'Evolution, Université Montpellier, CNRS, IRD, EPHE, Montpellier 34095, France
| |
Collapse
|
2
|
Liu Z, Yang J, Long Y, Zhang C, Wang D, Zhang X, Dong W, Zhao L, Liu C, Zhai J, Wang E. Single-nucleus transcriptomes reveal spatiotemporal symbiotic perception and early response in Medicago. NATURE PLANTS 2023; 9:1734-1748. [PMID: 37749242 DOI: 10.1038/s41477-023-01524-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 08/25/2023] [Indexed: 09/27/2023]
Abstract
Establishing legume-rhizobial symbiosis requires precise coordination of complex responses in a time- and cell type-specific manner. Encountering Rhizobium, rapid changes of gene expression levels in host plants occur in the first few hours, which prepare the plants to turn off defence and form a symbiotic relationship with the microbes. Here, we applied single-nucleus RNA sequencing to characterize the roots of Medicago truncatula at 30 min, 6 h and 24 h after nod factor treatment. We found drastic global gene expression reprogramming at 30 min in the epidermis and cortex and most of these changes were restored at 6 h. Moreover, plant defence response genes are activated at 30 min and subsequently suppressed at 6 h in non-meristem cells. Only in the cortical cells but not in other cell types, we found the flavonoid synthase genes required to recruit rhizobia are highly expressed 30 min after inoculation with nod factors. A gene module enriched for symbiotic nitrogen fixation genes showed that MtFER (MtFERONIA) and LYK3 (LysM domain receptor-like kinase 3) share similar responses to symbiotic signals. We further found that MtFER can be phosphorylated by LYK3 and it participates in rhizobial symbiosis. Our results expand our understanding of dynamic spatiotemporal symbiotic responses at the single-cell level.
Collapse
Affiliation(s)
- Zhijian Liu
- Institute of Plant and Food Science, Department of Biology, School of Life Sciences, Southern University of Science and Technology (SUSTech), Shenzhen, China
| | - Jun Yang
- New Cornerstone Science Laboratory, National Key Laboratory of Plant Molecular Genetics, Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Yanping Long
- Institute of Plant and Food Science, Department of Biology, School of Life Sciences, Southern University of Science and Technology (SUSTech), Shenzhen, China
| | - Chi Zhang
- New Cornerstone Science Laboratory, National Key Laboratory of Plant Molecular Genetics, Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
- State Key Laboratory for Managing Biotic and Chemical Treats to the Quality and Safety of Agro-Products, Key Laboratory of Biotechnology in Plant Protection of MOA of China and Zhejiang Province, Institute of Virology and Biotechnology, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Dapeng Wang
- New Cornerstone Science Laboratory, National Key Laboratory of Plant Molecular Genetics, Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Xiaowei Zhang
- New Cornerstone Science Laboratory, National Key Laboratory of Plant Molecular Genetics, Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Wentao Dong
- New Cornerstone Science Laboratory, National Key Laboratory of Plant Molecular Genetics, Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Li Zhao
- School of Life Sciences, Division of Life Sciences and Medicine, MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, University of Science and Technology of China, Hefei, China
| | - Chengwu Liu
- School of Life Sciences, Division of Life Sciences and Medicine, MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, University of Science and Technology of China, Hefei, China
| | - Jixian Zhai
- Institute of Plant and Food Science, Department of Biology, School of Life Sciences, Southern University of Science and Technology (SUSTech), Shenzhen, China.
| | - Ertao Wang
- New Cornerstone Science Laboratory, National Key Laboratory of Plant Molecular Genetics, Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China.
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| |
Collapse
|
3
|
LeMay M, Libeskind-Hadas R, Wu YC. A Polynomial-Time Algorithm for Minimizing the Deep Coalescence Cost for Level-1 Species Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2642-2653. [PMID: 34406946 DOI: 10.1109/tcbb.2021.3105922] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Phylogenetic analyses commonly assume that the species history can be represented as a tree. However, in the presence of hybridization, the species history is more accurately captured as a network. Despite several advances in modeling phylogenetic networks, there is no known polynomial-time algorithm for parsimoniously reconciling gene trees with species networks while accounting for incomplete lineage sorting. To address this issue, we present a polynomial-time algorithm for the case of level-1 networks, in which no hybrid species is the direct ancestor of another hybrid species. This work enables more efficient reconciliation of gene trees with species networks, which in turn, enables more efficient reconstruction of species networks.
Collapse
|
4
|
Paszek J, Markin A, Górecki P, Eulenstein O. Taming the Duplication-Loss-Coalescence Model with Integer Linear Programming. J Comput Biol 2021; 28:758-773. [PMID: 34125600 DOI: 10.1089/cmb.2021.0011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The duplication-loss-coalescence (DLC) parsimony model is invaluable for analyzing the complex scenarios of concurrent duplication loss and deep coalescence events in the evolution of gene families. However, inferring such scenarios for already moderately sized families is prohibitive owing to the computational complexity involved. To overcome this stringent limitation, we make the first step by describing a flexible integer linear programming (ILP) formulation for inferring DLC evolutionary scenarios. Then, to make the DLC model more scalable, we introduce four sensibly constrained versions of the model and describe modified versions of our ILP formulation reflecting these constraints. Our simulation studies showcase that our constrained ILP formulations compute evolutionary scenarios that are substantially larger than scenarios computable under our original ILP formulation and the original dynamic programming algorithm by Wu et al. Furthermore, scenarios computed under our constrained DLC models are remarkably accurate compared with corresponding scenarios under the original DLC model, which we also confirm in an empirical study with thousands of gene families.
Collapse
Affiliation(s)
- Jarosław Paszek
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Alexey Markin
- Department of Computer Science, Iowa State University, Ames, Iowa, USA
| | - Paweł Górecki
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Oliver Eulenstein
- Department of Computer Science, Iowa State University, Ames, Iowa, USA
| |
Collapse
|
5
|
Li Q, Scornavacca C, Galtier N, Chan YB. The Multilocus Multispecies Coalescent: A Flexible New Model of Gene Family Evolution. Syst Biol 2020; 70:822-837. [PMID: 33169795 DOI: 10.1093/sysbio/syaa084] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 05/07/2020] [Accepted: 10/19/2020] [Indexed: 02/06/2023] Open
Abstract
Incomplete lineage sorting (ILS), the interaction between coalescence and speciation, can generate incongruence between gene trees and species trees, as can gene duplication (D), transfer (T), and loss (L). These processes are usually modeled independently, but in reality, ILS can affect gene copy number polymorphism, that is, interfere with DTL. This has been previously recognized, but not treated in a satisfactory way, mainly because DTL events are naturally modeled forward-in-time, while ILS is naturally modeled backward-in-time with the coalescent. Here, we consider the joint action of ILS and DTL on the gene tree/species tree problem in all its complexity. In particular, we show that the interaction between ILS and duplications/transfers (without losses) can result in patterns usually interpreted as resulting from gene loss, and that the realized rate of D, T, and L becomes nonhomogeneous in time when ILS is taken into account. We introduce algorithmic solutions to these problems. Our new model, the multilocus multispecies coalescent, which also accounts for any level of linkage between loci, generalizes the multispecies coalescent (MSC) model and offers a versatile, powerful framework for proper simulation, and inference of gene family evolution. [Gene duplication; gene loss; horizontal gene transfer; incomplete lineage sorting; multispecies coalescent; hemiplasy; recombination.].
Collapse
Affiliation(s)
- Qiuyi Li
- School of Mathematics and Statistics / Melbourne Integrative Genomics, The University of Melbourne, Melbourne 3010, Australia
| | - Celine Scornavacca
- Institut des Sciences de l'Evolution, Université Montpellier, CNRS, IRD, EPHE, Montpellier, 34095, France
| | - Nicolas Galtier
- Institut des Sciences de l'Evolution, Université Montpellier, CNRS, IRD, EPHE, Montpellier, 34095, France
| | - Yao-Ban Chan
- School of Mathematics and Statistics / Melbourne Integrative Genomics, The University of Melbourne, Melbourne 3010, Australia
| |
Collapse
|
6
|
Mawhorter R, Liu N, Libeskind-Hadas R, Wu YC. Inferring Pareto-optimal reconciliations across multiple event costs under the duplication-loss-coalescence model. BMC Bioinformatics 2019; 20:639. [PMID: 31842732 PMCID: PMC6916210 DOI: 10.1186/s12859-019-3206-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Reconciliation methods are widely used to explain incongruence between a gene tree and species tree. However, the common approach of inferring maximum parsimony reconciliations (MPRs) relies on user-defined costs for each type of event, which can be difficult to estimate. Prior work has explored the relationship between event costs and maximum parsimony reconciliations in the duplication-loss and duplication-transfer-loss models, but no studies have addressed this relationship in the more complicated duplication-loss-coalescence model. RESULTS We provide a fixed-parameter tractable algorithm for computing Pareto-optimal reconciliations and recording all events that arise in those reconciliations, along with their frequencies. We apply this method to a case study of 16 fungi to systematically characterize the complexity of MPR space across event costs and identify events supported across this space. CONCLUSION This work provides a new framework for studying the relationship between event costs and reconciliations that incorporates both macro-evolutionary events and population effects and is thus broadly applicable across eukaryotic species.
Collapse
Affiliation(s)
- Ross Mawhorter
- Department of Computer Science, Harvey Mudd College, Claremont, 91711, CA, USA
| | - Nuo Liu
- Department of Computer Science, Harvey Mudd College, Claremont, 91711, CA, USA
| | - Ran Libeskind-Hadas
- Department of Computer Science, Harvey Mudd College, Claremont, 91711, CA, USA
| | - Yi-Chieh Wu
- Department of Computer Science, Harvey Mudd College, Claremont, 91711, CA, USA.
| |
Collapse
|