1
|
Cribbie EP, Doerr D, Chauve C. AGO, a Framework for the Reconstruction of Ancestral Syntenies and Gene Orders. Methods Mol Biol 2024; 2802:247-265. [PMID: 38819563 DOI: 10.1007/978-1-0716-3838-5_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Reconstructing ancestral gene orders from the genome data of extant species is an important problem in comparative and evolutionary genomics. In a phylogenomics setting that accounts for gene family evolution through gene duplication and gene loss, the reconstruction of ancestral gene orders involves several steps, including multiple sequence alignment, the inference of reconciled gene trees, and the inference of ancestral syntenies and gene adjacencies. For each of the steps of such a process, several methods can be used and implemented using a growing corpus of, often parameterized, tools; in practice, interfacing such tools into an ancestral gene order reconstruction pipeline is far from trivial. This chapter introduces AGO, a Python-based framework aimed at creating ancestral gene order reconstruction pipelines allowing to interface and parameterize different bioinformatics tools. The authors illustrate the features of AGO by reconstructing ancestral gene orders for the X chromosome of three ancestral Anopheles species using three different pipelines. AGO is freely available at https://github.com/cchauve/AGO-pipeline .
Collapse
Affiliation(s)
- Evan P Cribbie
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| | - Daniel Doerr
- Department for Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, German Diabetes Center (DDZ), Leibniz Institute for Diabetes Research, and Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada.
| |
Collapse
|
2
|
|
3
|
Luhmann N, Lafond M, Thevenin A, Ouangraoua A, Wittler R, Chauve C. The SCJ Small Parsimony Problem for Weighted Gene Adjacencies. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1364-1373. [PMID: 28166504 DOI: 10.1109/tcbb.2017.2661761] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Reconstructing ancestral gene orders in a given phylogeny is a classical problem in comparative genomics. Most existing methods compare conserved features in extant genomes in the phylogeny to define potential ancestral gene adjacencies, and either try to reconstruct all ancestral genomes under a global evolutionary parsimony criterion, or, focusing on a single ancestral genome, use a scaffolding approach to select a subset of ancestral gene adjacencies, generally aiming at reducing the fragmentation of the reconstructed ancestral genome. In this paper, we describe an exact algorithm for the Small Parsimony Problem that combines both approaches. We consider that gene adjacencies at internal nodes of the species phylogeny are weighted, and we introduce an objective function defined as a convex combination of these weights and the evolutionary cost under the Single-Cut-or-Join (SCJ) model. The weights of ancestral gene adjacencies can, e.g., be obtained through the recent availability of ancient DNA sequencing data, which provide a direct hint at the genome structure of the considered ancestor, or through probabilistic analysis of gene adjacencies evolution. We show the NP-hardness of our problem variant and propose a Fixed-Parameter Tractable algorithm based on the Sankoff-Rousseau dynamic programming algorithm that also allows to sample co-optimal solutions. We apply our approach to mammalian and bacterial data providing different degrees of complexity. We show that including adjacency weights in the objective has a significant impact in reducing the fragmentation of the reconstructed ancestral gene orders. An implementation is available at http://github.com/nluhmann/PhySca.
Collapse
|
4
|
Anselmetti Y, Duchemin W, Tannier E, Chauve C, Bérard S. Phylogenetic signal from rearrangements in 18 Anopheles species by joint scaffolding extant and ancestral genomes. BMC Genomics 2018; 19:96. [PMID: 29764366 PMCID: PMC5954271 DOI: 10.1186/s12864-018-4466-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background Genomes rearrangements carry valuable information for phylogenetic inference or the elucidation of molecular mechanisms of adaptation. However, the detection of genome rearrangements is often hampered by current deficiencies in data and methods: Genomes obtained from short sequence reads have generally very fragmented assemblies, and comparing multiple gene orders generally leads to computationally intractable algorithmic questions. Results We present a computational method, ADseq, which, by combining ancestral gene order reconstruction, comparative scaffolding and de novo scaffolding methods, overcomes these two caveats. ADseq provides simultaneously improved assemblies and ancestral genomes, with statistical supports on all local features. Compared to previous comparative methods, it runs in polynomial time, it samples solutions in a probabilistic space, and it can handle a significantly larger gene complement from the considered extant genomes, with complex histories including gene duplications and losses. We use ADseq to provide improved assemblies and a genome history made of duplications, losses, gene translocations, rearrangements, of 18 complete Anopheles genomes, including several important malaria vectors. We also provide additional support for a differentiated mode of evolution of the sex chromosome and of the autosomes in these mosquito genomes. Conclusions We demonstrate the method’s ability to improve extant assemblies accurately through a procedure simulating realistic assembly fragmentation. We study a debated issue regarding the phylogeny of the Gambiae complex group of Anopheles genomes in the light of the evolution of chromosomal rearrangements, suggesting that the phylogenetic signal they carry can differ from the phylogenetic signal carried by gene sequences, more prone to introgression. Electronic supplementary material The online version of this article (10.1186/s12864-018-4466-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yoann Anselmetti
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France.,Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex, 69622, France
| | - Wandrille Duchemin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex, 69622, France.,INRIA Grenoble - Rhône-Alpes, 655 Avenue de l'Europe, Montbonnot-Saint-Martin, 38330, France
| | - Eric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex, 69622, France.,INRIA Grenoble - Rhône-Alpes, 655 Avenue de l'Europe, Montbonnot-Saint-Martin, 38330, France
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, V5A1S6, BC, Canada
| | - Sèverine Bérard
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France.
| |
Collapse
|
5
|
Duchemin W, Anselmetti Y, Patterson M, Ponty Y, Bérard S, Chauve C, Scornavacca C, Daubin V, Tannier E. DeCoSTAR: Reconstructing the Ancestral Organization of Genes or Genomes Using Reconciled Phylogenies. Genome Biol Evol 2018; 9:1312-1319. [PMID: 28402423 PMCID: PMC5441342 DOI: 10.1093/gbe/evx069] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/07/2017] [Indexed: 12/15/2022] Open
Abstract
DeCoSTAR is a software that aims at reconstructing the organization of ancestral genes or genomes in the form of sets of neighborhood relations (adjacencies) between pairs of ancestral genes or gene domains. It can also improve the assembly of fragmented genomes by proposing evolutionary-induced adjacencies between scaffolding fragments. Ancestral genes or domains are deduced from reconciled phylogenetic trees under an evolutionary model that considers gains, losses, speciations, duplications, and transfers as possible events for gene evolution. Reconciliations are either given as input or computed with the ecceTERA package, into which DeCoSTAR is integrated. DeCoSTAR computes adjacency evolutionary scenarios using a scoring scheme based on a weighted sum of adjacency gains and breakages. Solutions, both optimal and near-optimal, are sampled according to the Boltzmann–Gibbs distribution centered around parsimonious solutions, and statistical supports on ancestral and extant adjacencies are provided. DeCoSTAR supports the features of previously contributed tools that reconstruct ancestral adjacencies, namely DeCo, DeCoLT, ART-DeCo, and DeClone. In a few minutes, DeCoSTAR can reconstruct the evolutionary history of domains inside genes, of gene fusion and fission events, or of gene order along chromosomes, for large data sets including dozens of whole genomes from all kingdoms of life. We illustrate the potential of DeCoSTAR with several applications: ancestral reconstruction of gene orders for Anopheles mosquito genomes, multidomain proteins in Drosophila, and gene fusion and fission detection in Actinobacteria. Availability:http://pbil.univ-lyon1.fr/software/DeCoSTAR (Last accessed April 24, 2017).
Collapse
Affiliation(s)
- Wandrille Duchemin
- Inria Grenoble Rhône-Alpes, Montbonnot, France.,Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France
| | - Yoann Anselmetti
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France.,Institut des Sciences de l'Évolution, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Murray Patterson
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France.,Experimental Algorithmics Lab (AlgoLab), Dipartimento di Informatica, Sistemistica e Comunicazione (DISCo), Università degli Studi di Milano-Bicocca, Viale Sarca, Milano, Italy
| | - Yann Ponty
- CNRS, Ecole Polytechnique, LIX UMR7161, Palaiseau, France.,Inria Saclay, EP AMIB, Palaiseau, France
| | - Sèverine Bérard
- Institut des Sciences de l'Évolution, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France.,LIRMM, Université de Montpellier, CNRS, Montpellier, France
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Celine Scornavacca
- Institut des Sciences de l'Évolution, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Vincent Daubin
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France
| | - Eric Tannier
- Inria Grenoble Rhône-Alpes, Montbonnot, France.,Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France
| |
Collapse
|
6
|
Anselmetti Y, Luhmann N, Bérard S, Tannier E, Chauve C. Comparative Methods for Reconstructing Ancient Genome Organization. Methods Mol Biol 2018; 1704:343-362. [PMID: 29277873 DOI: 10.1007/978-1-4939-7463-4_13] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Comparative genomics considers the detection of similarities and differences between extant genomes, and, based on more or less formalized hypotheses regarding the involved evolutionary processes, inferring ancestral states explaining the similarities and an evolutionary history explaining the differences. In this chapter, we focus on the reconstruction of the organization of ancient genomes into chromosomes. We review different methodological approaches and software, applied to a wide range of datasets from different kingdoms of life and at different evolutionary depths. We discuss relations with genome assembly, and potential approaches to validate computational predictions on ancient genomes that are almost always only accessible through these predictions.
Collapse
Affiliation(s)
- Yoann Anselmetti
- Institut des Sciences de l'Évolution, Université Montpellier 2, Montpellier, France
| | - Nina Luhmann
- Faculty of Technology, Bielefeld University, Bielefeld, Germany.,Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany.,International Research Training Group1906, Bielefeld University, Bielefeld, Germany
| | - Sèverine Bérard
- Institut des Sciences de l'Évolution, Université Montpellier 2, Montpellier, France
| | - Eric Tannier
- UMR CNRS 5558 - LBBE "Biométrie et Biologie Évolutive", Inria Grenoble Rhône-Alpes and University of Lyon, Lyon, France
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC, Canada, V5A 1S6.
| |
Collapse
|
7
|
Feng B, Zhou L, Tang J. Ancestral Genome Reconstruction on Whole Genome Level. Curr Genomics 2017; 18:306-315. [PMID: 29081686 PMCID: PMC5635614 DOI: 10.2174/1389202918666170307120943] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Revised: 10/08/2016] [Accepted: 11/03/2016] [Indexed: 11/22/2022] Open
Abstract
Comparative genomics, evolutionary biology, and cancer researches require tools to elucidate the evolutionary trajectories and reconstruct the ancestral genomes. Various methods have been developed to infer the genome content and gene ordering of ancestral genomes by using such genomic structural variants. There are mainly two kinds of computational approaches in the ancestral genome reconstruction study. Distance/event-based approaches employ genome evolutionary models and reconstruct the ancestral genomes that minimize the total distance or events over the edges of the given phylogeny. The homology/adjacency-based approaches search for the conserved gene adjacencies and genome structures, and assemble these regions into ancestral genomes along the internal node of the given phylogeny. We review the principles and algorithms of these approaches that can reconstruct the ancestral genomes on the whole genome level. We talk about their advantages and limitations of these approaches in dealing with various genome datasets, evolutionary events, and reconstruction problems. We also talk about the improvements and developments of these approaches in the subsequent researches. We select four most famous and powerful approaches from both distance/event-based and homology/adjacency-based categories to analyze and compare their performances in dealing with different kinds of datasets and evolutionary events. Based on our experiment, GASTS has the best performance in solving the problems with equal genome contents that only have genome rearrangement events. PMAG++ achieves the best performance in solving the problems with unequal genome contents that have all possible complicated evolutionary events.
Collapse
Affiliation(s)
- Bing Feng
- School of Computer Science and Technology, Tianjin University, Tianjin300350, China
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC29208, USA
| | - Lingxi Zhou
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC29208, USA
| | - Jijun Tang
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC29208, USA
| |
Collapse
|
8
|
Abstract
Background During evolution, genomes are modified by large scale structural events, such as rearrangements, deletions or insertions of large blocks of DNA. Of particular interest, in order to better understand how this type of genomic evolution happens, is the reconstruction of ancestral genomes, given a phylogenetic tree with extant genomes at its leaves. One way of solving this problem is to assume a rearrangement model, such as Double Cut and Join (DCJ), and find a set of ancestral genomes that minimizes the number of events on the input tree. Since this problem is NP-hard for most rearrangement models, exact solutions are practical only for small instances, and heuristics have to be used for larger datasets. This type of approach can be called event-based. Another common approach is based on finding conserved structures between the input genomes, such as adjacencies between genes, possibly also assigning weights that indicate a measure of confidence or probability that this particular structure is present on each ancestral genome, and then finding a set of non conflicting adjacencies that optimize some given function, usually trying to maximize total weight and minimizing character changes in the tree. We call this type of methods homology-based. Results In previous work, we proposed an ancestral reconstruction method that combines homology- and event-based ideas, using the concept of intermediate genomes, that arise in DCJ rearrangement scenarios. This method showed better rate of correctly reconstructed adjacencies than other methods, while also being faster, since the use of intermediate genomes greatly reduces the search space. Here, we generalize the intermediate genome concept to genomes with unequal gene content, extending our method to account for gene insertions and deletions of any length. In many of the simulated datasets, our proposed method had better results than MLGO and MGRA, two state-of-the-art algorithms for ancestral reconstruction with unequal gene content, while running much faster, making it more scalable to larger datasets. Conclusion Studing ancestral reconstruction problems under a new light, using the concept of intermediate genomes, allows the design of very fast algorithms by greatly reducing the solution search space, while also giving very good results. The algorithms introduced in this paper were implemented in an open-source software called RINGO (ancestral Reconstruction with INtermediate GenOmes), available at https://github.com/pedrofeijao/RINGO.
Collapse
Affiliation(s)
- Pedro Feijão
- Technische Fakultät and CeBiTec, Universität Bielefeld, Universitätsstr. 25, Bielefeld, 33615, Germany.
| | - Eloi Araujo
- Technische Fakultät and CeBiTec, Universität Bielefeld, Universitätsstr. 25, Bielefeld, 33615, Germany.,Faculdade de Computação, Universidade Federal de Mato Grosso do Sul - UFMS, Campo Grande, MS, Brazil
| |
Collapse
|
9
|
Anselmetti Y, Berry V, Chauve C, Chateau A, Tannier E, Bérard S. Ancestral gene synteny reconstruction improves extant species scaffolding. BMC Genomics 2015; 16 Suppl 10:S11. [PMID: 26450761 PMCID: PMC4603332 DOI: 10.1186/1471-2164-16-s10-s11] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
We exploit the methodological similarity between ancestral genome reconstruction and extant genome scaffolding. We present a method, called ARt-DeCo that constructs neighborhood relationships between genes or contigs, in both ancestral and extant genomes, in a phylogenetic context. It is able to handle dozens of complete genomes, including genes with complex histories, by using gene phylogenies reconciled with a species tree, that is, annotated with speciation, duplication and loss events. Reconstructed ancestral or extant synteny comes with a support computed from an exhaustive exploration of the solution space. We compare our method with a previously published one that follows the same goal on a small number of genomes with universal unicopy genes. Then we test it on the whole Ensembl database, by proposing partial ancestral genome structures, as well as a more complete scaffolding for many partially assembled genomes on 69 eukaryote species. We carefully analyze a couple of extant adjacencies proposed by our method, and show that they are indeed real links in the extant genomes, that were missing in the current assembly. On a reduced data set of 39 eutherian mammals, we estimate the precision and sensitivity of ARt-DeCo by simulating a fragmentation in some well assembled genomes, and measure how many adjacencies are recovered. We find a very high precision, while the sensitivity depends on the quality of the data and on the proximity of closely related genomes.
Collapse
Affiliation(s)
- Yoann Anselmetti
- Institut des Sciences de l'Évolution de Montpellier (ISE-M), Place Eugène Bataillon, Montpellier, 34095, France
- Laboratoire de Biométrie et Biologie Évolutive, LBBE, UMR CNRS 5558, University of Lyon 1, 43 boulevard du 11 novembre 1918, 69622 Villeurbanne, France
| | - Vincent Berry
- Institut de Biologie Computationnelle (IBC), Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université Montpellier - CNRS, 161 rue Ada, Montpellier, 34090, France
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, V5A 1S6, Canada
| | - Annie Chateau
- Institut de Biologie Computationnelle (IBC), Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université Montpellier - CNRS, 161 rue Ada, Montpellier, 34090, France
| | - Eric Tannier
- Laboratoire de Biométrie et Biologie Évolutive, LBBE, UMR CNRS 5558, University of Lyon 1, 43 boulevard du 11 novembre 1918, 69622 Villeurbanne, France
- Institut National de Recherche en Informatique et en Automatique (INRIA) Grenoble Rhône-Alpes, 655 avenue de l'Europe, 38330 Montbonnot, France
| | - Sèverine Bérard
- Institut des Sciences de l'Évolution de Montpellier (ISE-M), Place Eugène Bataillon, Montpellier, 34095, France
- Institut de Biologie Computationnelle (IBC), Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université Montpellier - CNRS, 161 rue Ada, Montpellier, 34090, France
| |
Collapse
|