1
|
Tricou T, Tannier E, de Vienne DM. Response to "On the impact of incomplete taxon sampling on the relative timing of gene transfer events". PLoS Biol 2024; 22:e3002557. [PMID: 38502645 PMCID: PMC10950208 DOI: 10.1371/journal.pbio.3002557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 02/21/2024] [Indexed: 03/21/2024] Open
Affiliation(s)
- Théo Tricou
- Université Clermont Auvergne, INRAE, VetAgro Sup, UMR EPIA, F-63122 Saint-Genès-Champanelle, France
| | - Eric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
- INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France
| | - Damien M. de Vienne
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
| |
Collapse
|
2
|
Comte A, Tricou T, Tannier E, Joseph J, Siberchicot A, Penel S, Allio R, Delsuc F, Dray S, de Vienne DM. PhylteR: Efficient Identification of Outlier Sequences in Phylogenomic Datasets. Mol Biol Evol 2023; 40:msad234. [PMID: 37879113 PMCID: PMC10655845 DOI: 10.1093/molbev/msad234] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 09/29/2023] [Accepted: 10/18/2023] [Indexed: 10/27/2023] Open
Abstract
In phylogenomics, incongruences between gene trees, resulting from both artifactual and biological reasons, can decrease the signal-to-noise ratio and complicate species tree inference. The amount of data handled today in classical phylogenomic analyses precludes manual error detection and removal. However, a simple and efficient way to automate the identification of outliers from a collection of gene trees is still missing. Here, we present PhylteR, a method that allows rapid and accurate detection of outlier sequences in phylogenomic datasets, i.e. species from individual gene trees that do not follow the general trend. PhylteR relies on DISTATIS, an extension of multidimensional scaling to 3 dimensions to compare multiple distance matrices at once. In PhylteR, these distance matrices extracted from individual gene phylogenies represent evolutionary distances between species according to each gene. On simulated datasets, we show that PhylteR identifies outliers with more sensitivity and precision than a comparable existing method. We also show that PhylteR is not sensitive to ILS-induced incongruences, which is a desirable feature. On a biological dataset of 14,463 genes for 53 species previously assembled for Carnivora phylogenomics, we show (i) that PhylteR identifies as outliers sequences that can be considered as such by other means, and (ii) that the removal of these sequences improves the concordance between the gene trees and the species tree. Thanks to the generation of numerous graphical outputs, PhylteR also allows for the rapid and easy visual characterization of the dataset at hand, thus aiding in the precise identification of errors. PhylteR is distributed as an R package on CRAN and as containerized versions (docker and singularity).
Collapse
Affiliation(s)
- Aurore Comte
- French Institute of Bioinformatics (IFB)—South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, Montpellier, France
- IRD, CIRAD, INRAE, Institut Agro, PHIM Plant Health Institute, Montpellier University, Montpellier, France
| | - Théo Tricou
- Université de Lyon, Université Lyon 1, UMR CNRS 5558 Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
| | - Eric Tannier
- Université de Lyon, Université Lyon 1, UMR CNRS 5558 Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
- Centre de Recherches Inria de Lyon, Villeurbanne, France
| | - Julien Joseph
- Université de Lyon, Université Lyon 1, UMR CNRS 5558 Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
| | - Aurélie Siberchicot
- Université de Lyon, Université Lyon 1, UMR CNRS 5558 Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
| | - Simon Penel
- Université de Lyon, Université Lyon 1, UMR CNRS 5558 Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
| | - Rémi Allio
- CBGP, INRAE, CIRAD, IRD, Montpellier SupAgro, Univ. Montpellier, Montpellier, France
| | | | - Stéphane Dray
- Université de Lyon, Université Lyon 1, UMR CNRS 5558 Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
| | - Damien M de Vienne
- Université de Lyon, Université Lyon 1, UMR CNRS 5558 Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
| |
Collapse
|
3
|
Menet H, Daubin V, Tannier E. Phylogenetic reconciliation. PLoS Comput Biol 2022; 18:e1010621. [PMID: 36327227 PMCID: PMC9632901 DOI: 10.1371/journal.pcbi.1010621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Affiliation(s)
- Hugo Menet
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
| | - Vincent Daubin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
- * E-mail: (VD); (ET)
| | - Eric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
- Inria, centre de recherche de Lyon, Villeurbanne, France
- * E-mail: (VD); (ET)
| |
Collapse
|
4
|
Abstract
Introgression, endosymbiosis, and gene transfer, i.e., horizontal gene flow (HGF), are primordial sources of innovation in all domains of life. Our knowledge on HGF relies on detection methods that exploit some of its signatures left on extant genomes. One of them is the effect of HGF on branch lengths of constructed phylogenies. This signature has been formalized in statistical tests for HGF detection and used for example to detect massive adaptive gene flows in malaria vectors or to order evolutionary events involved in eukaryogenesis. However, these studies rely on the assumption that ghost lineages (all unsampled extant and extinct taxa) have little influence. We demonstrate here with simulations and data reanalysis that when considering the more realistic condition that unsampled taxa are legion compared to sampled ones, the conclusion of these studies become unfounded or even reversed. This illustrates the necessity to recognize the existence of ghosts in evolutionary studies.
Collapse
Affiliation(s)
- Théo Tricou
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
| | - Eric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
- INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France
| | - Damien M. de Vienne
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
| |
Collapse
|
5
|
Tannier E. A hapless mathematical contribution to biology : Chromosome inversions in Drosophila, 1937-1941. Hist Philos Life Sci 2022; 44:34. [PMID: 35918616 DOI: 10.1007/s40656-022-00514-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 06/12/2022] [Indexed: 06/15/2023]
Abstract
This is the story, told in the light of a new analysis of historical data, of a mathematical biology problem that was explored in the 1930s in Thomas Morgan's laboratory at the California Institute of Technology. It is one of the early developments of evolutionary genetics and quantitative phylogeny, and deals with the identification and counting of chromosomal inversions in Drosophila species from comparisons of genetic maps. A re-analysis of the data produced in the 1930s using current mathematics and computational technologies reveals how a team of biologists, with the help of a renowned mathematician and against their first intuition, came to an erroneous conclusion regarding the presence of phylogenetic signals in gene arrangements. This example illustrates two different aspects of a same piece: (1) the appearance of a mathematical in biology problem solved with the development of a combinatorial algorithm, which was unusual at the time, and (2) the role of errors in scientific activity. Also underlying is the possible influence of computational complexity in understanding the directions of research in biology.
Collapse
Affiliation(s)
- Eric Tannier
- Laboratoire de Biométrie et Biologie Évolutive UMR5558, Université de Lyon, 69622, Villeurbanne, France.
- Centre de Recherches de Lyon, Inria, Institut National de Recherches en Informatique et Automatique, Villeurbanne, France.
| |
Collapse
|
6
|
Penel S, Menet H, Tricou T, Daubin V, Tannier E. Thirdkind: displaying phylogenetic encounters beyond 2-level reconciliation. Bioinformatics 2022; 38:2350-2352. [PMID: 35139153 DOI: 10.1093/bioinformatics/btac062] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 01/26/2022] [Accepted: 02/03/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Reconciliation between a host and its symbiont phylogenies or between a species and a gene phylogenies is a prevalent approach in evolution, however no simple generic tool (i.e. virtually usable by all reconciliation software, from host/symbiont to species/gene comparisons) is available to visualize reconciliation results. Moreover there is no tool to visualize 3-levels reconciliations, i.e. to visualize 2 nested reconciliations as for example in a host/symbiont/gene complex. RESULTS Thirdkind is a light and easy to install command line software producing svg files displaying reconciliations, including 3-levels reconciliations. It takes a standard format recPhyloXML as input, and is thus usable with most reconciliation software. AVAILABILITY AND IMPLEMENTATION https://github.com/simonpenel/thirdkind/wiki. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Simon Penel
- Laboratoire de Biométrie et Biologie Evolutive/UMR5558, CNRS/UCBL, Villeurbanne 69622, France
| | - Hugo Menet
- Laboratoire de Biométrie et Biologie Evolutive/UMR5558, CNRS/UCBL, Villeurbanne 69622, France
| | - Théo Tricou
- Laboratoire de Biométrie et Biologie Evolutive/UMR5558, CNRS/UCBL, Villeurbanne 69622, France
| | - Vincent Daubin
- Laboratoire de Biométrie et Biologie Evolutive/UMR5558, CNRS/UCBL, Villeurbanne 69622, France
| | - Eric Tannier
- Laboratoire de Biométrie et Biologie Evolutive/UMR5558, CNRS/UCBL, Villeurbanne 69622, France.,Centre de Recherche Inria Lyon, Villeurbanne 69622, France
| |
Collapse
|
7
|
Abstract
Most species are extinct, those that are not are often unknown. Sequenced and sampled species are often a minority of known ones. Past evolutionary events involving horizontal gene flow, such as horizontal gene transfer, hybridization, introgression, and admixture, are therefore likely to involve “ghosts,” that is extinct, unknown, or unsampled lineages. The existence of these ghost lineages is widely acknowledged, but their possible impact on the detection of gene flow and on the identification of the species involved is largely overlooked. It is generally considered as a possible source of error that, with reasonable approximation, can be ignored. We explore the possible influence of absent species on an evolutionary study by quantifying the effect of ghost lineages on introgression as detected by the popular D-statistic method. We show from simulated data that under certain frequently encountered conditions, the donors and recipients of horizontal gene flow can be wrongly identified if ghost lineages are not taken into account. In particular, having a distant outgroup, which is usually recommended, leads to an increase in the error probability and to false interpretations in most cases. We conclude that introgression from ghost lineages should be systematically considered as an alternative possible, even probable, scenario. [ABBA–BABA; D-statistic; gene flow; ghost lineage; introgression; simulation.]
Collapse
Affiliation(s)
- Théo Tricou
- Correspondence to be sent to: CNRS Université Claude Bernard Lyon 1, Laboratoire de Biométrie et Biologie Évolutive (LBBE), Bâtiment Mendel, 43 boulevard du 11 Novembre 1918, Villeurbanne, 69622 Cedex, France; E-mail:
| | - Eric Tannier
- Laboratoire de Biométrie et Biologie Évolutive UMR5558, Univ Lyon, Université Lyon 1, CNRS, F-69622 Villeurbanne, France
- Inria, Centre de Recherche de Lyon, F-69603 Villeurbanne, France
| | - Damien M de Vienne
- Laboratoire de Biométrie et Biologie Évolutive UMR5558, Univ Lyon, Université Lyon 1, CNRS, F-69622 Villeurbanne, France
| |
Collapse
|
8
|
Laverre A, Tannier E, Necsulea A. Long-range promoter-enhancer contacts are conserved during evolution and contribute to gene expression robustness. Genome Res 2021; 32:280-296. [PMID: 34930799 PMCID: PMC8805723 DOI: 10.1101/gr.275901.121] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 12/16/2021] [Indexed: 11/25/2022]
Abstract
Gene expression is regulated through complex molecular interactions, involving cis-acting elements that can be situated far away from their target genes. Data on long-range contacts between promoters and regulatory elements are rapidly accumulating. However, it remains unclear how these regulatory relationships evolve and how they contribute to the establishment of robust gene expression profiles. Here, we address these questions by comparing genome-wide maps of promoter-centered chromatin contacts in mouse and human. We show that there is significant evolutionary conservation of cis-regulatory landscapes, indicating that selective pressures act to preserve not only regulatory element sequences but also their chromatin contacts with target genes. The extent of evolutionary conservation is remarkable for long-range promoter–enhancer contacts, illustrating how the structure of regulatory landscapes constrains large-scale genome evolution. We show that the evolution of cis-regulatory landscapes, measured in terms of distal element sequences, synteny, or contacts with target genes, is significantly associated with gene expression evolution.
Collapse
Affiliation(s)
- Alexandre Laverre
- Université de Lyon, Université Claude Bernard Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive
| | - Eric Tannier
- Université de Lyon, Université Claude Bernard Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive, Centre de recherche Inria de Lyon
| | | |
Collapse
|
9
|
Comte N, Morel B, Hasić D, Guéguen L, Boussau B, Daubin V, Penel S, Scornavacca C, Gouy M, Stamatakis A, Tannier E, Parsons DP. Treerecs: an integrated phylogenetic tool, from sequences to reconciliations. Bioinformatics 2021; 36:4822-4824. [PMID: 33085745 DOI: 10.1093/bioinformatics/btaa615] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 06/22/2020] [Accepted: 07/09/2020] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION Gene and species tree reconciliation methods are used to interpret gene trees, root them and correct uncertainties that are due to scarcity of signal in multiple sequence alignments. So far, reconciliation tools have not been integrated in standard phylogenetic software and they either lack performance on certain functions, or usability for biologists. RESULTS We present Treerecs, a phylogenetic software based on duplication-loss reconciliation. Treerecs is simple to install and to use. It is fast and versatile, has a graphic output, and can be used along with methods for phylogenetic inference on multiple alignments like PLL and Seaview. AVAILABILITY AND IMPLEMENTATION Treerecs is open-source. Its source code (C++, AGPLv3) and manuals are available from https://project.inria.fr/treerecs/.
Collapse
Affiliation(s)
- Nicolas Comte
- Inria Grenoble Rhône-Alpes, 38334 Montbonnot, France
| | - Benoit Morel
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Damir Hasić
- Department of Mathematics, University of Sarajevo, Sarajevo, Bosnia and Herzegovina
| | - Laurent Guéguen
- Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | - Bastien Boussau
- Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | - Vincent Daubin
- Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | - Simon Penel
- Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | - Celine Scornavacca
- ISEM, CNRS, Université de Montpellier, IRD, EPHE, Montpellier 34000, France
| | - Manolo Gouy
- Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.,Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Eric Tannier
- Inria Grenoble Rhône-Alpes, 38334 Montbonnot, France.,Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | | |
Collapse
|
10
|
Gouy M, Tannier E, Comte N, Parsons DP. Seaview Version 5: A Multiplatform Software for Multiple Sequence Alignment, Molecular Phylogenetic Analyses, and Tree Reconciliation. Methods Mol Biol 2021; 2231:241-260. [PMID: 33289897 DOI: 10.1007/978-1-0716-1036-7_15] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
We present Seaview version 5, a multiplatform program to perform multiple alignment and phylogenetic tree building from molecular sequence data. Seaview provides network access to sequence databases, alignment with arbitrary algorithm, parsimony, distance and maximum likelihood tree building with PhyML, and display, printing, and copy-to-clipboard or to SVG files of rooted or unrooted, binary or multifurcating phylogenetic trees. While Seaview is primarily a program providing a graphical user interface to guide the user into performing desired analyses, Seaview possesses also a command-line mode adequate for user-provided scripts. Seaview version 5 introduces the ability to reconcile a gene tree with a reference species tree and use this reconciliation to root and rearrange the gene tree. Seaview is freely available at http://doua.prabi.fr/software/seaview .
Collapse
Affiliation(s)
- Manolo Gouy
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, Villeurbanne, France.
| | - Eric Tannier
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, Villeurbanne, France
- INRIA Grenoble-Rhône-Alpes, Montbonnot, France
| | | | | |
Collapse
|
11
|
Davín AA, Tricou T, Tannier E, de Vienne DM, Szöllősi GJ. Zombi: a phylogenetic simulator of trees, genomes and sequences that accounts for dead linages. Bioinformatics 2020; 36:1286-1288. [PMID: 31566657 PMCID: PMC7031779 DOI: 10.1093/bioinformatics/btz710] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 09/09/2019] [Accepted: 09/26/2019] [Indexed: 11/14/2022] Open
Abstract
Summary Here we present Zombi, a tool to simulate the evolution of species, genomes and sequences in silico, that considers for the first time the evolution of genomes in extinct lineages. It also incorporates various features that have not to date been combined in a single simulator, such as the possibility of generating species trees with a pre-defined variation of speciation and extinction rates through time, simulating explicitly intergenic sequences of variable length and outputting gene tree—species tree reconciliations. Availability and implementation Source code and manual are freely available in https://github.com/AADavin/ZOMBI/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Adrián A Davín
- MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd, Budapest, Hungary
| | - Théo Tricou
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne F-69622, France
| | - Eric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne F-69622, France.,INRIA Grenoble Rhône-Alpes, Montbonnot-Saint-Martin F-38334, France
| | - Damien M de Vienne
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne F-69622, France
| | - Gergely J Szöllősi
- MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd, Budapest, Hungary.,Evolutionary Systems Research Group, Centre for Ecological Research, Hungarian Academy of Sciences, Tihany H-8237, Hungary
| |
Collapse
|
12
|
Waterhouse RM, Aganezov S, Anselmetti Y, Lee J, Ruzzante L, Reijnders MJMF, Feron R, Bérard S, George P, Hahn MW, Howell PI, Kamali M, Koren S, Lawson D, Maslen G, Peery A, Phillippy AM, Sharakhova MV, Tannier E, Unger MF, Zhang SV, Alekseyev MA, Besansky NJ, Chauve C, Emrich SJ, Sharakhov IV. Evolutionary superscaffolding and chromosome anchoring to improve Anopheles genome assemblies. BMC Biol 2020; 18:1. [PMID: 31898513 PMCID: PMC6939337 DOI: 10.1186/s12915-019-0728-3] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 11/26/2019] [Indexed: 11/18/2022] Open
Abstract
Background New sequencing technologies have lowered financial barriers to whole genome sequencing, but resulting assemblies are often fragmented and far from ‘finished’. Updating multi-scaffold drafts to chromosome-level status can be achieved through experimental mapping or re-sequencing efforts. Avoiding the costs associated with such approaches, comparative genomic analysis of gene order conservation (synteny) to predict scaffold neighbours (adjacencies) offers a potentially useful complementary method for improving draft assemblies. Results We evaluated and employed 3 gene synteny-based methods applied to 21 Anopheles mosquito assemblies to produce consensus sets of scaffold adjacencies. For subsets of the assemblies, we integrated these with additional supporting data to confirm and complement the synteny-based adjacencies: 6 with physical mapping data that anchor scaffolds to chromosome locations, 13 with paired-end RNA sequencing (RNAseq) data, and 3 with new assemblies based on re-scaffolding or long-read data. Our combined analyses produced 20 new superscaffolded assemblies with improved contiguities: 7 for which assignments of non-anchored scaffolds to chromosome arms span more than 75% of the assemblies, and a further 7 with chromosome anchoring including an 88% anchored Anopheles arabiensis assembly and, respectively, 73% and 84% anchored assemblies with comprehensively updated cytogenetic photomaps for Anopheles funestus and Anopheles stephensi. Conclusions Experimental data from probe mapping, RNAseq, or long-read technologies, where available, all contribute to successful upgrading of draft assemblies. Our evaluations show that gene synteny-based computational methods represent a valuable alternative or complementary approach. Our improved Anopheles reference assemblies highlight the utility of applying comparative genomics approaches to improve community genomic resources.
Collapse
Affiliation(s)
- Robert M Waterhouse
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
| | - Sergey Aganezov
- Department of Computer Science, Princeton University, Princeton, NJ, 08450, USA.,Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
| | | | - Jiyoung Lee
- The Interdisciplinary PhD Program in Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| | - Livio Ruzzante
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Maarten J M F Reijnders
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Romain Feron
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Sèverine Bérard
- ISEM, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Phillip George
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| | - Matthew W Hahn
- Departments of Biology and Computer Science, Indiana University, Bloomington, IN, 47405, USA
| | - Paul I Howell
- Centers for Disease Control and Prevention, Atlanta, GA, 30329, USA
| | - Maryam Kamali
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA.,Department of Medical Entomology and Parasitology, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Daniel Lawson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Gareth Maslen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Ashley Peery
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Maria V Sharakhova
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA.,Laboratory of Ecology, Genetics and Environmental Protection, Tomsk State University, Tomsk, Russia, 634050
| | - Eric Tannier
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, Unité Mixte de Recherche 5558 Centre National de la Recherche Scientifique, 69622, Villeurbanne, France.,Institut national de recherche en informatique et en automatique, Montbonnot, 38334, Grenoble, Rhône-Alpes, France
| | - Maria F Unger
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, Galvin Life Sciences Building, Notre Dame, IN, 46556, USA
| | - Simo V Zhang
- Departments of Biology and Computer Science, Indiana University, Bloomington, IN, 47405, USA
| | - Max A Alekseyev
- Department of Mathematics and Computational Biology Institute, George Washington University, Ashburn, VA, 20147, USA
| | - Nora J Besansky
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, Galvin Life Sciences Building, Notre Dame, IN, 46556, USA
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada
| | - Scott J Emrich
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, 37996, USA
| | - Igor V Sharakhov
- The Interdisciplinary PhD Program in Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA. .,Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA. .,Laboratory of Ecology, Genetics and Environmental Protection, Tomsk State University, Tomsk, Russia, 634050.
| |
Collapse
|
13
|
Anselmetti Y, Duchemin W, Tannier E, Chauve C, Bérard S. Phylogenetic signal from rearrangements in 18 Anopheles species by joint scaffolding extant and ancestral genomes. BMC Genomics 2018; 19:96. [PMID: 29764366 PMCID: PMC5954271 DOI: 10.1186/s12864-018-4466-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background Genomes rearrangements carry valuable information for phylogenetic inference or the elucidation of molecular mechanisms of adaptation. However, the detection of genome rearrangements is often hampered by current deficiencies in data and methods: Genomes obtained from short sequence reads have generally very fragmented assemblies, and comparing multiple gene orders generally leads to computationally intractable algorithmic questions. Results We present a computational method, ADseq, which, by combining ancestral gene order reconstruction, comparative scaffolding and de novo scaffolding methods, overcomes these two caveats. ADseq provides simultaneously improved assemblies and ancestral genomes, with statistical supports on all local features. Compared to previous comparative methods, it runs in polynomial time, it samples solutions in a probabilistic space, and it can handle a significantly larger gene complement from the considered extant genomes, with complex histories including gene duplications and losses. We use ADseq to provide improved assemblies and a genome history made of duplications, losses, gene translocations, rearrangements, of 18 complete Anopheles genomes, including several important malaria vectors. We also provide additional support for a differentiated mode of evolution of the sex chromosome and of the autosomes in these mosquito genomes. Conclusions We demonstrate the method’s ability to improve extant assemblies accurately through a procedure simulating realistic assembly fragmentation. We study a debated issue regarding the phylogeny of the Gambiae complex group of Anopheles genomes in the light of the evolution of chromosomal rearrangements, suggesting that the phylogenetic signal they carry can differ from the phylogenetic signal carried by gene sequences, more prone to introgression. Electronic supplementary material The online version of this article (10.1186/s12864-018-4466-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yoann Anselmetti
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France.,Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex, 69622, France
| | - Wandrille Duchemin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex, 69622, France.,INRIA Grenoble - Rhône-Alpes, 655 Avenue de l'Europe, Montbonnot-Saint-Martin, 38330, France
| | - Eric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex, 69622, France.,INRIA Grenoble - Rhône-Alpes, 655 Avenue de l'Europe, Montbonnot-Saint-Martin, 38330, France
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, V5A1S6, BC, Canada
| | - Sèverine Bérard
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France.
| |
Collapse
|
14
|
Duchemin W, Anselmetti Y, Patterson M, Ponty Y, Bérard S, Chauve C, Scornavacca C, Daubin V, Tannier E. DeCoSTAR: Reconstructing the Ancestral Organization of Genes or Genomes Using Reconciled Phylogenies. Genome Biol Evol 2018; 9:1312-1319. [PMID: 28402423 PMCID: PMC5441342 DOI: 10.1093/gbe/evx069] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/07/2017] [Indexed: 12/15/2022] Open
Abstract
DeCoSTAR is a software that aims at reconstructing the organization of ancestral genes or genomes in the form of sets of neighborhood relations (adjacencies) between pairs of ancestral genes or gene domains. It can also improve the assembly of fragmented genomes by proposing evolutionary-induced adjacencies between scaffolding fragments. Ancestral genes or domains are deduced from reconciled phylogenetic trees under an evolutionary model that considers gains, losses, speciations, duplications, and transfers as possible events for gene evolution. Reconciliations are either given as input or computed with the ecceTERA package, into which DeCoSTAR is integrated. DeCoSTAR computes adjacency evolutionary scenarios using a scoring scheme based on a weighted sum of adjacency gains and breakages. Solutions, both optimal and near-optimal, are sampled according to the Boltzmann–Gibbs distribution centered around parsimonious solutions, and statistical supports on ancestral and extant adjacencies are provided. DeCoSTAR supports the features of previously contributed tools that reconstruct ancestral adjacencies, namely DeCo, DeCoLT, ART-DeCo, and DeClone. In a few minutes, DeCoSTAR can reconstruct the evolutionary history of domains inside genes, of gene fusion and fission events, or of gene order along chromosomes, for large data sets including dozens of whole genomes from all kingdoms of life. We illustrate the potential of DeCoSTAR with several applications: ancestral reconstruction of gene orders for Anopheles mosquito genomes, multidomain proteins in Drosophila, and gene fusion and fission detection in Actinobacteria. Availability:http://pbil.univ-lyon1.fr/software/DeCoSTAR (Last accessed April 24, 2017).
Collapse
Affiliation(s)
- Wandrille Duchemin
- Inria Grenoble Rhône-Alpes, Montbonnot, France.,Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France
| | - Yoann Anselmetti
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France.,Institut des Sciences de l'Évolution, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Murray Patterson
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France.,Experimental Algorithmics Lab (AlgoLab), Dipartimento di Informatica, Sistemistica e Comunicazione (DISCo), Università degli Studi di Milano-Bicocca, Viale Sarca, Milano, Italy
| | - Yann Ponty
- CNRS, Ecole Polytechnique, LIX UMR7161, Palaiseau, France.,Inria Saclay, EP AMIB, Palaiseau, France
| | - Sèverine Bérard
- Institut des Sciences de l'Évolution, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France.,LIRMM, Université de Montpellier, CNRS, Montpellier, France
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Celine Scornavacca
- Institut des Sciences de l'Évolution, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Vincent Daubin
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France
| | - Eric Tannier
- Inria Grenoble Rhône-Alpes, Montbonnot, France.,Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France
| |
Collapse
|
15
|
Abstract
Comparative genomics considers the detection of similarities and differences between extant genomes, and, based on more or less formalized hypotheses regarding the involved evolutionary processes, inferring ancestral states explaining the similarities and an evolutionary history explaining the differences. In this chapter, we focus on the reconstruction of the organization of ancient genomes into chromosomes. We review different methodological approaches and software, applied to a wide range of datasets from different kingdoms of life and at different evolutionary depths. We discuss relations with genome assembly, and potential approaches to validate computational predictions on ancient genomes that are almost always only accessible through these predictions.
Collapse
Affiliation(s)
- Yoann Anselmetti
- Institut des Sciences de l'Évolution, Université Montpellier 2, Montpellier, France
| | - Nina Luhmann
- Faculty of Technology, Bielefeld University, Bielefeld, Germany.,Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany.,International Research Training Group1906, Bielefeld University, Bielefeld, Germany
| | - Sèverine Bérard
- Institut des Sciences de l'Évolution, Université Montpellier 2, Montpellier, France
| | - Eric Tannier
- UMR CNRS 5558 - LBBE "Biométrie et Biologie Évolutive", Inria Grenoble Rhône-Alpes and University of Lyon, Lyon, France
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC, Canada, V5A 1S6.
| |
Collapse
|
16
|
Fertin G, Jean G, Tannier E. Algorithms for computing the double cut and join distance on both gene order and intergenic sizes. Algorithms Mol Biol 2017; 12:16. [PMID: 28592988 PMCID: PMC5460591 DOI: 10.1186/s13015-017-0107-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Accepted: 05/15/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Combinatorial works on genome rearrangements have so far ignored the influence of intergene sizes, i.e. the number of nucleotides between consecutive genes, although it was recently shown decisive for the accuracy of inference methods (Biller et al. in Genome Biol Evol 8:1427-39, 2016; Biller et al. in Beckmann A, Bienvenu L, Jonoska N, editors. Proceedings of Pursuit of the Universal-12th conference on computability in Europe, CiE 2016, Lecture notes in computer science, vol 9709, Paris, France, June 27-July 1, 2016. Berlin: Springer, p. 35-44, 2016). In this line, we define a new genome rearrangement model called wDCJ, a generalization of the well-known double cut and join (or DCJ) operation that modifies both the gene order and the intergene size distribution of a genome. RESULTS We first provide a generic formula for the wDCJ distance between two genomes, and show that computing this distance is strongly NP-complete. We then propose an approximation algorithm of ratio 4/3, and two exact ones: a fixed-parameter tractable (FPT) algorithm and an integer linear programming (ILP) formulation. CONCLUSIONS We provide theoretical and empirical bounds on the expected growth of the parameter at the center of our FPT and ILP algorithms, assuming a probabilistic model of evolution under wDCJ, which shows that both these algorithms should run reasonably fast in practice.
Collapse
Affiliation(s)
- Guillaume Fertin
- LS2N UMR CNRS 6004, Université de Nantes, 2 rue de la Houssinière, 44322 Nantes, France
| | - Géraldine Jean
- LS2N UMR CNRS 6004, Université de Nantes, 2 rue de la Houssinière, 44322 Nantes, France
| | - Eric Tannier
- Institut National de Recherche en Informatique et en Automatique (Inria) Grenoble Rhône-Alpes, 655 avenue de l’Europe, 38330 Montbonnot-Saint-Martin, France
- CNRS, Laboratoire de Biomètrie et Biologie Evolutive UMR5558, Univ Lyon, Université Lyon 1, 43 boulevard du 11 novembre 1918, 69622 Villeurbanne, Villeurbanne France
| |
Collapse
|
17
|
Jacox E, Weller M, Tannier E, Scornavacca C. Resolution and reconciliation of non-binary gene trees with transfers, duplications and losses. Bioinformatics 2017; 33:980-987. [PMID: 28073758 DOI: 10.1093/bioinformatics/btw778] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2016] [Accepted: 12/02/2016] [Indexed: 11/14/2022] Open
Abstract
Summary Gene trees reconstructed from sequence alignments contain poorly supported branches when the phylogenetic signal in the sequences is insufficient to determine them all. When a species tree is available, the signal of gains and losses of genes can be used to correctly resolve the unsupported parts of the gene history. However finding a most parsimonious binary resolution of a non-binary tree obtained by contracting the unsupported branches is NP-hard if transfer events are considered as possible gene scale events, in addition to gene origination, duplication and loss. We propose an exact, parameterized algorithm to solve this problem in single-exponential time, where the parameter is the number of connected branches of the gene tree that show low support from the sequence alignment or, equivalently, the maximum number of children of any node of the gene tree once the low-support branches have been collapsed. This improves on the best known algorithm by an exponential factor. We propose a way to choose among optimal solutions based on the available information. We show the usability of this principle on several simulated and biological datasets. The results are comparable in quality to several other tested methods having similar goals, but our approach provides a lower running time and a guarantee that the produced solution is optimal. Availability and Implementation Our algorithm has been integrated into the ecceTERA phylogeny package, available at http://mbb.univ-montp2.fr/MBB/download_sources/16__ecceTERA and which can be run online at http://mbb.univ-montp2.fr/MBB/subsection/softExec.php?soft=eccetera . Contact celine.scornavacca@umontpellier.fr. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Edwin Jacox
- ISE-M, Université Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Mathias Weller
- Institut de Biologie Computationnelle (IBC), Montpellier, France.,LIRMM, Université Montpellier, CNRS, Montpellier, France
| | - Eric Tannier
- INRIA Rhône-Alpes, LBBE, Université Lyon 1, Lyon, France
| | - Celine Scornavacca
- ISE-M, Université Montpellier, CNRS, IRD, EPHE, Montpellier, France.,Institut de Biologie Computationnelle (IBC), Montpellier, France
| |
Collapse
|
18
|
Abstract
Background Given two genomes that have diverged by a series of rearrangements, we infer minimum Double Cut-and-Join (DCJ) scenarios to explain their organization differences, coupled with indel scenarios to explain their intergene size distribution, where DCJs themselves also alter the sizes of broken intergenes. Results We give a polynomial-time algorithm that, given two genomes with arbitrary intergene size distributions, outputs a DCJ scenario which optimizes on the number of DCJs, and given this optimal number of DCJs, optimizes on the total sum of the sizes of the indels. Conclusions We show that there is a valuable information in the intergene sizes concerning the rearrangement scenario itself. On simulated data we show that statistical properties of the inferred scenarios are closer to the true ones than DCJ only scenarios, i.e. scenarios which do not handle intergene sizes.
Collapse
Affiliation(s)
- Laurent Bulteau
- Laboratoire d'Informatique Gaspard Monge, CNRS UMR 8049, Université Paris-Est, 5 Bd Descartes, Marne-la-Vallée, 77454, France.
| | - Guillaume Fertin
- LINA UMR CNRS 6241, Université de Nantes, 2 rue de la Houssinière, Nantes, 44322, France
| | - Eric Tannier
- Laboratoire de Biométrie et Biologie Évolutive (LBBE), 43 boulevard du 11 novembre 1918, Villeurbanne, 69622, France.,Institut National de Recherche en Informatique et en Automatique (INRIA) Rhône-Alpes, 655 avenue de l'Europe, Montbonnot-Saint-Martin, 38330, France
| |
Collapse
|
19
|
Abstract
Type VI secretion systems (T6SS) are bacterial molecular machines translocating effector proteins into target cells. T6SS are widely present in Gram-negative bacteria where they predominantly act to kill neighboring bacteria. This secretion system is reminiscent of the tail of contractile bacteriophages and consists of a contractile sheath anchored in the bacterial envelope and an inner tube made of stacks of the Hcp protein. The Hcp tube is capped with a VgrG trimer and a spike protein termed PAAR, which acts as the membrane-puncturing device. Francisella tularensis, the agent of tularemia, is an intracellular bacterium replicating within the host cytosol. Upon entry into the host cell, F. tularensis rapidly lyses the host vacuolar membrane to reach the host cytosol. This escape is dependent on the Francisella Pathogenicity Island (FPI), which is encoding an atypical T6SS. Among the 17 proteins encoded by the FPI, most of them required for virulence, eight have some homology to canonical T6SS proteins. We recently identified the function of one protein of unknown function encoded within the FPI, IglG. By three-dimensional modelling and following validation by different techniques, we found that IglG adopts a fold resembling the one of PAAR proteins. Importantly, IglG features a domain of unknown function DUF4280, present in numerous bacterial species. We thus propose to rename this domain of unknown function, PAAR-like domain, and discuss here the characteristics of this domain and its distribution in both Gram-negative and Gram-positive bacteria.
Collapse
Affiliation(s)
- Claire Lays
- CIRI, Centre International de Recherche en Infectiologie, Inserm, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, École Normale Supérieure de Lyon, Univ Lyon, F-69007, LYON, France. ; LabEX Ecofect, Eco-evolutionary dynamics of infectious diseases, Lyon, France
| | - Eric Tannier
- LabEX Ecofect, Eco-evolutionary dynamics of infectious diseases, Lyon, France. ; INRIA Rhône-Alpes, 655 av de l'Europe, F-38334 Montbonnot, France, LBBE, UMR5558, Université Claude Bernard Lyon 1, Univ Lyon, F-69007, Lyon, France
| | - Thomas Henry
- CIRI, Centre International de Recherche en Infectiologie, Inserm, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, École Normale Supérieure de Lyon, Univ Lyon, F-69007, LYON, France. ; LabEX Ecofect, Eco-evolutionary dynamics of infectious diseases, Lyon, France
| |
Collapse
|
20
|
Noutahi E, Semeria M, Lafond M, Seguin J, Boussau B, Guéguen L, El-Mabrouk N, Tannier E. Efficient Gene Tree Correction Guided by Genome Evolution. PLoS One 2016; 11:e0159559. [PMID: 27513924 PMCID: PMC4981423 DOI: 10.1371/journal.pone.0159559] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2016] [Accepted: 07/04/2016] [Indexed: 12/31/2022] Open
Abstract
MOTIVATIONS Gene trees inferred solely from multiple alignments of homologous sequences often contain weakly supported and uncertain branches. Information for their full resolution may lie in the dependency between gene families and their genomic context. Integrative methods, using species tree information in addition to sequence information, often rely on a computationally intensive tree space search which forecloses an application to large genomic databases. RESULTS We propose a new method, called ProfileNJ, that takes a gene tree with statistical supports on its branches, and corrects its weakly supported parts by using a combination of information from a species tree and a distance matrix. Its low running time enabled us to use it on the whole Ensembl Compara database, for which we propose an alternative, arguably more plausible set of gene trees. This allowed us to perform a genome-wide analysis of duplication and loss patterns on the history of 63 eukaryote species, and predict ancestral gene content and order for all ancestors along the phylogeny. AVAILABILITY A web interface called RefineTree, including ProfileNJ as well as a other gene tree correction methods, which we also test on the Ensembl gene families, is available at: http://www-ens.iro.umontreal.ca/~adbit/polytomysolver.html. The code of ProfileNJ as well as the set of gene trees corrected by ProfileNJ from Ensembl Compara version 73 families are also made available.
Collapse
Affiliation(s)
- Emmanuel Noutahi
- Département d’Informatique (DIRO), Université de Montréal, H3C3J7 Montréal, Canada
| | - Magali Semeria
- LBBE, UMR CNRS 5558, Université de Lyon 1, F-69622 Villeurbanne, France
| | - Manuel Lafond
- Département d’Informatique (DIRO), Université de Montréal, H3C3J7 Montréal, Canada
| | - Jonathan Seguin
- Département d’Informatique (DIRO), Université de Montréal, H3C3J7 Montréal, Canada
| | - Bastien Boussau
- LBBE, UMR CNRS 5558, Université de Lyon 1, F-69622 Villeurbanne, France
| | - Laurent Guéguen
- LBBE, UMR CNRS 5558, Université de Lyon 1, F-69622 Villeurbanne, France
| | - Nadia El-Mabrouk
- Département d’Informatique (DIRO), Université de Montréal, H3C3J7 Montréal, Canada
| | - Eric Tannier
- LBBE, UMR CNRS 5558, Université de Lyon 1, F-69622 Villeurbanne, France
- INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France
| |
Collapse
|
21
|
Szöllősi GJ, Davín AA, Tannier E, Daubin V, Boussau B. Genome-scale phylogenetic analysis finds extensive gene transfer among fungi. Philos Trans R Soc Lond B Biol Sci 2016; 370:20140335. [PMID: 26323765 PMCID: PMC4571573 DOI: 10.1098/rstb.2014.0335] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Although the role of lateral gene transfer is well recognized in the evolution of bacteria, it is generally assumed that it has had less influence among eukaryotes. To explore this hypothesis, we compare the dynamics of genome evolution in two groups of organisms: cyanobacteria and fungi. Ancestral genomes are inferred in both clades using two types of methods: first, Count, a gene tree unaware method that models gene duplications, gains and losses to explain the observed numbers of genes present in a genome; second, ALE, a more recent gene tree-aware method that reconciles gene trees with a species tree using a model of gene duplication, loss and transfer. We compare their merits and their ability to quantify the role of transfers, and assess the impact of taxonomic sampling on their inferences. We present what we believe is compelling evidence that gene transfer plays a significant role in the evolution of fungi.
Collapse
Affiliation(s)
- Gergely J Szöllősi
- ELTE-MTA 'Lendület' Biophysics Research Group, Pázmány P. stny. 1A, 1117 Budapest, Hungary
| | - Adrián Arellano Davín
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, 69000 Lyon, France
| | - Eric Tannier
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, 69000 Lyon, France Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, 38334 Montbonnot, France
| | - Vincent Daubin
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, 69000 Lyon, France Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, 69622 Villeurbanne, France
| | - Bastien Boussau
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, 69000 Lyon, France Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, 69622 Villeurbanne, France
| |
Collapse
|
22
|
Abstract
Models of evolution by genome rearrangements are prone to two types of flaws: One is to ignore the diversity of susceptibility to breakage across genomic regions, and the other is to suppose that susceptibility values are given. Without necessarily supposing their precise localization, we call "solid" the regions that are improbably broken by rearrangements and "fragile" the regions outside solid ones. We propose a model of evolution by inversions where breakage probabilities vary across fragile regions and over time. It contains as a particular case the uniform breakage model on the nucleotidic sequence, where breakage probabilities are proportional to fragile region lengths. This is very different from the frequently used pseudouniform model where all fragile regions have the same probability to break. Estimations of rearrangement distances based on the pseudouniform model completely fail on simulations with the truly uniform model. On pairs of amniote genomes, we show that identifying coding genes with solid regions yields incoherent distance estimations, especially with the pseudouniform model, and to a lesser extent with the truly uniform model. This incoherence is solved when we coestimate the number of fragile regions with the rearrangement distance. The estimated number of fragile regions is surprisingly small, suggesting that a minority of regions are recurrently used by rearrangements. Estimations for several pairs of genomes at different divergence times are in agreement with a slowly evolvable colocalization of active genomic regions in the cell.
Collapse
Affiliation(s)
- Priscila Biller
- INRIA Grenoble Rhône-Alpes, Montbonnot, France University of Campinas, São Paulo, Brazil
| | | | - Carole Knibbe
- INRIA Grenoble Rhône-Alpes, Montbonnot, France Université Lyon 1, LIRIS, UMR5205, Villeurbanne, France
| | - Eric Tannier
- INRIA Grenoble Rhône-Alpes, Montbonnot, France Université Lyon 1, LBBE, UMR5558, Villeurbanne, France
| |
Collapse
|
23
|
Abstract
BACKGROUND Most models of genome evolution concern either genetic sequences, gene content or gene order. They sometimes integrate two of the three levels, but rarely the three of them. Probabilistic models of gene order evolution usually have to assume constant gene content or adopt a presence/absence coding of gene neighborhoods which is blind to complex events modifying gene content. RESULTS We propose a probabilistic evolutionary model for gene neighborhoods, allowing genes to be inserted, duplicated or lost. It uses reconciled phylogenies, which integrate sequence and gene content evolution. We are then able to optimize parameters such as phylogeny branch lengths, or probabilistic laws depicting the diversity of susceptibility of syntenic regions to rearrangements. We reconstruct a structure for ancestral genomes by optimizing a likelihood, keeping track of all evolutionary events at the level of gene content and gene synteny. Ancestral syntenies are associated with a probability of presence.
Collapse
Affiliation(s)
- Magali Semeria
- Laboratoire de Biométrie et Biologie Évolutive UMR CNRS 5558, Université Claude Bernard Lyon 1, 43 boulevard du 11 novembre 1918, 69622 Villeurbanne, France
| | - Eric Tannier
- Laboratoire de Biométrie et Biologie Évolutive UMR CNRS 5558, Université Claude Bernard Lyon 1, 43 boulevard du 11 novembre 1918, 69622 Villeurbanne, France
- INRIA Grenoble Rhône-Alpes, 655 avenue de l'Europe, 38330 Montbonnot, France
| | - Laurent Guéguen
- Laboratoire de Biométrie et Biologie Évolutive UMR CNRS 5558, Université Claude Bernard Lyon 1, 43 boulevard du 11 novembre 1918, 69622 Villeurbanne, France
| |
Collapse
|
24
|
Abstract
We exploit the methodological similarity between ancestral genome reconstruction and extant genome scaffolding. We present a method, called ARt-DeCo that constructs neighborhood relationships between genes or contigs, in both ancestral and extant genomes, in a phylogenetic context. It is able to handle dozens of complete genomes, including genes with complex histories, by using gene phylogenies reconciled with a species tree, that is, annotated with speciation, duplication and loss events. Reconstructed ancestral or extant synteny comes with a support computed from an exhaustive exploration of the solution space. We compare our method with a previously published one that follows the same goal on a small number of genomes with universal unicopy genes. Then we test it on the whole Ensembl database, by proposing partial ancestral genome structures, as well as a more complete scaffolding for many partially assembled genomes on 69 eukaryote species. We carefully analyze a couple of extant adjacencies proposed by our method, and show that they are indeed real links in the extant genomes, that were missing in the current assembly. On a reduced data set of 39 eutherian mammals, we estimate the precision and sensitivity of ARt-DeCo by simulating a fragmentation in some well assembled genomes, and measure how many adjacencies are recovered. We find a very high precision, while the sensitivity depends on the quality of the data and on the proximity of closely related genomes.
Collapse
Affiliation(s)
- Yoann Anselmetti
- Institut des Sciences de l'Évolution de Montpellier (ISE-M), Place Eugène Bataillon, Montpellier, 34095, France
- Laboratoire de Biométrie et Biologie Évolutive, LBBE, UMR CNRS 5558, University of Lyon 1, 43 boulevard du 11 novembre 1918, 69622 Villeurbanne, France
| | - Vincent Berry
- Institut de Biologie Computationnelle (IBC), Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université Montpellier - CNRS, 161 rue Ada, Montpellier, 34090, France
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, V5A 1S6, Canada
| | - Annie Chateau
- Institut de Biologie Computationnelle (IBC), Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université Montpellier - CNRS, 161 rue Ada, Montpellier, 34090, France
| | - Eric Tannier
- Laboratoire de Biométrie et Biologie Évolutive, LBBE, UMR CNRS 5558, University of Lyon 1, 43 boulevard du 11 novembre 1918, 69622 Villeurbanne, France
- Institut National de Recherche en Informatique et en Automatique (INRIA) Grenoble Rhône-Alpes, 655 avenue de l'Europe, 38330 Montbonnot, France
| | - Sèverine Bérard
- Institut des Sciences de l'Évolution de Montpellier (ISE-M), Place Eugène Bataillon, Montpellier, 34095, France
- Institut de Biologie Computationnelle (IBC), Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université Montpellier - CNRS, 161 rue Ada, Montpellier, 34090, France
| |
Collapse
|
25
|
Abstract
We study statistical estimators of the number of genomic events separating two genomes under a Double Cut-and Join (DCJ) rearrangement model, by a method of moment estimation. We first propose an exact, closed, analytically invertible formula for the expected number of breakpoints after a given number of DCJs. This improves over the heuristic, recursive and computationally slower previously proposed one. Then we explore the analogies of genome evolution by DCJ with evolution of binary sequences under substitutions, permutations under transpositions, and random graphs. Each of these are presented in the literature with intuitive justifications, and are used to import results from better known fields. We formalize the relations by proving a correspondence between moments in sequence and genome evolution, provided substitutions appear four by four in the corresponding model. Eventually we prove a bounded error on two estimators of the number of cycles in the breakpoint graph after a given number of rearrangements, by an analogy with cycles in permutations and components in random graphs.
Collapse
Affiliation(s)
- Priscila Biller
- Institute of Computing, University of Campinas, São Paulo, Brazil
- Institut National de Recherche en Informatique et en Automatique (INRIA) Grenoble Rhône-Alps, 655 avenue de L'Europe, 38330 Montbonnot, France
| | - Laurent Guéguen
- Laboratoire de Biométrie et Biologie Évolutive, LBBE, UMR CNRS 5558, University of Lyon 1, 43 boulevard du 11 novembre 1918, 69622, Villeurbanne, France
| | - Eric Tannier
- Institut National de Recherche en Informatique et en Automatique (INRIA) Grenoble Rhône-Alps, 655 avenue de L'Europe, 38330 Montbonnot, France
- Laboratoire de Biométrie et Biologie Évolutive, LBBE, UMR CNRS 5558, University of Lyon 1, 43 boulevard du 11 novembre 1918, 69622, Villeurbanne, France
| |
Collapse
|
26
|
Abstract
BACKGROUND We propose the computational reconstruction of a whole bacterial ancestral genome at the nucleotide scale, and its validation by a sequence of ancient DNA. This rare possibility is offered by an ancient sequence of the late middle ages plague agent. It has been hypothesized to be ancestral to extant Yersinia pestis strains based on the pattern of nucleotide substitutions. But the dynamics of indels, duplications, insertion sequences and rearrangements has impacted all genomes much more than the substitution process, which makes the ancestral reconstruction task challenging. RESULTS We use a set of gene families from 13 Yersinia species, construct reconciled phylogenies for all of them, and determine gene orders in ancestral species. Gene trees integrate information from the sequence, the species tree and gene order. We reconstruct ancestral sequences for ancestral genic and intergenic regions, providing nearly a complete genome sequence for the ancestor, containing a chromosome and three plasmids. CONCLUSION The comparison of the ancestral and ancient sequences provides a unique opportunity to assess the quality of ancestral genome reconstruction methods. But the quality of the sequencing and assembly of the ancient sequence can also be questioned by this comparison.
Collapse
Affiliation(s)
- Wandrille Duchemin
- Laboratoire de Biométrie et Biologie Évolutive, LBBE, UMR CNRS 5558, University of Lyon 1, 43 boulevard du 11 novembre 1918, 69622 Villeurbanne, France
| | - Vincent Daubin
- Laboratoire de Biométrie et Biologie Évolutive, LBBE, UMR CNRS 5558, University of Lyon 1, 43 boulevard du 11 novembre 1918, 69622 Villeurbanne, France
| | - Eric Tannier
- Laboratoire de Biométrie et Biologie Évolutive, LBBE, UMR CNRS 5558, University of Lyon 1, 43 boulevard du 11 novembre 1918, 69622 Villeurbanne, France
- Institut National de Recherche en Informatique et en Automatique (INRIA) Grenoble Rhône-Alpes, 655 avenue de l'Europe, 38330 Montbonnot, France
| |
Collapse
|
27
|
Murat F, Zhang R, Guizard S, Gavranović H, Flores R, Steinbach D, Quesneville H, Tannier E, Salse J. Karyotype and gene order evolution from reconstructed extinct ancestors highlight contrasts in genome plasticity of modern rosid crops. Genome Biol Evol 2015; 7:735-49. [PMID: 25637221 PMCID: PMC5322550 DOI: 10.1093/gbe/evv014] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
We used nine complete genome sequences, from grape, poplar, Arabidopsis, soybean, lotus, apple, strawberry, cacao, and papaya, to investigate the paleohistory of rosid crops. We characterized an ancestral rosid karyotype, structured into 7/21 protochomosomes, with a minimal set of 6,250 ordered protogenes and a minimum physical coding gene space of 50 megabases. We also proposed ancestral karyotypes for the Caricaceae, Brassicaceae, Malvaceae, Fabaceae, Rosaceae, Salicaceae, and Vitaceae families with 9, 8, 10, 6, 12, 9, 12, and 19 protochromosomes, respectively. On the basis of these ancestral karyotypes and present-day species comparisons, we proposed a two-step evolutionary scenario based on allohexaploidization involving the newly characterized A, B, and C diploid progenitors leading to dominant (stable) and sensitive (plastic) genomic compartments in any modern rosid crops. Finally, a new user-friendly online tool, “DicotSyntenyViewer” (available from http://urgi.versailles.inra.fr/synteny-dicot), has been made available for accurate translational genomics in rosids.
Collapse
Affiliation(s)
- Florent Murat
- INRA/UBP UMR 1095 GDEC 'Génétique, Diversité et Ecophysiologie des Céréales', Clermont Ferrand, France
| | - Rongzhi Zhang
- INRA/UBP UMR 1095 GDEC 'Génétique, Diversité et Ecophysiologie des Céréales', Clermont Ferrand, France
| | - Sébastien Guizard
- INRA/UBP UMR 1095 GDEC 'Génétique, Diversité et Ecophysiologie des Céréales', Clermont Ferrand, France
| | - Haris Gavranović
- Faculty of Engineering and Natural Sciences, International University of Sarajevo, Sarajevo, Bosnia and Herzegovina
| | - Raphael Flores
- INRA 'Unité de Recherche en Génomique et Informatique', Centre INRA de Versailles, Versailles, France
| | - Delphine Steinbach
- INRA 'Unité de Recherche en Génomique et Informatique', Centre INRA de Versailles, Versailles, France
| | - Hadi Quesneville
- INRA 'Unité de Recherche en Génomique et Informatique', Centre INRA de Versailles, Versailles, France
| | - Eric Tannier
- INRIA Rhône-Alpes, Université de Lyon 1, CNRS UMR5558, Laboratoire Biométrie et Biologie Évolutive, Villeurbanne Cedex, France
| | - Jérôme Salse
- INRA/UBP UMR 1095 GDEC 'Génétique, Diversité et Ecophysiologie des Céréales', Clermont Ferrand, France
| |
Collapse
|
28
|
Abstract
This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.
Collapse
Affiliation(s)
- Gergely J Szöllősi
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Eric Tannier
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Vincent Daubin
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Bastien Boussau
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France;
| |
Collapse
|
29
|
Abstract
Background Reconciled gene trees yield orthology and paralogy relationships between genes. This information may however contradict other information on orthology and paralogy provided by other footprints of evolution, such as conserved synteny. Results We explore a way to include external information on orthology in the process of gene tree construction. Given an initial gene tree and a set of orthology constraints on pairs of genes or on clades, we give polynomial-time algorithms for producing a modified gene tree satisfying the set of constraints, that is as close as possible to the original one according to the Robinson-Foulds distance. We assess the validity of the modifications we propose by computing the likelihood ratio between initial and modified trees according to sequence alignments on Ensembl trees, showing that often the two trees are statistically equivalent. Availability Software and data available upon request to the corresponding author.
Collapse
|
30
|
Abstract
MOTIVATIONS Recent progress in ancient DNA sequencing technologies and protocols has lead to the sequencing of whole ancient bacterial genomes, as illustrated by the recent sequence of the Yersinia pestis strain that caused the Black Death pandemic. However, sequencing ancient genomes raises specific problems, because of the decay and fragmentation of ancient DNA among others, making the scaffolding of ancient contigs challenging. RESULTS We show that computational paleogenomics methods aimed at reconstructing the organization of ancestral genomes from the comparison of extant genomes can be adapted to correct, order and orient ancient bacterial contigs. We describe the method FPSAC (fast phylogenetic scaffolding of ancient contigs) and apply it on a set of 2134 ancient contigs assembled from the recently sequenced Black Death agent genome. We obtain a unique scaffold for the whole chromosome of this ancient genome that allows to gain precise insights into the structural evolution of the Yersinia clade.
Collapse
Affiliation(s)
- Ashok Rajaraman
- Department of Mathematics, Simon Fraser University, Burnaby (BC) V5A1S6, Canada, International Graduate Training Center in Mathematical Biology, Pacific Institute for the Mathematical Sciences, Vancouver (BC), Canada, INRIA Grenoble Rhône-Alpes, Montbonnot 38334, France, Université de Lyon 1, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558 F-69622 Villeurbanne, France and LaBRI, Université Bordeaux I, 33405 Talence, France
| | | | | |
Collapse
|
31
|
Abstract
Gene trees record the combination of gene-level events, such as duplication, transfer and loss (DTL), and species-level events, such as speciation and extinction. Gene tree–species tree reconciliation methods model these processes by drawing gene trees into the species tree using a series of gene and species-level events. The reconstruction of gene trees based on sequence alone almost always involves choosing between statistically equivalent or weakly distinguishable relationships that could be much better resolved based on a putative species tree. To exploit this potential for accurate reconstruction of gene trees, the space of reconciled gene trees must be explored according to a joint model of sequence evolution and gene tree–species tree reconciliation. Here we present amalgamated likelihood estimation (ALE), a probabilistic approach to exhaustively explore all reconciled gene trees that can be amalgamated as a combination of clades observed in a sample of gene trees. We implement the ALE approach in the context of a reconciliation model (Szöllősi et al. 2013), which allows for the DTL of genes. We use ALE to efficiently approximate the sum of the joint likelihood over amalgamations and to find the reconciled gene tree that maximizes the joint likelihood among all such trees. We demonstrate using simulations that gene trees reconstructed using the joint likelihood are substantially more accurate than those reconstructed using sequence alone. Using realistic gene tree topologies, branch lengths, and alignment sizes, we demonstrate that ALE produces more accurate gene trees even if the model of sequence evolution is greatly simplified. Finally, examining 1099 gene families from 36 cyanobacterial genomes we find that joint likelihood-based inference results in a striking reduction in apparent phylogenetic discord, with respectively. 24%, 59%, and 46% reductions in the mean numbers of duplications, transfers, and losses per gene family. The open source implementation of ALE is available from https://github.com/ssolo/ALE.git. [amalgamation; gene tree reconciliation; gene tree reconstruction; lateral gene transfer; phylogeny.]
Collapse
Affiliation(s)
- Gergely J Szöllõsi
- Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne
| | | | | | | | | |
Collapse
|
32
|
Abstract
Motivation: Most models of genome evolution integrating gene duplications, losses and chromosomal rearrangements are computationally intract
able, even when comparing only two genomes. This prevents large-scale studies that consider different types of genome structural variations. Results: We define an ‘adjacency phylogenetic tree’ that describes the evolution of an adjacency, a neighborhood relation between two genes, by speciation, duplication or loss of one or both genes, and rearrangement. We describe an algorithm that, given a species tree and a set of gene trees where the leaves are connected by adjacencies, computes an adjacency forest that minimizes the number of gains and breakages of adjacencies (caused by rearrangements) and runs in polynomial time. We use this algorithm to reconstruct contiguous regions of mammalian and plant ancestral genomes in a few minutes for a dozen species and several thousand genes. We show that this method yields reduced conflict between ancestral adjacencies. We detect duplications involving several genes and compare the different modes of evolution between phyla and among lineages. Availability: C++ implementation using BIO++ package, available upon request to Sèverine Bérard. Contact:Severine.Berard@cirad.fr or Eric.Tannier@inria.fr Supplementary information:Supplementary material is available at Bioinformatics online.
Collapse
|
33
|
Abstract
In phylogenetic studies, the evolution of molecular sequences is assumed to have taken place along the phylogeny traced by the ancestors of extant species. In the presence of lateral gene transfer, however, this may not be the case, because the species lineage from which a gene was transferred may have gone extinct or not have been sampled. Because it is not feasible to specify or reconstruct the complete phylogeny of all species, we must describe the evolution of genes outside the represented phylogeny by modeling the speciation dynamics that gave rise to the complete phylogeny. We demonstrate that if the number of sampled species is small compared with the total number of existing species, the overwhelming majority of gene transfers involve speciation to and evolution along extinct or unsampled lineages. We show that the evolution of genes along extinct or unsampled lineages can to good approximation be treated as those of independently evolving lineages described by a few global parameters. Using this result, we derive an algorithm to calculate the probability of a gene tree and recover the maximum-likelihood reconciliation given the phylogeny of the sampled species. Examining 473 near-universal gene families from 36 cyanobacteria, we find that nearly a third of transfer events (28%) appear to have topological signatures of evolution along extinct species, but only approximately 6% of transfers trace their ancestry to before the common ancestor of the sampled cyanobacteria. [Gene tree reconciliation; lateral gene transfer; macroevolution; phylogeny.]
Collapse
Affiliation(s)
- Gergely J Szöllosi
- Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France.
| | | | | | | |
Collapse
|
34
|
Chauve C, El-Mabrouk N, Guéguen L, Semeria M, Tannier E. Duplication, Rearrangement and Reconciliation: A Follow-Up 13 Years Later. Models and Algorithms for Genome Evolution 2013. [DOI: 10.1007/978-1-4471-5298-9_4] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
35
|
Abstract
BACKGROUND Recovering the structure of ancestral genomes can be formalized in terms of properties of binary matrices such as the Consecutive-Ones Property (C1P). The Linearization Problem asks to extract, from a given binary matrix, a maximum weight subset of rows that satisfies such a property. This problem is in general intractable, and in particular if the ancestral genome is expected to contain only linear chromosomes or a unique circular chromosome. In the present work, we consider a relaxation of this problem, which allows ancestral genomes that can contain several chromosomes, each either linear or circular. RESULT We show that, when restricted to binary matrices of degree two, which correspond to adjacencies, the genomic characters used in most ancestral genome reconstruction methods, this relaxed version of the Linearization Problem is polynomially solvable using a reduction to a matching problem. This result holds in the more general case where columns have bounded multiplicity, which models possibly duplicated ancestral genes. We also prove that for matrices with rows of degrees 2 and 3, without multiplicity and without weights on the rows, the problem is NP-complete, thus tracing sharp tractability boundaries. CONCLUSION As it happened for the breakpoint median problem, also used in ancestral genome reconstruction, relaxing the definition of a genome turns an intractable problem into a tractable one. The relaxation is adapted to some biological contexts, such as bacterial genomes with several replicons, possibly partially assembled. Algorithms can also be used as heuristics for hard variants. More generally, this work opens a way to better understand linearization results for ancestral genome structure inference.
Collapse
Affiliation(s)
- Ján Maňuch
- INRIA Rhône-Alpes, 655 avenue de I'Europe, F-38344 Montbonnot, France.
| | | | | | | | | |
Collapse
|
36
|
Abstract
Comparisons of gene trees and species trees are key to understanding major processes of genome evolution such as gene duplication and loss. Because current methods to reconstruct phylogenies fail to model the two-way dependency between gene trees and the species tree, they often misrepresent gene and species histories. We present a new probabilistic model to jointly infer rooted species and gene trees for dozens of genomes and thousands of gene families. We use simulations to show that this method accurately infers the species tree and gene trees, is robust to misspecification of the models of sequence and gene family evolution, and provides a precise historic record of gene duplications and losses throughout genome evolution. We simultaneously reconstruct the history of mammalian species and their genes based on 36 completely sequenced genomes, and use the reconstructed gene trees to infer the gene content and organization of ancestral mammalian genomes. We show that our method yields a more accurate picture of ancestral genomes than the trees available in the authoritative database Ensembl.
Collapse
Affiliation(s)
- Bastien Boussau
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, UMR 5558, Villeurbanne F-69622, France.
| | | | | | | | | | | |
Collapse
|
37
|
|
38
|
|
39
|
Abstract
MOTIVATION Ancestral genomes provide a better way to understand the structural evolution of genomes than the simple comparison of extant genomes. Most ancestral genome reconstruction methods rely on universal markers, that is, homologous families of DNA segments present in exactly one exemplar in every considered species. Complex histories of genes or other markers, undergoing duplications and losses, are rarely taken into account. It follows that some ancestors are inaccessible by these methods, such as the proto-monocotyledon whose evolution involved massive gene loss following a whole genome duplication. RESULTS We propose a mapping approach based on the combinatorial notion of 'sandwich consecutive ones matrix', which explicitly takes gene losses into account. We introduce combinatorial optimization problems related to this concept, and propose a heuristic solver and a lower bound on the optimal solution. We use these results to propose a configuration for the proto-chromosomes of the monocot ancestor, and study the accuracy of this configuration. We also use our method to reconstruct the ancestral boreoeutherian genomes, which illustrates that the framework we propose is not specific to plant paleogenomics but is adapted to reconstruct any ancestral genome from extant genomes with heterogeneous marker content. AVAILABILITY Upon request to the authors. CONTACT haris.gavranovic@gmail.com; eric.tannier@inria.fr.
Collapse
Affiliation(s)
- Haris Gavranović
- Faculty of Natural Sciences, University of Sarajevo, Bosnia and Herzegovina.
| | | | | | | |
Collapse
|
40
|
Murat F, Xu JH, Tannier E, Abrouk M, Guilhot N, Pont C, Messing J, Salse J. Ancestral grass karyotype reconstruction unravels new mechanisms of genome shuffling as a source of plant evolution. Genome Res 2010; 20:1545-57. [PMID: 20876790 PMCID: PMC2963818 DOI: 10.1101/gr.109744.110] [Citation(s) in RCA: 137] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2010] [Accepted: 08/24/2010] [Indexed: 11/25/2022]
Abstract
The comparison of the chromosome numbers of today's species with common reconstructed paleo-ancestors has led to intense speculation of how chromosomes have been rearranged over time in mammals. However, similar studies in plants with respect to genome evolution as well as molecular mechanisms leading to mosaic synteny blocks have been lacking due to relevant examples of evolutionary zooms from genomic sequences. Such studies require genomes of species that belong to the same family but are diverged to fall into different subfamilies. Our most important crops belong to the family of the grasses, where a number of genomes have now been sequenced. Based on detailed paleogenomics, using inference from n = 5-12 grass ancestral karyotypes (AGKs) in terms of gene content and order, we delineated sequence intervals comprising a complete set of junction break points of orthologous regions from rice, maize, sorghum, and Brachypodium genomes, representing three different subfamilies and different polyploidization events. By focusing on these sequence intervals, we could show that the chromosome number variation/reduction from the n = 12 common paleo-ancestor was driven by nonrandom centric double-strand break repair events. It appeared that the centromeric/telomeric illegitimate recombination between nonhomologous chromosomes led to nested chromosome fusions (NCFs) and synteny break points (SBPs). When intervals comprising NCFs were compared in their structure, we concluded that SBPs (1) were meiotic recombination hotspots, (2) corresponded to high sequence turnover loci through repeat invasion, and (3) might be considered as hotspots of evolutionary novelty that could act as a reservoir for producing adaptive phenotypes.
Collapse
Affiliation(s)
- Florent Murat
- INRA, UMR 1095, Laboratoire Génétique, Diversité et Ecophysiologie des Céréales, 63100 Clermont Ferrand, France
| | - Jian-Hong Xu
- The Plant Genome Initiative at Rutgers (PGIR), Waksman Institute of Microbiology, Rutgers, The State University of New Jersey, Piscataway, New Jersey 08854, USA
| | - Eric Tannier
- INRIA Rhône-Alpes, Université de Lyon 1, CNRS UMR5558, Laboratoire Biométrie et Biologie Évolutive, 69622 Villeurbanne Cedex, France
| | - Michael Abrouk
- INRA, UMR 1095, Laboratoire Génétique, Diversité et Ecophysiologie des Céréales, 63100 Clermont Ferrand, France
| | - Nicolas Guilhot
- INRA, UMR 1095, Laboratoire Génétique, Diversité et Ecophysiologie des Céréales, 63100 Clermont Ferrand, France
| | - Caroline Pont
- INRA, UMR 1095, Laboratoire Génétique, Diversité et Ecophysiologie des Céréales, 63100 Clermont Ferrand, France
| | - Joachim Messing
- The Plant Genome Initiative at Rutgers (PGIR), Waksman Institute of Microbiology, Rutgers, The State University of New Jersey, Piscataway, New Jersey 08854, USA
| | - Jérôme Salse
- INRA, UMR 1095, Laboratoire Génétique, Diversité et Ecophysiologie des Céréales, 63100 Clermont Ferrand, France
| |
Collapse
|
41
|
Abstract
MOTIVATION When comparing the organization of two genomes, it is important not to draw conclusions on their modes of evolution from a single most parsimonious scenario explaining their differences. Better estimations can be obtained by sampling many different genomic rearrangement scenarios. For this problem, the Double Cut and Join (DCJ) model, while less relevant, is computationally easier than the Hannenhalli-Pevzner (HP) model. Indeed, in some special cases, the total number of DCJ sorting scenarios can be analytically calculated, and uniformly distributed random DCJ scenarios can be drawn in polynomial running time, while the complexity of counting the number of HP scenarios and sampling from the uniform distribution of their space is unknown, and conjectured to be #P-complete. Statistical methods, like Markov chain Monte Carlo (MCMC) for sampling from the uniform distribution of the most parsimonious or the Bayesian distribution of all possible HP scenarios are required. RESULTS We use the computational facilities of the DCJ model to draw a sampling of HP scenarios. It is based on a parallel MCMC method that cools down DCJ scenarios to HP scenarios. We introduce two theorems underlying the theoretical mixing properties of this parallel MCMC method. The method was tested on yeast and mammalian genomic data, and allowed us to provide estimates of the different modes of evolution in diverse lineages. AVAILABILITY The program implemented in Java 1.5 programming language is available from http://www.renyi.hu/~miklosi/DCJ2HP/.
Collapse
Affiliation(s)
- István Miklós
- Department of Stochastics, Rényi Institute, Budapest, Hungary.
| | | |
Collapse
|
42
|
Chauve C, Gavranovic H, Ouangraoua A, Tannier E. Yeast Ancestral Genome Reconstructions: The Possibilities of Computational Methods II. J Comput Biol 2010; 17:1097-112. [DOI: 10.1089/cmb.2010.0092] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Affiliation(s)
- Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| | - Haris Gavranovic
- Faculty of Natural Sciences, University of Sarajevo, Sarajevo, Bosnia and Herzegovina
| | - Aida Ouangraoua
- INRIA Lille-Nord-Europe, Université Lille 1, LIFL, UMR CNRS 8022, Villeneuve d'Ascq, France
| | - Eric Tannier
- INRIA Rhône-Alpes, Université de Lyon, Lyon, and Université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne, France
| |
Collapse
|
43
|
Abrouk M, Murat F, Pont C, Messing J, Jackson S, Faraut T, Tannier E, Plomion C, Cooke R, Feuillet C, Salse J. Palaeogenomics of plants: synteny-based modelling of extinct ancestors. Trends Plant Sci 2010; 15:479-87. [PMID: 20638891 DOI: 10.1016/j.tplants.2010.06.001] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2010] [Revised: 05/28/2010] [Accepted: 06/16/2010] [Indexed: 05/03/2023]
Abstract
In the past ten years, international initiatives have led to the development of large sets of genomic resources that allow comparative genomic studies between plant genomes at a high level of resolution. Comparison of map-based genomic sequences revealed shared intra-genomic duplications, providing new insights into the evolution of flowering plant genomes from common ancestors. Plant genomes can be presented as concentric circles, providing a new reference for plant chromosome evolutionary relationships and an efficient tool for gene annotation and cross-genome markers development. Recent palaeogenomic data demonstrate that whole-genome duplications have provided a motor for the evolutionary success of flowering plants over the last 50-70 million years.
Collapse
Affiliation(s)
- Michael Abrouk
- INRA, UMR 1095, Laboratoire Génétique, Diversité et Ecophysiologie des Céréales, 234 avenue du Brézet, 63100 Clermont Ferrand, France
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Abstract
Summary: Genomes undergo large structural changes that alter their organization. The chromosomal regions affected by these rearrangements are called breakpoints, while those which have not been rearranged are called synteny blocks. Lemaitre et al. presented a new method to precisely delimit rearrangement breakpoints in a genome by comparison with the genome of a related species. Receiving as input a list of one2one orthologous genes found in the genomes of two species, the method builds a set of reliable and non-overlapping synteny blocks and refines the regions that are not contained into them. Through the alignment of each breakpoint sequence against its specific orthologous sequences in the other species, we can look for weak similarities inside the breakpoint, thus extending the synteny blocks and narrowing the breakpoints. The identification of the narrowed breakpoints relies on a segmentation algorithm and is statistically assessed. Here, we present the package Cassis that implements this method of precise detection of genomic rearrangement breakpoints. Availability: Perl and R scripts are freely available for download at http://pbil.univ-lyon1.fr/software/Cassis/. Documentation with methodological background, technical aspects, download and setup instructions, as well as examples of applications are available together with the package. The package was tested on Linux and Mac OS environments and is distributed under the GNU GPL License. Contact:Marie-France.Sagot@inria.fr Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christian Baudet
- Equipe BAMBOO, INRIA Grenoble Rhône-Alpes et Laboratoire de Biométrie et Biologie Evolutive (UMR 5558) CNRS, Université Lyon 1, Villeurbanne, France
| | | | | | | | | | | |
Collapse
|
45
|
Gavranović H, Tannier E. Guided genome halving: provably optimal solutions provide good insights into the preduplication ancestral genome of Saccharomyces cerevisiae. Pac Symp Biocomput 2010:21-30. [PMID: 19908354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
We present theoretical and practical advances on the Guided Genome Halving problem, a combinatorial optimisation problem which aims at proposing ancestral configurations of extant genomes when one of them has undergone a whole genome duplication. We provide a lower bound on the optimal solution, devise a heuristic algorithm based on subgraph identification, and apply it to yeast gene order data. On some instances, the computation of the bound yields a proof that the obtained solutions are optimal. We analyse a set of optimal solutions, compare them with a manually curated standard ancestor, showing that on yeast data, results coming from different methodologies are largely convergent: the optimal solutions are distant of at most one rearrangement from the reference.
Collapse
Affiliation(s)
- Haris Gavranović
- Faculty of Natural Sciences, University of Sarajevo, Bosnia and Herzegovina
| | | |
Collapse
|
46
|
Bérard S, Chateau A, Chauve C, Paul C, Tannier E. Computation of perfect DCJ rearrangement scenarios with linear and circular chromosomes. J Comput Biol 2009; 16:1287-309. [PMID: 19803733 DOI: 10.1089/cmb.2009.0088] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We study the problem of transforming a multichromosomal genome into another using Double Cut-and-Join (DCJ) operations, which simulates several types of rearrangements, as reversals, translocations, and block-interchanges. We introduce the notion of a DCJ scenario that does not break families of common intervals (groups of genes co-localized in both genomes). Such scenarios are called perfect, and their properties are well known when the only considered rearrangements are reversals. We show that computing the minimum perfect DCJ rearrangement scenario is NP-hard, and describe an exact algorithm which exponential running time is bounded in terms of a specific pattern used in the NP-completeness proof. The study of perfect DCJ rearrangement leads to some surprising properties. The DCJ model has often yielded algorithmic problems which complexities are comparable to the reversal-only model. In the perfect rearrangement framework, however, while perfect sorting by reversals is NP-hard if the family of common intervals to be preserved is nested, we show that finding a shortest perfect DCJ scenario can be answered in polynomial time in this case. Conversely, while perfect sorting by reversals is tractable when the family of common intervals is weakly separable, we show that the corresponding problem is still NP-hard in the DCJ case. This shows that despite the similarity of the two operations, easy patterns for reversals are hard ones for DCJ, and vice versa.
Collapse
|
47
|
Lemaitre C, Zaghloul L, Sagot MF, Gautier C, Arneodo A, Tannier E, Audit B. Analysis of fine-scale mammalian evolutionary breakpoints provides new insight into their relation to genome organisation. BMC Genomics 2009; 10:335. [PMID: 19630943 PMCID: PMC2722678 DOI: 10.1186/1471-2164-10-335] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2009] [Accepted: 07/24/2009] [Indexed: 11/21/2022] Open
Abstract
Background The Intergenic Breakage Model, which is the current model of structural genome evolution, considers that evolutionary rearrangement breakages happen with a uniform propensity along the genome but are selected against in genes, their regulatory regions and in-between. However, a growing body of evidence shows that there exists regions along mammalian genomes that present a high susceptibility to breakage. We reconsidered this question taking advantage of a recently published methodology for the precise detection of rearrangement breakpoints based on pairwise genome comparisons. Results We applied this methodology between the genome of human and those of five sequenced eutherian mammals which allowed us to delineate evolutionary breakpoint regions along the human genome with a finer resolution (median size 26.6 kb) than obtained before. We investigated the distribution of these breakpoints with respect to genome organisation into domains of different activity. In agreement with the Intergenic Breakage Model, we observed that breakpoints are under-represented in genes. Surprisingly however, the density of breakpoints in small intergenes (1 per Mb) appears significantly higher than in gene deserts (0.1 per Mb). More generally, we found a heterogeneous distribution of breakpoints that follows the organisation of the genome into isochores (breakpoints are more frequent in GC-rich regions). We then discuss the hypothesis that regions with an enhanced susceptibility to breakage correspond to regions of high transcriptional activity and replication initiation. Conclusion We propose a model to describe the heterogeneous distribution of evolutionary breakpoints along human chromosomes that combines natural selection and a mutational bias linked to local open chromatin state.
Collapse
Affiliation(s)
- Claire Lemaitre
- Université de Bordeaux, Centre de Bioinformatique - Génomique Fonctionnelle Bordeaux, F-33000 Bordeaux, France.
| | | | | | | | | | | | | |
Collapse
|
48
|
Lemaitre C, Braga MDV, Gautier C, Sagot MF, Tannier E, Marais GAB. Footprints of inversions at present and past pseudoautosomal boundaries in human sex chromosomes. Genome Biol Evol 2009; 1:56-66. [PMID: 20333177 PMCID: PMC2817401 DOI: 10.1093/gbe/evp006] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/12/2009] [Indexed: 12/23/2022] Open
Abstract
The human sex chromosomes have stopped recombining gradually, which has left five evolutionary strata on the X chromosome. Y inversions are thought to have suppressed X–Y recombination but clear evidence is missing. Here, we looked for such evidence by focusing on a region—the X-added region (XAR)—that includes the pseudoautosomal region and the most recent strata 3 to 5. We estimated and analyzed the whole set of parsimonious scenarios of Y inversions given the gene order in XAR and its Y homolog. Comparing these to scenarios for simulated sequences suggests that the strata 4 and 5 were formed by Y inversions. By comparing the X and Y DNA sequences, we found clear evidence of two Y inversions associated with duplications that coincide with the boundaries of strata 4 and 5. Divergence between duplicates is in agreement with the timing of strata 4 and 5 formation. These duplicates show a complex pattern of gene conversion that resembles the pattern previously found for AMELXY, a stratum 3 locus. This suggests that this locus—despite AMELY being unbroken—was possibly involved in a Y inversion that formed stratum 3. However, no clear evidence supporting the formation of stratum 3 by a Y inversion was found, probably because this stratum is too old for such an inversion to be detectable. Our results strongly support the view that the most recent human strata have arisen by Y inversions and suggest that inversions have played a major role in the differentiation of our sex chromosomes.
Collapse
Affiliation(s)
- Claire Lemaitre
- Université de Lyon, Université Lyon 1, Centre National de la Recherche Scientifique, Institut National de Recherche en Informatique et en Automatique, UMR5558, Laboratoire de Biométrie et Biologie évolutive, Villeurbanne, F-69622 cedex, France
| | | | | | | | | | | |
Collapse
|
49
|
Tannier E, Zheng C, Sankoff D. Multichromosomal median and halving problems under different genomic distances. BMC Bioinformatics 2009; 10:120. [PMID: 19386099 PMCID: PMC2683817 DOI: 10.1186/1471-2105-10-120] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2008] [Accepted: 04/22/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genome median and genome halving are combinatorial optimization problems that aim at reconstructing ancestral genomes as well as the evolutionary events leading from the ancestor to extant species. Exploring complexity issues is a first step towards devising efficient algorithms. The complexity of the median problem for unichromosomal genomes (permutations) has been settled for both the breakpoint distance and the reversal distance. Although the multichromosomal case has often been assumed to be a simple generalization of the unichromosomal case, it is also a relaxation so that complexity in this context does not follow from existing results, and is open for all distances. RESULTS We settle here the complexity of several genome median and halving problems, including a surprising polynomial result for the breakpoint median and guided halving problems in genomes with circular and linear chromosomes, showing that the multichromosomal problem is actually easier than the unichromosomal problem. Still other variants of these problems are NP-complete, including the DCJ double distance problem, previously mentioned as an open question. We list the remaining open problems. CONCLUSION This theoretical study clears up a wide swathe of the algorithmical study of genome rearrangements with multiple multichromosomal genomes.
Collapse
Affiliation(s)
- Eric Tannier
- INRIA Rhône-Alpes, Inovallée, 655 avenue de l'Europe, Montbonnot, 38 334 Saint Ismier Cedex, France
- Université de Lyon, F-69000, Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, F-69622, Villeurbanne, France
| | - Chunfang Zheng
- Department of Biology and Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada K1N 6N5
| | - David Sankoff
- Department of Biology and Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada K1N 6N5
| |
Collapse
|
50
|
|