101
|
Mácha J, Teichmanová R, Sater AK, Wells DE, Tlapáková T, Zimmerman LB, Krylov V. Deep ancestry of mammalian X chromosome revealed by comparison with the basal tetrapod Xenopus tropicalis. BMC Genomics 2012; 13:315. [PMID: 22800176 PMCID: PMC3472169 DOI: 10.1186/1471-2164-13-315] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2012] [Accepted: 06/25/2012] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND The X and Y sex chromosomes are conspicuous features of placental mammal genomes. Mammalian sex chromosomes arose from an ordinary pair of autosomes after the proto-Y acquired a male-determining gene and degenerated due to suppression of X-Y recombination. Analysis of earlier steps in X chromosome evolution has been hampered by the long interval between the origins of teleost and amniote lineages as well as scarcity of X chromosome orthologs in incomplete avian genome assemblies. RESULTS This study clarifies the genesis and remodelling of the Eutherian X chromosome by using a combination of sequence analysis, meiotic map information, and cytogenetic localization to compare amniote genome organization with that of the amphibian Xenopus tropicalis. Nearly all orthologs of human X genes localize to X. tropicalis chromosomes 2 and 8, consistent with an ancestral X-conserved region and a single X-added region precursor. This finding contradicts a previous hypothesis of three evolutionary strata in this region. Homologies between human, opossum, chicken and frog chromosomes suggest a single X-added region predecessor in therian mammals, corresponding to opossum chromosomes 4 and 7. A more ancient X-added ancestral region, currently extant as a major part of chicken chromosome 1, is likely to have been present in the progenitor of synapsids and sauropsids. Analysis of X chromosome gene content emphasizes conservation of single protein coding genes and the role of tandem arrays in formation of novel genes. CONCLUSIONS Chromosomal regions orthologous to Therian X chromosomes have been located in the genome of the frog X. tropicalis. These X chromosome ancestral components experienced a series of fusion and breakage events to give rise to avian autosomes and mammalian sex chromosomes. The early branching tetrapod X. tropicalis' simple diploid genome and robust synteny to amniotes greatly enhances studies of vertebrate chromosome evolution.
Collapse
Affiliation(s)
- Jaroslav Mácha
- Department of Cell Biology, Faculty of Science, Charles University in Prague, Vinicna 7, Prague 2, Czech Republic
| | - Radka Teichmanová
- Department of Cell Biology, Faculty of Science, Charles University in Prague, Vinicna 7, Prague 2, Czech Republic
| | - Amy K Sater
- Department of Biology and Biochemistry, University of Houston, Houston, TX, 77204-5001, USA
| | - Dan E Wells
- Department of Biology and Biochemistry, University of Houston, Houston, TX, 77204-5001, USA
| | - Tereza Tlapáková
- Department of Cell Biology, Faculty of Science, Charles University in Prague, Vinicna 7, Prague 2, Czech Republic
| | - Lyle B Zimmerman
- Division of Developmental Biology, MRC-National Institute for Medical Research, Mill Hill, London, NW7 1AA, UK
| | - Vladimír Krylov
- Department of Cell Biology, Faculty of Science, Charles University in Prague, Vinicna 7, Prague 2, Czech Republic
| |
Collapse
|
102
|
Abstract
Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction, and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses, such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make the most effective use of our rapidly growing databases of whole genomes.
Collapse
Affiliation(s)
- Colin N Dewey
- Biostatistics and Medical Informatics and Computer Sciences, Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
103
|
Zheng C, Sankoff D. Gene order in rosid phylogeny, inferred from pairwise syntenies among extant genomes. BMC Bioinformatics 2012; 13 Suppl 10:S9. [PMID: 22759433 PMCID: PMC3389459 DOI: 10.1186/1471-2105-13-s10-s9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Background Ancestral gene order reconstruction for flowering plants has lagged behind developments in yeasts, insects and higher animals, because of the recency of widespread plant genome sequencing, sequencers' embargoes on public data use, paralogies due to whole genome duplication (WGD) and fractionation of undeleted duplicates, extensive paralogy from other sources, and the computational cost of existing methods. Results We address these problems, using the gene order of four core eudicot genomes (cacao, castor bean, papaya and grapevine) that have escaped any recent WGD events, and two others (poplar and cucumber) that descend from independent WGDs, in inferring the ancestral gene order of the rosid clade and those of its main subgroups, the fabids and malvids. We improve and adapt techniques including the OMG method for extracting large, paralogy-free, multiple orthologies from conflated pairwise synteny data among the six genomes and the PATHGROUPS approach for ancestral gene order reconstruction in a given phylogeny, where some genomes may be descendants of WGD events. We use the gene order evidence to evaluate the hypothesis that the order Malpighiales belongs to the malvids rather than as traditionally assigned to the fabids. Conclusions Gene orders of ancestral eudicot species, involving 10,000 or more genes can be reconstructed in an efficient, parsimonious and consistent way, despite paralogies due to WGD and other processes. Pairwise genomic syntenies provide appropriate input to a parameter-free procedure of multiple ortholog identification followed by gene-order reconstruction in solving instances of the "small phylogeny" problem.
Collapse
Affiliation(s)
- Chunfang Zheng
- Department of Mathematics and Statistics, University of Ottawa, Canada
| | | |
Collapse
|
104
|
Romanenko SA, Volobouev V. Non-Sciuromorph rodent karyotypes in evolution. Cytogenet Genome Res 2012; 137:233-45. [PMID: 22699115 DOI: 10.1159/000339294] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Rodents are, taxonomically, the most species-rich mammalian order. They display a series of special genomic features including the highest karyotypic diversity, frequent occurrence of complex intraspecies chromosome variability, and a variety of unusual chromosomal sex determination mechanisms not encountered in other mammalian taxa. Rodents also have an abundance of cytochemically heterogeneous heterochromatin. There are also instances of extremely rapid karyotype reorganization and speciation not accompanied by significant genetic differentiation. All these peculiarities make it clear that a detailed study of rodent genomic evolution is indispensable to understand the mode and tempo of mammalian evolution. The aim of this review is to update the data obtained by classical and molecular cytogenetics as well as comparative genomics in order to outline the range of old and emerging problems that remain to be resolved.
Collapse
Affiliation(s)
- S A Romanenko
- Institute of Molecular and Cellular Biology, SB RAS, Novosibirsk, Russia.
| | | |
Collapse
|
105
|
Wittler R, Maňuch J, Patterson M, Stoye J. Consistency of sequence-based gene clusters. J Comput Biol 2012; 18:1023-39. [PMID: 21899413 DOI: 10.1089/cmb.2011.0083] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In comparative genomics, differences or similarities of gene orders are determined to predict functional relations of genes or phylogenetic relations of genomes. For this purpose, various combinatorial models can be used to specify gene clusters--groups of genes that are co-located in a set of genomes. Several approaches have been proposed to reconstruct putative ancestral gene clusters based on the gene order of contemporary species. One prevalent and natural reconstruction criterion is consistency: For a set of reconstructed gene clusters, there should exist a gene order that comprises all given clusters. For permutation-based gene cluster models, efficient methods exist to verify this condition. In this article, we discuss the consistency problem for different gene cluster models on sequences with restricted gene multiplicities. Our results range from linear-time algorithms for the simple model of adjacencies to NP-completeness proofs for more complex models like common intervals.
Collapse
Affiliation(s)
- Roland Wittler
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada.
| | | | | | | |
Collapse
|
106
|
Swain MT, Larkin DM, Caffrey CR, Davies SJ, Loukas A, Skelly PJ, Hoffmann KF. Schistosoma comparative genomics: integrating genome structure, parasite biology and anthelmintic discovery. Trends Parasitol 2011; 27:555-64. [PMID: 22024648 PMCID: PMC3223292 DOI: 10.1016/j.pt.2011.09.003] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2011] [Revised: 09/09/2011] [Accepted: 09/20/2011] [Indexed: 12/11/2022]
Abstract
Schistosoma genomes provide a comprehensive resource for identifying the molecular processes that shape parasite evolution and for discovering novel chemotherapeutic or immunoprophylactic targets. Here, we demonstrate how intragenus and intergenus comparative genomics can be used to drive these investigations forward, illustrate the advantages and limitations of these approaches and review how post-genomic technologies offer complementary strategies for genome characterisation. Although sequencing and functional characterisation of other schistosome/platyhelminth genomes continues to expedite anthelmintic discovery, we contend that future priorities should equally focus on improving assembly quality, and chromosomal assignment, of existing schistosome/platyhelminth genomes.
Collapse
Affiliation(s)
- Martin T Swain
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, UK
| | | | | | | | | | | | | |
Collapse
|
107
|
Abstract
This review summarizes aspects of the extensive literature on the patterns and processes underpinning chromosomal evolution in vertebrates and especially placental mammals. It highlights the growing synergy between molecular cytogenetics and comparative genomics, particularly with respect to fully or partially sequenced genomes, and provides novel insights into changes in chromosome number and structure across deep division of the vertebrate tree of life. The examination of basal numbers in the deeper branches of the vertebrate tree suggest a haploid (n) chromosome number of 10-13 in an ancestral vertebrate, with modest increases in tetrapods and amniotes most probably by chromosomal fissioning. Information drawn largely from cross-species chromosome painting in the data-dense Placentalia permits the confident reconstruction of an ancestral karyotype comprising n=23 chromosomes that is similarly retained in Boreoeutheria. Using in silico genome-wide scans that include the newly released frog genome we show that of the nine ancient syntenies detected in conserved karyotypes of extant placentals (thought likely to reflect the structure of ancestral chromosomes), the human syntenic segmental associations 3p/21, 4pq/8p, 7a/16p, 14/15, 12qt/22q and 12pq/22qt predate the divergence of tetrapods. These findings underscore the enhanced quality of ancestral reconstructions based on the integrative molecular cytogenetic and comparative genomic approaches that collectively highlight a pattern of conserved syntenic associations that extends back ∼360 million years ago.
Collapse
|
108
|
Farré M, Bosch M, López-Giráldez F, Ponsà M, Ruiz-Herrera A. Assessing the role of tandem repeats in shaping the genomic architecture of great apes. PLoS One 2011; 6:e27239. [PMID: 22076140 PMCID: PMC3208591 DOI: 10.1371/journal.pone.0027239] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2011] [Accepted: 10/12/2011] [Indexed: 11/18/2022] Open
Abstract
Background Ancestral reconstructions of mammalian genomes have revealed that evolutionary breakpoint regions are clustered in regions that are more prone to break and reorganize. What is still unclear to evolutionary biologists is whether these regions are physically unstable due solely to sequence composition and/or genome organization, or do they represent genomic areas where the selection against breakpoints is minimal. Methodology and Principal Findings Here we present a comprehensive study of the distribution of tandem repeats in great apes. We analyzed the distribution of tandem repeats in relation to the localization of evolutionary breakpoint regions in the human, chimpanzee, orangutan and macaque genomes. We observed an accumulation of tandem repeats in the genomic regions implicated in chromosomal reorganizations. In the case of the human genome our analyses revealed that evolutionary breakpoint regions contained more base pairs implicated in tandem repeats compared to synteny blocks, being the AAAT motif the most frequently involved in evolutionary regions. We found that those AAAT repeats located in evolutionary regions were preferentially associated with Alu elements. Significance Our observations provide evidence for the role of tandem repeats in shaping mammalian genome architecture. We hypothesize that an accumulation of specific tandem repeats in evolutionary regions can promote genome instability by altering the state of the chromatin conformation or by promoting the insertion of transposable elements.
Collapse
Affiliation(s)
- Marta Farré
- Departament de Biologia Cel·lular, Fisiologia i Immunologia, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| | | | - Francesc López-Giráldez
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
| | - Montserrat Ponsà
- Departament de Biologia Cel·lular, Fisiologia i Immunologia, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| | - Aurora Ruiz-Herrera
- Departament de Biologia Cel·lular, Fisiologia i Immunologia, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
- Institut de Biotecnologia i Biomedicina (IBB), Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
- * E-mail:
| |
Collapse
|
109
|
Hou M, Yao P, Antonou A, Johns MA. Pico-inplace-inversions between human and chimpanzee. Bioinformatics 2011; 27:3266-75. [PMID: 21994225 DOI: 10.1093/bioinformatics/btr566] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION There have been several studies on the micro-inversions between human and chimpanzee, but there are large discrepancies among their results. Furthermore, all of them rely on alignment procedures or existing alignment results to identify inversions. However, the core alignment procedures do not take very small inversions into consideration. Therefore, their analyses cannot find inversions that are too small to be detected by a classic aligner. We call such inversions pico-inversions. RESULTS We re-analyzed human-chimpanzee alignment from the UCSC Genome Browser for micro-inplace-inversions and screened for pico-inplace-inversions using a likelihood ratio test. We report that the quantity of inplace-inversions between human and chimpanzee is substantially greater than what had previously been discovered. We also present the software tool PicoInversionMiner to detect pico-inplace-inversions between closely related species. AVAILABILITY Software tools, scripts and result data are available at http://faculty.cs.niu.edu/~hou/PicoInversion.html. CONTACT mhou@cs.niu.edu.
Collapse
Affiliation(s)
- Minmei Hou
- Department of Computer Science, Northern Illinois University, DeKalb, IL 60115, USA.
| | | | | | | |
Collapse
|
110
|
Graphodatsky AS, Trifonov VA, Stanyon R. The genome diversity and karyotype evolution of mammals. Mol Cytogenet 2011; 4:22. [PMID: 21992653 PMCID: PMC3204295 DOI: 10.1186/1755-8166-4-22] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2011] [Accepted: 10/12/2011] [Indexed: 01/30/2023] Open
Abstract
The past decade has witnessed an explosion of genome sequencing and mapping in evolutionary diverse species. While full genome sequencing of mammals is rapidly progressing, the ability to assemble and align orthologous whole chromosome regions from more than a few species is still not possible. The intense focus on building of comparative maps for companion (dog and cat), laboratory (mice and rat) and agricultural (cattle, pig, and horse) animals has traditionally been used as a means to understand the underlying basis of disease-related or economically important phenotypes. However, these maps also provide an unprecedented opportunity to use multispecies analysis as a tool for inferring karyotype evolution. Comparative chromosome painting and related techniques are now considered to be the most powerful approaches in comparative genome studies. Homologies can be identified with high accuracy using molecularly defined DNA probes for fluorescence in situ hybridization (FISH) on chromosomes of different species. Chromosome painting data are now available for members of nearly all mammalian orders. In most orders, there are species with rates of chromosome evolution that can be considered as 'default' rates. The number of rearrangements that have become fixed in evolutionary history seems comparatively low, bearing in mind the 180 million years of the mammalian radiation. Comparative chromosome maps record the history of karyotype changes that have occurred during evolution. The aim of this review is to provide an overview of these recent advances in our endeavor to decipher the karyotype evolution of mammals by integrating the published results together with some of our latest unpublished results.
Collapse
|
111
|
Maňuch J, Patterson M. The complexity of the gapped consecutive-ones property problem for matrices of bounded maximum degree. J Comput Biol 2011; 18:1243-53. [PMID: 21899429 DOI: 10.1089/cmb.2011.0128] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The Gapped Consecutive-Ones Property (C1P) Problem, or the (k, δ)-C1P Problem is: given a binary matrix M and integers k and δ, decide if the columns of M can be ordered such that each row contains at most k blocks of 1's, and no two neighboring blocks of 1's are separated by a gap of more than δ 0's. This problem was introduced by Chauve et al. ( 2009b ). The classical polynomial-time solvable C1P Problem is equivalent to the (1, 0)-C1P problem. It has been shown that, for every unbounded or bounded k ≥ 2 and unbounded or bounded δ ≥ 1, except when (k, δ) = (2, 1), the (k, δ)-C1P Problem is NP-complete (Maňuch et al., 2011 ; Goldberg et al., 1995 ). In this article, we study the Gapped C1P Problem with a third parameter d, namely the bound on the maximum number of 1's in any row of M, or the bound on the maximum degree of M. This is motivated by the reconstruction of ancestral genomes (Ma et al., 2006 ; Chauve and Tannier, 2008 ), where, in binary matrices obtained from the experiments of Chauve and Tannier ( 2008 ), we have observed that the majority of the rows have low degree, while each high degree row contains many rows of low degree. The (d, k, δ)-C1P Problem has been shown to be polynomial-time solvable when all three parameters are fixed (Chauve et al., 2009b ). Since fixing d also fixes k (k ≤ d), the only case left to consider is the case when δ is unbounded, or the (d, k, ∞)-C1P Problem. Here we show that for every d > k ≥ 2, the (d, k, ∞)-C1P Problem is NP-complete.
Collapse
Affiliation(s)
- Ján Maňuch
- Department of Mathematics, Simon Fraser University, Burnaby, Canada
| | | |
Collapse
|
112
|
Feijão P, Meidanis J. SCJ: a breakpoint-like distance that simplifies several rearrangement problems. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1318-1329. [PMID: 21339538 DOI: 10.1109/tcbb.2011.34] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
The breakpoint distance is one of the most straightforward genome comparison measures. Surprisingly, when it comes to defining it precisely for multichromosomal genomes with both linear and circular chromosomes, there is more than one way to go about it. Pevzner and Tesler gave a definition in a 2003 paper, Tannier et al. defined it differently in 2008, and in this paper we provide yet another alternative, calling it SCJ for single-cut-or-join, in analogy to the popular double cut and join (DCJ) measure. We show that several genome rearrangement problems, such as median and halving, become easy for SCJ, and provide linear and higher polynomial time algorithms for them. For the multichromosomal linear genome median problem, this is the first polynomial time algorithm described, since for other distances this problem is NP-hard. In addition, we show that small parsimony under SCJ is also easy, and can be solved by a variant of Fitch's algorithm. In contrast, big parsimony is NP-hard under SCJ. This new distance measure may be of value as a speedily computable, first approximation to distances based on more realistic rearrangement models.
Collapse
Affiliation(s)
- Pedro Feijão
- Institute of Computing, University of Campinas, Brazil.
| | | |
Collapse
|
113
|
Ouangraoua A, Tannier E, Chauve C. Reconstructing the architecture of the ancestral amniote genome. Bioinformatics 2011; 27:2664-71. [DOI: 10.1093/bioinformatics/btr461] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
114
|
Gavranović H, Chauve C, Salse J, Tannier E. Mapping ancestral genomes with massive gene loss: a matrix sandwich problem. Bioinformatics 2011; 27:i257-65. [PMID: 21685079 PMCID: PMC3117370 DOI: 10.1093/bioinformatics/btr224] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
MOTIVATION Ancestral genomes provide a better way to understand the structural evolution of genomes than the simple comparison of extant genomes. Most ancestral genome reconstruction methods rely on universal markers, that is, homologous families of DNA segments present in exactly one exemplar in every considered species. Complex histories of genes or other markers, undergoing duplications and losses, are rarely taken into account. It follows that some ancestors are inaccessible by these methods, such as the proto-monocotyledon whose evolution involved massive gene loss following a whole genome duplication. RESULTS We propose a mapping approach based on the combinatorial notion of 'sandwich consecutive ones matrix', which explicitly takes gene losses into account. We introduce combinatorial optimization problems related to this concept, and propose a heuristic solver and a lower bound on the optimal solution. We use these results to propose a configuration for the proto-chromosomes of the monocot ancestor, and study the accuracy of this configuration. We also use our method to reconstruct the ancestral boreoeutherian genomes, which illustrates that the framework we propose is not specific to plant paleogenomics but is adapted to reconstruct any ancestral genome from extant genomes with heterogeneous marker content. AVAILABILITY Upon request to the authors. CONTACT haris.gavranovic@gmail.com; eric.tannier@inria.fr.
Collapse
Affiliation(s)
- Haris Gavranović
- Faculty of Natural Sciences, University of Sarajevo, Bosnia and Herzegovina.
| | | | | | | |
Collapse
|
115
|
Braun EL, Kimball RT, Han KL, Iuhasz-Velez NR, Bonilla AJ, Chojnowski JL, Smith JV, Bowie RCK, Braun MJ, Hackett SJ, Harshman J, Huddleston CJ, Marks BD, Miglia KJ, Moore WS, Reddy S, Sheldon FH, Witt CC, Yuri T. Homoplastic microinversions and the avian tree of life. BMC Evol Biol 2011; 11:141. [PMID: 21612607 PMCID: PMC3123225 DOI: 10.1186/1471-2148-11-141] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2010] [Accepted: 05/25/2011] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Microinversions are cytologically undetectable inversions of DNA sequences that accumulate slowly in genomes. Like many other rare genomic changes (RGCs), microinversions are thought to be virtually homoplasy-free evolutionary characters, suggesting that they may be very useful for difficult phylogenetic problems such as the avian tree of life. However, few detailed surveys of these genomic rearrangements have been conducted, making it difficult to assess this hypothesis or understand the impact of microinversions upon genome evolution. RESULTS We surveyed non-coding sequence data from a recent avian phylogenetic study and found substantially more microinversions than expected based upon prior information about vertebrate inversion rates, although this is likely due to underestimation of these rates in previous studies. Most microinversions were lineage-specific or united well-accepted groups. However, some homoplastic microinversions were evident among the informative characters. Hemiplasy, which reflects differences between gene trees and the species tree, did not explain the observed homoplasy. Two specific loci were microinversion hotspots, with high numbers of inversions that included both the homoplastic as well as some overlapping microinversions. Neither stem-loop structures nor detectable sequence motifs were associated with microinversions in the hotspots. CONCLUSIONS Microinversions can provide valuable phylogenetic information, although power analysis indicates that large amounts of sequence data will be necessary to identify enough inversions (and similar RGCs) to resolve short branches in the tree of life. Moreover, microinversions are not perfect characters and should be interpreted with caution, just as with any other character type. Independent of their use for phylogenetic analyses, microinversions are important because they have the potential to complicate alignment of non-coding sequences. Despite their low rate of accumulation, they have clearly contributed to genome evolution, suggesting that active identification of microinversions will prove useful in future phylogenomic studies.
Collapse
Affiliation(s)
- Edward L Braun
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
| | - Rebecca T Kimball
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
| | - Kin-Lan Han
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
| | | | - Amber J Bonilla
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
| | - Jena L Chojnowski
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
| | - Jordan V Smith
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
| | - Rauri CK Bowie
- Zoology Department, Field Museum of Natural History, 1400 S. Lakeshore Drive, Chicago, IL 60605, USA
- Museum of Vertebrate Zoology and Department of Integrative Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Michael J Braun
- Department of Vertebrate Zoology, Smithsonian Institution, 4210 Silver Hill Road, Suitland, MD 20746, USA
- Behavior, Ecology, Evolution, and Systematics Program, University of Maryland, College Park, MD 20742, USA
| | - Shannon J Hackett
- Zoology Department, Field Museum of Natural History, 1400 S. Lakeshore Drive, Chicago, IL 60605, USA
| | - John Harshman
- Zoology Department, Field Museum of Natural History, 1400 S. Lakeshore Drive, Chicago, IL 60605, USA
- 4869 Pepperwood Way, San Jose, CA 95124, USA
| | - Christopher J Huddleston
- Department of Vertebrate Zoology, Smithsonian Institution, 4210 Silver Hill Road, Suitland, MD 20746, USA
| | - Ben D Marks
- Museum of Natural Science and Department of Biological Sciences, 119 Foster Hall, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Kathleen J Miglia
- Department of Biological Sciences, Wayne State University, 5047 Gullen Mall, Detroit, MI 48202, USA
| | - William S Moore
- Department of Biological Sciences, Wayne State University, 5047 Gullen Mall, Detroit, MI 48202, USA
| | - Sushma Reddy
- Zoology Department, Field Museum of Natural History, 1400 S. Lakeshore Drive, Chicago, IL 60605, USA
- Biology Department, Loyola University Chicago, Chicago, IL 60626, USA
| | - Frederick H Sheldon
- Museum of Natural Science and Department of Biological Sciences, 119 Foster Hall, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Christopher C Witt
- Museum of Natural Science and Department of Biological Sciences, 119 Foster Hall, Louisiana State University, Baton Rouge, LA 70803, USA
- Department of Biology and Museum of Southwestern Biology, University of New Mexico, Albuquerque, NM 87131, USA
| | - Tamaki Yuri
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
- Department of Vertebrate Zoology, Smithsonian Institution, 4210 Silver Hill Road, Suitland, MD 20746, USA
- Sam Noble Oklahoma Museum of Natural History, University of Oklahoma, Norman, OK 73072, USA
| |
Collapse
|
116
|
Ma J. Reconstructing the history of large-scale genomic changes: biological questions and computational challenges. J Comput Biol 2011; 18:879-93. [PMID: 21563973 DOI: 10.1089/cmb.2010.0189] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
In addition to point mutations, larger-scale structural changes (including rearrangements, duplications, insertions, and deletions) are also prevalent between different mammalian genomes. Capturing these large-scale changes is critical to unraveling the history of mammalian evolution in order to better understand the human genome. It also has profound biomedical significance, because many human diseases are associated with structural genomic aberrations. The increasing number of mammalian genomes being sequenced as well as the recent advancement in DNA sequencing technologies are allowing us to identify these structural genomic changes with vastly greater accuracy. However, there are a considerable number of computational challenges related to these problems. In this article, we introduce the ancestral genome reconstruction problem, which enables us to explain the large-scale genomic changes between species in an evolutionary context. The application of these methods to within-species structural variation and disease genome analysis is also discussed. The target audience of this article is advanced undergraduate students in biology.
Collapse
Affiliation(s)
- Jian Ma
- Department of Bioengineering, Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.
| |
Collapse
|
117
|
Chauve C, Hausd UU, Stephen T, You VP. Minimal conflicting sets for the consecutive ones property in ancestral genome reconstruction. J Comput Biol 2011; 17:1167-81. [PMID: 20874402 DOI: 10.1089/cmb.2010.0113] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
A binary matrix has the Consecutive Ones Property (C1P) if its columns can be ordered in such a way that all 1's on each row are consecutive. A Minimal Conflicting Set is a set of rows that does not have the C1P, but every proper subset has the C1P. Such submatrices have been considered in comparative genomics applications, but very little is known about their combinatorial structure and efficient algorithms to compute them. We first describe an algorithm that detects rows that belong to Minimal Conflicting Sets. This algorithm has a polynomial time complexity when the number of 1s in each row of the considered matrix is bounded by a constant. Next, we show that the problem of computing all Minimal Conflicting Sets can be reduced to the joint generation of all minimal true clauses and maximal false clauses for some monotone boolean function. We use these methods on simulated data related to ancestral genome reconstruction to show that computing Minimal Conflicting Set is useful in discriminating between true positive and false positive ancestral syntenies. We also study a dataset of yeast genomes and address the reliability of an ancestral genome proposal of the Saccharomycetaceae yeasts.
Collapse
Affiliation(s)
- Cedric Chauve
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC, Canada.
| | | | | | | |
Collapse
|
118
|
Strong functional patterns in the evolution of eukaryotic genomes revealed by the reconstruction of ancestral protein domain repertoires. Genome Biol 2011; 12:R4. [PMID: 21241503 PMCID: PMC3091302 DOI: 10.1186/gb-2011-12-1-r4] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2010] [Revised: 12/23/2010] [Accepted: 01/17/2011] [Indexed: 12/11/2022] Open
Abstract
Background Genome size and complexity, as measured by the number of genes or protein domains, is remarkably similar in most extant eukaryotes and generally exhibits no correlation with their morphological complexity. Underlying trends in the evolution of the functional content and capabilities of different eukaryotic genomes might be hidden by simultaneous gains and losses of genes. Results We reconstructed the domain repertoires of putative ancestral species at major divergence points, including the last eukaryotic common ancestor (LECA). We show that, surprisingly, during eukaryotic evolution domain losses in general outnumber domain gains. Only at the base of the animal and the vertebrate sub-trees do domain gains outnumber domain losses. The observed gain/loss balance has a distinct functional bias, most strikingly seen during animal evolution, where most of the gains represent domains involved in regulation and most of the losses represent domains with metabolic functions. This trend is so consistent that clustering of genomes according to their functional profiles results in an organization similar to the tree of life. Furthermore, our results indicate that metabolic functions lost during animal evolution are likely being replaced by the metabolic capabilities of symbiotic organisms such as gut microbes. Conclusions While protein domain gains and losses are common throughout eukaryote evolution, losses oftentimes outweigh gains and lead to significant differences in functional profiles. Results presented here provide additional arguments for a complex last eukaryotic common ancestor, but also show a general trend of losses in metabolic capabilities and gain in regulatory complexity during the rise of animals.
Collapse
|
119
|
|
120
|
Alekseyev MA, Pevzner PA. Comparative genomics reveals birth and death of fragile regions in mammalian evolution. Genome Biol 2010; 11:R117. [PMID: 21118492 PMCID: PMC3156956 DOI: 10.1186/gb-2010-11-11-r117] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2010] [Revised: 10/05/2010] [Accepted: 11/30/2010] [Indexed: 12/15/2022] Open
Abstract
Background An important question in genome evolution is whether there exist fragile regions (rearrangement hotspots) where chromosomal rearrangements are happening over and over again. Although nearly all recent studies supported the existence of fragile regions in mammalian genomes, the most comprehensive phylogenomic study of mammals raised some doubts about their existence. Results Here we demonstrate that fragile regions are subject to a birth and death process, implying that fragility has a limited evolutionary lifespan. Conclusions This finding implies that fragile regions migrate to different locations in different mammals, explaining why there exist only a few chromosomal breakpoints shared between different lineages. The birth and death of fragile regions as a phenomenon reinforces the hypothesis that rearrangements are promoted by matching segmental duplications and suggests putative locations of the currently active fragile regions in the human genome.
Collapse
Affiliation(s)
- Max A Alekseyev
- Department of Computer Science & Engineering, University of South Carolina, 301 Main St, Columbia, SC 29208, USA.
| | | |
Collapse
|
121
|
Tuller T, Birin H, Kupiec M, Ruppin E. Reconstructing ancestral genomic sequences by co-evolution: formal definitions, computational issues, and biological examples. J Comput Biol 2010; 17:1327-44. [PMID: 20874411 DOI: 10.1089/cmb.2010.0112] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The inference of ancestral genomes is a fundamental problem in molecular evolution. Due to the statistical nature of this problem, the most likely or the most parsimonious ancestral genomes usually include considerable error rates. In general, these errors cannot be abolished by utilizing more exhaustive computational approaches, by using longer genomic sequences, or by analyzing more taxa. In recent studies, we showed that co-evolution is an important force that can be used for significantly improving the inference of ancestral genome content. In this work we formally define a computational problem for the inference of ancestral genome content by co-evolution. We show that this problem is NP-hard and hard to approximate and present both a Fixed Parameter Tractable (FPT) algorithm, and heuristic approximation algorithms for solving it. The running time of these algorithms on simulated inputs with hundreds of protein families and hundreds of co-evolutionary relations was fast (up to four minutes) and it achieved an approximation ratio of <1.3. We use our approach to study the ancestral genome content of the Fungi. To this end, we implement our approach on a dataset of 33, 931 protein families and 20, 317 co-evolutionary relations. Our algorithm added and removed hundreds of proteins from the ancestral genomes inferred by maximum likelihood (ML) or maximum parsimony (MP) while slightly affecting the likelihood/parsimony score of the results. A biological analysis revealed various pieces of evidence that support the biological plausibility of the new solutions. In addition, we showed that our approach reconstructs missing values at the leaves of the Fungi evolutionary tree better than ML or MP.
Collapse
Affiliation(s)
- Tamir Tuller
- Faculty of Mathematics and Computer Science, Weizmann Institute of Science, Rehovot, Israel
| | | | | | | |
Collapse
|
122
|
Bergeron A, Medvedev P, Stoye J. Rearrangement models and single-cut operations. J Comput Biol 2010; 17:1213-25. [PMID: 20874405 DOI: 10.1089/cmb.2010.0091] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
There have been many widely used genome rearrangement models, such as reversals, Hannenhalli-Pevzner (HP), and double-cut and join. Though each one can be precisely defined, the general notion of a model remains undefined. In this paper, we give a formal set-theoretic definition, which allows us to investigate and prove relationships between distances under various existing and new models. Among our results is that sorting in the HP model is equivalent to sorting in the reversal model when the initial and final genomes are linear uni-chromosomal. We also initiate the formal study of single-cut operations by giving a linear time algorithm for the distance problem under a new single-cut and join model.
Collapse
Affiliation(s)
- Anne Bergeron
- Départment d'informatique, Université du Québec à Montréal, Montreal, QC, Canada
| | | | | |
Collapse
|
123
|
Chauve C, Gavranovic H, Ouangraoua A, Tannier E. Yeast Ancestral Genome Reconstructions: The Possibilities of Computational Methods II. J Comput Biol 2010; 17:1097-112. [DOI: 10.1089/cmb.2010.0092] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Affiliation(s)
- Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| | - Haris Gavranovic
- Faculty of Natural Sciences, University of Sarajevo, Sarajevo, Bosnia and Herzegovina
| | - Aida Ouangraoua
- INRIA Lille-Nord-Europe, Université Lille 1, LIFL, UMR CNRS 8022, Villeneuve d'Ascq, France
| | - Eric Tannier
- INRIA Rhône-Alpes, Université de Lyon, Lyon, and Université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne, France
| |
Collapse
|
124
|
Pham SK, Pevzner PA. DRIMM-Synteny: decomposing genomes into evolutionary conserved segments. ACTA ACUST UNITED AC 2010; 26:2509-16. [PMID: 20736338 DOI: 10.1093/bioinformatics/btq465] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
MOTIVATION The rapidly increasing set of sequenced genomes highlights the importance of identifying the synteny blocks in multiple and/or highly duplicated genomes. Most synteny block reconstruction algorithms use genes shared over all genomes to construct the synteny blocks for multiple genomes. However, the number of genes shared among all genomes quickly decreases with the increase in the number of genomes. RESULTS We propose the Duplications and Rearrangements In Multiple Mammals (DRIMM)-Synteny algorithm to address this bottleneck and apply it to analyzing genomic architectures of yeast, plant and mammalian genomes. We further combine synteny block generation with rearrangement analysis to reconstruct the ancestral preduplicated yeast genome. CONTACT kspham@cs.ucsd.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Son K Pham
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA.
| | | |
Collapse
|
125
|
Comparative analysis of DNA replication timing reveals conserved large-scale chromosomal architecture. PLoS Genet 2010; 6:e1001011. [PMID: 20617169 PMCID: PMC2895651 DOI: 10.1371/journal.pgen.1001011] [Citation(s) in RCA: 128] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2009] [Accepted: 06/01/2010] [Indexed: 01/02/2023] Open
Abstract
Recent evidence suggests that the timing of DNA replication is coordinated across megabase-scale domains in metazoan genomes, yet the importance of this aspect of genome organization is unclear. Here we show that replication timing is remarkably conserved between human and mouse, uncovering large regions that may have been governed by similar replication dynamics since these species have diverged. This conservation is both tissue-specific and independent of the genomic G+C content conservation. Moreover, we show that time of replication is globally conserved despite numerous large-scale genome rearrangements. We systematically identify rearrangement fusion points and demonstrate that replication time can be locally diverged at these loci. Conversely, rearrangements are shown to be correlated with early replication and physical chromosomal proximity. These results suggest that large chromosomal domains of coordinated replication are shuffled by evolution while conserving the large-scale nuclear architecture of the genome. During S-phase of the cell cycle, chromosomal DNA is replicated in a complex process involving the coordinated activity of thousands of replication forks, each of which duplicates a long stretch of DNA. Recent experiments revealed that the genome is replicating as a mosaic of large-scale early and late chromosomal domains and that this high-level domain organization is correlated with genomic properties like gene density and nucleotide composition. We compared genome-wide replication time maps of compatible human and mouse cells and revealed that their organization into replication domains is highly conserved despite the numerous large-scale genome rearrangements separating the two species. Analysis of recent chromosomal interaction data shows that regions with similar time of replication are more frequently interacting with each other than expected. The data also show that evolutionary rearrangements have predominantly occurred between regions that have similar time of replication and higher-than-expected chromosomal proximity. Our data suggests that the genome, while being continuously rearranged by evolution, maintains a conserved domain organization. Whether this conservation is driven by selection, or is a consequence of the rearrangement process itself, can be resolved by enhancing the comparative approach proposed here.
Collapse
|
126
|
Kuhn A, Dehnert M, Helm WE, Hütt MT. Statistical evidence for ancestral correlation patterns. Biosystems 2010; 100:215-24. [DOI: 10.1016/j.biosystems.2010.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2009] [Revised: 12/15/2009] [Accepted: 03/16/2010] [Indexed: 10/19/2022]
|
127
|
Muffato M, Louis A, Poisnel CE, Roest Crollius H. Genomicus: a database and a browser to study gene synteny in modern and ancestral genomes. Bioinformatics 2010; 26:1119-1121. [PMID: 20185404 PMCID: PMC2853686 DOI: 10.1093/bioinformatics%2fbtq079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2009] [Revised: 02/15/2010] [Accepted: 02/19/2010] [Indexed: 07/03/2023] Open
Abstract
UNLABELLED Comparative genomics remains a pivotal strategy to study the evolution of gene organization, and this primacy is reinforced by the growing number of full genome sequences available in public repositories. Despite this growth, bioinformatic tools available to visualize and compare genomes and to infer evolutionary events remain restricted to two or three genomes at a time, thus limiting the breadth and the nature of the question that can be investigated. Here we present Genomicus, a new synteny browser that can represent and compare unlimited numbers of genomes in a broad phylogenetic view. In addition, Genomicus includes reconstructed ancestral gene organization, thus greatly facilitating the interpretation of the data. AVAILABILITY Genomicus is freely available for online use at http://www.dyogen.ens.fr/genomicus while data can be downloaded at ftp://ftp.biologie.ens.fr/pub/dyogen/genomicus.
Collapse
Affiliation(s)
- Matthieu Muffato
- Dyogen Group, Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Centre National de la Recherche Scientifique UMR8197, Institut National de la Santé et de la Recherche Médicale U1024, 75005 Paris, France
| | | | | | | |
Collapse
|
128
|
Larkin DM. Role of chromosomal rearrangements and conserved chromosome regions in amniote evolution. MOLECULAR GENETICS MICROBIOLOGY AND VIROLOGY 2010. [DOI: 10.3103/s0891416810010015] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
129
|
Muffato M, Louis A, Poisnel CE, Roest Crollius H. Genomicus: a database and a browser to study gene synteny in modern and ancestral genomes. Bioinformatics 2010; 26:1119-21. [PMID: 20185404 PMCID: PMC2853686 DOI: 10.1093/bioinformatics/btq079] [Citation(s) in RCA: 173] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Summary: Comparative genomics remains a pivotal strategy to study the evolution of gene organization, and this primacy is reinforced by the growing number of full genome sequences available in public repositories. Despite this growth, bioinformatic tools available to visualize and compare genomes and to infer evolutionary events remain restricted to two or three genomes at a time, thus limiting the breadth and the nature of the question that can be investigated. Here we present Genomicus, a new synteny browser that can represent and compare unlimited numbers of genomes in a broad phylogenetic view. In addition, Genomicus includes reconstructed ancestral gene organization, thus greatly facilitating the interpretation of the data. Availability: Genomicus is freely available for online use at http://www.dyogen.ens.fr/genomicus while data can be downloaded at ftp://ftp.biologie.ens.fr/pub/dyogen/genomicus Contact:hrc@biologie.ens.fr
Collapse
Affiliation(s)
- Matthieu Muffato
- Dyogen Group, Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Centre National de la Recherche Scientifique UMR8197, Institut National de la Santé et de la Recherche Médicale U1024, 75005 Paris, France
| | | | | | | |
Collapse
|
130
|
Reconstruction of Ancestral Genome Subject to Whole Genome Duplication, Speciation, Rearrangement and Loss. LECTURE NOTES IN COMPUTER SCIENCE 2010. [DOI: 10.1007/978-3-642-15294-8_7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
131
|
Divergent patterns of breakpoint reuse in Muroid rodents. Mamm Genome 2009; 21:77-87. [PMID: 20033182 DOI: 10.1007/s00335-009-9242-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2009] [Accepted: 11/09/2009] [Indexed: 01/09/2023]
Abstract
Multiple Genome Rearrangement (MGR) analysis was used to define the trajectory and pattern of chromosome rearrangement within muroid rodents. MGR was applied using 107 chromosome homologies between Mus, Rattus, Peromyscus, the muroid sister taxon Cricetulus griseus, and Sciurus carolinensis as a non-Muroidea outgroup, with specific attention paid to breakpoint reuse and centromere evolution. This analysis revealed a high level of chromosome breakpoint conservation between Rattus and Peromyscus and indicated that the chromosomes of Mus are highly derived. This analysis identified several conserved evolutionary breakpoints that have been reused multiple times during karyotypic evolution in rodents. Our data demonstrate a high level of reuse of breakpoints among muroid rodents, further supporting the "Fragile Breakage Model" of chromosome evolution. We provide the first analysis of rodent centromeres with respect to evolutionary breakpoints. By analyzing closely related rodent species we were able to clarify muroid rodent karyotypic evolution. We were also able to derive several high-resolution ancestral karyotypes and identify rearrangements specific to various stages of Muroidea evolution. These data were useful in further characterizing lineage-specific modes of chromosome evolution.
Collapse
|
132
|
Bertrand D, Blanchette M, El-Mabrouk N. Genetic map refinement using a comparative genomic approach. J Comput Biol 2009; 16:1475-86. [PMID: 19754272 DOI: 10.1089/cmb.2009.0094] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Following various genetic mapping techniques conducted on different segregating populations, one or more genetic maps are obtained for a given species. However, recombination analyzes and other methods for gene mapping often fail to resolve the ordering of some pairs of neighboring markers, thereby leading to sets of markers ambiguously mapped to the same position. Each individual map is thus a partial order defined on the set of markers, and can be represented as a Directed Acyclic Graph (DAG). In this article, given a phylogenetic tree with a set of DAGs labeling each leaf (species), the goal is to infer, at each leaf, a single combined DAG that is as resolved as possible, considering the complementary information provided by individual maps, and the phylogenetic information provided by the species tree. After combining the individual maps of a leaf into a single DAG, we order incomparable markers by using two successive heuristics for minimizing two distances on the species tree: the breakpoint distance, and the Kemeny distance. We apply our algorithms to the plant species represented in the Gramene database, and we evaluate the simplified maps we obtained.
Collapse
|
133
|
Tuller T, Birin H, Gophna U, Kupiec M, Ruppin E. Reconstructing ancestral gene content by coevolution. Genome Res 2009; 20:122-32. [PMID: 19948819 DOI: 10.1101/gr.096115.109] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Inferring the gene content of ancestral genomes is a fundamental challenge in molecular evolution. Due to the statistical nature of this problem, ancestral genomes inferred by the maximum likelihood (ML) or the maximum-parsimony (MP) methods are prone to considerable error rates. In general, these errors are difficult to abolish by using longer genomic sequences or by analyzing more taxa. This study describes a new approach for improving ancestral genome reconstruction, the ancestral coevolver (ACE), which utilizes coevolutionary information to improve the accuracy of such reconstructions over previous approaches. The principal idea is to reduce the potentially large solution space by choosing a single optimal (or near optimal) solution that is in accord with the coevolutionary relationships between protein families. Simulation experiments, both on artificial and real biological data, show that ACE yields a marked decrease in error rate compared with ML or MP. Applied to a large data set (95 organisms, 4873 protein families, and 10,000 coevolutionary relationships), some of the ancestral genomes reconstructed by ACE were remarkably different in their gene content from those reconstructed by ML or MP alone (more than 10% in some nodes). These reconstructions, while having almost similar likelihood/parsimony scores as those obtained with ML/MP, had markedly higher concordance with the coevolutionary information. Specifically, when ACE was implemented to improve the results of ML, it added a large number of proteins to those encoded by LUCA (last universal common ancestor), most of them ribosomal proteins and components of the F(0)F(1)-type ATP synthase/ATPases, complexes that are vital in most living organisms. Our analysis suggests that LUCA appears to have been bacterial-like and had a genome size similar to the genome sizes of many extant organisms.
Collapse
Affiliation(s)
- Tamir Tuller
- School of Computer Sciences, Tel Aviv University, Ramat Aviv, Israel.
| | | | | | | | | |
Collapse
|
134
|
Jean G, Sherman DJ, Nikolski M. Mining the semantics of genome super-blocks to infer ancestral architectures. J Comput Biol 2009; 16:1267-84. [PMID: 19772437 DOI: 10.1089/cmb.2008.0046] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The study of evolutionary mechanisms is made more and more accurate by the increase in the number of fully sequenced genomes. One of the main problems is to reconstruct plausible ancestral genome architectures based on the comparison of contemporary genomes. Current methods have largely focused on finding complete architectures for ancestral genomes, and, due to the computational difficulty of the problem, stop after a small number of equivalent minimal solutions have been found. Recent results suggest, however, that the set of minimum complete architectures is very large and heterogeneous. In fact these solutions are collections of conserved blocks, freely rearranged. In this paper, we identify these conserved super-blocks, using a new method of analysis of ancestral architectures that reconciles both breakpoint and rearrangement analyses, as well as respects biological constraints. The resulting algorithms permit the first reliable reconstruction of plausible ancestral architectures for several non-WGD yeasts simultaneously, a problem hitherto intractable due to the extensive map reshuffling of these species. See online Supplementary Material at www.liebertonline.com.
Collapse
Affiliation(s)
- Géraldine Jean
- Université Bordeaux , CNRS LaBRI, INRIA project-team MAGNOME, 351 cours de la Libération, 33405 Talence, France
| | | | | |
Collapse
|
135
|
Abstract
The analysis of genome rearrangements provides a global view on the evolution of a set of related species. We present a new algorithm called EMRAE (efficient method to recover ancestral events) to reliably predict a wide-range of rearrangement events in the ancestry of a group of species. Using simulated data sets, we show that EMRAE achieves comparable sensitivity but significantly higher specificity when predicting evolutionary events relative to other tools to study genome rearrangements. We apply our approach to the synteny blocks of six mammalian genomes (human, chimpanzee, rhesus macaque, mouse, rat, and dog) and predict 1109 rearrangement events, including 831 inversions, 15 translocations, 237 transpositions, and 26 fusions/fissions. Studying the sequence features at the breakpoints of the primate rearrangement events, we demonstrate that they are not only enriched in segmental duplications (SDs), but that the enrichment of matching pairs of SDs is even stronger within the pairs of breakpoints associated with recovered events. We also show that pairs of L1 repeats are frequently associated with ancestral inversions across all studied lineages. Together, this substantiates the model that regions of high sequence identity have been associated with rearrangement events throughout the mammalian phylogeny.
Collapse
Affiliation(s)
- Hao Zhao
- Computational and Mathematical Biology, Genome Institute of Singapore, Singapore
| | | |
Collapse
|
136
|
Salse J, Abrouk M, Murat F, Quraishi UM, Feuillet C. Improved criteria and comparative genomics tool provide new insights into grass paleogenomics. Brief Bioinform 2009; 10:619-30. [DOI: 10.1093/bib/bbp037] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
137
|
Bradley RK, Holmes I. Evolutionary triplet models of structured RNA. PLoS Comput Biol 2009; 5:e1000483. [PMID: 19714212 PMCID: PMC2725318 DOI: 10.1371/journal.pcbi.1000483] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2008] [Accepted: 07/23/2009] [Indexed: 12/31/2022] Open
Abstract
The reconstruction and synthesis of ancestral RNAs is a feasible goal for paleogenetics. This will require new bioinformatics methods, including a robust statistical framework for reconstructing histories of substitutions, indels and structural changes. We describe a "transducer composition" algorithm for extending pairwise probabilistic models of RNA structural evolution to models of multiple sequences related by a phylogenetic tree. This algorithm draws on formal models of computational linguistics as well as the 1985 protosequence algorithm of David Sankoff. The output of the composition algorithm is a multiple-sequence stochastic context-free grammar. We describe dynamic programming algorithms, which are robust to null cycles and empty bifurcations, for parsing this grammar. Example applications include structural alignment of non-coding RNAs, propagation of structural information from an experimentally-characterized sequence to its homologs, and inference of the ancestral structure of a set of diverged RNAs. We implemented the above algorithms for a simple model of pairwise RNA structural evolution; in particular, the algorithms for maximum likelihood (ML) alignment of three known RNA structures and a known phylogeny and inference of the common ancestral structure. We compared this ML algorithm to a variety of related, but simpler, techniques, including ML alignment algorithms for simpler models that omitted various aspects of the full model and also a posterior-decoding alignment algorithm for one of the simpler models. In our tests, incorporation of basepair structure was the most important factor for accurate alignment inference; appropriate use of posterior-decoding was next; and fine details of the model were least important. Posterior-decoding heuristics can be substantially faster than exact phylogenetic inference, so this motivates the use of sum-over-pairs heuristics where possible (and approximate sum-over-pairs). For more exact probabilistic inference, we discuss the use of transducer composition for ML (or MCMC) inference on phylogenies, including possible ways to make the core operations tractable.
Collapse
Affiliation(s)
- Robert K. Bradley
- Biophysics Graduate Group, University of California, Berkeley, California, United States of America
| | - Ian Holmes
- Biophysics Graduate Group, University of California, Berkeley, California, United States of America
- Department of Bioengineering, University of California, Berkeley, California, United States of America
- * E-mail:
| |
Collapse
|
138
|
Lemaitre C, Zaghloul L, Sagot MF, Gautier C, Arneodo A, Tannier E, Audit B. Analysis of fine-scale mammalian evolutionary breakpoints provides new insight into their relation to genome organisation. BMC Genomics 2009; 10:335. [PMID: 19630943 PMCID: PMC2722678 DOI: 10.1186/1471-2164-10-335] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2009] [Accepted: 07/24/2009] [Indexed: 11/21/2022] Open
Abstract
Background The Intergenic Breakage Model, which is the current model of structural genome evolution, considers that evolutionary rearrangement breakages happen with a uniform propensity along the genome but are selected against in genes, their regulatory regions and in-between. However, a growing body of evidence shows that there exists regions along mammalian genomes that present a high susceptibility to breakage. We reconsidered this question taking advantage of a recently published methodology for the precise detection of rearrangement breakpoints based on pairwise genome comparisons. Results We applied this methodology between the genome of human and those of five sequenced eutherian mammals which allowed us to delineate evolutionary breakpoint regions along the human genome with a finer resolution (median size 26.6 kb) than obtained before. We investigated the distribution of these breakpoints with respect to genome organisation into domains of different activity. In agreement with the Intergenic Breakage Model, we observed that breakpoints are under-represented in genes. Surprisingly however, the density of breakpoints in small intergenes (1 per Mb) appears significantly higher than in gene deserts (0.1 per Mb). More generally, we found a heterogeneous distribution of breakpoints that follows the organisation of the genome into isochores (breakpoints are more frequent in GC-rich regions). We then discuss the hypothesis that regions with an enhanced susceptibility to breakage correspond to regions of high transcriptional activity and replication initiation. Conclusion We propose a model to describe the heterogeneous distribution of evolutionary breakpoints along human chromosomes that combines natural selection and a mutational bias linked to local open chromatin state.
Collapse
Affiliation(s)
- Claire Lemaitre
- Université de Bordeaux, Centre de Bioinformatique - Génomique Fonctionnelle Bordeaux, F-33000 Bordeaux, France.
| | | | | | | | | | | | | |
Collapse
|
139
|
Donthu R, Lewin HA, Larkin DM. SyntenyTracker: a tool for defining homologous synteny blocks using radiation hybrid maps and whole-genome sequence. BMC Res Notes 2009; 2:148. [PMID: 19627596 PMCID: PMC2726151 DOI: 10.1186/1756-0500-2-148] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2009] [Accepted: 07/23/2009] [Indexed: 12/04/2022] Open
Abstract
Background The recent availability of genomic sequences and BAC libraries for a large number of mammals provides an excellent opportunity for identifying comparatively-anchored markers that are useful for creating high-resolution radiation-hybrid (RH) and BAC-based comparative maps. To use these maps for multispecies genome comparison and evolutionary inference, robust bioinformatic tools are required for the identification of chromosomal regions shared between genomes and to localize the positions of evolutionary breakpoints that are the signatures of chromosomal rearrangements. Here we report an automated tool for the identification of homologous synteny blocks (HSBs) between genomes that tolerates errors common in RH comparative maps and can be used for automated whole-genome analysis of chromosome rearrangements that occur during evolution. Findings We developed an algorithm and software tool (SyntenyTracker) that can be used for automated definition of HSBs using pair-wise RH or gene-based comparative maps as input. To verify correct implementation of the underlying algorithm, SyntenyTracker was used to identify HSBs in the cattle and human genomes. Results demonstrated 96% agreement with HSBs defined manually using the same set of rules. A comparison of SyntenyTracker with the AutoGRAPH synteny tool was performed using identical datasets containing 14,380 genes with 1:1 orthology in human and mouse. Discrepancies between the results using the two tools and advantages of SyntenyTracker are reported. Conclusion SyntenyTracker was shown to be an efficient and accurate automated tool for defining HSBs using datasets that may contain minor errors resulting from limitations in map construction methodologies. The utility of SyntenyTracker will become more important for comparative genomics as the number of mapped and sequenced genomes increases.
Collapse
Affiliation(s)
- Ravikiran Donthu
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA.
| | | | | |
Collapse
|
140
|
Lewin HA, Larkin DM, Pontius J, O'Brien SJ. Every genome sequence needs a good map. Genome Res 2009; 19:1925-8. [PMID: 19596977 DOI: 10.1101/gr.094557.109] [Citation(s) in RCA: 115] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Affiliation(s)
- Harris A Lewin
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA.
| | | | | | | |
Collapse
|
141
|
Cruciform-forming inverted repeats appear to have mediated many of the microinversions that distinguish the human and chimpanzee genomes. Chromosome Res 2009; 17:469-83. [PMID: 19475482 DOI: 10.1007/s10577-009-9039-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2009] [Revised: 04/08/2009] [Accepted: 04/08/2009] [Indexed: 10/20/2022]
Abstract
Submicroscopic inversions have contributed significantly to the genomic divergence between humans and chimpanzees over evolutionary time. Those microinversions which are flanked by segmental duplications (SDs) are presumed to have originated via non-allelic homologous recombination between SDs arranged in inverted orientation. However, the nature of the mechanisms underlying those inversions which are not flanked by SDs remains unclear. We have investigated 35 such inversions, ranging in size from 51-nt to 22056-nt, with the goal of characterizing the DNA sequences in the breakpoint-flanking regions. Using the macaque genome as an outgroup, we determined the lineage specificity of these inversions and noted that the majority (N = 31; 89%) were associated with deletions (of length between 1-nt and 6754-nt) immediately adjacent to one or both inversion breakpoints. Overrepresentations of both direct and inverted repeats, >or= 6-nt in length and capable of non-B DNA structure formation, were noted in the vicinity of breakpoint junctions suggesting that these repeats could have contributed to double strand breakage. Inverted repeats capable of cruciform structure formation were also found to be a common feature of the inversion breakpoint-flanking regions, consistent with these inversions having originated through the resolution of Holliday junction-like cruciforms. Sequences capable of non-B DNA structure formation have previously been implicated in promoting gross deletions and translocations causing human genetic disease. We conclude that non-B DNA forming sequences may also have promoted the occurrence of mutations in an evolutionary context, giving rise to at least some of the inversion/deletions which now serve to distinguish the human and chimpanzee genomes.
Collapse
|
142
|
Alekseyev MA, Pevzner PA. Breakpoint graphs and ancestral genome reconstructions. Genes Dev 2009; 19:943-57. [PMID: 19218533 PMCID: PMC2675983 DOI: 10.1101/gr.082784.108] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2008] [Accepted: 01/22/2009] [Indexed: 11/24/2022]
Abstract
Recently completed whole-genome sequencing projects marked the transition from gene-based phylogenetic studies to phylogenomics analysis of entire genomes. We developed an algorithm MGRA for reconstructing ancestral genomes and used it to study the rearrangement history of seven mammalian genomes: human, chimpanzee, macaque, mouse, rat, dog, and opossum. MGRA relies on the notion of the multiple breakpoint graphs to overcome some limitations of the existing approaches to ancestral genome reconstructions. MGRA also generates the rearrangement-based characters guiding the phylogenetic tree reconstruction when the phylogeny is unknown.
Collapse
Affiliation(s)
- Max A. Alekseyev
- Department of Computer Science and Engineering, University of California at San Diego, La Jolla, California 92093-0404, USA
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California at San Diego, La Jolla, California 92093-0404, USA
| |
Collapse
|
143
|
Kemkemer C, Kohn M, Cooper DN, Froenicke L, Högel J, Hameister H, Kehrer-Sawatzki H. Gene synteny comparisons between different vertebrates provide new insights into breakage and fusion events during mammalian karyotype evolution. BMC Evol Biol 2009; 9:84. [PMID: 19393055 PMCID: PMC2681463 DOI: 10.1186/1471-2148-9-84] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2008] [Accepted: 04/24/2009] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Genome comparisons have made possible the reconstruction of the eutherian ancestral karyotype but also have the potential to provide new insights into the evolutionary inter-relationship of the different eutherian orders within the mammalian phylogenetic tree. Such comparisons can additionally reveal (i) the nature of the DNA sequences present within the evolutionary breakpoint regions and (ii) whether or not the evolutionary breakpoints occur randomly across the genome. Gene synteny analysis (E-painting) not only greatly reduces the complexity of comparative genome sequence analysis but also extends its evolutionary reach. RESULTS E-painting was used to compare the genome sequences of six different mammalian species and chicken. A total of 526 evolutionary breakpoint intervals were identified and these were mapped to a median resolution of 120 kb, the highest level of resolution so far obtained. A marked correlation was noted between evolutionary breakpoint frequency and gene density. This correlation was significant not only at the chromosomal level but also sub-chromosomally when comparing genome intervals of lengths as short as 40 kb. Contrary to previous findings, a comparison of evolutionary breakpoint locations with the chromosomal positions of well mapped common fragile sites and cancer-associated breakpoints failed to reveal any evidence for significant co-location. Primate-specific chromosomal rearrangements were however found to occur preferentially in regions containing segmental duplications and copy number variants. CONCLUSION Specific chromosomal regions appear to be prone to recurring rearrangement in different mammalian lineages ('breakpoint reuse') even if the breakpoints themselves are likely to be non-identical. The putative ancestral eutherian genome, reconstructed on the basis of the synteny analysis of 7 vertebrate genome sequences, not only confirmed the results of previous molecular cytogenetic studies but also increased the definition of the inferred structure of ancestral eutherian chromosomes. For the first time in such an analysis, the opossum was included as an outgroup species. This served to confirm our previous model of the ancestral eutherian genome since all ancestral syntenic segment associations were also noted in this marsupial.
Collapse
Affiliation(s)
- Claus Kemkemer
- Institute of Human Genetics, University of Ulm, 89081 Ulm, Germany
- LMU München, Biozentrum Martinsried, München, Germany
| | - Matthias Kohn
- Institute of Human Genetics, University of Ulm, 89081 Ulm, Germany
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, UK
| | - Lutz Froenicke
- Dept. of Population Health and Reproduction, School of Veterinary Medicine, University of California, Davis, CA 95616, USA
| | - Josef Högel
- Institute of Human Genetics, University of Ulm, 89081 Ulm, Germany
| | - Horst Hameister
- Institute of Human Genetics, University of Ulm, 89081 Ulm, Germany
| | | |
Collapse
|
144
|
Larkin DM, Pape G, Donthu R, Auvil L, Welge M, Lewin HA. Breakpoint regions and homologous synteny blocks in chromosomes have different evolutionary histories. Genome Res 2009; 19:770-7. [PMID: 19342477 DOI: 10.1101/gr.086546.108] [Citation(s) in RCA: 135] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
The persistence of large blocks of homologous synteny and a high frequency of breakpoint reuse are distinctive features of mammalian chromosomes that are not well understood in evolutionary terms. To gain a better understanding of the evolutionary forces that affect genome architecture, synteny relationships among 10 amniotes (human, chimp, macaque, rat, mouse, pig, cattle, dog, opossum, and chicken) were compared at <1 human-Mbp resolution. Homologous synteny blocks (HSBs; N = 2233) and chromosome evolutionary breakpoint regions (EBRs; N = 1064) were identified from pairwise comparisons of all genomes. Analysis of the size distribution of HSBs shared in all 10 species' chromosomes (msHSBs) identified three (>20 Mbp) that are larger than expected by chance. Gene network analysis of msHSBs >3 human-Mbp and EBRs <1 Mbp demonstrated that msHSBs are significantly enriched for genes involved in development of the central nervous and other organ systems, whereas EBRs are enriched for genes associated with adaptive functions. In addition, we found EBRs are significantly enriched for structural variations (segmental duplications, copy number variants, and indels), retrotransposed and zinc finger genes, and single nucleotide polymorphisms. These results demonstrate that chromosome breakage in evolution is nonrandom and that HSBs and EBRs are evolving in distinctly different ways. We suggest that natural selection acts on the genome to maintain combinations of genes and their regulatory elements that are essential to fundamental processes of amniote development and biological organization. Furthermore, EBRs may be used extensively to generate new genetic variation and novel combinations of genes and regulatory elements that contribute to adaptive phenotypes.
Collapse
Affiliation(s)
- Denis M Larkin
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | | | | | | | | | | |
Collapse
|
145
|
Clemente JC, Ikeo K, Valiente G, Gojobori T. Optimized ancestral state reconstruction using Sankoff parsimony. BMC Bioinformatics 2009; 10:51. [PMID: 19200389 PMCID: PMC2677398 DOI: 10.1186/1471-2105-10-51] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2008] [Accepted: 02/07/2009] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Parsimony methods are widely used in molecular evolution to estimate the most plausible phylogeny for a set of characters. Sankoff parsimony determines the minimum number of changes required in a given phylogeny when a cost is associated to transitions between character states. Although optimizations exist to reduce the computations in the number of taxa, the original algorithm takes time O(n(2)) in the number of states, making it impractical for large values of n. RESULTS In this study we introduce an optimization of Sankoff parsimony for the reconstruction of ancestral states when ultrametric or additive cost matrices are used. We analyzed its performance for randomly generated matrices, Jukes-Cantor and Kimura's two-parameter models of DNA evolution, and in the reconstruction of elongation factor-1alpha and ancestral metabolic states of a group of eukaryotes, showing that in all cases the execution time is significantly less than with the original implementation. CONCLUSION The algorithms here presented provide a fast computation of Sankoff parsimony for a given phylogeny. Problems where the number of states is large, such as reconstruction of ancestral metabolism, are particularly adequate for this optimization. Since we are reducing the computations required to calculate the parsimony cost of a single tree, our method can be combined with optimizations in the number of taxa that aim at finding the most parsimonious tree.
Collapse
Affiliation(s)
- José C Clemente
- Center for Information Biology and DNA Databank of Japan, National Institute of Genetics, Yata 1111, Mishima, Japan
| | - Kazuho Ikeo
- Center for Information Biology and DNA Databank of Japan, National Institute of Genetics, Yata 1111, Mishima, Japan
| | | | - Takashi Gojobori
- Center for Information Biology and DNA Databank of Japan, National Institute of Genetics, Yata 1111, Mishima, Japan
| |
Collapse
|
146
|
Hachiya T, Osana Y, Popendorf K, Sakakibara Y. Accurate identification of orthologous segments among multiple genomes. Bioinformatics 2009; 25:853-60. [DOI: 10.1093/bioinformatics/btp070] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
147
|
Abstract
Background Reconstructing complete ancestral genomes (at least in terms of their gene inventory and arrangement) is attracting much interest due to the rapidly increasing availability of whole genome sequences. While modest successes have been reported for mammalian and even vertebrate genomes, more divergent groups continue to pose a stiff challenge, mostly because current models of genomic evolution support too many choices. Results We describe a novel type of genomic signature based on rearrangements that characterizes evolutionary changes that must be common to all minimal rearrangement scenarios; by focusing on global patterns of rearrangements, such signatures bypass individual variations and sharply restrict the search space. We present the results of extensive simulation studies demonstrating that these signatures can be used to reconstruct accurate ancestral genomes and phylogenies even for widely divergent collections. Conclusion Focusing on genome triples rather than genomes pairs unleashes the full power of evolutionary analysis. Our genomic signature captures shared evolutionary events and thus can form the basis of a robust analysis and reconstruction of evolutionary history.
Collapse
Affiliation(s)
- Krister M Swenson
- Laboratory for Computational Biology and Bioinformatics, EPFL (Swiss Federal Institute of Technology), EPFL-IC-LCBB, INJ 230, Station 14, CH-1014 Lausanne, Switzerland.
| | | |
Collapse
|
148
|
Dubchak I, Poliakov A, Kislyuk A, Brudno M. Multiple whole-genome alignments without a reference organism. Genome Res 2009; 19:682-9. [PMID: 19176791 DOI: 10.1101/gr.081778.108] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and six Drosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families-perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.
Collapse
Affiliation(s)
- Inna Dubchak
- Genome Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | | | | | | |
Collapse
|
149
|
Tannier E. Yeast Ancestral Genome Reconstructions: The Possibilities of Computational Methods. COMPARATIVE GENOMICS 2009. [DOI: 10.1007/978-3-642-04744-2_1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
150
|
Peng Q, Alekseyev MA, Tesler G, Pevzner PA. Decoding Synteny Blocks and Large-Scale Duplications in Mammalian and Plant Genomes. LECTURE NOTES IN COMPUTER SCIENCE 2009. [DOI: 10.1007/978-3-642-04241-6_19] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|