151
|
Alekseyev MA. Multi-break rearrangements and breakpoint re-uses: from circular to linear genomes. J Comput Biol 2008; 15:1117-31. [PMID: 18788907 DOI: 10.1089/cmb.2008.0080] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Multi-break rearrangements break a genome into multiple fragments and further glue them together in a new order. While 2-break rearrangements represent standard reversals, fusions, fissions, and translocations, 3-break rearrangements represent a natural generalization of transpositions. Alekseyev and Pevzner (2007a, 2008a) studied multi-break rearrangements in circular genomes and further applied them to the analysis of chromosomal evolution in mammalian genomes. In this paper, we extend these results to the more difficult case of linear genomes. In particular, we give lower bounds for the rearrangement distance between linear genomes and for the breakpoint re-use rate as functions of the number and proportion of transpositions. We further use these results to analyze comparative genomic architecture of mammalian genomes.
Collapse
Affiliation(s)
- Max A Alekseyev
- Department of Computer Science and Engineering, University of California at San Diego, La Jolla, California 92093, USA.
| |
Collapse
|
152
|
Ma J, Ratan A, Raney BJ, Suh BB, Zhang L, Miller W, Haussler D. DUPCAR: reconstructing contiguous ancestral regions with duplications. J Comput Biol 2008; 15:1007-27. [PMID: 18774902 DOI: 10.1089/cmb.2008.0069] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Accurately reconstructing the large-scale gene order in an ancestral genome is a critical step to better understand genome evolution. In this paper, we propose a heuristic algorithm, called DUPCAR, for reconstructing ancestral genomic orders with duplications. The method starts from the order of genes in modern genomes and predicts predecessor and successor relationships in the ancestor. Then a greedy algorithm is used to reconstruct the ancestral orders by connecting genes into contiguous regions based on predicted adjacencies. Computer simulation was used to validate the algorithm. We also applied the method to reconstruct the ancestral chromosome X of placental mammals and the ancestral genomes of the ciliate Paramecium tetraurelia.
Collapse
Affiliation(s)
- Jian Ma
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, CA 95064, USA.
| | | | | | | | | | | | | |
Collapse
|
153
|
Bertrand D, Lajoie M, El-Mabrouk N. Inferring ancestral gene orders for a family of tandemly arrayed genes. J Comput Biol 2008; 15:1063-77. [PMID: 18781832 DOI: 10.1089/cmb.2008.0025] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Tandemly arrayed genes (TAG) constitute a large fraction of most genomes and play important biological roles. They evolve through unequal recombination, which places duplicated genes next to the original ones (tandem duplications). Many algorithms have been proposed to infer a tandem duplication history for a TAG cluster. However, the presence of different transcriptional orientations in many clusters highlights the fact that processes such as inversions also contribute to their evolution. Moreover, existing algorithms are restricted to the study of TAGs evolution in a single species (only paralogous genes are considered). To circumvent these limitations, we consider an evolutionary model for TAGs involving duplication, gene loss, inversion, and speciation events. A general framework to infer ancestral gene orders that minimize the number of inversions in the whole evolutionary history is presented. At the methodological level, this paper integrates three approaches to genome evolution: the duplication tree reconstruction, the gene tree/species tree reconciliation theory, and the concept of inversion median used in order-based phylogeny reconstruction. An application on a cluster of olfactory receptor genes in four mammals is presented.
Collapse
|
154
|
Chauve C, Tannier E. A methodological framework for the reconstruction of contiguous regions of ancestral genomes and its application to mammalian genomes. PLoS Comput Biol 2008; 4:e1000234. [PMID: 19043541 PMCID: PMC2580819 DOI: 10.1371/journal.pcbi.1000234] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2008] [Accepted: 10/17/2008] [Indexed: 01/07/2023] Open
Abstract
The reconstruction of ancestral genome architectures and gene orders from homologies between extant species is a long-standing problem, considered by both cytogeneticists and bioinformaticians. A comparison of the two approaches was recently investigated and discussed in a series of papers, sometimes with diverging points of view regarding the performance of these two approaches. We describe a general methodological framework for reconstructing ancestral genome segments from conserved syntenies in extant genomes. We show that this problem, from a computational point of view, is naturally related to physical mapping of chromosomes and benefits from using combinatorial tools developed in this scope. We develop this framework into a new reconstruction method considering conserved gene clusters with similar gene content, mimicking principles used in most cytogenetic studies, although on a different kind of data. We implement and apply it to datasets of mammalian genomes. We perform intensive theoretical and experimental comparisons with other bioinformatics methods for ancestral genome segments reconstruction. We show that the method that we propose is stable and reliable: it gives convergent results using several kinds of data at different levels of resolution, and all predicted ancestral regions are well supported. The results come eventually very close to cytogenetics studies. It suggests that the comparison of methods for ancestral genome reconstruction should include the algorithmic aspects of the methods as well as the disciplinary differences in data aquisition. No DNA molecule is preserved after a few hundred thousand years, so inferring the DNA sequence organization of ancient living organisms beyond several million years can only be achieved by computational estimations, using the similarities and differences between chromosomes of extant species. This is the scope of “paleogenomics”, and it can help to better understand how genomes have evolved until today. We propose here a computational framework to estimate contiguous segments of ancestral chromosomes, based on techniques of physical mapping that are used to infer chromosome maps of extant species when their genome is not sequenced. This framework is not guided by possible evolutionary events such as rearrangements but only proposes ancestral genome architectures. We developed a method following this framework and applied it to mammalian genomes. We inferred ancestral chromosomal regions that are stable and well supported at different levels of resolution. These ancestral chromosomal regions agree with previous cytogenetics studies and were very probably part of the genome of the common ancestor of humans, macaca, mice, dogs, and cows, living 120 million years ago. We illustrate, through comparison with other bioinformatics methods, the importance of a formal methodological background when comparing ancestral genome architecture proposals obtained from different methods.
Collapse
Affiliation(s)
- Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Eric Tannier
- INRIA, Rhône-Alpes, France
- Université de Lyon, Lyon, France
- Université Lyon 1, Lyon, France
- Laboratoire de Biométrie et Biologie Évolutive, CNRS, UMR5558, Villeurbanne, France
- * E-mail:
| |
Collapse
|
155
|
Ruiz-Herrera A, Robinson TJ. Evolutionary plasticity and cancer breakpoints in human chromosome 3. Bioessays 2008; 30:1126-37. [DOI: 10.1002/bies.20829] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
156
|
Uchiyama I. Multiple genome alignment for identifying the core structure among moderately related microbial genomes. BMC Genomics 2008; 9:515. [PMID: 18976470 PMCID: PMC2615449 DOI: 10.1186/1471-2164-9-515] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2008] [Accepted: 10/31/2008] [Indexed: 12/04/2022] Open
Abstract
Background Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs) that maximally retains the conserved gene orders. Results The method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes. Conclusion The results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes.
Collapse
Affiliation(s)
- Ikuo Uchiyama
- Department of Theoretical Biology, National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Aichi, Japan.
| |
Collapse
|
157
|
Paten B, Herrero J, Beal K, Fitzgerald S, Birney E. Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res 2008; 18:1814-28. [PMID: 18849524 DOI: 10.1101/gr.076554.108] [Citation(s) in RCA: 205] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Pairwise whole-genome alignment involves the creation of a homology map, capable of performing a near complete transformation of one genome into another. For multiple genomes this problem is generalized to finding a set of consistent homology maps for converting each genome in the set of aligned genomes into any of the others. The problem can be divided into two principal stages. First, the partitioning of the input genomes into a set of colinear segments, a process which essentially deals with the complex processes of rearrangement. Second, the generation of a base pair level alignment map for each colinear segment. We have developed a new genome-wide segmentation program, Enredo, which produces colinear segments from extant genomes handling rearrangements, including duplications. We have then applied the new alignment program Pecan, which makes the consistency alignment methodology practical at a large scale, to create a new set of genome-wide mammalian alignments. We test both Enredo and Pecan using novel and existing assessment analyses that incorporate both real biological data and simulations, and show that both independently and in combination they outperform existing programs. Alignments from our pipeline are publicly available within the Ensembl genome browser.
Collapse
Affiliation(s)
- Benedict Paten
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA.
| | | | | | | | | |
Collapse
|
158
|
Abstract
We formalize the problem of recovering the evolutionary history of a set of genomes that are related to an unseen common ancestor genome by operations of speciation, deletion, insertion, duplication, and rearrangement of segments of bases. The problem is examined in the limit as the number of bases in each genome goes to infinity. In this limit, the chromosomes are represented by continuous circles or line segments. For such an infinite-sites model, we present a polynomial-time algorithm to find the most parsimonious evolutionary history of any set of related present-day genomes.
Collapse
|
159
|
Lehmann J, Stadler PF, Prohaska SJ. SynBlast: assisting the analysis of conserved synteny information. BMC Bioinformatics 2008; 9:351. [PMID: 18721485 PMCID: PMC2543028 DOI: 10.1186/1471-2105-9-351] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2008] [Accepted: 08/24/2008] [Indexed: 01/06/2023] Open
Abstract
Motivation In the last years more than 20 vertebrate genomes have been sequenced, and the rate at which genomic DNA information becomes available is rapidly accelerating. Gene duplication and gene loss events inherently limit the accuracy of orthology detection based on sequence similarity alone. Fully automated methods for orthology annotation do exist but often fail to identify individual members in cases of large gene families, or to distinguish missing data from traceable gene losses. This situation can be improved in many cases by including conserved synteny information. Results Here we present the SynBlast pipeline that is designed to construct and evaluate local synteny information. SynBlast uses the genomic region around a focal reference gene to retrieve candidates for homologous regions from a collection of target genomes and ranks them in accord with the available evidence for homology. The pipeline is intended as a tool to aid high quality manual annotation in particular in those cases where automatic procedures fail. We demonstrate how SynBlast is applied to retrieving orthologous and paralogous clusters using the vertebrate Hox and ParaHox clusters as examples. Software The SynBlast package written in Perl is available under the GNU General Public License at .
Collapse
Affiliation(s)
- Jörg Lehmann
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstrasse 16-18, D-04107 Leipzig, Germany.
| | | | | |
Collapse
|
160
|
Abstract
The availability of 12 complete genomes of various species of genus Drosophila provides a unique opportunity to analyze genome-scale chromosomal rearrangements among a group of closely related species. This article reports on the comparison of gene order between these 12 species and on the fixed rearrangement events that disrupt gene order. Three major themes are addressed: the conservation of syntenic blocks across species, the disruption of syntenic blocks (via chromosomal inversion events) and its relationship to the phylogenetic distribution of these species, and the rate of rearrangement events over evolutionary time. Comparison of syntenic blocks across this large genomic data set confirms that genetic elements are largely (95%) localized to the same Muller element across genus Drosophila species and paracentric inversions serve as the dominant mechanism for shuffling the order of genes along a chromosome. Gene-order scrambling between species is in accordance with the estimated evolutionary distances between them and we find it to approximate a linear process over time (linear to exponential with alternate divergence time estimates). We find the distribution of synteny segment sizes to be biased by a large number of small segments with comparatively fewer large segments. Our results provide estimated chromosomal evolution rates across this set of species on the basis of whole-genome synteny analysis, which are found to be higher than those previously reported. Identification of conserved syntenic blocks across these genomes suggests a large number of conserved blocks with varying levels of embryonic expression correlation in Drosophila melanogaster. On the other hand, an analysis of the disruption of syntenic blocks between species allowed the identification of fixed inversion breakpoints and estimates of breakpoint reuse and lineage-specific breakpoint event segregation.
Collapse
|
161
|
Abstract
The availability of 12 complete genomes of various species of genus Drosophila provides a unique opportunity to analyze genome-scale chromosomal rearrangements among a group of closely related species. This article reports on the comparison of gene order between these 12 species and on the fixed rearrangement events that disrupt gene order. Three major themes are addressed: the conservation of syntenic blocks across species, the disruption of syntenic blocks (via chromosomal inversion events) and its relationship to the phylogenetic distribution of these species, and the rate of rearrangement events over evolutionary time. Comparison of syntenic blocks across this large genomic data set confirms that genetic elements are largely (95%) localized to the same Muller element across genus Drosophila species and paracentric inversions serve as the dominant mechanism for shuffling the order of genes along a chromosome. Gene-order scrambling between species is in accordance with the estimated evolutionary distances between them and we find it to approximate a linear process over time (linear to exponential with alternate divergence time estimates). We find the distribution of synteny segment sizes to be biased by a large number of small segments with comparatively fewer large segments. Our results provide estimated chromosomal evolution rates across this set of species on the basis of whole-genome synteny analysis, which are found to be higher than those previously reported. Identification of conserved syntenic blocks across these genomes suggests a large number of conserved blocks with varying levels of embryonic expression correlation in Drosophila melanogaster. On the other hand, an analysis of the disruption of syntenic blocks between species allowed the identification of fixed inversion breakpoints and estimates of breakpoint reuse and lineage-specific breakpoint event segregation.
Collapse
|
162
|
Lemaitre C, Tannier E, Gautier C, Sagot MF. Precise detection of rearrangement breakpoints in mammalian chromosomes. BMC Bioinformatics 2008; 9:286. [PMID: 18564416 PMCID: PMC2443379 DOI: 10.1186/1471-2105-9-286] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2008] [Accepted: 06/18/2008] [Indexed: 11/14/2022] Open
Abstract
Background Genomes undergo large structural changes that alter their organisation. The chromosomal regions affected by these rearrangements are called breakpoints, while those which have not been rearranged are called synteny blocks. We developed a method to precisely delimit rearrangement breakpoints on a genome by comparison with the genome of a related species. Contrary to current methods which search for synteny blocks and simply return what remains in the genome as breakpoints, we propose to go further and to investigate the breakpoints themselves in order to refine them. Results Given some reliable and non overlapping synteny blocks, the core of the method consists in refining the regions that are not contained in them. By aligning each breakpoint sequence against its specific orthologous sequences in the other species, we can look for weak similarities inside the breakpoint, thus extending the synteny blocks and narrowing the breakpoints. The identification of the narrowed breakpoints relies on a segmentation algorithm and is statistically assessed. Since this method requires as input synteny blocks with some properties which, though they appear natural, are not verified by current methods for detecting such blocks, we further give a formal definition and provide an algorithm to compute them. The whole method is applied to delimit breakpoints on the human genome when compared to the mouse and dog genomes. Among the 355 human-mouse and 240 human-dog breakpoints, 168 and 146 respectively span less than 50 Kb. We compared the resulting breakpoints with some publicly available ones and show that we achieve a better resolution. Furthermore, we suggest that breakpoints are rarely reduced to a point, and instead consist in often large regions that can be distinguished from the sequences around in terms of segmental duplications, similarity with related species, and transposable elements. Conclusion Our method leads to smaller breakpoints than already published ones and allows for a better description of their internal structure. In the majority of cases, our refined regions of breakpoint exhibit specific biological properties (no similarity, presence of segmental duplications and of transposable elements). We hope that this new result may provide some insight into the mechanism and evolutionary properties of chromosomal rearrangements.
Collapse
|
163
|
Levasseur A, Pontarotti P, Poch O, Thompson JD. Strategies for reliable exploitation of evolutionary concepts in high throughput biology. Evol Bioinform Online 2008; 4:121-37. [PMID: 19204813 PMCID: PMC2614184 DOI: 10.4137/ebo.s597] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
The recent availability of the complete genome sequences of a large number of model organisms, together with the immense amount of data being produced by the new high-throughput technologies, means that we can now begin comparative analyses to understand the mechanisms involved in the evolution of the genome and their consequences in the study of biological systems. Phylogenetic approaches provide a unique conceptual framework for performing comparative analyses of all this data, for propagating information between different systems and for predicting or inferring new knowledge. As a result, phylogeny-based inference systems are now playing an increasingly important role in most areas of high throughput genomics, including studies of promoters (phylogenetic footprinting), interactomes (based on the presence and degree of conservation of interacting proteins), and in comparisons of transcriptomes or proteomes (phylogenetic proximity and co-regulation/co-expression). Here we review the recent developments aimed at making automatic, reliable phylogeny-based inference feasible in large-scale projects. We also discuss how evolutionary concepts and phylogeny-based inference strategies are now being exploited in order to understand the evolution and function of biological systems. Such advances will be fundamental for the success of the emerging disciplines of systems biology and synthetic biology, and will have wide-reaching effects in applied fields such as biotechnology, medicine and pharmacology.
Collapse
Affiliation(s)
- Anthony Levasseur
- Phylogenomics Laboratory, EA 3781 Evolution Biologique, Université de Provence, 13331 Marseille, France
| | | | | | | |
Collapse
|
164
|
Abstract
The evolution of karyotypes has been the subject of intensive study since the middle of the 20th century. This was motivated by the observation that the karyotypes of related species showed remarkable conservation. The recent emergence of whole-genome sequencing projects gives the opportunity to complement the cytogenetic approaches by addressing the conservation of karyotypes using chromosome sequence comparison. In this short review we present a description of recent advances in computational biology methods dedicated to the study of chromosome evolution and more specifically ancestral karyotype reconstruction in an attempt to provide an integrated overview of both cytogenetic and computational approaches.
Collapse
|
165
|
Abstract
In 1992 the Japanese macaque was the first species for which the homology of the entire karyotype was established by cross-species chromosome painting. Today, there are chromosome painting data on more than 50 species of primates. Although chromosome painting is a rapid and economical method for tracking translocations, it has limited utility for revealing intrachromosomal rearrangements. Fortunately, the use of BAC-FISH in the last few years has allowed remarkable progress in determining marker order along primate chromosomes and there are now marker order data on an array of primate species for a good number of chromosomes. These data reveal inversions, but also show that centromeres of many orthologous chromosomes are embedded in different genomic contexts. Even if the mechanisms of neocentromere formation and progression are just beginning to be understood, it is clear that these phenomena had a significant impact on shaping the primate genome and are fundamental to our understanding of genome evolution. In this report we complete and integrate the dataset of BAC-FISH marker order for human syntenies 1, 2, 4, 5, 8, 12, 17, 18, 19, 21, 22 and the X. These results allowed us to develop hypotheses about the content, marker order and centromere position in ancestral karyotypes at five major branching points on the primate evolutionary tree: ancestral primate, ancestral anthropoid, ancestral platyrrhine, ancestral catarrhine and ancestral hominoid. Current models suggest that between-species structural rearrangements are often intimately related to speciation. Comparative primate cytogenetics has become an important tool for elucidating the phylogeny and the taxonomy of primates. It has become increasingly apparent that molecular cytogenetic data in the future can be fruitfully combined with whole-genome assemblies to advance our understanding of primate genome evolution as well as the mechanisms and processes that have led to the origin of the human genome.
Collapse
|
166
|
Approaches to comparative sequence analysis: towards a functional view of vertebrate genomes. Nat Rev Genet 2008; 9:303-13. [PMID: 18347593 DOI: 10.1038/nrg2185] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The comparison of genomic sequences is now a common approach to identifying and characterizing functional regions in vertebrate genomes. However, for theoretical reasons and because of practical issues, the generation of these data sets is non-trivial and can have many pitfalls. We are currently seeing an explosion of comparative sequence data, the benefits and limitations of which need to be disseminated to the scientific community. This Review provides a critical overview of the different types of sequence data that are available for analysis and of contemporary comparative sequence analysis methods, highlighting both their strengths and limitations. Approaches to determining the biological significance of constrained sequence are also explored.
Collapse
|
167
|
Yu WP, Rajasegaran V, Yew K, Loh WL, Tay BH, Amemiya CT, Brenner S, Venkatesh B. Elephant shark sequence reveals unique insights into the evolutionary history of vertebrate genes: A comparative analysis of the protocadherin cluster. Proc Natl Acad Sci U S A 2008; 105:3819-3824. [PMID: 18319338 PMCID: PMC2268768 DOI: 10.1073/pnas.0800398105] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2007] [Indexed: 09/01/2023] Open
Abstract
Cartilaginous fishes are the oldest living phylogenetic group of jawed vertebrates. Here, we demonstrate the value of cartilaginous fish sequences in reconstructing the evolutionary history of vertebrate genomes by sequencing the protocadherin cluster in the relatively small genome (910 Mb) of the elephant shark (Callorhinchus milii). Human and coelacanth contain a single protocadherin cluster with 53 and 49 genes, respectively, that are organized in three subclusters, Pcdhalpha, Pcdhbeta, and Pcdhgamma, whereas the duplicated protocadherin clusters in fugu and zebrafish contain >77 and 107 genes, respectively, that are organized in Pcdhalpha and Pcdhgamma subclusters. By contrast, the elephant shark contains a single protocadherin cluster with 47 genes organized in four subclusters (Pcdhdelta, Pcdhepsilon, Pcdhmu, and Pcdhnu). By comparison with elephant shark sequences, we discovered a Pcdhdelta subcluster in teleost fishes, coelacanth, Xenopus, and chicken. Our results suggest that the protocadherin cluster in the ancestral jawed vertebrate contained more subclusters than modern vertebrates, and the evolution of the protocadherin cluster is characterized by lineage-specific differential loss of entire subclusters of genes. In contrast to teleost fish and mammalian protocadherin genes that have undergone gene conversion events, elephant shark protocadherin genes have experienced very little gene conversion. The syntenic block of genes in the elephant shark protocadherin locus is well conserved in human but disrupted in fugu. Thus, the elephant shark genome appears to be less prone to rearrangements compared with teleost fish genomes. The small and "stable" genome of the elephant shark is a valuable reference for understanding the evolution of vertebrate genomes.
Collapse
Affiliation(s)
- Wei-Ping Yu
- Gene Regulation Laboratory, National Neuroscience Institute, 11 Jalan Tan Tock Seng, Singapore 308433
| | - Vikneswari Rajasegaran
- Gene Regulation Laboratory, National Neuroscience Institute, 11 Jalan Tan Tock Seng, Singapore 308433
| | - Kenneth Yew
- Gene Regulation Laboratory, National Neuroscience Institute, 11 Jalan Tan Tock Seng, Singapore 308433
| | - Wai-lin Loh
- Gene Regulation Laboratory, National Neuroscience Institute, 11 Jalan Tan Tock Seng, Singapore 308433
| | - Boon-Hui Tay
- Institute of Molecular and Cell Biology, Agency for Science, Technology, and Research, Biopolis, Singapore 138673; and
| | - Chris T. Amemiya
- Benaroya Research Institute at Virginia Mason, Seattle, WA 98101
| | - Sydney Brenner
- Institute of Molecular and Cell Biology, Agency for Science, Technology, and Research, Biopolis, Singapore 138673; and
| | - Byrappa Venkatesh
- Institute of Molecular and Cell Biology, Agency for Science, Technology, and Research, Biopolis, Singapore 138673; and
| |
Collapse
|
168
|
Muffato M, Crollius HR. Paleogenomics in vertebrates, or the recovery of lost genomes from the mist of time. Bioessays 2008; 30:122-34. [DOI: 10.1002/bies.20707] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
169
|
Alekseyev MA, Pevzner PA. Are there rearrangement hotspots in the human genome? PLoS Comput Biol 2007; 3:e209. [PMID: 17997591 PMCID: PMC2065889 DOI: 10.1371/journal.pcbi.0030209] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2007] [Accepted: 09/13/2007] [Indexed: 11/18/2022] Open
Abstract
In a landmark paper, Nadeau and Taylor [18] formulated the random breakage model (RBM) of chromosome evolution that postulates that there are no rearrangement hotspots in the human genome. In the next two decades, numerous studies with progressively increasing levels of resolution made RBM the de facto theory of chromosome evolution. Despite the fact that RBM had prophetic prediction power, it was recently refuted by Pevzner and Tesler [4], who introduced the fragile breakage model (FBM), postulating that the human genome is a mosaic of solid regions (with low propensity for rearrangements) and fragile regions (rearrangement hotspots). However, the rebuttal of RBM caused a controversy and led to a split among researchers studying genome evolution. In particular, it remains unclear whether some complex rearrangements (e.g., transpositions) can create an appearance of rearrangement hotspots. We contribute to the ongoing debate by analyzing multi-break rearrangements that break a genome into multiple fragments and further glue them together in a new order. In particular, we demonstrate that (1) even if transpositions were a dominant force in mammalian evolution, the arguments in favor of FBM still stand, and (2) the "gene deletion" argument against FBM is flawed.
Collapse
Affiliation(s)
- Max A Alekseyev
- Department of Computer Science and Engineering, University of California San Diego, San Diego, California, United States of America.
| | | |
Collapse
|
170
|
A transgenomic cytogenetic sorghum (Sorghum propinquum) bacterial artificial chromosome fluorescence in situ hybridization map of maize (Zea mays L.) pachytene chromosome 9, evidence for regions of genome hyperexpansion. Genetics 2007; 177:1509-26. [PMID: 17947405 DOI: 10.1534/genetics.107.080846] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
A cytogenetic FISH map of maize pachytene-stage chromosome 9 was produced with 32 maize marker-selected sorghum BACs as probes. The genetically mapped markers used are distributed along the linkage maps at an average spacing of 5 cM. Each locus was mapped by means of multicolor direct FISH with a fluorescently labeled probe mix containing a whole-chromosome paint, a single sorghum BAC clone, and the centromeric sequence, CentC. A maize-chromosome-addition line of oat was used for bright unambiguous identification of the maize 9 fiber within pachytene chromosome spreads. The locations of the sorghum BAC-FISH signals were determined, and each new cytogenetic locus was assigned a centiMcClintock position on the short (9S) or long (9L) arm. Nearly all of the markers appeared in the same order on linkage and cytogenetic maps but at different relative positions on the two. The CentC FISH signal was localized between cdo17 (at 9L.03) and tda66 (at 9S.03). Several regions of genome hyperexpansion on maize chromosome 9 were found by comparative analysis of relative marker spacing in maize and sorghum. This transgenomic cytogenetic FISH map creates anchors between various maps of maize and sorghum and creates additional tools and information for understanding the structure and evolution of the maize genome.
Collapse
|
171
|
Bradley RK, Holmes I. Transducers: an emerging probabilistic framework for modeling indels on trees. Bioinformatics 2007; 23:3258-62. [PMID: 17804440 DOI: 10.1093/bioinformatics/btm402] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
172
|
Cardone MF, Lomiento M, Teti MG, Misceo D, Roberto R, Capozzi O, D'Addabbo P, Ventura M, Rocchi M, Archidiacono N. Evolutionary history of chromosome 11 featuring four distinct centromere repositioning events in Catarrhini. Genomics 2007; 90:35-43. [PMID: 17490852 DOI: 10.1016/j.ygeno.2007.01.007] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2006] [Revised: 01/14/2007] [Accepted: 01/18/2007] [Indexed: 11/30/2022]
Abstract
Panels of BAC clones used in FISH experiments allow a detailed definition of chromosomal marker arrangement and orientation during evolution. This approach has disclosed the centromere repositioning phenomenon, consisting in the activation of a novel, fully functional centromere in an ectopic location, concomitant with the inactivation of the old centromere. In this study, appropriate panels of BAC clones were used to track the chromosome 11 evolutionary history in primates and nonprimate boreoeutherian mammals. Chromosome 11 synteny was found to be highly conserved in both primate and boreoeutherian mammalian ancestors. Amazingly, we detected four centromere repositioning events in primates (in Old World monkeys, in gibbons, in orangutans, and in the Homo-Pan-Gorilla (H-P-G) clade ancestor), and one in Equidae. Both H-P-G and Lar gibbon novel centromeres were flanked by large duplicons with high sequence similarity. Outgroup species analysis revealed that this duplicon was absent in phylogenetically more distant primates. The chromosome 11 ancestral centromere was probably located near the HSA11q telomere. The domain of this inactivated centromere, in humans, is almost devoid of segmental duplications. An inversion occurred in chromosome 11 in the common ancestor of H-P-G. A large duplicon, again absent in outgroup species, was found located adjacent to the inversion breakpoints. In Hominoidea, almost all the five largest duplicons of this chromosome appeared involved in significant evolutionary architectural changes.
Collapse
|
173
|
Mikkelsen TS, Wakefield MJ, Aken B, Amemiya CT, Chang JL, Duke S, Garber M, Gentles AJ, Goodstadt L, Heger A, Jurka J, Kamal M, Mauceli E, Searle SMJ, Sharpe T, Baker ML, Batzer MA, Benos PV, Belov K, Clamp M, Cook A, Cuff J, Das R, Davidow L, Deakin JE, Fazzari MJ, Glass JL, Grabherr M, Greally JM, Gu W, Hore TA, Huttley GA, Kleber M, Jirtle RL, Koina E, Lee JT, Mahony S, Marra MA, Miller RD, Nicholls RD, Oda M, Papenfuss AT, Parra ZE, Pollock DD, Ray DA, Schein JE, Speed TP, Thompson K, VandeBerg JL, Wade CM, Walker JA, Waters PD, Webber C, Weidman JR, Xie X, Zody MC, Graves JAM, Ponting CP, Breen M, Samollow PB, Lander ES, Lindblad-Toh K. Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature 2007; 447:167-77. [PMID: 17495919 DOI: 10.1038/nature05805] [Citation(s) in RCA: 508] [Impact Index Per Article: 29.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2006] [Accepted: 04/03/2007] [Indexed: 12/15/2022]
Abstract
We report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian ('marsupial') species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation. Comparison of opossum and eutherian genomes also reveals a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. True innovation in protein-coding genes seems to be relatively rare, with lineage-specific differences being largely due to diversification and rapid turnover in gene families involved in environmental interactions. In contrast, about 20% of eutherian conserved non-coding elements (CNEs) are recent inventions that postdate the divergence of Eutheria and Metatheria. A substantial proportion of these eutherian-specific CNEs arose from sequence inserted by transposable elements, pointing to transposons as a major creative force in the evolution of mammalian gene regulation.
Collapse
Affiliation(s)
- Tarjei S Mikkelsen
- Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
174
|
Harris RA, Rogers J, Milosavljevic A. Human-specific changes of genome structure detected by genomic triangulation. Science 2007; 316:235-7. [PMID: 17431168 DOI: 10.1126/science.1139477] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Knowledge of the rhesus macaque genome sequence enables reconstruction of the ancestral state of the human genome before the divergence of chimpanzees. However, the draft quality of nonhuman primate genome assemblies challenges the ability of current methods to detect insertions, deletions, and copy-number variations between humans, chimpanzees, and rhesus macaques and hinders the identification of evolutionary changes between these species. Because of the abundance of segmental duplications, genome comparisons require the integration of genomic assemblies and data from large-insert clones, linkage maps, and radiation hybrid maps. With genomic triangulation, an integrative method that reconstructs ancestral states and the structural evolution of genomes, we identified 130 human-specific breakpoints in genome structure due to rearrangements at an intermediate scale (10 kilobases to 4 megabases), including 64 insertions affecting 58 genes. Comparison with a human structural polymorphism database indicates that many of the rearrangements are polymorphic.
Collapse
Affiliation(s)
- R A Harris
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | | | | |
Collapse
|
175
|
Gibbs RA, Rogers J, Katze MG, Bumgarner R, Weinstock GM, Mardis ER, Remington KA, Strausberg RL, Venter JC, Wilson RK, Batzer MA, Bustamante CD, Eichler EE, Hahn MW, Hardison RC, Makova KD, Miller W, Milosavljevic A, Palermo RE, Siepel A, Sikela JM, Attaway T, Bell S, Bernard KE, Buhay CJ, Chandrabose MN, Dao M, Davis C, Delehaunty KD, Ding Y, Dinh HH, Dugan-Rocha S, Fulton LA, Gabisi RA, Garner TT, Godfrey J, Hawes AC, Hernandez J, Hines S, Holder M, Hume J, Jhangiani SN, Joshi V, Khan ZM, Kirkness EF, Cree A, Fowler RG, Lee S, Lewis LR, Li Z, Liu YS, Moore SM, Muzny D, Nazareth LV, Ngo DN, Okwuonu GO, Pai G, Parker D, Paul HA, Pfannkoch C, Pohl CS, Rogers YH, Ruiz SJ, Sabo A, Santibanez J, Schneider BW, Smith SM, Sodergren E, Svatek AF, Utterback TR, Vattathil S, Warren W, White CS, Chinwalla AT, Feng Y, Halpern AL, Hillier LW, Huang X, Minx P, Nelson JO, Pepin KH, Qin X, Sutton GG, Venter E, Walenz BP, Wallis JW, Worley KC, Yang SP, Jones SM, Marra MA, Rocchi M, Schein JE, Baertsch R, Clarke L, Csürös M, Glasscock J, Harris RA, Havlak P, Jackson AR, Jiang H, Liu Y, Messina DN, Shen Y, Song HXZ, Wylie T, Zhang L, Birney E, Han K, Konkel MK, Lee J, Smit AFA, Ullmer B, Wang H, Xing J, Burhans R, Cheng Z, Karro JE, Ma J, Raney B, She X, Cox MJ, Demuth JP, Dumas LJ, Han SG, Hopkins J, Karimpour-Fard A, Kim YH, Pollack JR, Vinar T, Addo-Quaye C, Degenhardt J, Denby A, Hubisz MJ, Indap A, Kosiol C, Lahn BT, Lawson HA, Marklein A, Nielsen R, Vallender EJ, Clark AG, Ferguson B, Hernandez RD, Hirani K, Kehrer-Sawatzki H, Kolb J, Patil S, Pu LL, Ren Y, Smith DG, Wheeler DA, Schenck I, Ball EV, Chen R, Cooper DN, Giardine B, Hsu F, Kent WJ, Lesk A, Nelson DL, O'brien WE, Prüfer K, Stenson PD, Wallace JC, Ke H, Liu XM, Wang P, Xiang AP, Yang F, Barber GP, Haussler D, Karolchik D, Kern AD, Kuhn RM, Smith KE, Zwieg AS. Evolutionary and biomedical insights from the rhesus macaque genome. Science 2007; 316:222-34. [PMID: 17431167 DOI: 10.1126/science.1139247] [Citation(s) in RCA: 989] [Impact Index Per Article: 58.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The rhesus macaque (Macaca mulatta) is an abundant primate species that diverged from the ancestors of Homo sapiens about 25 million years ago. Because they are genetically and physiologically similar to humans, rhesus monkeys are the most widely used nonhuman primate in basic and applied biomedical research. We determined the genome sequence of an Indian-origin Macaca mulatta female and compared the data with chimpanzees and humans to reveal the structure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families. A comparison of sequences from individual animals was used to investigate their underlying genetic diversity. The complete description of the macaque genome blueprint enhances the utility of this animal model for biomedical research and improves our understanding of the basic biology of the species.
Collapse
|
176
|
Murphy WJ, Pringle TH, Crider TA, Springer MS, Miller W. Using genomic data to unravel the root of the placental mammal phylogeny. Genes Dev 2007; 17:413-21. [PMID: 17322288 PMCID: PMC1832088 DOI: 10.1101/gr.5918807] [Citation(s) in RCA: 316] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2006] [Accepted: 12/20/2006] [Indexed: 11/24/2022]
Abstract
The phylogeny of placental mammals is a critical framework for choosing future genome sequencing targets and for resolving the ancestral mammalian genome at the nucleotide level. Despite considerable recent progress defining superordinal relationships, several branches remain poorly resolved, including the root of the placental tree. Here we analyzed the genome sequence assemblies of human, armadillo, elephant, and opossum to identify informative coding indels that would serve as rare genomic changes to infer early events in placental mammal phylogeny. We also expanded our species sampling by including sequence data from >30 ongoing genome projects, followed by PCR and sequencing validation of each indel in additional taxa. Our data provide support for a sister-group relationship between Afrotheria and Xenarthra (the Atlantogenata hypothesis), which is in turn the sister-taxon to Boreoeutheria. We failed to recover any indels in support of a basal position for Xenarthra (Epitheria), which is suggested by morphology and a recent retroposon analysis, or a hypothesis with Afrotheria basal (Exafricoplacentalia), which is favored by phylogenetic analysis of large nuclear gene data sets. In addition, we identified two retroposon insertions that also support Atlantogenata and none for the alternative hypotheses. A revised molecular timescale based on these phylogenetic inferences suggests Afrotheria and Xenarthra diverged from other placental mammals approximately 103 (95-114) million years ago. We discuss the impacts of this topology on earlier phylogenetic reconstructions and repeat-based inferences of phylogeny.
Collapse
Affiliation(s)
- William J Murphy
- Department of Veterinary Integrative Biosciences, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX 77843, USA.
| | | | | | | | | |
Collapse
|
177
|
Kemkemer C, Kohn M, Kehrer-Sawatzki H, Minich P, Högel J, Froenicke L, Hameister H. Reconstruction of the ancestral ferungulate karyotype by electronic chromosome painting (E-painting). Chromosome Res 2007; 14:899-907. [PMID: 17195924 DOI: 10.1007/s10577-006-1097-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2006] [Revised: 10/28/2006] [Accepted: 10/28/2006] [Indexed: 12/14/2022]
Abstract
By comparing high-coverage and high-quality whole genome sequence assemblies it is now possible to reconstruct putative ancestral progenitor karyotypes, here called protokaryotypes. For this study we used the recently described electronic chromosome painting technique (E-painting) to reconstruct the karyotype of the 85 million-year-old (MYA) ferungulate ancestor. This model is primarily based on dog (Canis familiaris) and cattle (Bos taurus) genome data and is highly consistent with comparative gene mapping and chromosome painting data. The protokaryotype bears 23 autosomal chromosome pairs and the sex chromosomes and preserves most of the chromosomal associations described previously for the boreo-eutherian protokaryotype. The model indicates that five interchromosomal rearrangements occurred during the transition from the boreo-eutherian to the ferungulate ancestor. From there on 66 further interchromosomal rearrangements took place in the lineage leading to cattle and 61 further interchromosomal rearrangements in the lineage to dog.
Collapse
Affiliation(s)
- Claus Kemkemer
- Institute of Human Genetics, University Ulm, Albert-Einstein-Allee 11, 89070 Ulm, Germany
| | | | | | | | | | | | | |
Collapse
|
178
|
Rocchi M, Archidiacono N, Stanyon R. Ancestral genomes reconstruction: An integrated, multi-disciplinary approach is needed. Genome Res 2006; 16:1441-4. [PMID: 17053088 DOI: 10.1101/gr.5687906] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Affiliation(s)
- Mariano Rocchi
- Department of Genetics and Microbiology, University of Bari, Bari 70126, Italy.
| | | | | |
Collapse
|