101
|
Kahn CL, Mozes S, Raphael BJ. Efficient algorithms for analyzing segmental duplications with deletions and inversions in genomes. Algorithms Mol Biol 2010; 5:11. [PMID: 20047668 PMCID: PMC2820476 DOI: 10.1186/1748-7188-5-11] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2009] [Accepted: 01/04/2010] [Indexed: 02/06/2023] Open
Abstract
Background Segmental duplications, or low-copy repeats, are common in mammalian genomes. In the human genome, most segmental duplications are mosaics comprised of multiple duplicated fragments. This complex genomic organization complicates analysis of the evolutionary history of these sequences. One model proposed to explain this mosaic patterns is a model of repeated aggregation and subsequent duplication of genomic sequences. Results We describe a polynomial-time exact algorithm to compute duplication distance, a genomic distance defined as the most parsimonious way to build a target string by repeatedly copying substrings of a fixed source string. This distance models the process of repeated aggregation and duplication. We also describe extensions of this distance to include certain types of substring deletions and inversions. Finally, we provide a description of a sequence of duplication events as a context-free grammar (CFG). Conclusion These new genomic distances will permit more biologically realistic analyses of segmental duplications in genomes.
Collapse
|
102
|
Dumas L, Sikela JM. DUF1220 domains, cognitive disease, and human brain evolution. COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY 2009; 74:375-82. [PMID: 19850849 DOI: 10.1101/sqb.2009.74.025] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We have established that human genome sequences encoding a novel protein domain, DUF1220, show a dramatically elevated copy number in the human lineage (>200 copies in humans vs. 1 in mouse/rat) and may be important to human evolutionary adaptation. Copy-number variations (CNVs) in the 1q21.1 region, where most DUF1220 sequences map, have now been implicated in numerous diseases associated with cognitive dysfunction, including autism, autism spectrum disorder, mental retardation, schizophrenia, microcephaly, and macrocephaly. We report here that these disease-related 1q21.1 CNVs either encompass or are directly flanked by DUF1220 sequences and exhibit a dosage-related correlation with human brain size. Microcephaly-producing 1q21.1 CNVs are deletions, whereas macrocephaly-producing 1q21.1 CNVs are duplications. Similarly, 1q21.1 deletions and smaller brain size are linked with schizophrenia, whereas 1q21.1 duplications and larger brain size are associated with autism. Interestingly, these two diseases are thought to be phenotypic opposites. These data suggest a model which proposes that (1) DUF1220 domain copy number may be involved in influencing human brain size and (2) the evolutionary advantage of rapidly increasing DUF1220 copy number in the human lineage has resulted in favoring retention of the high genomic instability of the 1q21.1 region, which, in turn, has precipitated a spectrum of recurrent human brain and developmental disorders.
Collapse
Affiliation(s)
- L Dumas
- University of Colorado Denver School of Medicine, Aurora, CO 80045, USA
| | | |
Collapse
|
103
|
Marques-Bonet T, Ryder OA, Eichler EE. Sequencing primate genomes: what have we learned? Annu Rev Genomics Hum Genet 2009; 10:355-86. [PMID: 19630567 DOI: 10.1146/annurev.genom.9.081307.164420] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
We summarize the progress in whole-genome sequencing and analyses of primate genomes. These emerging genome datasets have broadened our understanding of primate genome evolution revealing unexpected and complex patterns of evolutionary change. This includes the characterization of genome structural variation, episodic changes in the repeat landscape, differences in gene expression, new models regarding speciation, and the ephemeral nature of the recombination landscape. The functional characterization of genomic differences important in primate speciation and adaptation remains a significant challenge. Limited access to biological materials, the lack of detailed phenotypic data and the endangered status of many critical primate species have significantly attenuated research into the genetic basis of primate evolution. Next-generation sequencing technologies promise to greatly expand the number of available primate genome sequences; however, such draft genome sequences will likely miss critical genetic differences within complex genomic regions unless dedicated efforts are put forward to understand the full spectrum of genetic variation.
Collapse
Affiliation(s)
- Tomas Marques-Bonet
- Department of Genome Sciences, University of Washington and the Howard Hughes Medical Institute, Seattle, Washington 98105, USA.
| | | | | |
Collapse
|
104
|
Marques-Bonet T, Girirajan S, Eichler EE. The origins and impact of primate segmental duplications. Trends Genet 2009; 25:443-54. [PMID: 19796838 PMCID: PMC2847396 DOI: 10.1016/j.tig.2009.08.002] [Citation(s) in RCA: 120] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2009] [Revised: 08/07/2009] [Accepted: 08/10/2009] [Indexed: 12/25/2022]
Abstract
Duplicated sequences are substrates for the emergence of new genes and are an important source of genetic instability associated with rare and common diseases. Analyses of primate genomes have shown an increase in the proportion of interspersed segmental duplications (SDs) within the genomes of humans and great apes. This contrasts with other mammalian genomes that seem to have their recently duplicated sequences organized in a tandem configuration. In this review, we focus on the mechanistic origin and impact of this difference with respect to evolution, genetic diversity and primate phenotype. Although many genomes will be sequenced in the future, resolution of this aspect of genomic architecture still requires high quality sequences and detailed analyses.
Collapse
Affiliation(s)
- Tomas Marques-Bonet
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | | |
Collapse
|
105
|
Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet 2009; 41:1061-7. [PMID: 19718026 PMCID: PMC2875196 DOI: 10.1038/ng.437] [Citation(s) in RCA: 486] [Impact Index Per Article: 32.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2009] [Accepted: 07/23/2009] [Indexed: 12/18/2022]
Abstract
Despite their importance in gene innovation and phenotypic variation, duplicated regions have remained largely intractable due to difficulties in accurately resolving their structure, copy number and sequence content. We present an algorithm (mrFAST) to comprehensively map next-generation sequence reads allowing for the prediction of absolute copy-number variation of duplicated segments and genes. We examine three human genomes and experimentally validate genome-wide copy-number differences. We estimate that 73–87 genes will be on average copy-number variable between two human genomes and find that these genic differences overwhelmingly correspond to segmental duplications (OR=135; p<2.2e-16). Our method can distinguish between different copies of highly identical genes, providing a more accurate census of gene content and insight into functional constraint without the limitations of array-based technology.
Collapse
|
106
|
The evolution of human segmental duplications and the core duplicon hypothesis. COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY 2009; 74:355-62. [PMID: 19717539 DOI: 10.1101/sqb.2009.74.011] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Duplicated sequences are important sources of genetic instability and in the evolution of new gene function within species. Hominids have a preponderance of intrachromosomal duplications organized in an interspersed fashion, as opposed to tandem duplications, which are common in other mammalian genomes such as mouse, dog, and cow. Multiple lines of evidence, including sequence divergence, comparative primate genomes, and fluorescence in situ hybridization (FISH) analyses, point to an excess of segmental duplications in the common ancestor of humans and African great apes. We find that much of the interspersed human duplication architecture within chromosomes is focused around common sequence elements referred to as "core duplicons." These cores correspond to the expansion of gene families, some of which show signatures of positive selection and lack orthologs present in other mammalian species. This genomic architecture predisposes apes and humans not only to extensive genetic diversity, but also to large-scale structural diversity mediated by nonallelic homologous recombination. In humans, many de novo large-scale genomic changes mediated by these duplications are associated with neuropsychiatric and neurodevelopmental disease. We propose that the disadvantage of a high rate of new mutations is offset by the selective advantage of newly minted genes within the cores.
Collapse
|
107
|
Tian X, Pascal G, Monget P. Evolution and functional divergence of NLRP genes in mammalian reproductive systems. BMC Evol Biol 2009; 9:202. [PMID: 19682372 PMCID: PMC2735741 DOI: 10.1186/1471-2148-9-202] [Citation(s) in RCA: 125] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2009] [Accepted: 08/14/2009] [Indexed: 12/31/2022] Open
Abstract
Background NLRPs (Nucleotide-binding oligomerization domain, Leucine rich Repeat and Pyrin domain containing Proteins) are members of NLR (Nod-like receptors) protein family. Recent researches have shown that NLRP genes play important roles in both mammalian innate immune system and reproductive system. Several of NLRP genes were shown to be specifically expressed in the oocyte in mammals. The aim of the present work was to study how these genes evolved and diverged after their duplication, as well as whether natural selection played a role during their evolution. Results By using in silico methods, we have evaluated the evolution and functional divergence of NLRP genes, in particular of mouse reproduction-related Nlrp genes. We found that (1) major NLRP genes have been duplicated before the divergence of mammals, with certain lineage-specific duplications in primates (NLRP7 and 11) and in rodents (Nlrp1, 4 and 9 duplicates); (2) tandem duplication events gave rise to a mammalian reproduction-related NLRP cluster including NLRP2, 4, 5, 7, 8, 9, 11, 13 and 14 genes; (3) the function of mammalian oocyte-specific NLRP genes (NLRP4, 5, 9 and 14) might have diverged during gene evolution; (4) recent segmental duplications concerning Nlrp4 copies and vomeronasal 1 receptor encoding genes (V1r) have been undertaken in the mouse; and (5) duplicates of Nlrp4 and 9 in the mouse might have been subjected to adaptive evolution. Conclusion In conclusion, this study brings us novel information on the evolution of mammalian reproduction-related NLRPs. On the one hand, NLRP genes duplicated and functionally diversified in mammalian reproductive systems (such as NLRP4, 5, 9 and 14). On the other hand, during evolution, different lineages adapted to develop their own NLRP genes, particularly in reproductive function (such as the specific expansion of Nlrp4 and Nlrp9 in the mouse).
Collapse
Affiliation(s)
- Xin Tian
- Physiologie de la Reproduction et des Comportements, UMR 6175 INRA-CNRS-Université François Rabelais de Tours-Haras Nationaux, 37380 Nouzilly, France.
| | | | | |
Collapse
|
108
|
Hahn MW. Distinguishing among evolutionary models for the maintenance of gene duplicates. J Hered 2009; 100:605-17. [PMID: 19596713 DOI: 10.1093/jhered/esp047] [Citation(s) in RCA: 259] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Determining the evolutionary forces responsible for the maintenance of gene duplicates is key to understanding the processes leading to evolutionary adaptation and novelty. In his highly prescient book, Susumu Ohno recognized that duplicate genes are fixed and maintained within a population with 3 distinct outcomes: neofunctionalization, subfunctionalization, and conservation of function. Subsequent researchers have proposed a multitude of population genetic models that lead to these outcomes, each differing largely in the role played by adaptive natural selection. In this paper, I present a nonmathematical review of these models, their predictions, and the evidence collected in support of each of them. Though the various outcomes of gene duplication are often strictly associated with the presence or absence of adaptive natural selection, I argue that determining the outcome of duplication is orthogonal to determining whether natural selection has acted. Despite an ever-growing field of research into the fate of gene duplicates, there is not yet clear evidence for the preponderance of one outcome over the others, much less evidence for the importance of adaptive or nonadaptive forces in maintaining these duplicates.
Collapse
Affiliation(s)
- Matthew W Hahn
- Department of Biology and School of Informatics, Indiana University, Bloomington, IN 47405, USA.
| |
Collapse
|
109
|
Zeh DW, Zeh JA, Ishida Y. Transposable elements and an epigenetic basis for punctuated equilibria. Bioessays 2009; 31:715-26. [DOI: 10.1002/bies.200900026] [Citation(s) in RCA: 119] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
110
|
Schmidt J, Kirsch S, Rappold GA, Schempp W. Complex evolution of a Y-chromosomal double homeobox 4 (DUX4)-related gene family in hominoids. PLoS One 2009; 4:e5288. [PMID: 19404400 PMCID: PMC2671837 DOI: 10.1371/journal.pone.0005288] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2009] [Accepted: 03/24/2009] [Indexed: 12/21/2022] Open
Abstract
The human Y chromosome carries four human Y-chromosomal euchromatin/heterochromatin transition regions, all of which are characterized by the presence of interchromosomal segmental duplications. The Yq11.1/Yq11.21 transition region harbours a peculiar segment composed of an imperfectly organized tandem-repeat structure encoding four members of the double homeobox (DUX) gene family. By comparative fluorescence in situ hybridization (FISH) analysis we have documented the primary appearance of Y-chromosomal DUX genes (DUXY) on the gibbon Y chromosome. The major amplification and dispersal of DUXY paralogs occurred after the gibbon and hominid lineages had diverged. Orthologous DUXY loci of human and chimpanzee show a highly similar structural organization. Sequence alignment survey, phylogenetic reconstruction and recombination detection analyses of human and chimpanzee DUXY genes revealed the existence of all copies in a common ancestor. Comparative analysis of the circumjacent beta-satellites indicated that DUXY genes and beta-satellites evolved in concert. However, evolutionary forces acting on DUXY genes may have induced amino acid sequence differences in the orthologous chimpanzee and human DUXY open reading frames (ORFs). The acquisition of complete ORFs in human copies might relate to evolutionary advantageous functions indicating neo-functionalization. We propose an evolutionary scenario in which an ancestral tandem array DUX gene cassette transposed to the hominoid Y chromosome followed by lineage-specific chromosomal rearrangements paved the way for a species-specific evolution of the Y-chromosomal members of a large highly diverged homeobox gene family.
Collapse
Affiliation(s)
- Julia Schmidt
- Institute of Human Genetics, University of Freiburg, Freiburg, Germany
| | - Stefan Kirsch
- Institute of Human Genetics, University of Freiburg, Freiburg, Germany
| | - Gudrun A. Rappold
- Institute of Human Genetics, University of Heidelberg, Heidelberg, Germany
| | - Werner Schempp
- Institute of Human Genetics, University of Freiburg, Freiburg, Germany
- * E-mail:
| |
Collapse
|
111
|
Antonacci F, Kidd JM, Marques-Bonet T, Ventura M, Siswara P, Jiang Z, Eichler EE. Characterization of six human disease-associated inversion polymorphisms. Hum Mol Genet 2009; 18:2555-66. [PMID: 19383631 PMCID: PMC2701327 DOI: 10.1093/hmg/ddp187] [Citation(s) in RCA: 94] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
The human genome is a highly dynamic structure that shows a wide range of genetic polymorphic variation. Unlike other types of structural variation, little is known about inversion variants within normal individuals because such events are typically balanced and are difficult to detect and analyze by standard molecular approaches. Using sequence-based, cytogenetic and genotyping approaches, we characterized six large inversion polymorphisms that map to regions associated with genomic disorders with complex segmental duplications mapping at the breakpoints. We developed a metaphase FISH-based assay to genotype inversions and analyzed the chromosomes of 27 individuals from three HapMap populations. In this subset, we find that these inversions are less frequent or absent in Asians when compared with European and Yoruban populations. Analyzing multiple individuals from outgroup species of great apes, we show that most of these large inversion polymorphisms are specific to the human lineage with two exceptions, 17q21.31 and 8p23 inversions, which are found to be similarly polymorphic in other great ape species and where the inverted allele represents the ancestral state. Investigating linkage disequilibrium relationships with genotyped SNPs, we provide evidence that most of these inversions appear to have arisen on at least two different haplotype backgrounds. In these cases, discovery and genotyping methods based on SNPs may be confounded and molecular cytogenetics remains the only method to genotype these inversions.
Collapse
Affiliation(s)
- Francesca Antonacci
- Department of Genome Sciences, Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | | | | | | | | | | | | |
Collapse
|
112
|
A burst of segmental duplications in the genome of the African great ape ancestor. Nature 2009; 457:877-81. [PMID: 19212409 DOI: 10.1038/nature07744] [Citation(s) in RCA: 169] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2008] [Accepted: 12/18/2008] [Indexed: 02/02/2023]
Abstract
It is generally accepted that the extent of phenotypic change between human and great apes is dissonant with the rate of molecular change. Between these two groups, proteins are virtually identical, cytogenetically there are few rearrangements that distinguish ape-human chromosomes, and rates of single-base-pair change and retrotransposon activity have slowed particularly within hominid lineages when compared to rodents or monkeys. Studies of gene family evolution indicate that gene loss and gain are enriched within the primate lineage. Here, we perform a systematic analysis of duplication content of four primate genomes (macaque, orang-utan, chimpanzee and human) in an effort to understand the pattern and rates of genomic duplication during hominid evolution. We find that the ancestral branch leading to human and African great apes shows the most significant increase in duplication activity both in terms of base pairs and in terms of events. This duplication acceleration within the ancestral species is significant when compared to lineage-specific rate estimates even after accounting for copy-number polymorphism and homoplasy. We discover striking examples of recurrent and independent gene-containing duplications within the gorilla and chimpanzee that are absent in the human lineage. Our results suggest that the evolutionary properties of copy-number mutation differ significantly from other forms of genetic mutation and, in contrast to the hominid slowdown of single-base-pair mutations, there has been a genomic burst of duplication activity at this period during human evolution.
Collapse
|
113
|
Schaschl H, Aitman TJ, Vyse TJ. Copy number variation in the human genome and its implication in autoimmunity. Clin Exp Immunol 2009; 156:12-6. [PMID: 19220326 DOI: 10.1111/j.1365-2249.2008.03865.x] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
The causes of autoimmune disease remain poorly defined. However, it is known that genetic factors contribute to disease susceptibility. Hitherto, studies have focused upon single nucleotide polymorphisms as both tools for mapping and as probable causal variants. Recent studies, using genome-wide analytical techniques, have revealed that, in the genome, segments of DNA ranging in size from kilobases to megabases can vary in copy number. These changes of DNA copy number represent an important element of genomic polymorphism in humans and in other species and may therefore make a substantial contribution to phenotypic variation and population differentiation. Furthermore, copy number variation (CNV) in genomic regions harbouring dosage-sensitive genes may cause or predispose to a variety of human genetic diseases. Several recent studies have reported an association between CNV and autoimmunity in humans such as systemic lupus, psoriasis, Crohn's disease, rheumatoid arthritis and type 1 diabetes. The use of novel analytical techniques facilitates the study of complex human genomic structures such as CNV, and allows new susceptibility loci for autoimmunity to be found that are not readily mappable by single nucleotide polymorphism-based association analyses alone.
Collapse
Affiliation(s)
- H Schaschl
- Imperial College London, Faculty of Medicine, Section of Molecular Genetics and Rheumatology, Hammersmith Campus, London, UK
| | | | | |
Collapse
|
114
|
Zody MC, Jiang Z, Fung HC, Antonacci F, Hillier LW, Cardone MF, Graves TA, Kidd JM, Cheng Z, Abouelleil A, Chen L, Wallis J, Glasscock J, Wilson RK, Reily AD, Duckworth J, Ventura M, Hardy J, Warren WC, Eichler EE. Evolutionary toggling of the MAPT 17q21.31 inversion region. Nat Genet 2009; 40:1076-83. [PMID: 19165922 DOI: 10.1038/ng.193] [Citation(s) in RCA: 146] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Using comparative sequencing approaches, we investigated the evolutionary history of the European-enriched 17q21.31 MAPT inversion polymorphism. We present a detailed, BAC-based sequence assembly of the inverted human H2 haplotype and compare it to the sequence structure and genetic variation of the corresponding 1.5-Mb region for the noninverted H1 human haplotype and that of chimpanzee and orangutan. We found that inversion of the MAPT region is similarly polymorphic in other great ape species, and we present evidence that the inversions occurred independently in chimpanzees and humans. In humans, the inversion breakpoints correspond to core duplications with the LRRC37 gene family. Our analysis favors the H2 configuration and sequence haplotype as the likely great ape and human ancestral state, with inversion recurrences during primate evolution. We show that the H2 architecture has evolved more extensive sequence homology, perhaps explaining its tendency to undergo microdeletion associated with mental retardation in European populations.
Collapse
Affiliation(s)
- Michael C Zody
- Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
115
|
Clemente JC, Ikeo K, Valiente G, Gojobori T. Optimized ancestral state reconstruction using Sankoff parsimony. BMC Bioinformatics 2009; 10:51. [PMID: 19200389 PMCID: PMC2677398 DOI: 10.1186/1471-2105-10-51] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2008] [Accepted: 02/07/2009] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Parsimony methods are widely used in molecular evolution to estimate the most plausible phylogeny for a set of characters. Sankoff parsimony determines the minimum number of changes required in a given phylogeny when a cost is associated to transitions between character states. Although optimizations exist to reduce the computations in the number of taxa, the original algorithm takes time O(n(2)) in the number of states, making it impractical for large values of n. RESULTS In this study we introduce an optimization of Sankoff parsimony for the reconstruction of ancestral states when ultrametric or additive cost matrices are used. We analyzed its performance for randomly generated matrices, Jukes-Cantor and Kimura's two-parameter models of DNA evolution, and in the reconstruction of elongation factor-1alpha and ancestral metabolic states of a group of eukaryotes, showing that in all cases the execution time is significantly less than with the original implementation. CONCLUSION The algorithms here presented provide a fast computation of Sankoff parsimony for a given phylogeny. Problems where the number of states is large, such as reconstruction of ancestral metabolism, are particularly adequate for this optimization. Since we are reducing the computations required to calculate the parsimony cost of a single tree, our method can be combined with optimizations in the number of taxa that aim at finding the most parsimonious tree.
Collapse
Affiliation(s)
- José C Clemente
- Center for Information Biology and DNA Databank of Japan, National Institute of Genetics, Yata 1111, Mishima, Japan
| | - Kazuho Ikeo
- Center for Information Biology and DNA Databank of Japan, National Institute of Genetics, Yata 1111, Mishima, Japan
| | | | - Takashi Gojobori
- Center for Information Biology and DNA Databank of Japan, National Institute of Genetics, Yata 1111, Mishima, Japan
| |
Collapse
|
116
|
Koszul R, Fischer G. A prominent role for segmental duplications in modeling Eukaryotic genomes. C R Biol 2009; 332:254-66. [DOI: 10.1016/j.crvi.2008.07.005] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2008] [Accepted: 07/12/2008] [Indexed: 01/22/2023]
|
117
|
Peng Q, Alekseyev MA, Tesler G, Pevzner PA. Decoding Synteny Blocks and Large-Scale Duplications in Mammalian and Plant Genomes. LECTURE NOTES IN COMPUTER SCIENCE 2009. [DOI: 10.1007/978-3-642-04241-6_19] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
118
|
Abstract
Copy number variation (CNV) is a source of genetic diversity in humans. Numerous CNVs are being identified with various genome analysis platforms, including array comparative genomic hybridization (aCGH), single nucleotide polymorphism (SNP) genotyping platforms, and next-generation sequencing. CNV formation occurs by both recombination-based and replication-based mechanisms and de novo locus-specific mutation rates appear much higher for CNVs than for SNPs. By various molecular mechanisms, including gene dosage, gene disruption, gene fusion, position effects, etc., CNVs can cause Mendelian or sporadic traits, or be associated with complex diseases. However, CNV can also represent benign polymorphic variants. CNVs, especially gene duplication and exon shuffling, can be a predominant mechanism driving gene and genome evolution.
Collapse
Affiliation(s)
- Feng Zhang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | | | | | | |
Collapse
|
119
|
Richard GF, Kerrest A, Dujon B. Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiol Mol Biol Rev 2008; 72:686-727. [PMID: 19052325 PMCID: PMC2593564 DOI: 10.1128/mmbr.00011-08] [Citation(s) in RCA: 335] [Impact Index Per Article: 20.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Repeated elements can be widely abundant in eukaryotic genomes, composing more than 50% of the human genome, for example. It is possible to classify repeated sequences into two large families, "tandem repeats" and "dispersed repeats." Each of these two families can be itself divided into subfamilies. Dispersed repeats contain transposons, tRNA genes, and gene paralogues, whereas tandem repeats contain gene tandems, ribosomal DNA repeat arrays, and satellite DNA, itself subdivided into satellites, minisatellites, and microsatellites. Remarkably, the molecular mechanisms that create and propagate dispersed and tandem repeats are specific to each class and usually do not overlap. In the present review, we have chosen in the first section to describe the nature and distribution of dispersed and tandem repeats in eukaryotic genomes in the light of complete (or nearly complete) available genome sequences. In the second part, we focus on the molecular mechanisms responsible for the fast evolution of two specific classes of tandem repeats: minisatellites and microsatellites. Given that a growing number of human neurological disorders involve the expansion of a particular class of microsatellites, called trinucleotide repeats, a large part of the recent experimental work on microsatellites has focused on these particular repeats, and thus we also review the current knowledge in this area. Finally, we propose a unified definition for mini- and microsatellites that takes into account their biological properties and try to point out new directions that should be explored in a near future on our road to understanding the genetics of repeated sequences.
Collapse
Affiliation(s)
- Guy-Franck Richard
- Institut Pasteur, Unité de Génétique Moléculaire des Levures, CNRS, URA2171, Université Pierre et Marie Curie, UFR927, 25 rue du Dr. Roux, F-75015, Paris, France.
| | | | | |
Collapse
|
120
|
Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E. EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res 2008; 19:327-35. [PMID: 19029536 DOI: 10.1101/gr.073585.107] [Citation(s) in RCA: 858] [Impact Index Per Article: 53.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
We have developed a comprehensive gene orientated phylogenetic resource, EnsemblCompara GeneTrees, based on a computational pipeline to handle clustering, multiple alignment, and tree generation, including the handling of large gene families. We developed two novel non-sequence-based metrics of gene tree correctness and benchmarked a number of tree methods. The TreeBeST method from TreeFam shows the best performance in our hands. We also compared this phylogenetic approach to clustering approaches for ortholog prediction, showing a large increase in coverage using the phylogenetic approach. All data are made available in a number of formats and will be kept up to date with the Ensembl project.
Collapse
Affiliation(s)
- Albert J Vilella
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | | | | | | | | | | |
Collapse
|
121
|
Girirajan S, Chen L, Graves T, Marques-Bonet T, Ventura M, Fronick C, Fulton L, Rocchi M, Fulton RS, Wilson RK, Mardis ER, Eichler EE. Sequencing human-gibbon breakpoints of synteny reveals mosaic new insertions at rearrangement sites. Genome Res 2008; 19:178-90. [PMID: 19029537 DOI: 10.1101/gr.086041.108] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The gibbon genome exhibits extensive karyotypic diversity with an increased rate of chromosomal rearrangements during evolution. In an effort to understand the mechanistic origin and implications of these rearrangement events, we sequenced 24 synteny breakpoint regions in the white-cheeked gibbon (Nomascus leucogenys, NLE) in the form of high-quality BAC insert sequences (4.2 Mbp). While there is a significant deficit of breakpoints in genes, we identified seven human gene structures involved in signaling pathways (DEPDC4, GNG10), phospholipid metabolism (ENPP5, PLSCR2), beta-oxidation (ECH1), cellular structure and transport (HEATR4), and transcription (ZNF461), that have been disrupted in the NLE gibbon lineage. Notably, only three of these genes show the expected evolutionary signatures of pseudogenization. Sequence analysis of the breakpoints suggested both nonclassical nonhomologous end-joining (NHEJ) and replication-based mechanisms of rearrangement. A substantial number (11/24) of human-NLE gibbon breakpoints showed new insertions of gibbon-specific repeats and mosaic structures formed from disparate sequences including segmental duplications, LINE, SINE, and LTR elements. Analysis of these sites provides a model for a replication-dependent repair mechanism for double-strand breaks (DSBs) at rearrangement sites and insights into the structure and formation of primate segmental duplications at sites of genomic rearrangements during evolution.
Collapse
Affiliation(s)
- Santhosh Girirajan
- Department of Genome Sciences, Howard Hughes Medical Institute, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
122
|
Woldringh G, Janssen I, Hehir-Kwa J, van den Elzen C, Kremer J, de Boer P, Schoenmakers E. Constitutional DNA copy number changes in ICSI children. Hum Reprod 2008; 24:233-40. [DOI: 10.1093/humrep/den323] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
123
|
Varki A, Geschwind DH, Eichler EE. Explaining human uniqueness: genome interactions with environment, behaviour and culture. Nat Rev Genet 2008; 9:749-63. [PMID: 18802414 DOI: 10.1038/nrg2428] [Citation(s) in RCA: 107] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
What makes us human? Specialists in each discipline respond through the lens of their own expertise. In fact, 'anthropogeny' (explaining the origin of humans) requires a transdisciplinary approach that eschews such barriers. Here we take a genomic and genetic perspective towards molecular variation, explore systems analysis of gene expression and discuss an organ-systems approach. Rejecting any 'genes versus environment' dichotomy, we then consider genome interactions with environment, behaviour and culture, finally speculating that aspects of human uniqueness arose because of a primate evolutionary trend towards increasing and irreversible dependence on learned behaviours and culture - perhaps relaxing allowable thresholds for large-scale genomic diversity.
Collapse
Affiliation(s)
- Ajit Varki
- Center for Academic Research and Training in Anthropogeny, University of California, San Diego, La Jolla, California 92093, USA.
| | | | | |
Collapse
|
124
|
Kim PM, Lam HYK, Urban AE, Korbel JO, Affourtit J, Grubert F, Chen X, Weissman S, Snyder M, Gerstein MB. Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history. Genome Res 2008; 18:1865-74. [PMID: 18842824 DOI: 10.1101/gr.081422.108] [Citation(s) in RCA: 118] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Segmental duplications (SDs) are operationally defined as >1 kb stretches of duplicated DNA with high sequence identity. They arise from copy number variants (CNVs) fixed in the population. To investigate the formation of SDs and CNVs, we examine their large-scale patterns of co-occurrence with different repeats. Alu elements, a major class of genomic repeats, had previously been identified as prime drivers of SD formation. We also observe this association; however, we find that it sharply decreases for younger SDs. Continuing this trend, we find only weak associations of CNVs with Alus. Similarly, we find an association of SDs with processed pseudogenes, which is decreasing for younger SDs and absent entirely for CNVs. Next, we find that SDs are significantly co-localized with each other, resulting in a highly skewed "power-law" distribution and chromosomal hotspots. We also observe a significant association of CNVs with SDs, but find that an SD-mediated mechanism only accounts for some CNVs (<28%). Overall, our results imply that a shift in predominant formation mechanism occurred in recent history: approximately 40 million years ago, during the "Alu burst" in retrotransposition activity, non-allelic homologous recombination, first mediated by Alus and then the by newly formed CNVs themselves, was the main driver of genome rearrangements; however, its relative importance has decreased markedly since then, with proportionally more events now stemming from other repeats and from non-homologous end-joining. In addition to a coarse-grained analysis, we performed targeted sequencing of 67 CNVs and then analyzed a combined set of 270 CNVs (540 breakpoints) to verify our conclusions.
Collapse
Affiliation(s)
- Philip M Kim
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
125
|
Münch C, Kirsch S, Fernandes AMG, Schempp W. Evolutionary analysis of the highly dynamic CHEK2 duplicon in anthropoids. BMC Evol Biol 2008; 8:269. [PMID: 18831734 PMCID: PMC2566985 DOI: 10.1186/1471-2148-8-269] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2008] [Accepted: 10/02/2008] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Segmental duplications (SDs) are euchromatic portions of genomic DNA (> or = 1 kb) that occur at more than one site within the genome, and typically share a high level of sequence identity (>90%). Approximately 5% of the human genome is composed of such duplicated sequences. Here we report the detailed investigation of CHEK2 duplications. CHEK2 is a multiorgan cancer susceptibility gene encoding a cell cycle checkpoint kinase acting in the DNA-damage response signalling pathway. The continuous presence of the CHEK2 gene in all eukaryotes and its important role in maintaining genome stability prompted us to investigate the duplicative evolution and phylogeny of CHEK2 and its paralogs during anthropoid evolution. RESULTS To study CHEK2 duplicon evolution in anthropoids we applied a combination of comparative FISH and in silico analyses. Our comparative FISH results with a CHEK2 fosmid probe revealed the single-copy status of CHEK2 in New World monkeys, Old World monkeys and gibbons. Whereas a single CHEK2 duplication was detected in orangutan, a multi-site signal pattern indicated a burst of duplication in African great apes and human. Phylogenetic analysis of paralogous and ancestral CHEK2 sequences in human, chimpanzee and rhesus macaque confirmed this burst of duplication, which occurred after the radiation of orangutan and African great apes. In addition, we used inter-species quantitative PCR to determine CHEK2 copy numbers. An amplification of CHEK2 was detected in African great apes and the highest CHEK2 copy number of all analysed species was observed in the human genome. Furthermore, we detected variation in CHEK2 copy numbers within the analysed set of human samples. CONCLUSION Our detailed analysis revealed the highly dynamic nature of CHEK2 duplication during anthropoid evolution. We determined a burst of CHEK2 duplication after the radiation of orangutan and African great apes and identified the highest CHEK2 copy number in human. In conclusion, our analysis of CHEK2 duplicon evolution revealed that SDs contribute to inter-species variation. Furthermore, our qPCR analysis led us to presume CHEK2 copy number variation in human, and molecular diagnostics of the cancer susceptibility gene CHEK2 inside the duplicated region might be hampered by the individual-specific set of duplicons.
Collapse
Affiliation(s)
- Claudia Münch
- Institute of Human Genetics and Anthropology, University of Freiburg, Breisacher Str. 33, 79106 Freiburg, Germany
| | - Stefan Kirsch
- Institute of Human Genetics and Anthropology, University of Freiburg, Breisacher Str. 33, 79106 Freiburg, Germany
| | - António MG Fernandes
- Institute of Human Genetics and Anthropology, University of Freiburg, Breisacher Str. 33, 79106 Freiburg, Germany
| | - Werner Schempp
- Institute of Human Genetics and Anthropology, University of Freiburg, Breisacher Str. 33, 79106 Freiburg, Germany
| |
Collapse
|
126
|
Symmons O, Váradi A, Arányi T. How segmental duplications shape our genome: recent evolution of ABCC6 and PKD1 Mendelian disease genes. Mol Biol Evol 2008; 25:2601-13. [PMID: 18791038 DOI: 10.1093/molbev/msn202] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The completion of the Human Genome Project has brought the understanding that our genome contains an unexpectedly large proportion of segmental duplications. This poses the challenge of elucidating the consequences of recent duplications on physiology. We have conducted an in-depth study of a subset of segmental duplications on chromosome 16. We focused on PKD1 and ABCC6 duplications because mutations affecting these genes are responsible for the Mendelian disorders autosomal dominant polycystic kidney disease and pseudoxanthoma elasticum, respectively. We establish that duplications of PKD1 and ABCC6 are associated to low-copy repeat 16a and show that such duplications have occurred several times independently in different primate species. We demonstrate that partial duplication of PKD1 and ABCC6 has numerous consequences: the pseudogenes give rise to new transcripts and mediate gene conversion, which not only results in disease-causing mutations but also serves as a reservoir for sequence variation. The duplicated segments are also involved in submicroscopic and microscopic genomic rearrangements, contributing to structural variation in human and chromosomal break points in the gibbon. In conclusion, our data shed light on the recent and ongoing evolution of chromosome 16 mediated by segmental duplication and deepen our understanding of the history of two Mendelian disorder genes.
Collapse
Affiliation(s)
- Orsolya Symmons
- Institute of Enzymology, Hungarian Academy of Sciences, Budapest, Hungary
| | | | | |
Collapse
|
127
|
Payen C, Koszul R, Dujon B, Fischer G. Segmental duplications arise from Pol32-dependent repair of broken forks through two alternative replication-based mechanisms. PLoS Genet 2008; 4:e1000175. [PMID: 18773114 PMCID: PMC2518615 DOI: 10.1371/journal.pgen.1000175] [Citation(s) in RCA: 149] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2007] [Accepted: 07/18/2008] [Indexed: 11/18/2022] Open
Abstract
The propensity of segmental duplications (SDs) to promote genomic instability is of increasing interest since their involvement in numerous human genomic diseases and cancers was revealed. However, the mechanism(s) responsible for their appearance remain mostly speculative. Here, we show that in budding yeast, replication accidents, which are most likely transformed into broken forks, play a causal role in the formation of SDs. The Pol32 subunit of the major replicative polymerase Polδ is required for all SD formation, demonstrating that SDs result from untimely DNA synthesis rather than from unequal crossing-over. Although Pol32 is known to be required for classical (Rad52-dependant) break-induced replication, only half of the SDs can be attributed to this mechanism. The remaining SDs are generated through a Rad52-independent mechanism of template switching between microsatellites or microhomologous sequences. This new mechanism, named microhomology/microsatellite-induced replication (MMIR), differs from all known DNA double-strand break repair pathways, as MMIR-mediated duplications still occur in the combined absence of homologous recombination, microhomology-mediated, and nonhomologous end joining machineries. The interplay between these two replication-based pathways explains important features of higher eukaryotic genomes, such as the strong, but not strict, association between SDs and transposable elements, as well as the frequent formation of oncogenic fusion genes generating protein innovations at SD junctions. Duplications of long segments of chromosomes are frequently observed in multicellular organisms (∼5% of our genome, for instance). They appear as a fundamental trait of the recent genome evolution in great apes and are often associated with chromosomal instability, capable of increasing genetic polymorphism among individuals, but also having dramatic consequences as a source of diseases and cancer. Despite their importance, the molecular mechanisms of formation of segmental duplications remain unclear. Using a specifically designed experimental system in the baker's yeast Saccharomyces cerevisiae, hundreds of naturally occurring segmental duplications encompassing dozens of genes were selected. With the help of modern molecular methods coupled to detailed genetic analysis, we show that such duplication events are frequent and result from untimely DNA synthesis accidents produced by two distinct molecular mechanisms: the well-known break-induced replication and a novel mechanism of template switching between low-complexity or microhomologous sequences. These two mechanisms, rather than unequal recombination events, contribute in comparable proportions to duplication formation, the latter being prone to create novel gene fusions at chromosomal junctions. The mechanisms identified in yeast could explain the origin of a variety of genetic diseases in human, such as hemophilia A, Pelizaeus-Merzbacher disease, or some neurological disorders.
Collapse
Affiliation(s)
- Celia Payen
- Institut Pasteur, Unité de Génétique Moléculaire des Levures, CNRS, URA2171, Université Pierre et Marie Curie-Paris 6, UFR927, Paris, France
| | - Romain Koszul
- Institut Pasteur, Unité de Génétique Moléculaire des Levures, CNRS, URA2171, Université Pierre et Marie Curie-Paris 6, UFR927, Paris, France
| | - Bernard Dujon
- Institut Pasteur, Unité de Génétique Moléculaire des Levures, CNRS, URA2171, Université Pierre et Marie Curie-Paris 6, UFR927, Paris, France
| | - Gilles Fischer
- Institut Pasteur, Unité de Génétique Moléculaire des Levures, CNRS, URA2171, Université Pierre et Marie Curie-Paris 6, UFR927, Paris, France
- * E-mail:
| |
Collapse
|
128
|
Perry GH, Yang F, Marques-Bonet T, Murphy C, Fitzgerald T, Lee AS, Hyland C, Stone AC, Hurles ME, Tyler-Smith C, Eichler EE, Carter NP, Lee C, Redon R. Copy number variation and evolution in humans and chimpanzees. Genome Res 2008; 18:1698-710. [PMID: 18775914 DOI: 10.1101/gr.082016.108] [Citation(s) in RCA: 180] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Copy number variants (CNVs) underlie many aspects of human phenotypic diversity and provide the raw material for gene duplication and gene family expansion. However, our understanding of their evolutionary significance remains limited. We performed comparative genomic hybridization on a single human microarray platform to identify CNVs among the genomes of 30 humans and 30 chimpanzees as well as fixed copy number differences between species. We found that human and chimpanzee CNVs occur in orthologous genomic regions far more often than expected by chance and are strongly associated with the presence of highly homologous intrachromosomal segmental duplications. By adapting population genetic analyses for use with copy number data, we identified functional categories of genes that have likely evolved under purifying or positive selection for copy number changes. In particular, duplications and deletions of genes with inflammatory response and cell proliferation functions may have been fixed by positive selection and involved in the adaptive phenotypic differentiation of humans and chimpanzees.
Collapse
Affiliation(s)
- George H Perry
- School of Human Evolution & Social Change, Arizona State University, Tempe, Arizona 85287, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
129
|
Abstract
Comparative genomics is a powerful tool for gaining insight into genomic function and evolution. However, in plants, sequence data that would enable detailed comparisons of both coding and noncoding regions have been limited in availability. Here we report the generation and analysis of sequences for an unduplicated conserved syntenic segment (CSS) in the genomes of five members of the agriculturally important plant family Solanaceae. This CSS includes a 105-kb region of tomato chromosome 2 and orthologous regions of the potato, eggplant, pepper, and petunia genomes. With a total neutral divergence of 0.73-0.78 substitutions/site, these sequences are similar enough that most noncoding regions can be aligned, yet divergent enough to be informative about evolutionary dynamics and selective pressures. The CSS contains 17 distinct genes with generally conserved order and orientation, but with numerous small-scale differences between species. Our analysis indicates that the last common ancestor of these species lived approximately 27-36 million years ago, that more than one-third of short genomic segments (5-15 bp) are under selection, and that more than two-thirds of selected bases fall in noncoding regions. In addition, we identify genes under positive selection and analyze hundreds of conserved noncoding elements. This analysis provides a window into 30 million years of plant evolution in the absence of polyploidization.
Collapse
|
130
|
Affiliation(s)
- Crystal L Kahn
- Department of Computer Science, Brown University, Providence, RI, USA.
| | | |
Collapse
|
131
|
Zheng D. Asymmetric histone modifications between the original and derived loci of human segmental duplications. Genome Biol 2008; 9:R105. [PMID: 18598352 PMCID: PMC2530858 DOI: 10.1186/gb-2008-9-7-r105] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2008] [Revised: 06/23/2008] [Accepted: 07/03/2008] [Indexed: 11/10/2022] Open
Abstract
A systematic analysis of histone modifications between human segmental duplications shows that two seemingly identical genomic copies have distinct epigenomic properties. Background Sequencing and annotation of several mammalian genomes have revealed that segmental duplications are a common architectural feature of primate genomes; in fact, about 5% of the human genome is composed of large blocks of interspersed segmental duplications. These segmental duplications have been implicated in genomic copy-number variation, gene novelty, and various genomic disorders. However, the molecular processes involved in the evolution and regulation of duplicated sequences remain largely unexplored. Results In this study, the profile of about 20 histone modifications within human segmental duplications was characterized using high-resolution, genome-wide data derived from a ChIP-Seq study. The analysis demonstrates that derivative loci of segmental duplications often differ significantly from the original with respect to many histone methylations. Further investigation showed that genes are present three times more frequently in the original than in the derivative, whereas pseudogenes exhibit the opposite trend. These asymmetries tend to increase with the age of segmental duplications. The uneven distribution of genes and pseudogenes does not, however, fully account for the asymmetry in the profile of histone modifications. Conclusion The first systematic analysis of histone modifications between segmental duplications demonstrates that two seemingly 'identical' genomic copies are distinct in their epigenomic properties. Results here suggest that local chromatin environments may be implicated in the discrimination of derived copies of segmental duplications from their originals, leading to a biased pseudogenization of the new duplicates. The data also indicate that further exploration of the interactions between histone modification and sequence degeneration is necessary in order to understand the divergence of duplicated sequences.
Collapse
Affiliation(s)
- Deyou Zheng
- Institute for Brain Disorders and Neural Regeneration, The Saul R, Korey Department of Neurology, Albert Einstein College of Medicine, Rose F, Kennedy Center 915B, 1410 Pelham Parkway South, Bronx, NY 10461, USA.
| |
Collapse
|
132
|
Evolution of major histocompatibility complex by "en bloc" duplication before mammalian radiation. Immunogenetics 2008; 60:423-38. [PMID: 18560826 DOI: 10.1007/s00251-008-0301-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2008] [Accepted: 04/28/2008] [Indexed: 01/06/2023]
Abstract
Duplications are an important mechanism for the emergence of genetic novelties. Reports on duplicated genes are numerous, and mechanisms for polyploidization or local gene duplication are beginning to be understood. When a local duplication is studied, searches are usually done gene-by-gene, and the size of duplicated segments is not often investigated. Therefore, we do not know if the gene in question has duplicated alone or with other genes, implying that "en bloc" duplications are poorly studied. We propose a method for identification of "en bloc" duplication using mapping, phylogenetic and statistical analyses. We show that two segments present in the major histocompatibility complex (MHC) region of human chromosome 6 have resulted from an "en bloc" duplication that took place between divergence of amniotes and methaterian/eutherian separation. These segments contain members of the same multigenic families, namely olfactory receptors genes, genes encoding proteins containing B30.2 domain, genes encoding proteins containing immunoglobulin V domain and MHC class I genes. We will discuss the fact that olfactory receptors and MHC genes have undergone positive selection, which could have helped in fixation of the surrounding genes.
Collapse
|
133
|
Korbel JO, Kim PM, Chen X, Urban AE, Weissman S, Snyder M, Gerstein MB. The current excitement about copy-number variation: how it relates to gene duplications and protein families. Curr Opin Struct Biol 2008; 18:366-74. [PMID: 18511261 DOI: 10.1016/j.sbi.2008.02.005] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2008] [Accepted: 02/13/2008] [Indexed: 01/28/2023]
Abstract
Following recent technological advances there has been an increasing interest in genome structural variants (SVs), in particular copy-number variants (CNVs)--large-scale duplications and deletions. Although not immediately evident, CNV surveys make a conceptual connection between the fields of population genetics and protein families, in particular with regard to the stability and expandability of families. The mechanisms giving rise to CNVs can be considered as fundamental processes underlying gene duplication and loss; duplicated genes being the results of 'successful' copies, fixed and maintained in the population. Conversely, many 'unsuccessful' duplicates remain in the genome as pseudogenes. Here, we survey studies on CNVs, highlighting issues related to protein families. In particular, CNVs tend to affect specific gene functional categories, such as those associated with environmental response, and are depleted in genes related to basic cellular processes. Furthermore, CNVs occur more often at the periphery of the protein interaction network. In comparison, protein families associated with successful and unsuccessful duplicates are associated with similar functional categories but are differentially placed in the interaction network. These trends are likely reflective of CNV formation biases and natural selection, both of which differentially influence distinct protein families.
Collapse
Affiliation(s)
- Jan O Korbel
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT 06520, USA
| | | | | | | | | | | | | |
Collapse
|
134
|
Jiang Z, Hubley R, Smit A, Eichler EE. DupMasker: a tool for annotating primate segmental duplications. Genome Res 2008; 18:1362-8. [PMID: 18502942 DOI: 10.1101/gr.078477.108] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Segmental duplications (SDs) play an important role in genome rearrangement, evolution, and the copy-number variation (CNV) of primate genomes. Such sequences are difficult to detect, a priori, because they share no defining sequence features that distinguish them from unique portions of the genome. Current sequence annotation of segmental duplications requires computationally intensive, genome-wide self-comparisons that cannot be easily implemented on new data sets. Based on the successful implementation of RepeatMasker, we developed a new genome annotation tool, DupMasker. The program uses a library of nonredundant consensus sequences of human segmental duplications, wherein a majority of the ancestral origins have been determined based on comparisons to mammalian outgroup genomes. Using DupMasker, new human and nonhuman primate (NHP) sequences may be readily queried to provide details on the origin and degree of sequence identity of each duplicon. This program can be applied to delineate the order and orientation of duplicons within complex duplication blocks and used to characterize structural variation differences between sequenced human haplotypes. We predict this tool will be valuable in the annotation of large-insert sequence clones, allowing putative unique and duplicated regions of the genomes to be annotated prior to whole genome assembly comparisons.
Collapse
Affiliation(s)
- Zhaoshi Jiang
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | | | | | | |
Collapse
|
135
|
Kirsch S, Münch C, Jiang Z, Cheng Z, Chen L, Batz C, Eichler EE, Schempp W. Evolutionary dynamics of segmental duplications from human Y-chromosomal euchromatin/heterochromatin transition regions. Genome Res 2008; 18:1030-42. [PMID: 18445620 DOI: 10.1101/gr.076711.108] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Human chromosomal regions enriched in segmental duplications are subject to extensive genomic reorganization. Such regions are particularly informative for illuminating the evolutionary history of a given chromosome. We have analyzed 866 kb of Y-chromosomal non-palindromic segmental duplications delineating four euchromatin/heterochromatin transition regions (Yp11.2/Yp11.1, Yq11.1/Yq11.21, Yq11.23/Yq12, and Yq12/PAR2). Several computational methods were applied to decipher the segmental duplication architecture and identify the ancestral origin of the 41 different duplicons. Combining computational and comparative FISH analysis, we reconstruct the evolutionary history of these regions. Our analysis indicates a continuous process of transposition of duplicated sequences onto the evolving higher primate Y chromosome, providing unique insights into the development of species-specific Y-chromosomal and autosomal duplicons. Phylogenetic sequence comparisons show that duplicons of the human Yp11.2/Yp11.1 region were already present in the macaque-human ancestor as multiple paralogs located predominantly in subtelomeric regions. In contrast, duplicons from the Yq11.1/Yq11.21, Yq11.23/Yq12, and Yq12/PAR2 regions show no evidence of duplication in rhesus macaque, but map to the pericentromeric regions in chimpanzee and human. This suggests an evolutionary shift in the direction of duplicative transposition events from subtelomeric in Old World monkeys to pericentromeric in the human/ape lineage. Extensive chromosomal relocation of autosomal-duplicated sequences from euchromatin/heterochromatin transition regions to interstitial regions as demonstrated on the pygmy chimpanzee Y chromosome support a model in which substantial reorganization and amplification of duplicated sequences may contribute to speciation.
Collapse
Affiliation(s)
- Stefan Kirsch
- Institute of Human Genetics, University of Freiburg, 79106 Freiburg, Germany
| | | | | | | | | | | | | | | |
Collapse
|
136
|
Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 2008; 18:821-9. [PMID: 18349386 DOI: 10.1101/gr.074492.107] [Citation(s) in RCA: 7066] [Impact Index Per Article: 441.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
We have developed a new set of algorithms, collectively called "Velvet," to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words (k-mers) that is ideal for high coverage, very short read (25-50 bp) data sets. Applying Velvet to very short reads and paired-ends information only, one can produce contigs of significant length, up to 50-kb N50 length in simulations of prokaryotic data and 3-kb N50 on simulated mammalian BACs. When applied to real Solexa data sets without read pairs, Velvet generated contigs of approximately 8 kb in a prokaryote and 2 kb in a mammalian BAC, in close agreement with our simulated results without read-pair information. Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies.
Collapse
Affiliation(s)
- Daniel R Zerbino
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | | |
Collapse
|
137
|
Sharp AJ, Mefford HC, Li K, Baker C, Skinner C, Stevenson RE, Schroer RJ, Novara F, De Gregori M, Ciccone R, Broomer A, Casuga I, Wang Y, Xiao C, Barbacioru C, Gimelli G, Bernardina BD, Torniero C, Giorda R, Regan R, Murday V, Mansour S, Fichera M, Castiglia L, Failla P, Ventura M, Jiang Z, Cooper GM, Knight SJL, Romano C, Zuffardi O, Chen C, Schwartz CE, Eichler EE. A recurrent 15q13.3 microdeletion syndrome associated with mental retardation and seizures. Nat Genet 2008; 40:322-8. [PMID: 18278044 PMCID: PMC2365467 DOI: 10.1038/ng.93] [Citation(s) in RCA: 412] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2007] [Accepted: 01/07/2008] [Indexed: 11/09/2022]
Abstract
We report a recurrent microdeletion syndrome causing mental retardation, epilepsy and variable facial and digital dysmorphisms. We describe nine affected individuals, including six probands: two with de novo deletions, two who inherited the deletion from an affected parent and two with unknown inheritance. The proximal breakpoint of the largest deletion is contiguous with breakpoint 3 (BP3) of the Prader-Willi and Angelman syndrome region, extending 3.95 Mb distally to BP5. A smaller 1.5-Mb deletion has a proximal breakpoint within the larger deletion (BP4) and shares the same distal BP5. This recurrent 1.5-Mb deletion contains six genes, including a candidate gene for epilepsy (CHRNA7) that is probably responsible for the observed seizure phenotype. The BP4-BP5 region undergoes frequent inversion, suggesting a possible link between this inversion polymorphism and recurrent deletion. The frequency of these microdeletions in mental retardation cases is approximately 0.3% (6/2,082 tested), a prevalence comparable to that of Williams, Angelman and Prader-Willi syndromes.
Collapse
Affiliation(s)
- Andrew J Sharp
- Department of Genome Sciences, University of Washington School of Medicine, 1705 NE Pacific St., Seattle, Washington 98195, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
138
|
Cardone MF, Jiang Z, D'Addabbo P, Archidiacono N, Rocchi M, Eichler EE, Ventura M. Hominoid chromosomal rearrangements on 17q map to complex regions of segmental duplication. Genome Biol 2008; 9:R28. [PMID: 18257913 PMCID: PMC2374708 DOI: 10.1186/gb-2008-9-2-r28] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2007] [Revised: 01/24/2008] [Accepted: 02/07/2008] [Indexed: 01/30/2023] Open
Abstract
BACKGROUND Chromosomal rearrangements, such as translocations and inversions, are recurrent phenomena during evolution, and both of them are involved in reproductive isolation and speciation. To better understand the molecular basis of chromosome rearrangements and their part in karyotype evolution, we have investigated the history of human chromosome 17 by comparative fluorescence in situ hybridization (FISH) and sequence analysis. RESULTS Human bacterial artificial chromosome/p1 artificial chromosome probes spanning the length of chromosome 17 were used in FISH experiments on great apes, Old World monkeys and New World monkeys to study the evolutionary history of this chromosome. We observed that the macaque marker order represents the ancestral organization. Human, chimpanzee and gorilla homologous chromosomes differ by a paracentric inversion that occurred specifically in the Homo sapiens/Pan troglodytes/Gorilla gorilla ancestor. Detailed analyses of the paracentric inversion revealed that the breakpoints mapped to two regions syntenic to human 17q12/21 and 17q23, both rich in segmental duplications. CONCLUSION Sequence analyses of the human and macaque organization suggest that the duplication events occurred in the catarrhine ancestor with the duplication blocks continuing to duplicate or undergo gene conversion during evolution of the hominoid lineage. We propose that the presence of these duplicons has mediated the inversion in the H. sapiens/P. troglodytes/G. gorilla ancestor. Recently, the same duplication blocks have been shown to be polymorphic in the human population and to be involved in triggering microdeletion and duplication in human. These results further support a model where genomic architecture has a direct role in both rearrangement involved in karyotype evolution and genomic instability in human.
Collapse
Affiliation(s)
- Maria Francesca Cardone
- Department of Genetics and Microbiology, University of Bari, Via Amendola, Bari, 70126, Italy.
| | | | | | | | | | | | | |
Collapse
|
139
|
Abstract
In the last year, high-throughput sequencing technologies have progressed from proof-of-concept to production quality. While these methods produce high-quality reads, they have yet to produce reads comparable in length to Sanger-based sequencing. Current fragment assembly algorithms have been implemented and optimized for mate-paired Sanger-based reads, and thus do not perform well on short reads produced by short read technologies. We present a new Eulerian assembler that generates nearly optimal short read assemblies of bacterial genomes and describe an approach to assemble reads in the case of the popular hybrid protocol when short and long Sanger-based reads are combined.
Collapse
|
140
|
|