1
|
Guitart X, Porubsky D, Yoo D, Dougherty ML, Dishuck PC, Munson KM, Lewis AP, Hoekzema K, Knuth J, Chang S, Pastinen T, Eichler EE. Independent expansion, selection and hypervariability of the TBC1D3 gene family in humans. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.12.584650. [PMID: 38654825 PMCID: PMC11037872 DOI: 10.1101/2024.03.12.584650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
TBC1D3 is a primate-specific gene family that has expanded in the human lineage and has been implicated in neuronal progenitor proliferation and expansion of the frontal cortex. The gene family and its expression have been challenging to investigate because it is embedded in high-identity and highly variable segmental duplications. We sequenced and assembled the gene family using long-read sequencing data from 34 humans and 11 nonhuman primate species. Our analysis shows that this particular gene family has independently duplicated in at least five primate lineages, and the duplicated loci are enriched at sites of large-scale chromosomal rearrangements on chromosome 17. We find that most humans vary along two TBC1D3 clusters where human haplotypes are highly variable in copy number, differing by as many as 20 copies, and structure (structural heterozygosity 90%). We also show evidence of positive selection, as well as a significant change in the predicted human TBC1D3 protein sequence. Lastly, we find that, despite multiple duplications, human TBC1D3 expression is limited to a subset of copies and, most notably, from a single paralog group: TBC1D3-CDKL. These observations may help explain why a gene potentially important in cortical development can be so variable in the human population.
Collapse
Affiliation(s)
- Xavi Guitart
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Max L. Dougherty
- Tisch Cancer Institute, Division of Hematology and Medical Oncology, The Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Philip C. Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P. Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jordan Knuth
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Stephen Chang
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University, Stanford, CA, USA
| | - Tomi Pastinen
- Department of Pediatrics, Genomic Medicine Center, Children’s Mercy Kansas City, Kansas City, MO, USA
- Department of Pediatrics, School of Medicine, University of Missouri Kansas City, Kansas City, MO, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
2
|
Abdullaev ET, Umarova IR, Arndt PF. Modelling segmental duplications in the human genome. BMC Genomics 2021; 22:496. [PMID: 34215180 PMCID: PMC8254307 DOI: 10.1186/s12864-021-07789-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 06/10/2021] [Indexed: 11/22/2022] Open
Abstract
Background Segmental duplications (SDs) are long DNA sequences that are repeated in a genome and have high sequence identity. In contrast to repetitive elements they are often unique and only sometimes have multiple copies in a genome. There are several well-studied mechanisms responsible for segmental duplications: non-allelic homologous recombination, non-homologous end joining and replication slippage. Such duplications play an important role in evolution, however, we do not have a full understanding of the dynamic properties of the duplication process. Results We study segmental duplications through a graph representation where nodes represent genomic regions and edges represent duplications between them. The resulting network (the SD network) is quite complex and has distinct features which allow us to make inference on the evolution of segmantal duplications. We come up with the network growth model that explains features of the SD network thus giving us insights on dynamics of segmental duplications in the human genome. Based on our analysis of genomes of other species the network growth model seems to be applicable for multiple mammalian genomes. Conclusions Our analysis suggests that duplication rates of genomic loci grow linearly with the number of copies of a duplicated region. Several scenarios explaining such a preferential duplication rates were suggested. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-021-07789-7).
Collapse
Affiliation(s)
- Eldar T Abdullaev
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestraße 63/73, Berlin, 14195, Germany.
| | - Iren R Umarova
- Faculty of Computational Mathematics and Cybernetics, Moscow State University, Leninskiye Gory 1-52, Moscow, 119991, Russia
| | - Peter F Arndt
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestraße 63/73, Berlin, 14195, Germany
| |
Collapse
|
3
|
Cruz MAD, Lund D, Szekeres F, Karlsson S, Faresjö M, Larsson D. Cis-regulatory elements in conserved non-coding sequences of nuclear receptor genes indicate for crosstalk between endocrine systems. Open Med (Wars) 2021; 16:640-650. [PMID: 33954257 PMCID: PMC8051167 DOI: 10.1515/med-2021-0264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 02/01/2021] [Accepted: 03/09/2021] [Indexed: 11/16/2022] Open
Abstract
Nuclear receptors (NRs) are ligand-activated transcription factors that regulate gene expression when bound to specific DNA sequences. Crosstalk between steroid NR systems has been studied for understanding the development of hormone-driven cancers but not to an extent at a genetic level. This study aimed to investigate crosstalk between steroid NRs in conserved intron and exon sequences, with a focus on steroid NRs involved in prostate cancer etiology. For this purpose, we evaluated conserved intron and exon sequences among all 49 members of the NR Superfamily (NRS) and their relevance as regulatory sequences and NR-binding sequences. Sequence conservation was found to be higher in the first intron (35%), when compared with downstream introns. Seventy-nine percent of the conserved regions in the NRS contained putative transcription factor binding sites (TFBS) and a large fraction of these sequences contained splicing sites (SS). Analysis of transcription factors binding to putative intronic and exonic TFBS revealed that 5 and 16%, respectively, were NRs. The present study suggests crosstalk between steroid NRs, e.g., vitamin D, estrogen, progesterone, and retinoic acid endocrine systems, through cis-regulatory elements in conserved sequences of introns and exons. This investigation gives evidence for crosstalk between steroid hormones and contributes to novel targets for steroid NR regulation.
Collapse
Affiliation(s)
- Maria Araceli Diaz Cruz
- Research School of Health and Welfare, School of Health and Welfare, Jönköping University, Jönköping, Sweden
| | - Dan Lund
- Department of Natural Science and Biomedicine, School of Health and Welfare, Jönköping University, Jönköping, Sweden
| | - Ferenc Szekeres
- Department of Biomedicine, School of Health Sciences, University of Skövde, Skövde, Sweden
| | - Sandra Karlsson
- Department of Natural Science and Biomedicine, School of Health and Welfare, Jönköping University, Jönköping, Sweden
| | - Maria Faresjö
- Department of Natural Science and Biomedicine, School of Health and Welfare, Jönköping University, Jönköping, Sweden
| | - Dennis Larsson
- Sahlgrenska University Hospital, Gothia Forum for Clinical Research, Gothenburg, Sweden
| |
Collapse
|
4
|
McCartney AM, Hyland EM, Cormican P, Moran RJ, Webb AE, Lee KD, Hernandez-Rodriguez J, Prado-Martinez J, Creevey CJ, Aspden JL, McInerney JO, Marques-Bonet T, O'Connell MJ. Gene Fusions Derived by Transcriptional Readthrough are Driven by Segmental Duplication in Human. Genome Biol Evol 2020; 11:2678-2690. [PMID: 31400206 PMCID: PMC6764479 DOI: 10.1093/gbe/evz163] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/17/2019] [Indexed: 12/14/2022] Open
Abstract
Gene fusion occurs when two or more individual genes with independent open reading frames becoming juxtaposed under the same open reading frame creating a new fused gene. A small number of gene fusions described in detail have been associated with novel functions, for example, the hominid-specific PIPSL gene, TNFSF12, and the TWE-PRIL gene family. We use Sequence Similarity Networks and species level comparisons of great ape genomes to identify 45 new genes that have emerged by transcriptional readthrough, that is, transcription-derived gene fusion. For 35 of these putative gene fusions, we have been able to assess available RNAseq data to determine whether there are reads that map to each breakpoint. A total of 29 of the putative gene fusions had annotated transcripts (9/29 of which are human-specific). We carried out RT-qPCR in a range of human tissues (placenta, lung, liver, brain, and testes) and found that 23 of the putative gene fusion events were expressed in at least one tissue. Examining the available ribosome foot-printing data, we find evidence for translation of three of the fused genes in human. Finally, we find enrichment for transcription-derived gene fusions in regions of known segmental duplication in human. Together, our results implicate chromosomal structural variation brought about by segmental duplication with the emergence of novel transcripts and translated protein products.
Collapse
Affiliation(s)
- Ann M McCartney
- Bioinformatics and Molecular Evolution Group, School of Biotechnology, Dublin City University, Ireland.,Computational and Molecular Evolutionary Biology Group, School of Biology, Faculty of Biological Sciences, The University of Leeds, United Kingdom
| | - Edel M Hyland
- Bioinformatics and Molecular Evolution Group, School of Biotechnology, Dublin City University, Ireland.,Institute for Global Food Security, Queens University Belfast, United Kingdom
| | - Paul Cormican
- Teagasc Animal and Bioscience Research Department, Animal & Grassland Research and Innovation Centre, Teagasc, Grange, Dunsany, County Meath, Ireland
| | - Raymond J Moran
- Bioinformatics and Molecular Evolution Group, School of Biotechnology, Dublin City University, Ireland.,Computational and Molecular Evolutionary Biology Group, School of Biology, Faculty of Biological Sciences, The University of Leeds, United Kingdom
| | - Andrew E Webb
- Bioinformatics and Molecular Evolution Group, School of Biotechnology, Dublin City University, Ireland
| | - Kate D Lee
- Bioinformatics and Molecular Evolution Group, School of Biotechnology, Dublin City University, Ireland.,School of Biological Sciences, University of Auckland, New Zealand.,School of Fundamental Sciences, Massey University, New Zealand
| | | | - Javier Prado-Martinez
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain.,Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Christopher J Creevey
- Institute for Global Food Security, Queens University Belfast, United Kingdom.,Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, United Kingdom
| | - Julie L Aspden
- School of Molecular and Cellular Biology, Faculty of Biological Sciences, The University of Leeds, United Kingdom
| | - James O McInerney
- Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, M13 9PL, United Kingdom.,School of Life Sciences, Faculty of Medicine and Health Sciences, The University of Nottingham, NG7 2RD, United Kingdom
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain.,Catalan Institution of Research and Advanced Studies (ICREA), Passeig de Lluís Companys, 23, 08010, Barcelona, Spain.,NAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain.,Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, c/ Columnes s/n, 08193 Cerdanyola del Vallés, Barcelona, Spain
| | - Mary J O'Connell
- Bioinformatics and Molecular Evolution Group, School of Biotechnology, Dublin City University, Ireland.,Computational and Molecular Evolutionary Biology Group, School of Biology, Faculty of Biological Sciences, The University of Leeds, United Kingdom.,School of Life Sciences, Faculty of Medicine and Health Sciences, The University of Nottingham, NG7 2RD, United Kingdom
| |
Collapse
|
5
|
VOLLGER MITCHELLR, LOGSDON GLENNISA, AUDANO PETERA, SULOVARI ARVIS, PORUBSKY DAVID, PELUSO PAUL, WENGER AARONM, CONCEPCION GREGORYT, KRONENBERG ZEVN, MUNSON KATHERINEM, BAKER CARL, SANDERS ASHLEYD, SPIERINGS DIANAC, LANSDORP PETERM, SURTI URVASHI, HUNKAPILLER MICHAELW, EICHLER EVANE. Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads. Ann Hum Genet 2020; 84:125-140. [PMID: 31711268 PMCID: PMC7015760 DOI: 10.1111/ahg.12364] [Citation(s) in RCA: 73] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Revised: 10/17/2019] [Accepted: 10/18/2019] [Indexed: 01/14/2023]
Abstract
The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes.
Collapse
Affiliation(s)
- MITCHELL R. VOLLGER
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- These authors contributed equally to this work
| | - GLENNIS A. LOGSDON
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- These authors contributed equally to this work
| | - PETER A. AUDANO
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - ARVIS SULOVARI
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - DAVID PORUBSKY
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - PAUL PELUSO
- Pacific Biosciences of California, Inc., Menlo Park, CA 94025, USA
| | - AARON M. WENGER
- Pacific Biosciences of California, Inc., Menlo Park, CA 94025, USA
| | | | | | - KATHERINE M. MUNSON
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - CARL BAKER
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - ASHLEY D. SANDERS
- European Molecular Biology Laboratory, Genome Biology Unit, 69117, Heidelberg, Germany
| | - DIANA C.J. SPIERINGS
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV Groningen, The Netherlands
| | - PETER M. LANSDORP
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV Groningen, The Netherlands
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC V5Z 1L3, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - URVASHI SURTI
- Department of Pathology, University of Pittsburgh School of Medicine, and University of Pittsburgh Medical Center, Pittsburgh, PA 15213, USA
| | | | - EVAN E. EICHLER
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
6
|
Characterization and evolutionary dynamics of complex regions in eukaryotic genomes. SCIENCE CHINA-LIFE SCIENCES 2019; 62:467-488. [PMID: 30810961 DOI: 10.1007/s11427-018-9458-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 11/05/2018] [Indexed: 01/07/2023]
Abstract
Complex regions in eukaryotic genomes are typically characterized by duplications of chromosomal stretches that often include one or more genes repeated in a tandem array or in relatively close proximity. Nevertheless, the repetitive nature of these regions, together with the often high sequence identity among repeats, have made complex regions particularly recalcitrant to proper molecular characterization, often being misassembled or completely absent in genome assemblies. This limitation has prevented accurate functional and evolutionary analyses of these regions. This is becoming increasingly relevant as evidence continues to support a central role for complex genomic regions in explaining human disease, developmental innovations, and ecological adaptations across phyla. With the advent of long-read sequencing technologies and suitable assemblers, the development of algorithms that can accommodate sample heterozygosity, and the adoption of a pangenomic-like view of these regions, accurate reconstructions of complex regions are now within reach. These reconstructions will finally allow for accurate functional and evolutionary studies of complex genomic regions, underlying the generation of genotype-phenotype maps of unprecedented resolution.
Collapse
|
7
|
Turner TN, Coe BP, Dickel DE, Hoekzema K, Nelson BJ, Zody MC, Kronenberg ZN, Hormozdiari F, Raja A, Pennacchio LA, Darnell RB, Eichler EE. Genomic Patterns of De Novo Mutation in Simplex Autism. Cell 2017; 171:710-722.e12. [PMID: 28965761 PMCID: PMC5679715 DOI: 10.1016/j.cell.2017.08.047] [Citation(s) in RCA: 228] [Impact Index Per Article: 32.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 08/03/2017] [Accepted: 08/25/2017] [Indexed: 12/22/2022]
Abstract
To further our understanding of the genetic etiology of autism, we generated and analyzed genome sequence data from 516 idiopathic autism families (2,064 individuals). This resource includes >59 million single-nucleotide variants (SNVs) and 9,212 private copy number variants (CNVs), of which 133,992 and 88 are de novo mutations (DNMs), respectively. We estimate a mutation rate of ∼1.5 × 10-8 SNVs per site per generation with a significantly higher mutation rate in repetitive DNA. Comparing probands and unaffected siblings, we observe several DNM trends. Probands carry more gene-disruptive CNVs and SNVs, resulting in severe missense mutations and mapping to predicted fetal brain promoters and embryonic stem cell enhancers. These differences become more pronounced for autism genes (p = 1.8 × 10-3, OR = 2.2). Patients are more likely to carry multiple coding and noncoding DNMs in different genes, which are enriched for expression in striatal neurons (p = 3 × 10-3), suggesting a path forward for genetically characterizing more complex cases of autism.
Collapse
Affiliation(s)
- Tychele N Turner
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Bradley P Coe
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Diane E Dickel
- Functional Genomics Department, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Bradley J Nelson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | | | - Zev N Kronenberg
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Fereydoun Hormozdiari
- Department of Biochemistry and Molecular Medicine, University of California, Davis, Davis, CA 95817, USA
| | - Archana Raja
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Len A Pennacchio
- Functional Genomics Department, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; U.S. Department of Energy Joint Genome Institute, Walnut Creek, CA 94598, USA
| | - Robert B Darnell
- New York Genome Center, New York, NY 10013, USA; Laboratory of Molecular Neuro-Oncology, The Rockefeller University, New York, NY 10065, USA; Howard Hughes Medical Institute, The Rockefeller University, New York, NY 10065, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
8
|
|
9
|
Ebert G, Steininger A, Weißmann R, Boldt V, Lind-Thomsen A, Grune J, Badelt S, Heßler M, Peiser M, Hitzler M, Jensen LR, Müller I, Hu H, Arndt PF, Kuss AW, Tebel K, Ullmann R. Distribution of segmental duplications in the context of higher order chromatin organisation of human chromosome 7. BMC Genomics 2014; 15:537. [PMID: 24973960 PMCID: PMC4092221 DOI: 10.1186/1471-2164-15-537] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2013] [Accepted: 06/17/2014] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Segmental duplications (SDs) are not evenly distributed along chromosomes. The reasons for this biased susceptibility to SD insertion are poorly understood. Accumulation of SDs is associated with increased genomic instability, which can lead to structural variants and genomic disorders such as the Williams-Beuren syndrome. Despite these adverse effects, SDs have become fixed in the human genome. Focusing on chromosome 7, which is particularly rich in interstitial SDs, we have investigated the distribution of SDs in the context of evolution and the three dimensional organisation of the chromosome in order to gain insights into the mutual relationship of SDs and chromatin topology. RESULTS Intrachromosomal SDs preferentially accumulate in those segments of chromosome 7 that are homologous to marmoset chromosome 2. Although this formerly compact segment has been re-distributed to three different sites during primate evolution, we can show by means of public data on long distance chromatin interactions that these three intervals, and consequently the paralogous SDs mapping to them, have retained their spatial proximity in the nucleus. Focusing on SD clusters implicated in the aetiology of the Williams-Beuren syndrome locus we demonstrate by cross-species comparison that these SDs have inserted at the borders of a topological domain and that they flank regions with distinct DNA conformation. CONCLUSIONS Our study suggests a link of nuclear architecture and the propagation of SDs across chromosome 7, either by promoting regional SD insertion or by contributing to the establishment of higher order chromatin organisation themselves. The latter could compensate for the high risk of structural rearrangements and thus may have contributed to their evolutionary fixation in the human genome.
Collapse
Affiliation(s)
- Grit Ebert
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
- />Department of Biology, Chemistry and Pharmacy, Free University Berlin, 14195 Berlin, Germany
| | - Anne Steininger
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
- />Department of Biology, Chemistry and Pharmacy, Free University Berlin, 14195 Berlin, Germany
| | - Robert Weißmann
- />Department of Human Genetics, University Medicine Greifswald, and Interfaculty Institute of Genetics and Functional Genomics, University of Greifswald, Fleischmannstraße 42-44, 17475 Greifswald, Germany
| | - Vivien Boldt
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
- />Department of Biology, Chemistry and Pharmacy, Free University Berlin, 14195 Berlin, Germany
| | - Allan Lind-Thomsen
- />Wilhelm Johannsen Centre for Functional Genome Research, Department of Cellular and Molecular Medicine, University of Copenhagen, Blegdamsvej 3, DK-2200 Copenhagen, Denmark
| | - Jana Grune
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| | - Stefan Badelt
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
- />Institute for Theoretical Chemistry, University of Vienna, Waehringer Straße 17, A-1090 Vienna, Austria
| | - Melanie Heßler
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| | - Matthias Peiser
- />Unit Experimental Research, Department of Product Safety, Federal Institute for Bundeswehr Institute of Radiobiology affiliated, the University of Ulm, Neuherbergstraße 11, 80937 Munich, Germany
| | - Manuel Hitzler
- />Unit Experimental Research, Department of Product Safety, Federal Institute for Bundeswehr Institute of Radiobiology affiliated, the University of Ulm, Neuherbergstraße 11, 80937 Munich, Germany
| | - Lars R Jensen
- />Department of Human Genetics, University Medicine Greifswald, and Interfaculty Institute of Genetics and Functional Genomics, University of Greifswald, Fleischmannstraße 42-44, 17475 Greifswald, Germany
| | - Ines Müller
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| | - Hao Hu
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| | - Peter F Arndt
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| | - Andreas W Kuss
- />Department of Human Genetics, University Medicine Greifswald, and Interfaculty Institute of Genetics and Functional Genomics, University of Greifswald, Fleischmannstraße 42-44, 17475 Greifswald, Germany
| | - Katrin Tebel
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| | - Reinhard Ullmann
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| |
Collapse
|
10
|
Human gene copy number variation and infectious disease. Hum Genet 2014; 133:1217-33. [PMID: 25110110 DOI: 10.1007/s00439-014-1457-x] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2014] [Accepted: 05/20/2014] [Indexed: 01/05/2023]
Abstract
Variability in the susceptibility to infectious disease and its clinical manifestation can be determined by variation in the environment and by genetic variation in the pathogen and the host. Despite several successes based on candidate gene studies, defining the host variation affecting infectious disease has not been as successful as for other multifactorial diseases. Both single nucleotide variation and copy number variation (CNV) of the host contribute to the host's susceptibility to infectious disease. In this review we focus on CNV, particularly on complex multiallelic CNV that is often not well characterised either directly by hybridisation methods or indirectly by analysis of genotypes and flanking single nucleotide variants. We summarise the well-known examples, such as α-globin deletion and susceptibility to severe malaria, as well as more recent controversies, such as the extensive CNV of the chemokine gene CCL3L1 and HIV infection. We discuss the potential biological mechanisms that could underly any genetic association and reflect on the extensive complexity and functional variation generated by a combination of CNV and sequence variation, as illustrated by the Fc gamma receptor genes FCGR3A, FCGR3B and FCGR2C. We also highlight some understudied areas that might prove fruitful areas for further research.
Collapse
|
11
|
Light S, Basile W, Elofsson A. Orphans and new gene origination, a structural and evolutionary perspective. Curr Opin Struct Biol 2014; 26:73-83. [DOI: 10.1016/j.sbi.2014.05.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2014] [Revised: 05/07/2014] [Accepted: 05/16/2014] [Indexed: 12/28/2022]
|
12
|
Aigner J, Villatoro S, Rabionet R, Roquer J, Jiménez-Conde J, Martí E, Estivill X. A common 56-kilobase deletion in a primate-specific segmental duplication creates a novel butyrophilin-like protein. BMC Genet 2013; 14:61. [PMID: 23829304 PMCID: PMC3729544 DOI: 10.1186/1471-2156-14-61] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2013] [Accepted: 05/21/2013] [Indexed: 12/22/2022] Open
Abstract
Background The Butyrophilin-like (BTNL) proteins are likely to play an important role in inflammation and immune response. Like the B7 protein family, many human and murine BTNL members have been shown to control T lymphocytes response, and polymorphisms in human BTNL2 have been linked to several inflammatory diseases, such as pulmonary sarcoidosis, inflammatory bowel disease and neonatal lupus. Results In this study we provide a comprehensive population, genomic and transcriptomic analysis of a 56-kb deletion copy number variant (CNV), located within two segmental duplications of two genes belonging to the BTNL family, namely BTNL8 and BTNL3. We confirm the presence of a novel BTNL8*3 fusion-protein product, and show an influence of the deletion variant on the expression level of several genes involved in immune function, including BTNL9, another member of the same family. Moreover, by genotyping HapMap and human diversity panel (HGDP) samples, we demonstrate a clear difference in the stratification of the BTNL8_BTNL3-del allele frequency between major continental human populations. Conclusion Despite tremendous progress in the field of structural variation, rather few CNVs have been functionally characterized so far. Here, we show clear functional consequences of a new deletion CNV (BTNL8_BTNL3-del) with potentially important implication in the human immune system and in inflammatory and proliferative disorders. In addition, the marked population differences found of BTNL8_BTNL3-del frequencies suggest that this deletion CNV might have evolved under positive selection due to environmental conditions in some populations, with potential phenotypic consequences.
Collapse
Affiliation(s)
- Johanna Aigner
- Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), Barcelona 08003, Spain
| | | | | | | | | | | | | |
Collapse
|
13
|
Sassa T. The Role of Human-Specific Gene Duplications During Brain Development and Evolution. J Neurogenet 2013; 27:86-96. [DOI: 10.3109/01677063.2013.789512] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
14
|
Research proceedings on primate comparative genomics. Zool Res 2013; 33:108-18. [DOI: 10.3724/sp.j.1141.2012.01108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
15
|
Marotta M, Chen X, Inoshita A, Stephens R, Budd GT, Crowe JP, Lyons J, Kondratova A, Tubbs R, Tanaka H. A common copy-number breakpoint of ERBB2 amplification in breast cancer colocalizes with a complex block of segmental duplications. Breast Cancer Res 2012. [PMID: 23181561 PMCID: PMC4053137 DOI: 10.1186/bcr3362] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Introduction Segmental duplications (low-copy repeats) are the recently duplicated genomic segments in the human genome that display nearly identical (> 90%) sequences and account for about 5% of euchromatic regions. In germline, duplicated segments mediate nonallelic homologous recombination and thus cause both non-disease-causing copy-number variants and genomic disorders. To what extent duplicated segments play a role in somatic DNA rearrangements in cancer remains elusive. Duplicated segments often cluster and form genomic blocks enriched with both direct and inverted repeats (complex genomic regions). Such complex regions could be fragile and play a mechanistic role in the amplification of the ERBB2 gene in breast tumors, because repeated sequences are known to initiate gene amplification in model systems. Methods We conducted polymerase chain reaction (PCR)-based assays for primary breast tumors and analyzed publically available array-comparative genomic hybridization data to map a common copy-number breakpoint in ERBB2-amplified primary breast tumors. We further used molecular, bioinformatics, and population-genetics approaches to define duplication contents, structural variants, and haplotypes within the common breakpoint. Results We found a large (> 300-kb) block of duplicated segments that was colocalized with a common-copy number breakpoint for ERBB2 amplification. The breakpoint that potentially initiated ERBB2 amplification localized in a region 1.5 megabases (Mb) on the telomeric side of ERBB2. The region is very complex, with extensive duplications of KRTAP genes, structural variants, and, as a result, a paucity of single-nucleotide polymorphism (SNP) markers. Duplicated segments are varied in size and degree of sequence homology, indicating that duplications have occurred recurrently during genome evolution. Conclusions Amplification of the ERBB2 gene in breast tumors is potentially initiated by a complex region that has unusual genomic features and thus requires rigorous, labor-intensive investigation. The haplotypes we provide could be useful to identify the potential association between the complex region and ERBB2 amplification.
Collapse
|
16
|
Giannuzzi G, Siswara P, Malig M, Marques-Bonet T, Mullikin JC, Ventura M, Eichler EE. Evolutionary dynamism of the primate LRRC37 gene family. Genome Res 2012; 23:46-59. [PMID: 23064749 PMCID: PMC3530683 DOI: 10.1101/gr.138842.112] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Core duplicons in the human genome represent ancestral duplication modules shared by the majority of intrachromosomal duplication blocks within a given chromosome. These cores are associated with the emergence of novel gene families in the hominoid lineage, but their genomic organization and gene characterization among other primates are largely unknown. Here, we investigate the genomic organization and expression of the core duplicon on chromosome 17 that led to the expansion of LRRC37 during primate evolution. A comparison of the LRRC37 gene family organization in human, orangutan, macaque, marmoset, and lemur genomes shows the presence of both orthologous and species-specific gene copies in all primate lineages. Expression profiling in mouse, macaque, and human tissues reveals that the ancestral expression of LRRC37 was restricted to the testis. In the hominid lineage, the pattern of LRRC37 became increasingly ubiquitous, with significantly higher levels of expression in the cerebellum and thymus, and showed a remarkable diversity of alternative splice forms. Transfection studies in HeLa cells indicate that the human FLAG-tagged recombinant LRRC37 protein is secreted after cleavage of a transmembrane precursor and its overexpression can induce filipodia formation.
Collapse
Affiliation(s)
- Giuliana Giannuzzi
- Dipartimento di Biologia, Università degli Studi di Bari Aldo Moro, Bari 70126, Italy
| | | | | | | | | | | | | | | |
Collapse
|
17
|
Regulatory element copy number differences shape primate expression profiles. Proc Natl Acad Sci U S A 2012; 109:12656-61. [PMID: 22797897 DOI: 10.1073/pnas.1205199109] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Gene expression differences are shaped by selective pressures and contribute to phenotypic differences between species. We identified 964 copy number differences (CNDs) of conserved sequences across three primate species and examined their potential effects on gene expression profiles. Samples with copy number different genes had significantly different expression than samples with neutral copy number. Genes encoding regulatory molecules differed in copy number and were associated with significant expression differences. Additionally, we identified 127 CNDs that were processed pseudogenes and some of which were expressed. Furthermore, there were copy number-different regulatory regions such as ultraconserved elements and long intergenic noncoding RNAs with the potential to affect expression. We postulate that CNDs of these conserved sequences fine-tune developmental pathways by altering the levels of RNA.
Collapse
|
18
|
Genomic structure and evolution of multigene families: "flowers" on the human genome. INTERNATIONAL JOURNAL OF EVOLUTIONARY BIOLOGY 2012; 2012:917678. [PMID: 22779033 PMCID: PMC3388347 DOI: 10.1155/2012/917678] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2011] [Revised: 04/06/2012] [Accepted: 04/09/2012] [Indexed: 11/17/2022]
Abstract
We report the results of an extensive investigation of genomic structures in the human genome, with a particular focus on relatively large repeats (>50 kb) in adjacent chromosomal regions. We named such structures “Flowers” because the pattern observed on dot plots resembles a flower. We detected a total of 291 Flowers in the human genome. They were predominantly located in euchromatic regions. Flowers are gene-rich compared to the average gene density of the genome. Genes involved in systems receiving environmental information, such as immunity and detoxification, were overrepresented in Flowers. Within a Flower, the mean number of duplication units was approximately four. The maximum and minimum identities between homologs in a Flower showed different distributions; the maximum identity was often concentrated to 100% identity, while the minimum identity was evenly distributed in the range of 78% to 100%. Using a gene conversion detection test, we found frequent and/or recent gene conversion events within the tested Flowers. Interestingly, many of those converted regions contained protein-coding genes. Computer simulation studies suggest that one role of such frequent gene conversions is the elongation of the life span of gene families in a Flower by the resurrection of pseudogenes.
Collapse
|
19
|
Iskow RC, Gokcumen O, Lee C. Exploring the role of copy number variants in human adaptation. Trends Genet 2012; 28:245-57. [PMID: 22483647 PMCID: PMC3533238 DOI: 10.1016/j.tig.2012.03.002] [Citation(s) in RCA: 97] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2011] [Revised: 03/05/2012] [Accepted: 03/06/2012] [Indexed: 11/18/2022]
Abstract
Over the past decade, the ubiquity of copy number variants (CNVs, the gain or loss of genomic material) in the genomes of healthy humans has become apparent. Although some of these variants are associated with disorders, a handful of studies documented an adaptive advantage conferred by CNVs. In this review, we propose that CNVs are substrates for human evolution and adaptation. We discuss the possible mechanisms and evolutionary processes in which CNVs are selected, outline the current challenges in identifying these loci, and highlight that copy number variable regions allow for the creation of novel genes that may diversify the repertoire of such genes in response to rapidly changing environments. We expect that many more adaptive CNVs will be discovered in the coming years, and we believe that these new findings will contribute to our understanding of human-specific phenotypes.
Collapse
Affiliation(s)
- Rebecca C Iskow
- Department of Pathology, Brigham and Women's Hospital, Boston, MA 02115, USA
| | | | | |
Collapse
|
20
|
Takahashi M, Saitou N. Identification and characterization of lineage-specific highly conserved noncoding sequences in Mammalian genomes. Genome Biol Evol 2012; 4:641-57. [PMID: 22505575 PMCID: PMC3381673 DOI: 10.1093/gbe/evs035] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2012] [Indexed: 01/12/2023] Open
Abstract
Vertebrate genome comparisons revealed that there are highly conserved noncoding sequences (HCNSs) among a wide range of species and many of which contain regulatory elements. However, recently emerged sequences conserved in specific lineages have not been well studied. Toward this end, we identified 8,198 primate and 21,128 specific HCNSs as representative ones among mammals from human-marmoset and mouse-rat comparisons, respectively. Derived allele frequency analysis of primate-specific HCNSs showed that these HCNSs were under purifying selection, indicating that they may harbor important functions. We selected the top 1,000 largest HCNSs and compared the lineage-specific HCNS-flanking genes (LHF genes) with ultraconserved element (UCE)-flanking genes. Interestingly, the majority of LHF genes were different from UCE-flanking genes. This lineage-specific set of LHF genes was more enriched in protein-binding function. Conversely, the number of LHF genes that were also shared by UCEs was small but significantly larger than random expectation, and many of these genes were involved in anatomical development as transcriptional regulators, suggesting that certain groups of genes preferentially recruit new HCNSs in addition to old HCNSs that are conserved among vertebrates. This group of LHF genes might be involved in the various levels of lineage-specific evolution among vertebrates, mammals, primates, and rodents. If so, the emergence of HCNSs in and around these two groups of LHF genes developed lineage-specific characteristics. Our results provide new insight into lineage-specific evolution through interactions between HCNSs and their LHF genes.
Collapse
Affiliation(s)
- Mahoko Takahashi
- Department of Genetics, School of Life Science, Graduate University for Advanced Studies, Japan
- Division of Population Genetics, National Institute of Genetics, Japan
- Present address: Department of Genetics, Stanford University
| | - Naruya Saitou
- Department of Genetics, School of Life Science, Graduate University for Advanced Studies, Japan
- Division of Population Genetics, National Institute of Genetics, Japan
| |
Collapse
|
21
|
Flatscher-Bader T, Foldi CJ, Chong S, Whitelaw E, Moser RJ, Burne THJ, Eyles DW, McGrath JJ. Increased de novo copy number variants in the offspring of older males. Transl Psychiatry 2011; 1:e34. [PMID: 22832608 PMCID: PMC3309504 DOI: 10.1038/tp.2011.30] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/23/2011] [Accepted: 07/08/2011] [Indexed: 01/26/2023] Open
Abstract
The offspring of older fathers have an increased risk of neurodevelopmental disorders, such as schizophrenia and autism. In light of the evidence implicating copy number variants (CNVs) with schizophrenia and autism, we used a mouse model to explore the hypothesis that the offspring of older males have an increased risk of de novo CNVs. C57BL/6J sires that were 3- and 12-16-months old were mated with 3-month-old dams to create control offspring and offspring of old sires, respectively. Applying genome-wide microarray screening technology, 7 distinct CNVs were identified in a set of 12 offspring and their parents. Competitive quantitative PCR confirmed these CNVs in the original set and also established their frequency in an independent set of 77 offspring and their parents. On the basis of the combined samples, six de novo CNVs were detected in the offspring of older sires, whereas none were detected in the control group. Two of the CNVs were associated with behavioral and/or neuroanatomical phenotypic features. One of the de novo CNVs involved Auts2 (autism susceptibility candidate 2), and other CNVs included genes linked to schizophrenia, autism and brain development. This is the first experimental demonstration that the offspring of older males have an increased risk of de novo CNVs. Our results support the hypothesis that the offspring of older fathers have an increased risk of neurodevelopmental disorders such as schizophrenia and autism by generation of de novo CNVs in the male germline.
Collapse
Affiliation(s)
- T Flatscher-Bader
- Queensland Brain Institute, The University of Queensland, St Lucia, QLD, Australia
- The Queensland Institute of Medical Research, Herston, QLD, Australia
| | - C J Foldi
- Queensland Brain Institute, The University of Queensland, St Lucia, QLD, Australia
| | - S Chong
- The Queensland Institute of Medical Research, Herston, QLD, Australia
| | - E Whitelaw
- The Queensland Institute of Medical Research, Herston, QLD, Australia
| | | | - T H J Burne
- Queensland Brain Institute, The University of Queensland, St Lucia, QLD, Australia
- Queensland Centre for Mental Health Research, The Park Centre for Mental Health, Wacol, QLD, Australia
| | - D W Eyles
- Queensland Brain Institute, The University of Queensland, St Lucia, QLD, Australia
- Queensland Centre for Mental Health Research, The Park Centre for Mental Health, Wacol, QLD, Australia
| | - J J McGrath
- Queensland Brain Institute, The University of Queensland, St Lucia, QLD, Australia
- Queensland Centre for Mental Health Research, The Park Centre for Mental Health, Wacol, QLD, Australia
- Discipline of Psychiatry, The University of Queensland, St Lucia, QLD, Australia
| |
Collapse
|
22
|
Transcriptional variations mediated by an alternative promoter of the FPR3 gene. Mamm Genome 2011; 22:621-33. [PMID: 21717223 DOI: 10.1007/s00335-011-9341-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2010] [Accepted: 05/20/2011] [Indexed: 10/18/2022]
Abstract
Formyl peptide receptor 3 (FPR3) is a potential player in innate immunity and appears with FPR2 as a FPR cluster during primate evolution. Comparative genome analyses indicate that a segmental duplication (SD) event upstream of the FPR3 gene after the divergence of New and Old World monkeys led to the emergence of an alternative promoter. In this study we combined computational and experimental approaches to identify a FPR3 gene that is controlled by an alternative promoter derived during a SD event. Its transcriptional activity was detected by quantitative reverse transcription polymerase chain reaction. Human alternative transcripts (FPR3-1 and FPR3-2) showed tissue-specific patterns with strong expressions in lung or uterus, while the FPR3-1 transcript of rhesus macaque is broadly expressed in various tissues. Overall, transcriptional variations of FPR3 occur by an alternative promoter during primate evolution.
Collapse
|
23
|
Kostka D, Hahn MW, Pollard KS. Noncoding sequences near duplicated genes evolve rapidly. Genome Biol Evol 2010; 2:518-33. [PMID: 20660939 PMCID: PMC2942038 DOI: 10.1093/gbe/evq037] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/25/2010] [Indexed: 11/17/2022] Open
Abstract
Gene expression divergence and chromosomal rearrangements have been put forward as major contributors to phenotypic differences between closely related species. It has also been established that duplicated genes show enhanced rates of positive selection in their amino acid sequences. If functional divergence is largely due to changes in gene expression, it follows that regulatory sequences in duplicated loci should also evolve rapidly. To investigate this hypothesis, we performed likelihood ratio tests (LRTs) on all noncoding loci within 5 kb of every transcript in the human genome and identified sequences with increased substitution rates in the human lineage since divergence from Old World Monkeys. The fraction of rapidly evolving loci is significantly higher nearby genes that duplicated in the common ancestor of humans and chimps compared with nonduplicated genes. We also conducted a genome-wide scan for nucleotide substitutions predicted to affect transcription factor binding. Rates of binding site divergence are elevated in noncoding sequences of duplicated loci with accelerated substitution rates. Many of the genes associated with these fast-evolving genomic elements belong to functional categories identified in previous studies of positive selection on amino acid sequences. In addition, we find enrichment for accelerated evolution nearby genes involved in establishment and maintenance of pregnancy, processes that differ significantly between humans and monkeys. Our findings support the hypothesis that adaptive evolution of the regulation of duplicated genes has played a significant role in human evolution.
Collapse
Affiliation(s)
- Dennis Kostka
- Gladstone Institute for Cardiovascular Disease, Gladstone Institutes, University of California-San Francisco, San Francisco, CA, USA.
| | | | | |
Collapse
|
24
|
Schwartz RS, Mueller RL. Variation in DNA substitution rates among lineages erroneously inferred from simulated clock-like data. PLoS One 2010; 5:e9649. [PMID: 20300176 PMCID: PMC2836374 DOI: 10.1371/journal.pone.0009649] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2009] [Accepted: 02/12/2010] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND The observation of variation in substitution rates among lineages has led to (1) a general rejection of the molecular clock model, and (2) the suggestion that a number of biological characteristics of organisms can cause rate variation. Accurate estimates of rate variation, and thus accurate inferences regarding the causes of rate variation, depend on accurate estimates of substitution rates. However, theory suggests that even when the substitution process is clock-like, variable numbers of substitutions can occur among lineages because the substitution process is stochastic. Furthermore, substitution rates along lineages can be misestimated, particularly when multiple substitutions occur at some sites. Although these potential causes of error in rate estimation are well understood in theory, such error has not been examined in detail; consequently, empirical studies that estimate rate variation among lineages have been unable to determine whether their results could be impacted by estimation error. METHODOLOGY/PRINCIPAL FINDINGS To evaluate the extent to which error in rate estimation could erroneously suggest rate variation among lineages, we examined rate variation estimated for datasets simulated under a molecular clock on trees with equal and variable branch lengths. Thus, any apparent rate variation in these datasets reflects error in rate estimation rather than true differences in the underlying substitution process. We observed substantial rate variation among lineages in our simulations; however, we did not observe rate variation when average substitution rates were compared between different clades. CONCLUSIONS/SIGNIFICANCE Our results confirm previous theoretical work suggesting that observations of among lineage rate variation in empirical data may be due to the stochastic substitution process and error in the estimation of substitution rates, rather than true differences in the underlying substitution process among lineages. However, conclusions regarding rate variation drawn from rates averaged across multiple branches are likely due to real, systematic variation in rates between groups.
Collapse
Affiliation(s)
- Rachel S Schwartz
- Department of Biology, Colorado State University, Fort Collins, Colorado, United States of America.
| | | |
Collapse
|
25
|
Schrider DR, Hahn MW. Lower linkage disequilibrium at CNVs is due to both recurrent mutation and transposing duplications. Mol Biol Evol 2010; 27:103-11. [PMID: 19745000 DOI: 10.1093/molbev/msp210] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Copy number variants (CNVs) within humans can have both adaptive and deleterious effects. Because of their phenotypic significance, researchers have attempted to find single nucleotide polymorphisms (SNPs) in high linkage disequilibrium (LD) with CNVs to use in genomewide association studies. However, studies have found that CNVs are less likely to be in strong LD with flanking markers. We hypothesized that this "taggability gap" can be explained by duplication events that place paralogous sequences far apart. In support of our hypothesis, we find that duplications are significantly less likely than deletions to have a "tag" SNP, even after controlling for CNV length, allele frequency, and availability of appropriate flanking SNPs. Using a novel likelihood method, we are able to show that many complex CNVs--those due to multiple duplication or deletion polymorphisms--are made up of two loci with little LD between them. Additionally, we find that many polymorphic duplications detected in a recent clone-based study are located far from their parental loci. We also examine two other common hypotheses for the taggability gap, and find that recurrent mutation of both deletions and duplications appears to have an effect on LD, but that lower SNP density around CNVs has no effect. Overall, our results suggest that a substantial fraction of CNVs caused by duplication cannot be tagged by markers flanking the parental locus because they have changed genomic location.
Collapse
|
26
|
Detection and correction of false segmental duplications caused by genome mis-assembly. Genome Biol 2010; 11:R28. [PMID: 20219098 PMCID: PMC2864568 DOI: 10.1186/gb-2010-11-3-r28] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2009] [Revised: 12/11/2009] [Accepted: 03/10/2010] [Indexed: 11/23/2022] Open
Abstract
A method for determining false segmental duplications in vertebrate genomes, thus correcting mis-assemblies and providing more accurate estimates of duplications. Diploid genomes with divergent chromosomes present special problems for assembly software as two copies of especially polymorphic regions may be mistakenly constructed, creating the appearance of a recent segmental duplication. We developed a method for identifying such false duplications and applied it to four vertebrate genomes. For each genome, we corrected mis-assemblies, improved estimates of the amount of duplicated sequence, and recovered polymorphisms between the sequenced chromosomes.
Collapse
|
27
|
Tandem repeats modify the structure of human genes hosted in segmental duplications. Genome Biol 2009; 10:R137. [PMID: 19954527 PMCID: PMC2812944 DOI: 10.1186/gb-2009-10-12-r137] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2009] [Revised: 10/08/2009] [Accepted: 12/02/2009] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Recently duplicated genes are often subject to genomic rearrangements that can lead to the development of novel gene structures. Here we specifically investigated the effect of variations in internal tandem repeats (ITRs) on the gene structure of human paralogs located in segmental duplications. RESULTS We found that around 7% of the primate-specific genes located within duplicated regions of the genome contain variable tandem repeats. These genes are members of large groups of recently duplicated paralogs that are often polymorphic in the human population. Half of the identified ITRs occur within coding exons and may be either kept or spliced out from the mature transcript. When ITRs reside within exons, they encode variable amino acid repeats. When located at exon-intron boundaries, ITRs can generate alternative splicing patterns through the formation of novel introns. CONCLUSIONS Our study shows that variation in the number of ITRs impacts on recently duplicated genes by modifying their coding sequence, splicing pattern, and tissue expression. The resulting effect is the production of a variety of primate-specific proteins, which mostly differ in number and sequence of amino acid repeats.
Collapse
|
28
|
Jun J, Ryvkin P, Hemphill E, Mandoiu I, Nelson C. The birth of new genes by RNA- and DNA-mediated duplication during mammalian evolution. J Comput Biol 2009; 16:1429-44. [PMID: 19803737 DOI: 10.1089/cmb.2009.0073] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Gene duplication has long been recognized as a major force in genome evolution and has recently been recognized as an important source of individual variation. For many years, the origin of functional gene duplicates was assumed to be whole or partial genome duplication events, but recently retrotransposition has also been shown to contribute new functional protein coding genes and siRNA's. In this study, we utilize pseudogenes to recreate more complete gene family histories, and compare the rates of RNA and DNA-mediated duplication and new functional gene formation in five mammalian genomes. We find that RNA-mediated duplication occurs at a much higher and more variable rate than DNA-mediated duplication, and gives rise to many more duplicated sequences over time. We show that, while the chance of RNA-mediated duplicates becoming functional is much lower than that of their DNA-mediated counterparts, the higher rate of retrotransposition leads to nearly equal contributions of new genes by each mechanism. We also find that functional RNA-mediated duplicates are closer to neighboring genes than non-functional RNA-mediated copies, consistent with co-option of regulatory elements at the site of insertion. Overall, new genes derived from DNA and RNA-mediated duplication mechanisms are under similar levels of purifying selective pressure, but have broadly different functions. RNA-mediated duplication gives rise to a diversity of genes but is dominated by the highly expressed genes of RNA metabolic pathways. DNA-mediated duplication can copy regulatory material along with the protein coding region of the gene and often gives rise to classes of genes whose function are dependent on complex regulatory information. This mechanistic difference may in part explain why we find that mammalian protein families tend to evolve by either one mechanism or the other, but rarely by both. Supplementary Material has been provided (see online Supplementary Material at www.liebertonline.com ).
Collapse
Affiliation(s)
- Jin Jun
- Computer Science & Engineering Department, University of Connecticut, Storrs, CT 06269, USA
| | | | | | | | | |
Collapse
|
29
|
Marques-Bonet T, Ryder OA, Eichler EE. Sequencing primate genomes: what have we learned? Annu Rev Genomics Hum Genet 2009; 10:355-86. [PMID: 19630567 DOI: 10.1146/annurev.genom.9.081307.164420] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
We summarize the progress in whole-genome sequencing and analyses of primate genomes. These emerging genome datasets have broadened our understanding of primate genome evolution revealing unexpected and complex patterns of evolutionary change. This includes the characterization of genome structural variation, episodic changes in the repeat landscape, differences in gene expression, new models regarding speciation, and the ephemeral nature of the recombination landscape. The functional characterization of genomic differences important in primate speciation and adaptation remains a significant challenge. Limited access to biological materials, the lack of detailed phenotypic data and the endangered status of many critical primate species have significantly attenuated research into the genetic basis of primate evolution. Next-generation sequencing technologies promise to greatly expand the number of available primate genome sequences; however, such draft genome sequences will likely miss critical genetic differences within complex genomic regions unless dedicated efforts are put forward to understand the full spectrum of genetic variation.
Collapse
Affiliation(s)
- Tomas Marques-Bonet
- Department of Genome Sciences, University of Washington and the Howard Hughes Medical Institute, Seattle, Washington 98105, USA.
| | | | | |
Collapse
|
30
|
Marques-Bonet T, Girirajan S, Eichler EE. The origins and impact of primate segmental duplications. Trends Genet 2009; 25:443-54. [PMID: 19796838 PMCID: PMC2847396 DOI: 10.1016/j.tig.2009.08.002] [Citation(s) in RCA: 120] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2009] [Revised: 08/07/2009] [Accepted: 08/10/2009] [Indexed: 12/25/2022]
Abstract
Duplicated sequences are substrates for the emergence of new genes and are an important source of genetic instability associated with rare and common diseases. Analyses of primate genomes have shown an increase in the proportion of interspersed segmental duplications (SDs) within the genomes of humans and great apes. This contrasts with other mammalian genomes that seem to have their recently duplicated sequences organized in a tandem configuration. In this review, we focus on the mechanistic origin and impact of this difference with respect to evolution, genetic diversity and primate phenotype. Although many genomes will be sequenced in the future, resolution of this aspect of genomic architecture still requires high quality sequences and detailed analyses.
Collapse
Affiliation(s)
- Tomas Marques-Bonet
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | | |
Collapse
|
31
|
Han MV, Demuth JP, McGrath CL, Casola C, Hahn MW. Adaptive evolution of young gene duplicates in mammals. Genome Res 2009; 19:859-67. [PMID: 19411603 DOI: 10.1101/gr.085951.108] [Citation(s) in RCA: 163] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Duplicate genes act as a source of genetic material from which new functions arise. They exist in large numbers in every sequenced eukaryotic genome and may be responsible for many differences in phenotypes between species. However, recent work searching for the targets of positive selection in humans has largely ignored duplicated genes due to complications in orthology assignment. Here we find that a high proportion of young gene duplicates in the human, macaque, mouse, and rat genomes have experienced adaptive natural selection. Approximately 10% of all lineage-specific duplicates show evidence for positive selection on their protein sequences, larger than any reported amount of selection among single-copy genes in these lineages using similar methods. We also find that newly duplicated genes that have been transposed to new chromosomal locations are significantly more likely to have undergone positive selection than the ancestral copy. Human-specific duplicates evolving under adaptive natural selection include a surprising number of genes involved in neuronal and cognitive functions. Our results imply that genome scans for selection that ignore duplicated loci are missing a large fraction of all adaptive substitutions. The results are also in agreement with the classical model of evolution by gene duplication, supporting a common role for neofunctionalization in the long-term maintenance of gene duplicates.
Collapse
Affiliation(s)
- Mira V Han
- School of Informatics, Indiana University, Bloomington, IN 47405, USA
| | | | | | | | | |
Collapse
|
32
|
Liu GE, Alkan C, Jiang L, Zhao S, Eichler EE. Comparative analysis of Alu repeats in primate genomes. Genome Res 2009; 19:876-85. [PMID: 19411604 DOI: 10.1101/gr.083972.108] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Using bacteria artificial chromosome (BAC) end sequences (16.9 Mb) and high-quality alignments of genomic sequences (17.4 Mb), we performed a global assessment of the divergence distributions, phylogenies, and consensus sequences for Alu elements in primates including lemur, marmoset, macaque, baboon, and chimpanzee as compared to human. We found that in lemurs, Alu elements show a broader and more symmetric sequence divergence distribution, suggesting a steady rate of Alu retrotransposition activity among prosimians. In contrast, Alu elements in anthropoids show a skewed distribution shifted toward more ancient elements with continual declining rates in recent Alu activity along the hominoid lineage of evolution. Using an integrated approach combining mutation profile and insertion/deletion analyses, we identified nine novel lineage-specific Alu subfamilies in lemur (seven), marmoset (one), and baboon/macaque (one) containing multiple diagnostic mutations distinct from their human counterparts-Alu J, S, and Y subfamilies, respectively. Among these primates, we show that that the lemur has the lowest density of Alu repeats (55 repeats/Mb), while marmoset has the greatest abundance (188 repeats/Mb). We estimate that approximately 70% of lemur and 16% of marmoset Alu elements belong to lineage-specific subfamilies. Our analysis has provided an evolutionary framework for further classification and refinement of the Alu repeat phylogeny. The differences in the distribution and rates of Alu activity have played an important role in subtly reshaping the structure of primate genomes. The functional consequences of these changes among the diverse primate lineages over such short periods of evolutionary time are an important area of future investigation.
Collapse
Affiliation(s)
- George E Liu
- USDA, ARS, ANRI, Bovine Functional Genomics Laboratory, Beltsville, MD 20705, USA.
| | | | | | | | | |
Collapse
|
33
|
Hahn MW. Distinguishing among evolutionary models for the maintenance of gene duplicates. J Hered 2009; 100:605-17. [PMID: 19596713 DOI: 10.1093/jhered/esp047] [Citation(s) in RCA: 259] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Determining the evolutionary forces responsible for the maintenance of gene duplicates is key to understanding the processes leading to evolutionary adaptation and novelty. In his highly prescient book, Susumu Ohno recognized that duplicate genes are fixed and maintained within a population with 3 distinct outcomes: neofunctionalization, subfunctionalization, and conservation of function. Subsequent researchers have proposed a multitude of population genetic models that lead to these outcomes, each differing largely in the role played by adaptive natural selection. In this paper, I present a nonmathematical review of these models, their predictions, and the evidence collected in support of each of them. Though the various outcomes of gene duplication are often strictly associated with the presence or absence of adaptive natural selection, I argue that determining the outcome of duplication is orthogonal to determining whether natural selection has acted. Despite an ever-growing field of research into the fate of gene duplicates, there is not yet clear evidence for the preponderance of one outcome over the others, much less evidence for the importance of adaptive or nonadaptive forces in maintaining these duplicates.
Collapse
Affiliation(s)
- Matthew W Hahn
- Department of Biology and School of Informatics, Indiana University, Bloomington, IN 47405, USA.
| |
Collapse
|
34
|
Carbone L, Harris RA, Vessere GM, Mootnick AR, Humphray S, Rogers J, Kim SK, Wall JD, Martin D, Jurka J, Milosavljevic A, de Jong PJ. Evolutionary breakpoints in the gibbon suggest association between cytosine methylation and karyotype evolution. PLoS Genet 2009; 5:e1000538. [PMID: 19557196 PMCID: PMC2695003 DOI: 10.1371/journal.pgen.1000538] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2009] [Accepted: 05/26/2009] [Indexed: 01/30/2023] Open
Abstract
Gibbon species have accumulated an unusually high number of chromosomal changes since diverging from the common hominoid ancestor 15-18 million years ago. The cause of this increased rate of chromosomal rearrangements is not known, nor is it known if genome architecture has a role. To address this question, we analyzed sequences spanning 57 breaks of synteny between northern white-cheeked gibbons (Nomascus l. leucogenys) and humans. We find that the breakpoint regions are enriched in segmental duplications and repeats, with Alu elements being the most abundant. Alus located near the gibbon breakpoints (<150 bp) have a higher CpG content than other Alus. Bisulphite allelic sequencing reveals that these gibbon Alus have a lower average density of methylated cytosine that their human orthologues. The finding of higher CpG content and lower average CpG methylation suggests that the gibbon Alu elements are epigenetically distinct from their human orthologues. The association between undermethylation and chromosomal rearrangement in gibbons suggests a correlation between epigenetic state and structural genome variation in evolution.
Collapse
Affiliation(s)
- Lucia Carbone
- Children's Hospital and Research Center Oakland, Oakland, California, United States of America.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
She X, Rohl CA, Castle JC, Kulkarni AV, Johnson JM, Chen R. Definition, conservation and epigenetics of housekeeping and tissue-enriched genes. BMC Genomics 2009; 10:269. [PMID: 19534766 PMCID: PMC2706266 DOI: 10.1186/1471-2164-10-269] [Citation(s) in RCA: 106] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2008] [Accepted: 06/17/2009] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Housekeeping genes (HKG) are constitutively expressed in all tissues while tissue-enriched genes (TEG) are expressed at a much higher level in a single tissue type than in others. HKGs serve as valuable experimental controls in gene and protein expression experiments, while TEGs tend to represent distinct physiological processes and are frequently candidates for biomarkers or drug targets. The genomic features of these two groups of genes expressed in opposing patterns may shed light on the mechanisms by which cells maintain basic and tissue-specific functions. RESULTS Here, we generate gene expression profiles of 42 normal human tissues on custom high-density microarrays to systematically identify 1,522 HKGs and 975 TEGs and compile a small subset of 20 housekeeping genes which are highly expressed in all tissues with lower variance than many commonly used HKGs. Cross-species comparison shows that both the functions and expression patterns of HKGs are conserved. TEGs are enriched with respect to both segmental duplication and copy number variation, while no such enrichment is observed for HKGs, suggesting the high expression of HKGs are not due to high copy numbers. Analysis of genomic and epigenetic features of HKGs and TEGs reveals that the high expression of HKGs across different tissues is associated with decreased nucleosome occupancy at the transcription start site as indicated by enhanced DNase hypersensitivity. Additionally, we systematically and quantitatively demonstrated that the CpG islands' enrichment in HKGs transcription start sites (TSS) and their depletion in TEGs TSS. Histone methylation patterns differ significantly between HKGs and TEGs, suggesting that methylation contributes to the differential expression patterns as well. CONCLUSION We have compiled a set of high quality HKGs that should provide higher and more consistent expression when used as references in laboratory experiments than currently used HKGs. The comparison of genomic features between HKGs and TEGs shows that HKGs are more conserved than TEGs in terms of functions, expression pattern and polymorphisms. In addition, our results identify chromatin structure and epigenetic features of HKGs and TEGs that are likely to play an important role in regulating their strikingly different expression patterns.
Collapse
Affiliation(s)
- Xinwei She
- Rosetta Inpharmatics LLC, Seattle, WA 98109, USA.
| | | | | | | | | | | |
Collapse
|
36
|
Pace JK, Sen SK, Batzer MA, Feschotte C. Repair-mediated duplication by capture of proximal chromosomal DNA has shaped vertebrate genome evolution. PLoS Genet 2009; 5:e1000469. [PMID: 19424419 PMCID: PMC2671141 DOI: 10.1371/journal.pgen.1000469] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2009] [Accepted: 04/06/2009] [Indexed: 12/25/2022] Open
Abstract
DNA double-strand breaks (DSBs) are a common form of cellular damage that can lead to cell death if not repaired promptly. Experimental systems have shown that DSB repair in eukaryotic cells is often imperfect and may result in the insertion of extra chromosomal DNA or the duplication of existing DNA at the breakpoint. These events are thought to be a source of genomic instability and human diseases, but it is unclear whether they have contributed significantly to genome evolution. Here we developed an innovative computational pipeline that takes advantage of the repetitive structure of genomes to detect repair-mediated duplication events (RDs) that occurred in the germline and created insertions of at least 50 bp of genomic DNA. Using this pipeline we identified over 1,000 probable RDs in the human genome. Of these, 824 were intra-chromosomal, closely linked duplications of up to 619 bp bearing the hallmarks of the synthesis-dependent strand-annealing repair pathway. This mechanism has duplicated hundreds of sequences predicted to be functional in the human genome, including exons, UTRs, intron splice sites and transcription factor binding sites. Dating of the duplication events using comparative genomics and experimental validation revealed that the mechanism has operated continuously but with decreasing intensity throughout primate evolution. The mechanism has produced species-specific duplications in all primate species surveyed and is contributing to genomic variation among humans. Finally, we show that RDs have also occurred, albeit at a lower frequency, in non-primate mammals and other vertebrates, indicating that this mechanism has been an important force shaping vertebrate genome evolution. The repair of DNA double-strand breaks (DSBs) is essential for the maintenance of genome integrity. The mechanisms by which DSBs are repaired have been the subject of intense experimental investigations. It has emerged that several imperfect repair pathways exist in eukaryotes that have the potential to result in chromosomal alterations, including genomic duplications. However, it remains unclear to what extent these imperfect repair events have contributed to shaping genomes throughout evolution. Here we introduce an innovative computational approach that takes advantage of the repetitive nature of eukaryotic genomes to identify repair-mediated duplications (RD) that occurred during evolution. We discovered over one thousand RDs in the human genome, with two-thirds resulting from the capture of a chromosomal DNA segment located in close proximity to the presumed site of the DSB, giving rise to local genomic duplications. Comparative genomic analyses reveal that the mechanism has operated continuously, but with decreasing intensity during primate evolution, generating species-specific duplications in all primates surveyed and generating genomic variation among humans. Finally, we show that RDs have also occurred in non-primate mammals and other vertebrates, indicating that this is a previously under-appreciated force shaping vertebrate genomes.
Collapse
Affiliation(s)
- John K. Pace
- Department of Biology, University of Texas at Arlington, Arlington, Texas, United States of America
| | - Shurjo K. Sen
- Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana, United States of America
| | - Mark A. Batzer
- Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana, United States of America
| | - Cédric Feschotte
- Department of Biology, University of Texas at Arlington, Arlington, Texas, United States of America
- * E-mail:
| |
Collapse
|
37
|
Schmidt J, Kirsch S, Rappold GA, Schempp W. Complex evolution of a Y-chromosomal double homeobox 4 (DUX4)-related gene family in hominoids. PLoS One 2009; 4:e5288. [PMID: 19404400 PMCID: PMC2671837 DOI: 10.1371/journal.pone.0005288] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2009] [Accepted: 03/24/2009] [Indexed: 12/21/2022] Open
Abstract
The human Y chromosome carries four human Y-chromosomal euchromatin/heterochromatin transition regions, all of which are characterized by the presence of interchromosomal segmental duplications. The Yq11.1/Yq11.21 transition region harbours a peculiar segment composed of an imperfectly organized tandem-repeat structure encoding four members of the double homeobox (DUX) gene family. By comparative fluorescence in situ hybridization (FISH) analysis we have documented the primary appearance of Y-chromosomal DUX genes (DUXY) on the gibbon Y chromosome. The major amplification and dispersal of DUXY paralogs occurred after the gibbon and hominid lineages had diverged. Orthologous DUXY loci of human and chimpanzee show a highly similar structural organization. Sequence alignment survey, phylogenetic reconstruction and recombination detection analyses of human and chimpanzee DUXY genes revealed the existence of all copies in a common ancestor. Comparative analysis of the circumjacent beta-satellites indicated that DUXY genes and beta-satellites evolved in concert. However, evolutionary forces acting on DUXY genes may have induced amino acid sequence differences in the orthologous chimpanzee and human DUXY open reading frames (ORFs). The acquisition of complete ORFs in human copies might relate to evolutionary advantageous functions indicating neo-functionalization. We propose an evolutionary scenario in which an ancestral tandem array DUX gene cassette transposed to the hominoid Y chromosome followed by lineage-specific chromosomal rearrangements paved the way for a species-specific evolution of the Y-chromosomal members of a large highly diverged homeobox gene family.
Collapse
Affiliation(s)
- Julia Schmidt
- Institute of Human Genetics, University of Freiburg, Freiburg, Germany
| | - Stefan Kirsch
- Institute of Human Genetics, University of Freiburg, Freiburg, Germany
| | - Gudrun A. Rappold
- Institute of Human Genetics, University of Heidelberg, Heidelberg, Germany
| | - Werner Schempp
- Institute of Human Genetics, University of Freiburg, Freiburg, Germany
- * E-mail:
| |
Collapse
|
38
|
Segmental duplications contribute to gene expression differences between humans and chimpanzees. Genetics 2009; 182:627-30. [PMID: 19332884 DOI: 10.1534/genetics.108.099960] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
In addition to specific changes in cis- and trans-regulatory elements, structural changes in the genome are hypothesized to underlie a large number of differences in gene expression between species. Accordingly, we show that species-specific segmental duplications are enriched with genes that are differentially expressed between humans and chimpanzees.
Collapse
|
39
|
Minimal effect of ectopic gene conversion among recent duplicates in four mammalian genomes. Genetics 2009; 182:615-22. [PMID: 19307604 DOI: 10.1534/genetics.109.101428] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Gene conversion between duplicated genes has been implicated in homogenization of gene families and reassortment of variation among paralogs. If conversion is common, this process could lead to errors in gene tree inference and subsequent overestimation of rates of gene duplication. After performing simulations to assess our power to detect gene conversion events, we determined rates of conversion among young, lineage-specific gene duplicates in four mammal species: human, rhesus macaque, mouse, and rat. Gene conversion rates (number of conversion events/number of gene pairs) among young duplicates range from 8.3% in macaque to 18.96% in rat, including a 5% false-positive rate. For all lineages, only 1-3% of the total amount of sequence examined was converted. There is no increase in GC content in conversion tracts compared to flanking regions of the same genes nor in conversion tracts compared to the same region in nonconverted gene-family members, suggesting that ectopic gene conversion does not significantly alter nucleotide composition in these duplicates. While the majority of gene duplicate pairs reside on different chromosomes in mammalian genomes, the majority of gene conversion events occur between duplicates on the same chromosome, even after controlling for divergence between duplicates. Among intrachromosomal duplicates, however, there is no correlation between the probability of conversion and physical distance between duplicates after controlling for divergence. Finally, we use a novel method to show that at most 5-10% of all gene trees involving young duplicates are likely to be incorrect due to gene conversion. We conclude that gene conversion has had only a small effect on mammalian genomes and gene duplicate evolution in general.
Collapse
|
40
|
Tanaka H, Yao MC. Palindromic gene amplification--an evolutionarily conserved role for DNA inverted repeats in the genome. Nat Rev Cancer 2009; 9:216-24. [PMID: 19212324 DOI: 10.1038/nrc2591] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The clinical importance of gene amplification in the diagnosis and treatment of cancer has been widely recognized, as it is often evident in advanced stages of diseases. However, our knowledge of the underlying mechanisms is still limited. Gene amplification is an essential process in several organisms including the ciliate Tetrahymena thermophila, in which the initiating mechanism has been well characterized. Lessons from such simple eukaryotes may provide useful information regarding how gene amplification occurs in tumour cells.
Collapse
Affiliation(s)
- Hisashi Tanaka
- Department of Molecular Genetics, Cleveland Clinic Lerner Research Institute, 9,500 Euclid Avenue, Cleveland, Ohio 44195, USA.
| | | |
Collapse
|
41
|
Koszul R, Fischer G. A prominent role for segmental duplications in modeling Eukaryotic genomes. C R Biol 2009; 332:254-66. [DOI: 10.1016/j.crvi.2008.07.005] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2008] [Accepted: 07/12/2008] [Indexed: 01/22/2023]
|
42
|
Balasubramanian S, Zheng D, Liu YJ, Fang G, Frankish A, Carriero N, Robilotto R, Cayting P, Gerstein M. Comparative analysis of processed ribosomal protein pseudogenes in four mammalian genomes. Genome Biol 2009; 10:R2. [PMID: 19123937 PMCID: PMC2687790 DOI: 10.1186/gb-2009-10-1-r2] [Citation(s) in RCA: 83] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2008] [Accepted: 01/05/2009] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND The availability of genome sequences of numerous organisms allows comparative study of pseudogenes in syntenic regions. Conservation of pseudogenes suggests that they might have a functional role in some instances. RESULTS We report the first large-scale comparative analysis of ribosomal protein pseudogenes in four mammalian genomes (human, chimpanzee, mouse and rat). To this end, we have assigned these pseudogenes in the four organisms using an automated pipeline and make the results available online. Each organism has a large number of ribosomal protein pseudogenes (approximately 1,400 to 2,800). The majority of them are processed (generated by retrotransposition). However, we do not see a correlation between the number of pseudogenes associated with a ribosomal protein gene and its mRNA abundance. Analysis of pseudogenes in syntenic regions between species shows that most are conserved between human and chimpanzee, but very few are conserved between primates and rodents. Interestingly, syntenic pseudogenes have a lower rate of nucleotide substitution than their surrounding intergenic DNA. Moreover, evidence from expressed sequence tags indicates that two pseudogenes conserved between human and mouse are transcribed. Detailed analysis shows that one of them, the pseudogene of RPS27, is likely to be a protein-coding gene. This is significant as previous reports indicated there are exactly 80 ribosomal protein genes encoded by the human genome. CONCLUSIONS Our analysis indicates that processed ribosomal protein pseudogenes abound in mammalian genomes, but few of these are conserved between primates and rodents. This highlights the large amount of recent retrotranspositional activity in mammals and a relatively larger amount of it in the rodent lineage.
Collapse
|
43
|
Schmieder S, Darré-Toulemonde F, Arguel MJ, Delerue-Audegond A, Christen R, Nahon JL. Primate-specific spliced PMCHL RNAs are non-protein coding in human and macaque tissues. BMC Evol Biol 2008; 8:330. [PMID: 19068116 PMCID: PMC2621205 DOI: 10.1186/1471-2148-8-330] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2008] [Accepted: 12/09/2008] [Indexed: 11/24/2022] Open
Abstract
Background Brain-expressed genes that were created in primate lineage represent obvious candidates to investigate molecular mechanisms that contributed to neural reorganization and emergence of new behavioural functions in Homo sapiens. PMCHL1 arose from retroposition of a pro-melanin-concentrating hormone (PMCH) antisense mRNA on the ancestral human chromosome 5p14 when platyrrhines and catarrhines diverged. Mutations before divergence of hylobatidae led to creation of new exons and finally PMCHL1 duplicated in an ancestor of hominids to generate PMCHL2 at the human chromosome 5q13. A complex pattern of spliced and unspliced PMCHL RNAs were found in human brain and testis. Results Several novel spliced PMCHL transcripts have been characterized in human testis and fetal brain, identifying an additional exon and novel splice sites. Sequencing of PMCHL genes in several non-human primates allowed to carry out phylogenetic analyses revealing that the initial retroposition event took place within an intron of the brain cadherin (CDH12) gene, soon after platyrrhine/catarrhine divergence, i.e. 30–35 Mya, and was concomitant with the insertion of an AluSg element. Sequence analysis of the spliced PMCHL transcripts identified only short ORFs of less than 300 bp, with low (VMCH-p8 and protein variants) or no evolutionary conservation. Western blot analyses of human and macaque tissues expressing PMCHL RNA failed to reveal any protein corresponding to VMCH-p8 and protein variants encoded by spliced transcripts. Conclusion Our present results improve our knowledge of the gene structure and the evolutionary history of the primate-specific chimeric PMCHL genes. These genes produce multiple spliced transcripts, bearing short, non-conserved and apparently non-translated ORFs that may function as mRNA-like non-coding RNAs.
Collapse
Affiliation(s)
- Sandra Schmieder
- Université de Nice-Sophia Antipolis, CNRS, Institut de Pharmacologie Moléculaire et Cellulaire, Valbonne, France.
| | | | | | | | | | | |
Collapse
|
44
|
|
45
|
Varki A, Geschwind DH, Eichler EE. Explaining human uniqueness: genome interactions with environment, behaviour and culture. Nat Rev Genet 2008; 9:749-63. [PMID: 18802414 DOI: 10.1038/nrg2428] [Citation(s) in RCA: 107] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
What makes us human? Specialists in each discipline respond through the lens of their own expertise. In fact, 'anthropogeny' (explaining the origin of humans) requires a transdisciplinary approach that eschews such barriers. Here we take a genomic and genetic perspective towards molecular variation, explore systems analysis of gene expression and discuss an organ-systems approach. Rejecting any 'genes versus environment' dichotomy, we then consider genome interactions with environment, behaviour and culture, finally speculating that aspects of human uniqueness arose because of a primate evolutionary trend towards increasing and irreversible dependence on learned behaviours and culture - perhaps relaxing allowable thresholds for large-scale genomic diversity.
Collapse
Affiliation(s)
- Ajit Varki
- Center for Academic Research and Training in Anthropogeny, University of California, San Diego, La Jolla, California 92093, USA.
| | | | | |
Collapse
|
46
|
Münch C, Kirsch S, Fernandes AMG, Schempp W. Evolutionary analysis of the highly dynamic CHEK2 duplicon in anthropoids. BMC Evol Biol 2008; 8:269. [PMID: 18831734 PMCID: PMC2566985 DOI: 10.1186/1471-2148-8-269] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2008] [Accepted: 10/02/2008] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Segmental duplications (SDs) are euchromatic portions of genomic DNA (> or = 1 kb) that occur at more than one site within the genome, and typically share a high level of sequence identity (>90%). Approximately 5% of the human genome is composed of such duplicated sequences. Here we report the detailed investigation of CHEK2 duplications. CHEK2 is a multiorgan cancer susceptibility gene encoding a cell cycle checkpoint kinase acting in the DNA-damage response signalling pathway. The continuous presence of the CHEK2 gene in all eukaryotes and its important role in maintaining genome stability prompted us to investigate the duplicative evolution and phylogeny of CHEK2 and its paralogs during anthropoid evolution. RESULTS To study CHEK2 duplicon evolution in anthropoids we applied a combination of comparative FISH and in silico analyses. Our comparative FISH results with a CHEK2 fosmid probe revealed the single-copy status of CHEK2 in New World monkeys, Old World monkeys and gibbons. Whereas a single CHEK2 duplication was detected in orangutan, a multi-site signal pattern indicated a burst of duplication in African great apes and human. Phylogenetic analysis of paralogous and ancestral CHEK2 sequences in human, chimpanzee and rhesus macaque confirmed this burst of duplication, which occurred after the radiation of orangutan and African great apes. In addition, we used inter-species quantitative PCR to determine CHEK2 copy numbers. An amplification of CHEK2 was detected in African great apes and the highest CHEK2 copy number of all analysed species was observed in the human genome. Furthermore, we detected variation in CHEK2 copy numbers within the analysed set of human samples. CONCLUSION Our detailed analysis revealed the highly dynamic nature of CHEK2 duplication during anthropoid evolution. We determined a burst of CHEK2 duplication after the radiation of orangutan and African great apes and identified the highest CHEK2 copy number in human. In conclusion, our analysis of CHEK2 duplicon evolution revealed that SDs contribute to inter-species variation. Furthermore, our qPCR analysis led us to presume CHEK2 copy number variation in human, and molecular diagnostics of the cancer susceptibility gene CHEK2 inside the duplicated region might be hampered by the individual-specific set of duplicons.
Collapse
Affiliation(s)
- Claudia Münch
- Institute of Human Genetics and Anthropology, University of Freiburg, Breisacher Str. 33, 79106 Freiburg, Germany
| | - Stefan Kirsch
- Institute of Human Genetics and Anthropology, University of Freiburg, Breisacher Str. 33, 79106 Freiburg, Germany
| | - António MG Fernandes
- Institute of Human Genetics and Anthropology, University of Freiburg, Breisacher Str. 33, 79106 Freiburg, Germany
| | - Werner Schempp
- Institute of Human Genetics and Anthropology, University of Freiburg, Breisacher Str. 33, 79106 Freiburg, Germany
| |
Collapse
|
47
|
Marques-Bonet T, Cheng Z, She X, Eichler EE, Navarro A. The genomic distribution of intraspecific and interspecific sequence divergence of human segmental duplications relative to human/chimpanzee chromosomal rearrangements. BMC Genomics 2008; 9:384. [PMID: 18699995 PMCID: PMC2542386 DOI: 10.1186/1471-2164-9-384] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2007] [Accepted: 08/12/2008] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND It has been suggested that chromosomal rearrangements harbor the molecular footprint of the biological phenomena which they induce, in the form, for instance, of changes in the sequence divergence rates of linked genes. So far, all the studies of these potential associations have focused on the relationship between structural changes and the rates of evolution of single-copy DNA and have tried to exclude segmental duplications (SDs). This is paradoxical, since SDs are one of the primary forces driving the evolution of structure and function in our genomes and have been linked not only with novel genes acquiring new functions, but also with overall higher DNA sequence divergence and major chromosomal rearrangements. RESULTS Here we take the opposite view and focus on SDs. We analyze several of the features of SDs, including the rates of intraspecific divergence between paralogous copies of human SDs and of interspecific divergence between human SDs and chimpanzee DNA. We study how divergence measures relate to chromosomal rearrangements, while considering other factors that affect evolutionary rates in single copy DNA. CONCLUSION We find that interspecific SD divergence behaves similarly to divergence of single-copy DNA. In contrast, old and recent paralogous copies of SDs do present different patterns of intraspecific divergence. Also, we show that some relatively recent SDs accumulate in regions that carry inversions in sister lineages.
Collapse
Affiliation(s)
- Tomàs Marques-Bonet
- Unitat de Biologia Evolutiva Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Ze Cheng
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Xinwei She
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Arcadi Navarro
- Unitat de Biologia Evolutiva Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain
- Institucio Catalana de Recerca i Estudis Avancats (ICREA) and Universitat Pompeu Fabra, Barcelona, Catalonia, Spain
- Population Genomics Node (GNV8), National Institute for Bioinformatics (INB) Universitat Pompeu Fabra, Spain
| |
Collapse
|
48
|
Nguyen DQ, Webber C, Hehir-Kwa J, Pfundt R, Veltman J, Ponting CP. Reduced purifying selection prevails over positive selection in human copy number variant evolution. Genome Res 2008; 18:1711-23. [PMID: 18687881 DOI: 10.1101/gr.077289.108] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Copy number variation is a dominant contributor to genomic variation and may frequently underlie an individual's variable susceptibilities to disease. Here we question our previous proposition that copy number variants (CNVs) are often retained in the human population because of their adaptive benefit. We show that genic biases of CNVs are best explained, not by positive selection, but by reduced efficiency of selection in eliminating deleterious changes from the human population. Of four CNV data sets examined, three exhibit significant increases in protein evolutionary rates. These increases appear to be attributable to the frequent coincidence of CNVs with segmental duplications (SDs) that recombine infrequently. Furthermore, human orthologs of mouse genes, which, when disrupted, result in pre- or postnatal lethality, are unusually depleted in CNVs. Together, these findings support a model of reduced purifying selection (Hill-Robertson interference) within copy number variable regions that are enriched in nonessential genes, allowing both the fixation of slightly deleterious substitutions and increased drift of CNV alleles. Additionally, all four CNV sets exhibited increased rates of interspecies chromosomal rearrangement and nucleotide substitution and an increased gene density. We observe that sequences with high G+C contents are most prone to copy number variation. In particular, frequently duplicated human SD sequence, or CNVs that are large and/or observed frequently, tend to be elevated in G+C content. In contrast, SD sequences that appear fixed in the human population lie more frequently within low G+C sequence. These findings provide an overarching view of how CNVs arise and segregate in the human population.
Collapse
Affiliation(s)
- Duc-Quang Nguyen
- MRC Functional Genomics Unit, University of Oxford, Department of Physiology, Anatomy and Genetics, Oxford OX1 3QX, United Kingdom
| | | | | | | | | | | |
Collapse
|
49
|
She X, Cheng Z, Zöllner S, Church DM, Eichler EE. Mouse segmental duplication and copy number variation. Nat Genet 2008; 40:909-14. [PMID: 18500340 PMCID: PMC2574762 DOI: 10.1038/ng.172] [Citation(s) in RCA: 179] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2008] [Accepted: 05/14/2008] [Indexed: 11/08/2022]
Abstract
Detailed analyses of the clone-based genome assembly reveal that the recent duplication content of mouse (4.94%) is now comparable to that of human (5.5%), in contrast to previous estimates from the whole-genome shotgun sequence assembly. However, the architecture of mouse and human genomes differs markedly: most mouse duplications are organized into discrete clusters of tandem duplications that show depletion of genes and transcripts and enrichment of long interspersed nuclear element (LINE) and long terminal repeat (LTR) retroposons. We assessed copy number variation of the C57BL/6J duplicated regions within 15 mouse strains previously used for genetic association studies, sequencing and the Mouse Phenome Project. We determined that over 60% of these base pairs are polymorphic among the strains (on average, there was 20 Mb of copy-number-variable DNA between different mouse strains). Our data suggest that different mouse strains show comparable, if not greater, copy number polymorphism when compared to human; however, such variation is more locally restricted. We show large and complex patterns of interstrain copy number variation restricted to large gene families associated with spermatogenesis, pregnancy, viviparity, pheromone signaling and immune response.
Collapse
Affiliation(s)
- Xinwei She
- Department of Genome Sciences, University of Washington, 1705 NE Pacific Street, Seattle, Washington 98195, USA
| | | | | | | | | |
Collapse
|
50
|
Jiang Z, Hubley R, Smit A, Eichler EE. DupMasker: a tool for annotating primate segmental duplications. Genome Res 2008; 18:1362-8. [PMID: 18502942 DOI: 10.1101/gr.078477.108] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Segmental duplications (SDs) play an important role in genome rearrangement, evolution, and the copy-number variation (CNV) of primate genomes. Such sequences are difficult to detect, a priori, because they share no defining sequence features that distinguish them from unique portions of the genome. Current sequence annotation of segmental duplications requires computationally intensive, genome-wide self-comparisons that cannot be easily implemented on new data sets. Based on the successful implementation of RepeatMasker, we developed a new genome annotation tool, DupMasker. The program uses a library of nonredundant consensus sequences of human segmental duplications, wherein a majority of the ancestral origins have been determined based on comparisons to mammalian outgroup genomes. Using DupMasker, new human and nonhuman primate (NHP) sequences may be readily queried to provide details on the origin and degree of sequence identity of each duplicon. This program can be applied to delineate the order and orientation of duplicons within complex duplication blocks and used to characterize structural variation differences between sequenced human haplotypes. We predict this tool will be valuable in the annotation of large-insert sequence clones, allowing putative unique and duplicated regions of the genomes to be annotated prior to whole genome assembly comparisons.
Collapse
Affiliation(s)
- Zhaoshi Jiang
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | | | | | | |
Collapse
|