1
|
Miga KH, Eichler EE. Envisioning a new era: Complete genetic information from routine, telomere-to-telomere genomes. Am J Hum Genet 2023; 110:1832-1840. [PMID: 37922882 PMCID: PMC10645551 DOI: 10.1016/j.ajhg.2023.09.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/19/2023] [Accepted: 09/20/2023] [Indexed: 11/07/2023] Open
Abstract
Advances in long-read sequencing and assembly now mean that individual labs can generate phased genomes that are more accurate and more contiguous than the original human reference genome. With declining costs and increasing democratization of technology, we suggest that complete genome assemblies, where both parental haplotypes are phased telomere to telomere, will become standard in human genetics. Soon, even in clinical settings where rigorous sample-handling standards must be met, affected individuals could have reference-grade genomes fully sequenced and assembled in just a few hours given advances in technology, computational processing, and annotation. Complete genetic variant discovery will transform how we map, catalog, and associate variation with human disease and fundamentally change our understanding of the genetic diversity of all humans.
Collapse
Affiliation(s)
- Karen H Miga
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
2
|
Altemose N, Glennis A, Bzikadze AV, Sidhwani P, Langley SA, Caldas GV, Hoyt SJ, Uralsky L, Ryabov FD, Shew CJ, Sauria MEG, Borchers M, Gershman A, Mikheenko A, Shepelev VA, Dvorkina T, Kunyavskaya O, Vollger MR, Rhie A, McCartney AM, Asri M, Lorig-Roach R, Shafin K, Aganezov S, Olson D, de Lima LG, Potapova T, Hartley GA, Haukness M, Kerpedjiev P, Gusev F, Tigyi K, Brooks S, Young A, Nurk S, Koren S, Salama SR, Paten B, Rogaev EI, Streets A, Karpen GH, Dernburg AF, Sullivan BA, Straight AF, Wheeler TJ, Gerton JL, Eichler EE, Phillippy AM, Timp W, Dennis MY, O'Neill RJ, Zook JM, Schatz MC, Pevzner PA, Diekhans M, Langley CH, Alexandrov IA, Miga KH. Complete genomic and epigenetic maps of human centromeres. Science 2022; 376:eabl4178. [PMID: 35357911 PMCID: PMC9233505 DOI: 10.1126/science.abl4178] [Citation(s) in RCA: 247] [Impact Index Per Article: 82.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.
Collapse
Affiliation(s)
- Nicolas Altemose
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - A. Glennis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, USA
| | - Pragya Sidhwani
- Department of Biochemistry, Stanford University, Stanford, CA, USA
| | - Sasha A. Langley
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Gina V. Caldas
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Savannah J. Hoyt
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Lev Uralsky
- Sirius University of Science and Technology, Sochi, Russia
- Vavilov Institute of General Genetics, Moscow, Russia
| | | | - Colin J. Shew
- Genome Center, MIND Institute, and Department of Biochemistry and Molecular Medicine, School of Medicine, University of California, Davis, Davis, CA, USA
| | | | | | - Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | | | - Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Olga Kunyavskaya
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ann M. McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Ryan Lorig-Roach
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Daniel Olson
- Department of Computer Science, University of Montana, Missoula, MT. USA
| | | | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Gabrielle A. Hartley
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Fedor Gusev
- Vavilov Institute of General Genetics, Moscow, Russia
| | - Kristof Tigyi
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Shelise Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alice Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sofie R. Salama
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| | - Evgeny I. Rogaev
- Sirius University of Science and Technology, Sochi, Russia
- Vavilov Institute of General Genetics, Moscow, Russia
- Department of Psychiatry, University of Massachusetts Medical School, Worcester, MA, USA
- Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Aaron Streets
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Gary H. Karpen
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- BioEngineering and BioMedical Sciences Department, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Abby F. Dernburg
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Institute for Quantitative Biosciences (QB3), University of California, Berkeley, Berkeley, CA, USA
| | - Beth A. Sullivan
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC, USA
| | | | - Travis J. Wheeler
- Department of Computer Science, University of Montana, Missoula, MT. USA
| | - Jennifer L. Gerton
- Stowers Institute for Medical Research, Kansas City, MO, USA
- University of Kansas Medical School, Department of Biochemistry and Molecular Biology and Cancer Center, University of Kansas, Kansas City, KS, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Megan Y. Dennis
- Genome Center, MIND Institute, and Department of Biochemistry and Molecular Medicine, School of Medicine, University of California, Davis, Davis, CA, USA
| | - Rachel J. O'Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Justin M. Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California at San Diego, San Diego, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Charles H. Langley
- Department of Evolution and Ecology, University of California Davis, Davis, CA, USA
| | - Ivan A. Alexandrov
- Vavilov Institute of General Genetics, Moscow, Russia
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
- Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| |
Collapse
|
3
|
Suzuki Y, Morishita S. The time is ripe to investigate human centromeres by long-read sequencing†. DNA Res 2021; 28:6381569. [PMID: 34609504 PMCID: PMC8502840 DOI: 10.1093/dnares/dsab021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 09/28/2021] [Indexed: 01/05/2023] Open
Abstract
The complete sequencing of human centromeres, which are filled with highly repetitive elements, has long been challenging. In human centromeres, α-satellite monomers of about 171 bp in length are the basic repeating units, but α-satellite monomers constitute the higher-order repeat (HOR) units, and thousands of copies of highly homologous HOR units form large arrays, which have hampered sequence assembly of human centromeres. Because most HOR unit occurrences are covered by long reads of about 10 kb, the recent availability of much longer reads is expected to enable observation of individual HOR occurrences in terms of their single-nucleotide or structural variants. The time has come to examine the complete sequence of human centromeres.
Collapse
Affiliation(s)
- Yuta Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8568, Japan
| | - Shinichi Morishita
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8568, Japan
| |
Collapse
|
4
|
Ahmad SF, Singchat W, Jehangir M, Suntronpong A, Panthum T, Malaivijitnond S, Srikulnath K. Dark Matter of Primate Genomes: Satellite DNA Repeats and Their Evolutionary Dynamics. Cells 2020; 9:E2714. [PMID: 33352976 PMCID: PMC7767330 DOI: 10.3390/cells9122714] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 12/15/2020] [Accepted: 12/16/2020] [Indexed: 12/12/2022] Open
Abstract
A substantial portion of the primate genome is composed of non-coding regions, so-called "dark matter", which includes an abundance of tandemly repeated sequences called satellite DNA. Collectively known as the satellitome, this genomic component offers exciting evolutionary insights into aspects of primate genome biology that raise new questions and challenge existing paradigms. A complete human reference genome was recently reported with telomere-to-telomere human X chromosome assembly that resolved hundreds of dark regions, encompassing a 3.1 Mb centromeric satellite array that had not been identified previously. With the recent exponential increase in the availability of primate genomes, and the development of modern genomic and bioinformatics tools, extensive growth in our knowledge concerning the structure, function, and evolution of satellite elements is expected. The current state of knowledge on this topic is summarized, highlighting various types of primate-specific satellite repeats to compare their proportions across diverse lineages. Inter- and intraspecific variation of satellite repeats in the primate genome are reviewed. The functional significance of these sequences is discussed by describing how the transcriptional activity of satellite repeats can affect gene expression during different cellular processes. Sex-linked satellites are outlined, together with their respective genomic organization. Mechanisms are proposed whereby satellite repeats might have emerged as novel sequences during different evolutionary phases. Finally, the main challenges that hinder the detection of satellite DNA are outlined and an overview of the latest methodologies to address technological limitations is presented.
Collapse
Affiliation(s)
- Syed Farhan Ahmad
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, Bangkok 10900, Thailand
| | - Worapong Singchat
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, Bangkok 10900, Thailand
| | - Maryam Jehangir
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Department of Structural and Functional Biology, Institute of Bioscience at Botucatu, São Paulo State University (UNESP), Botucatu, São Paulo 18618-689, Brazil
| | - Aorarat Suntronpong
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, Bangkok 10900, Thailand
| | - Thitipong Panthum
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, Bangkok 10900, Thailand
| | - Suchinda Malaivijitnond
- National Primate Research Center of Thailand, Chulalongkorn University, Saraburi 18110, Thailand;
- Department of Biology, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand
| | - Kornsorn Srikulnath
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, Bangkok 10900, Thailand
- National Primate Research Center of Thailand, Chulalongkorn University, Saraburi 18110, Thailand;
- Center of Excellence on Agricultural Biotechnology (AG-BIO/PERDO-CHE), Bangkok 10900, Thailand
- Omics Center for Agriculture, Bioresources, Food and Health, Kasetsart University (OmiKU), Bangkok 10900, Thailand
| |
Collapse
|
5
|
Suzuki Y, Myers EW, Morishita S. Rapid and ongoing evolution of repetitive sequence structures in human centromeres. SCIENCE ADVANCES 2020; 6:6/50/eabd9230. [PMID: 33310858 PMCID: PMC7732198 DOI: 10.1126/sciadv.abd9230] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 10/30/2020] [Indexed: 06/12/2023]
Abstract
Our understanding of centromere sequence variation across human populations is limited by its extremely long nested repeat structures called higher-order repeats that are challenging to sequence. Here, we analyzed chromosomes 11, 17, and X using long-read sequencing data for 36 individuals from diverse populations including a Han Chinese trio and 21 Japanese. We revealed substantial structural diversity with many previously unidentified variant higher-order repeats specific to individuals characterizing rapid, haplotype-specific evolution of human centromeric arrays, while frequent single-nucleotide variants are largely conserved. We found a characteristic pattern shared among prevalent variants in human and chimpanzee. Our findings pave the way for studying sequence evolution in human and primate centromeres.
Collapse
Affiliation(s)
- Yuta Suzuki
- The University of Tokyo, Graduate School of Frontier Sciences, Department of Computational Biology and Medical Sciences, Kashiwa, Chiba 277-8568, Japan.
| | - Eugene W Myers
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | - Shinichi Morishita
- The University of Tokyo, Graduate School of Frontier Sciences, Department of Computational Biology and Medical Sciences, Kashiwa, Chiba 277-8568, Japan.
| |
Collapse
|
6
|
Sullivan LL, Sullivan BA. Genomic and functional variation of human centromeres. Exp Cell Res 2020; 389:111896. [PMID: 32035947 PMCID: PMC7140587 DOI: 10.1016/j.yexcr.2020.111896] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Revised: 01/29/2020] [Accepted: 02/05/2020] [Indexed: 10/25/2022]
Abstract
Centromeres are central to chromosome segregation and genome stability, and thus their molecular foundations are important for understanding their function and the ways in which they go awry. Human centromeres typically form at large megabase-sized arrays of alpha satellite DNA for which there is little genomic understanding due to its repetitive nature. Consequently, it has been difficult to achieve genome assemblies at centromeres using traditional next generation sequencing approaches, so that centromeres represent gaps in the current human genome assembly. The role of alpha satellite DNA has been debated since centromeres can form, albeit rarely, on non-alpha satellite DNA. Conversely, the simple presence of alpha satellite DNA is not sufficient for centromere function since chromosomes with multiple alpha satellite arrays only exhibit a single location of centromere assembly. Here, we discuss the organization of human centromeres as well as genomic and functional variation in human centromere location, and current understanding of the genomic and epigenetic mechanisms that underlie centromere flexibility in humans.
Collapse
Affiliation(s)
| | - Beth A Sullivan
- Department of Molecular Genetics and Microbiology, USA; Division of Human Genetics, Duke University School of Medicine, Durham, NC, 27710, USA.
| |
Collapse
|
7
|
CENP-A binding domains and recombination patterns in horse spermatocytes. Sci Rep 2019; 9:15800. [PMID: 31676881 PMCID: PMC6825197 DOI: 10.1038/s41598-019-52153-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Accepted: 10/11/2019] [Indexed: 02/07/2023] Open
Abstract
Centromeres exert an inhibitory effect on meiotic recombination, but the possible contribution of satellite DNA to this "centromere effect" is under debate. In the horse, satellite DNA is present at all centromeres with the exception of the one from chromosome 11. This organization of centromeres allowed us to investigate the role of satellite DNA on recombination suppression in horse spermatocytes at the stage of pachytene. To this aim we analysed the distribution of the MLH1 protein, marker of recombination foci, relative to CENP-A, marker of centromeric function. We demonstrated that the satellite-less centromere of chromosome 11 causes crossover suppression, similarly to satellite-based centromeres. These results suggest that the centromere effect does not depend on satellite DNA. During this analysis, we observed a peculiar phenomenon: while, as expected, the centromere of the majority of meiotic bivalent chromosomes was labelled with a single immunofluorescence centromeric signal, double-spotted or extended signals were also detected. Their number varied from 0 to 7 in different cells. This observation can be explained by positional variation of the centromeric domain on the two homologs and/or misalignment of pericentromeric satellite DNA arrays during homolog pairing confirming the great plasticity of equine centromeres.
Collapse
|
8
|
Miga KH. Centromeric Satellite DNAs: Hidden Sequence Variation in the Human Population. Genes (Basel) 2019; 10:E352. [PMID: 31072070 PMCID: PMC6562703 DOI: 10.3390/genes10050352] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 05/03/2019] [Accepted: 05/03/2019] [Indexed: 12/30/2022] Open
Abstract
The central goal of medical genomics is to understand the inherited basis of sequence variation that underlies human physiology, evolution, and disease. Functional association studies currently ignore millions of bases that span each centromeric region and acrocentric short arm. These regions are enriched in long arrays of tandem repeats, or satellite DNAs, that are known to vary extensively in copy number and repeat structure in the human population. Satellite sequence variation in the human genome is often so large that it is detected cytogenetically, yet due to the lack of a reference assembly and informatics tools to measure this variability, contemporary high-resolution disease association studies are unable to detect causal variants in these regions. Nevertheless, recently uncovered associations between satellite DNA variation and human disease support that these regions present a substantial and biologically important fraction of human sequence variation. Therefore, there is a pressing and unmet need to detect and incorporate this uncharacterized sequence variation into broad studies of human evolution and medical genomics. Here I discuss the current knowledge of satellite DNA variation in the human genome, focusing on centromeric satellites and their potential implications for disease.
Collapse
Affiliation(s)
- Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California, CA 95064, USA.
| |
Collapse
|
9
|
Centromere Repeats: Hidden Gems of the Genome. Genes (Basel) 2019; 10:genes10030223. [PMID: 30884847 PMCID: PMC6471113 DOI: 10.3390/genes10030223] [Citation(s) in RCA: 94] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Revised: 03/07/2019] [Accepted: 03/11/2019] [Indexed: 01/08/2023] Open
Abstract
Satellite DNAs are now regarded as powerful and active contributors to genomic and chromosomal evolution. Paired with mobile transposable elements, these repetitive sequences provide a dynamic mechanism through which novel karyotypic modifications and chromosomal rearrangements may occur. In this review, we discuss the regulatory activity of satellite DNA and their neighboring transposable elements in a chromosomal context with a particular emphasis on the integral role of both in centromere function. In addition, we discuss the varied mechanisms by which centromeric repeats have endured evolutionary processes, producing a novel, species-specific centromeric landscape despite sharing a ubiquitously conserved function. Finally, we highlight the role these repetitive elements play in the establishment and functionality of de novo centromeres and chromosomal breakpoints that underpin karyotypic variation. By emphasizing these unique activities of satellite DNAs and transposable elements, we hope to disparage the conventional exemplification of repetitive DNA in the historically-associated context of ‘junk’.
Collapse
|
10
|
McNulty SM, Sullivan BA. Alpha satellite DNA biology: finding function in the recesses of the genome. Chromosome Res 2018; 26:115-138. [PMID: 29974361 DOI: 10.1007/s10577-018-9582-3] [Citation(s) in RCA: 81] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Accepted: 06/14/2018] [Indexed: 02/05/2023]
Abstract
Repetitive DNA, formerly referred to by the misnomer "junk DNA," comprises a majority of the human genome. One class of this DNA, alpha satellite, comprises up to 10% of the genome. Alpha satellite is enriched at all human centromere regions and is competent for de novo centromere assembly. Because of the highly repetitive nature of alpha satellite, it has been difficult to achieve genome assemblies at centromeres using traditional next-generation sequencing approaches, and thus, centromeres represent gaps in the current human genome assembly. Moreover, alpha satellite DNA is transcribed into repetitive noncoding RNA and contributes to a large portion of the transcriptome. Recent efforts to characterize these transcripts and their function have uncovered pivotal roles for satellite RNA in genome stability, including silencing "selfish" DNA elements and recruiting centromere and kinetochore proteins. This review will describe the genomic and epigenetic features of alpha satellite DNA, discuss recent findings of noncoding transcripts produced from distinct alpha satellite arrays, and address current progress in the functional understanding of this oft-neglected repetitive sequence. We will discuss unique challenges of studying human satellite DNAs and RNAs and point toward new technologies that will continue to advance our understanding of this largely untapped portion of the genome.
Collapse
Affiliation(s)
- Shannon M McNulty
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC, 27710, USA
| | - Beth A Sullivan
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC, 27710, USA. .,Division of Human Genetics, Duke University Medical Center, Durham, NC, 27710, USA.
| |
Collapse
|
11
|
Robicheau BM, Susko E, Harrigan AM, Snyder M. Ribosomal RNA Genes Contribute to the Formation of Pseudogenes and Junk DNA in the Human Genome. Genome Biol Evol 2018; 9:380-397. [PMID: 28204512 PMCID: PMC5381670 DOI: 10.1093/gbe/evw307] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/30/2016] [Indexed: 12/20/2022] Open
Abstract
Approximately 35% of the human genome can be identified as sequence devoid of a selected-effect function, and not derived from transposable elements or repeated sequences. We provide evidence supporting a known origin for a fraction of this sequence. We show that: 1) highly degraded, but near full length, ribosomal DNA (rDNA) units, including both 45S and Intergenic Spacer (IGS), can be found at multiple sites in the human genome on chromosomes without rDNA arrays, 2) that these rDNA sequences have a propensity for being centromere proximal, and 3) that sequence at all human functional rDNA array ends is divergent from canonical rDNA to the point that it is pseudogenic. We also show that small sequence strings of rDNA (from 45S + IGS) can be found distributed throughout the genome and are identifiable as an “rDNA-like signal”, representing 0.26% of the q-arm of HSA21 and ∼2% of the total sequence of other regions tested. The size of sequence strings found in the rDNA-like signal intergrade into the size of sequence strings that make up the full-length degrading rDNA units found scattered throughout the genome. We conclude that the displaced and degrading rDNA sequences are likely of a similar origin but represent different stages in their evolution towards random sequence. Collectively, our data suggests that over vast evolutionary time, rDNA arrays contribute to the production of junk DNA. The concept that the production of rDNA pseudogenes is a by-product of concerted evolution represents a previously under-appreciated process; we demonstrate here its importance.
Collapse
Affiliation(s)
- Brent M Robicheau
- Department of Biology, Acadia University, Wolfville, Nova Scotia, Canada
| | - Edward Susko
- Center for Comparative Genomics and Evolutionary Bioinformatics, Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Amye M Harrigan
- Department of Biology, Acadia University, Wolfville, Nova Scotia, Canada
| | - Marlene Snyder
- Department of Biology, Acadia University, Wolfville, Nova Scotia, Canada
| |
Collapse
|
12
|
Jain M, Olsen HE, Turner DJ, Stoddart D, Bulazel KV, Paten B, Haussler D, Willard HF, Akeson M, Miga KH. Linear assembly of a human centromere on the Y chromosome. Nat Biotechnol 2018; 36:321-323. [PMID: 29553574 PMCID: PMC5886786 DOI: 10.1038/nbt.4109] [Citation(s) in RCA: 150] [Impact Index Per Article: 21.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Accepted: 02/22/2018] [Indexed: 01/21/2023]
Abstract
The human genome reference sequence remains incomplete owing to the challenge of assembling long tracts of near-identical tandem repeats in centromeres. We implemented a nanopore sequencing strategy to generate high-quality reads that span hundreds of kilobases of highly repetitive DNA in a human Y chromosome centromere. Combining these data with short-read variant validation, we assembled and characterized the centromeric region of a human Y chromosome.
Collapse
Affiliation(s)
- Miten Jain
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California USA
| | - Hugh E Olsen
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California USA
| | | | | | - Kira V Bulazel
- Duke Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California USA
| | - David Haussler
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California USA
| | - Huntington F Willard
- Duke Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina USA
- Geisinger National, Bethesda, Maryland USA
| | - Mark Akeson
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California USA
- Duke Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina USA
| |
Collapse
|
13
|
Abstract
Genomic variation is a source of functional diversity that is typically studied in genic and non-coding regulatory regions. However, the extent of variation within noncoding portions of the human genome, particularly highly repetitive regions, and the functional consequences are not well understood. Satellite DNA, including α satellite DNA found at human centromeres, comprises up to 10% of the genome, but is difficult to study because its repetitive nature hinders contiguous sequence assemblies. We recently described variation within α satellite DNA that affects centromere function. On human chromosome 17 (HSA17), we showed that size and sequence polymorphisms within primary array D17Z1 are associated with chromosome aneuploidy and defective centromere architecture. However, HSA17 can counteract this instability by assembling the centromere at a second, "backup" array lacking variation. Here, we discuss our findings in a broader context of human centromere assembly, and highlight areas of future study to uncover links between genomic and epigenetic features of human centromeres.
Collapse
Affiliation(s)
- Lori L Sullivan
- a Department of Molecular Genetics and Microbiology , Duke University Medical Center , Durham , NC , USA
| | - Kimberline Chew
- a Department of Molecular Genetics and Microbiology , Duke University Medical Center , Durham , NC , USA
| | - Beth A Sullivan
- a Department of Molecular Genetics and Microbiology , Duke University Medical Center , Durham , NC , USA
| |
Collapse
|
14
|
Miga KH. The Promises and Challenges of Genomic Studies of Human Centromeres. PROGRESS IN MOLECULAR AND SUBCELLULAR BIOLOGY 2017; 56:285-304. [PMID: 28840242 DOI: 10.1007/978-3-319-58592-5_12] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Human centromeres are genomic regions that act as sites of kinetochore assembly to ensure proper chromosome segregation during mitosis and meiosis. Although the biological importance of centromeres in genome stability, and ultimately, cell viability are well understood, the complete sequence content and organization in these multi-megabase-sized regions remains unknown. The lack of a high-resolution reference assembly inhibits standard bioinformatics protocols, and as a result, sequence-based studies involving human centromeres lag far behind the advances made for the non-repetitive sequences in the human genome. In this chapter, I introduce what is known about the genomic organization in the highly repetitive regions spanning human centromeres, and discuss the challenges these sequences pose for assembly, alignment, and data interpretation. Overcoming these obstacles is expected to issue a new era for centromere genomics, which will offer new discoveries in basic cell biology and human biomedical research.
Collapse
Affiliation(s)
- Karen H Miga
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, CA, USA.
| |
Collapse
|
15
|
Cacheux L, Ponger L, Gerbault-Seureau M, Richard FA, Escudé C. Diversity and distribution of alpha satellite DNA in the genome of an Old World monkey: Cercopithecus solatus. BMC Genomics 2016; 17:916. [PMID: 27842493 PMCID: PMC5109768 DOI: 10.1186/s12864-016-3246-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Accepted: 11/02/2016] [Indexed: 11/10/2022] Open
Abstract
Background Alpha satellite is the major repeated DNA element of primate centromeres. Evolution of these tandemly repeated sequences has led to the existence of numerous families of monomers exhibiting specific organizational patterns. The limited amount of information available in non-human primates is a restriction to the understanding of the evolutionary dynamics of alpha satellite DNA. Results We carried out the targeted high-throughput sequencing of alpha satellite monomers and dimers from the Cercopithecus solatus genome, an Old World monkey from the Cercopithecini tribe. Computational approaches were used to infer the existence of sequence families and to study how these families are organized with respect to each other. While previous studies had suggested that alpha satellites in Old World monkeys were poorly diversified, our analysis provides evidence for the existence of at least four distinct families of sequences within the studied species and of higher order organizational patterns. Fluorescence in situ hybridization using oligonucleotide probes that are able to target each family in a specific way showed that the different families had distinct distributions on chromosomes and were not homogeneously distributed between chromosomes. Conclusions Our new approach provides an unprecedented and comprehensive view of the diversity and organization of alpha satellites in a species outside the hominoid group. We consider these data with respect to previously known alpha satellite families and to potential mechanisms for satellite DNA evolution. Applying this approach to other species will open new perspectives regarding the integration of satellite DNA into comparative genomic and cytogenetic studies. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3246-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Lauriane Cacheux
- Département Régulations, Développement et Diversité Moléculaire, Structure et Instabilité des Génomes, INSERM U1154, CNRS UMR7196, Sorbonne Universités, Muséum national d'Histoire naturelle, Paris, France.,Département Systématique et Evolution, Institut de Systématique, Evolution, Biodiversité, UMR 7205 MNHN, CNRS, UPMC, EPHE, Sorbonne Universités, Muséum national d'Histoire naturelle, Paris, France
| | - Loïc Ponger
- Département Régulations, Développement et Diversité Moléculaire, Structure et Instabilité des Génomes, INSERM U1154, CNRS UMR7196, Sorbonne Universités, Muséum national d'Histoire naturelle, Paris, France
| | - Michèle Gerbault-Seureau
- Département Systématique et Evolution, Institut de Systématique, Evolution, Biodiversité, UMR 7205 MNHN, CNRS, UPMC, EPHE, Sorbonne Universités, Muséum national d'Histoire naturelle, Paris, France
| | - Florence Anne Richard
- Département Systématique et Evolution, Institut de Systématique, Evolution, Biodiversité, UMR 7205 MNHN, CNRS, UPMC, EPHE, Sorbonne Universités, Muséum national d'Histoire naturelle, Paris, France.,Université Versailles St-Quentin, Montigny-le-Bretonneux, France
| | - Christophe Escudé
- Département Régulations, Développement et Diversité Moléculaire, Structure et Instabilité des Génomes, INSERM U1154, CNRS UMR7196, Sorbonne Universités, Muséum national d'Histoire naturelle, Paris, France.
| |
Collapse
|
16
|
Aldrup-MacDonald ME, Kuo ME, Sullivan LL, Chew K, Sullivan BA. Genomic variation within alpha satellite DNA influences centromere location on human chromosomes with metastable epialleles. Genome Res 2016; 26:1301-1311. [PMID: 27510565 PMCID: PMC5052062 DOI: 10.1101/gr.206706.116] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2016] [Accepted: 08/08/2016] [Indexed: 01/27/2023]
Abstract
Alpha satellite is a tandemly organized type of repetitive DNA that comprises 5% of the genome and is found at all human centromeres. A defined number of 171-bp monomers are organized into chromosome-specific higher-order repeats (HORs) that are reiterated thousands of times. At least half of all human chromosomes have two or more distinct HOR alpha satellite arrays within their centromere regions. We previously showed that the two alpha satellite arrays of Homo sapiens Chromosome 17 (HSA17), D17Z1 and D17Z1-B, behave as centromeric epialleles, that is, the centromere, defined by chromatin containing the centromeric histone variant CENPA and recruitment of other centromere proteins, can form at either D17Z1 or D17Z1-B. Some individuals in the human population are functional heterozygotes in that D17Z1 is the active centromere on one homolog and D17Z1-B is active on the other. In this study, we aimed to understand the molecular basis for how centromere location is determined on HSA17. Specifically, we focused on D17Z1 genomic variation as a driver of epiallele formation. We found that D17Z1 arrays that are predominantly composed of HOR size and sequence variants were functionally less competent. They either recruited decreased amounts of the centromere-specific histone variant CENPA and the HSA17 was mitotically unstable, or alternatively, the centromere was assembled at D17Z1-B and the HSA17 was stable. Our study demonstrates that genomic variation within highly repetitive, noncoding DNA of human centromere regions has a pronounced impact on genome stability and basic chromosomal function.
Collapse
Affiliation(s)
- Megan E Aldrup-MacDonald
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - Molly E Kuo
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - Lori L Sullivan
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - Kimberline Chew
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - Beth A Sullivan
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina 27710, USA; Division of Human Genetics, Duke University Medical Center, Durham, North Carolina 27710, USA
| |
Collapse
|
17
|
Evolution of the rapidly mutating human salivary agglutinin gene (DMBT1) and population subsistence strategy. Proc Natl Acad Sci U S A 2015; 112:5105-10. [PMID: 25848046 DOI: 10.1073/pnas.1416531112] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The dietary change resulting from the domestication of plant and animal species and development of agriculture at different locations across the world was one of the most significant changes in human evolution. An increase in dietary carbohydrates caused an increase in dental caries following the development of agriculture, mediated by the cariogenic oral bacterium Streptococcus mutans. Salivary agglutinin [SAG, encoded by the deleted in malignant brain tumors 1 (DMBT1) gene] is an innate immune receptor glycoprotein that binds a variety of bacteria and viruses, and mediates attachment of S. mutans to hydroxyapatite on the surface of the tooth. In this study we show that multiallelic copy number variation (CNV) within DMBT1 is extensive across all populations and is predicted to result in between 7-20 scavenger-receptor cysteine-rich (SRCR) domains within each SAG molecule. Direct observation of de novo mutation in multigeneration families suggests these CNVs have a very high mutation rate for a protein-coding locus, with a mutation rate of up to 5% per gamete. Given that the SRCR domains bind S. mutans and hydroxyapatite in the tooth, we investigated the association of sequence diversity at the SAG-binding gene of S. mutans, and DMBT1 CNV. Furthermore, we show that DMBT1 CNV is also associated with a history of agriculture across global populations, suggesting that dietary change as a result of agriculture has shaped the pattern of CNV at DMBT1, and that the DMBT1-S. mutans interaction is a promising model of host-pathogen-culture coevolution in humans.
Collapse
|
18
|
Black HA, Khan FF, Tyson J, Armour JAL. Inferring mechanisms of copy number change from haplotype structures at the human DEFA1A3 locus. BMC Genomics 2014; 15:614. [PMID: 25048054 PMCID: PMC4117965 DOI: 10.1186/1471-2164-15-614] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2014] [Accepted: 07/14/2014] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND The determination of structural haplotypes at copy number variable regions can indicate the mechanisms responsible for changes in copy number, as well as explain the relationship between gene copy number and expression. However, obtaining spatial information at regions displaying extensive copy number variation, such as the DEFA1A3 locus, is complex, because of the difficulty in the phasing and assembly of these regions. The DEFA1A3 locus is intriguing in that it falls within a region of high linkage disequilibrium, despite its high variability in copy number (n = 3-16); hence, the mechanisms responsible for changes in copy number at this locus are unclear. RESULTS In this study, a region flanking the DEFA1A3 locus was sequenced across 120 independent haplotypes with European ancestry, identifying five common classes of DEFA1A3 haplotype. Assigning DEFA1A3 class to haplotypes within the 1000 Genomes project highlights a significant difference in DEFA1A3 class frequencies between populations with different ancestry. The features of each DEFA1A3 class, for example, the associated DEFA1A3 copy numbers, were initially assessed in a European cohort (n = 599) and replicated in the 1000 Genomes samples, showing within-class similarity, but between-class and between-population differences in the features of the DEFA1A3 locus. Emulsion haplotype fusion-PCR was used to generate 61 structural haplotypes at the DEFA1A3 locus, showing a high within-class similarity in structure. CONCLUSIONS Structural haplotypes across the DEFA1A3 locus indicate that intra-allelic rearrangement is the predominant mechanism responsible for changes in DEFA1A3 copy number, explaining the conservation of linkage disequilibrium across the locus. The identification of common structural haplotypes at the DEFA1A3 locus could aid studies into how DEFA1A3 copy number influences expression, which is currently unclear.
Collapse
Affiliation(s)
- Holly A Black
- School of Life Sciences, University of Nottingham, Queen’s Medical Centre, Nottingham, NG7 2UH UK
| | - Fayeza F Khan
- School of Life Sciences, University of Nottingham, Queen’s Medical Centre, Nottingham, NG7 2UH UK
| | - Jess Tyson
- School of Life Sciences, University of Nottingham, Queen’s Medical Centre, Nottingham, NG7 2UH UK
| | - John AL Armour
- School of Life Sciences, University of Nottingham, Queen’s Medical Centre, Nottingham, NG7 2UH UK
| |
Collapse
|
19
|
Abstract
The centromere is the chromosomal locus essential for chromosome inheritance and genome stability. Human centromeres are located at repetitive alpha satellite DNA arrays that compose approximately 5% of the genome. Contiguous alpha satellite DNA sequence is absent from the assembled reference genome, limiting current understanding of centromere organization and function. Here, we review the progress in centromere genomics spanning the discovery of the sequence to its molecular characterization and the work done during the Human Genome Project era to elucidate alpha satellite structure and sequence variation. We discuss exciting recent advances in alpha satellite sequence assembly that have provided important insight into the abundance and complex organization of this sequence on human chromosomes. In light of these new findings, we offer perspectives for future studies of human centromere assembly and function.
Collapse
Affiliation(s)
- Megan E. Aldrup-MacDonald
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA; E-Mail:
- Division of Human Genetics, Duke University, Durham, NC 27710, USA
| | - Beth A. Sullivan
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA; E-Mail:
- Division of Human Genetics, Duke University, Durham, NC 27710, USA
- Author to whom correspondence should be addressed; E-Mail: ; Tel.: +1-919-684-9038
| |
Collapse
|
20
|
Miga KH, Newton Y, Jain M, Altemose N, Willard HF, Kent WJ. Centromere reference models for human chromosomes X and Y satellite arrays. Genome Res 2014; 24:697-707. [PMID: 24501022 PMCID: PMC3975068 DOI: 10.1101/gr.159624.113] [Citation(s) in RCA: 165] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
The human genome sequence remains incomplete, with multimegabase-sized gaps representing the endogenous centromeres and other heterochromatic regions. Available sequence-based studies within these sites in the genome have demonstrated a role in centromere function and chromosome pairing, necessary to ensure proper chromosome segregation during cell division. A common genomic feature of these regions is the enrichment of long arrays of near-identical tandem repeats, known as satellite DNAs, which offer a limited number of variant sites to differentiate individual repeat copies across millions of bases. This substantial sequence homogeneity challenges available assembly strategies and, as a result, centromeric regions are omitted from ongoing genomic studies. To address this problem, we utilize monomer sequence and ordering information obtained from whole-genome shotgun reads to model two haploid human satellite arrays on chromosomes X and Y, resulting in an initial characterization of 3.83 Mb of centromeric DNA within an individual genome. To further expand the utility of each centromeric reference sequence model, we evaluate sites within the arrays for short-read mappability and chromosome specificity. Because satellite DNAs evolve in a concerted manner, we use these centromeric assemblies to assess the extent of sequence variation among 366 individuals from distinct human populations. We thus identify two satellite array variants in both X and Y centromeres, as determined by array length and sequence composition. This study provides an initial sequence characterization of a regional centromere and establishes a foundation to extend genomic characterization to these sites as well as to other repeat-rich regions within complex genomes.
Collapse
Affiliation(s)
- Karen H Miga
- Duke Institute for Genome Sciences & Policy, Duke University, Durham, North Carolina 27708, USA
| | | | | | | | | | | |
Collapse
|
21
|
Abstract
Human centromeres are defined by megabases of homogenous alpha-satellite DNA arrays that are packaged into specialized chromatin marked by the centromeric histone variant, centromeric protein A (CENP-A). Although most human chromosomes have a single higher-order repeat (HOR) array of alpha satellites, several chromosomes have more than one HOR array. Homo sapiens chromosome 17 (HSA17) has two juxtaposed HOR arrays, D17Z1 and D17Z1-B. Only D17Z1 has been linked to CENP-A chromatin assembly. Here, we use human artificial chromosome assembly assays to show that both D17Z1 and D17Z1-B can support de novo centromere assembly independently. We extend these in vitro studies and demonstrate, using immunostaining and chromatin analyses, that in human cells the centromere can be assembled at D17Z1 or D17Z1-B. Intriguingly, some humans are functional heterozygotes, meaning that CENP-A is located at a different HOR array on the two HSA17 homologs. The site of CENP-A assembly on HSA17 is stable and is transmitted through meiosis, as evidenced by inheritance of CENP-A location through multigenerational families. Differences in histone modifications are not linked clearly with active and inactive D17Z1 and D17Z1-B arrays; however, we detect a correlation between the presence of variant repeat units of D17Z1 and CENP-A assembly at the opposite array, D17Z1-B. Our studies reveal the presence of centromeric epialleles on an endogenous human chromosome and suggest genomic complexities underlying the mechanisms that determine centromere identity in humans.
Collapse
|
22
|
Alkan C, Ventura M, Archidiacono N, Rocchi M, Sahinalp SC, Eichler EE. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data. PLoS Comput Biol 2007; 3:1807-18. [PMID: 17907796 PMCID: PMC1994983 DOI: 10.1371/journal.pcbi.0030181] [Citation(s) in RCA: 70] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2007] [Accepted: 07/31/2007] [Indexed: 11/18/2022] Open
Abstract
The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%–5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution. Centromeric DNA has been described as the last frontier of genomic sequencing; such regions are typically poorly assembled during the whole-genome shotgun sequence assembly process due to their repetitive complexity. This paper develops a computational algorithm to systematically extract data regarding primate centromeric DNA structure and organization from that ∼5% of sequence that is not included as part of standard genome sequence assemblies. Using this computational approach, we identify and reconstruct published human higher-order alpha satellite arrays and discover new families in human, chimpanzee, and Old World monkeys. Experimental validation confirms the utility of this computational approach to understanding the centromere organization of other nonhuman primates. An evolutionary analysis in diverse primate genomes supports fundamental differences in the structure and organization of centromere DNA between ape and Old World monkey lineages. The ability to extract meaningful biological data from random shotgun sequence data helps to fill an important void in large-scale sequencing of primate genomes, with implications for other genome sequencing projects.
Collapse
Affiliation(s)
- Can Alkan
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, United States of America
| | - Mario Ventura
- Department of Genetics and Microbiology, University of Bari, Bari, Italy
| | | | - Mariano Rocchi
- Department of Genetics and Microbiology, University of Bari, Bari, Italy
| | - S. Cenk Sahinalp
- Department of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, United States of America
- Howard Hughes Medical Institute, Seattle, Washington, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
23
|
Kejnovsky E, Hobza R, Kubat Z, Widmer A, Marais GAB, Vyskot B. High intrachromosomal similarity of retrotransposon long terminal repeats: evidence for homogenization by gene conversion on plant sex chromosomes? Gene 2006; 390:92-7. [PMID: 17134852 DOI: 10.1016/j.gene.2006.10.007] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2006] [Revised: 10/03/2006] [Accepted: 10/03/2006] [Indexed: 11/25/2022]
Abstract
Retrotransposons are ubiquitous in the plant genomes and are responsible for their plasticity. Recently, we described a novel family of gypsy-like retrotransposons, named Retand, in the dioecious plant Silene latifolia possessing evolutionary young sex chromosomes of the mammalian type (XY). Here we have analyzed long terminal repeats (LTRs) of Retand that were amplified from laser microdissected X and Y sex chromosomes and autosomes of S. latifolia. A majority of X and Y-derived LTRs formed a few separate clades in phylogenetic analysis reflecting their high intrachromosomal similarity. Moreover, the LTRs localized on the Y chromosome were less divergent than the X chromosome-derived or autosomal LTRs. These data can be explained by a homogenization process, such as gene conversion, working more intensively on the Y chromosome.
Collapse
Affiliation(s)
- Eduard Kejnovsky
- Laboratory of Plant Developmental Genetics, Institute of Biophysics, Academy of Sciences of the Czech Republic, Kralovopolska 135, CZ-612 65 Brno, Czech Republic.
| | | | | | | | | | | |
Collapse
|
24
|
Abstract
Centromeres are the elements of chromosomes that assemble the proteinaceous kinetochore, maintain sister chromatid cohesion, regulate chromosome attachment to the spindle, and direct chromosome movement during cell division. Although the functions of centromeres and the proteins that contribute to their complex structure and function are conserved in eukaryotes, centromeric DNA diverges rapidly. Human centromeres are particularly complicated. Here, we review studies on the organization of homogeneous arrays of chromosome-specific alpha-satellite repeats and evolutionary links among eukaryotic centromeric sequences. We also discuss epigenetic mechanisms of centromere identity that confer structural and functional features of the centromere through DNA-protein interactions and post-translational modifications, producing centromere-specific chromatin signatures. The assembly and organization of human centromeres, the contributions of satellite DNA to centromere identity and diversity, and the mechanism whereby centromeres are distinguished from the rest of the genome reflect ongoing puzzles in chromosome biology.
Collapse
Affiliation(s)
- Mary G Schueler
- Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | |
Collapse
|
25
|
Roizès G. Human centromeric alphoid domains are periodically homogenized so that they vary substantially between homologues. Mechanism and implications for centromere functioning. Nucleic Acids Res 2006; 34:1912-24. [PMID: 16598075 PMCID: PMC1447651 DOI: 10.1093/nar/gkl137] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Sequence analysis of alphoid repeats from human chromosomes 17, 21 and 13 reveals recurrent diagnostic variant nucleotides. Their combinations define haplotypes, with higher order repeats (HORs) containing identical or closely-related haplotypes tandemly arranged into separate domains. The haplotypes found on homologues can be totally different, while HORs remain 99.8% homogeneous both intrachromosomally and between homologues. These results support the hypothesis, never before demonstrated, that unequal crossovers between sister chromatids accumulate to produce homogenization and amplification into tandem alphoid repeats. I propose that the molecular basis of this involves the diagnostic variant nucleotides, which enable pairing between HORs with identical or closely-related haplotypes. Domains are thus periodically renewed to maintain high intrachromosomal and interhomologue homogeneity. The capacity of a domain to form an active centromere is maintained as long as neither retrotransposons nor significant numbers of mutations affect it. In the presented model, a chromosome with an altered centromere can be transiently rescued by forming a neocentromere, until a restored, fully-competent domain is amplified de novo or rehomogenized through the accumulation of unequal crossovers.
Collapse
Affiliation(s)
- Gérard Roizès
- Institut de Génétique Humaine, UPR 1142, CNRS, 141 Rue de la Cardonille, 34396 Montpellier Cedex 5, France.
| |
Collapse
|
26
|
Abstract
Alpha-satellite is a family of tandemly repeated sequences found at all normal human centromeres. In addition to its significance for understanding centromere function, alpha-satellite is also a model for concerted evolution, as alpha-satellite repeats are more similar within a species than between species. There are two types of alpha-satellite in the human genome; while both are made up of approximately 171-bp monomers, they can be distinguished by whether monomers are arranged in extremely homogeneous higher-order, multimeric repeat units or exist as more divergent monomeric alpha-satellite that lacks any multimeric periodicity. In this study, as a model to examine the genomic and evolutionary relationships between these two types, we have focused on the chromosome 17 centromeric region that has reached both higher-order and monomeric alpha-satellite in the human genome assembly. Monomeric and higher-order alpha-satellites on chromosome 17 are phylogenetically distinct, consistent with a model in which higher-order evolved independently of monomeric alpha-satellite. Comparative analysis between human chromosome 17 and the orthologous chimpanzee chromosome indicates that monomeric alpha-satellite is evolving at approximately the same rate as the adjacent non-alpha-satellite DNA. However, higher-order alpha-satellite is less conserved, suggesting different evolutionary rates for the two types of alpha-satellite.
Collapse
Affiliation(s)
- M Katharine Rudd
- Institute for Genome Sciences & Policy, Duke University, Durham, North Carolina 27708, USA
| | | | | |
Collapse
|
27
|
Schindelhauer D, Schwarz T. Evidence for a fast, intrachromosomal conversion mechanism from mapping of nucleotide variants within a homogeneous alpha-satellite DNA array. Genome Res 2002; 12:1815-26. [PMID: 12466285 PMCID: PMC187568 DOI: 10.1101/gr.451502] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Assuming that patterns of sequence variants within highly homogeneous centromeric tandem repeat arrays can tell us which molecular turnover mechanisms are presently at work, we analyzed the alpha-satellite tandem repeat array DXZ1 of one human X chromosome. Here we present accurate snapshots from this dark matter of the genome. We demonstrate stable and representative cloning of the array in a P1 artificial chromosome (PAC) library, use samples of higher-order repeats subcloned from five unmapped PACs (120-160 kb) to identify common variants, and show that such variants are presently in a fixed transition state. To characterize patterns of variant spread throughout homogeneous array segments, we use a novel partial restriction and pulsed-field gel electrophoresis mapping approach. We find an older large-scale (35-50 kb) duplication event supporting the evolutionarily important unequal crossing-over hypothesis, but generally find independent variant occurrence and a paucity of potential de novo mutations within segments of highest homogeneity (99.1%-99.3%). Within such segments, a highly nonrandom variant clustering within adjacent higher-order repeats was found in the absence of haplotypic repeats. Such variant clusters are hardly explained by interchromosomal, fixation-driving mechanisms and likely reflect a fast, localized, intrachromosomal sequence conversion mechanism.
Collapse
Affiliation(s)
- Dirk Schindelhauer
- Institute of Human Genetics, Technical University of Munich, Munich, Germany.
| | | |
Collapse
|
28
|
Nijman IJ, Bradley DG, Hanotte O, Otsen M, Lenstra JA. Satellite DNA polymorphisms and AFLP correlate with Bos indicus-taurus hybridization. Anim Genet 1999; 30:265-73. [PMID: 10467701 DOI: 10.1046/j.1365-2052.1999.00475.x] [Citation(s) in RCA: 23] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
We describe satellite DNA variation that detects hybridization of Bos indicus (zebu or indicine cattle) and Bos taurus (taurine cattle) in African cattle populations. On Southern blots hybridized to a satellite III probe, relative intensities of Hinfl fragments correlated with the taurine-zebu composition in hybrid animals as deduced from AFLP genotyping of the same animals and previous data on microsatellite allele frequencies. Similar results were obtained by PCR-RFLP analysis of a zebu-specific mutation in the repeat unit of satellite 1.711b. Analysis of individuals from 20 African cattle breeds indicate that the centromeric satellites of the sanga breeds are of the taurine type and that several East-African zebu breeds are hybrids between taurine and zebu. These satellite RFLP, or SFLP, markers provide a fast method to screen the genetic makeup of African cattle.
Collapse
Affiliation(s)
- I J Nijman
- Department of Bacteriology, Faculty of Veterinary Medicine, Utrecht, The Netherlands
| | | | | | | | | |
Collapse
|
29
|
Laurent AM, Puechberty J, Prades C, Roizès G. Informative genetic polymorphic markers within the centromeric regions of human chromosomes 17 (D17S2205) and 11 (D11S4975). Genomics 1998; 52:166-72. [PMID: 9782082 DOI: 10.1006/geno.1998.5428] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We have taken advantage of the presence of retrotransposed L1 elements within the centromeric alphoid sequences of the human genome to characterize polymorphic markers at the centromeres of human chromosomes 17 and 11 (D17S2205 and D11S4975, respectively). They correspond to microsatellites found at the 3' ends of L1 elements inserted within the alpha satellite sequences of the two chromosomes. They were detected after PCR by direct analysis in sequencing gels. Eight and five alleles, respectively, were found with heterozygosities of 0.67 and 0.68. They were converted into STSs by designing primers specific for each. D17S2205 and D11S4975 can be used as genuine anchor-informative genetic points for chromosomes 17 and 11. Both markers have been placed on the available genetic maps of their centromeric regions. The alphoid domain within which D17S2205 is embedded is ancestral to the canonical ones on chromosome 17 that exhibit several haplotypes in present-day human populations.
Collapse
MESH Headings
- Centromere/genetics
- Chromosomes, Human, Pair 11/genetics
- Chromosomes, Human, Pair 17/genetics
- DNA, Satellite/analysis
- DNA, Satellite/chemistry
- DNA, Satellite/genetics
- Electrophoresis, Gel, Pulsed-Field
- Humans
- Microsatellite Repeats
- Molecular Sequence Data
- Pedigree
- Polymorphism, Genetic
Collapse
Affiliation(s)
- A M Laurent
- Séquences Répétées et Centromères Humains, CNRS ERS 155, Institut de Biologie, 4 Boulevard Henri IV, Montpellier Cedex, 34060, France
| | | | | | | |
Collapse
|
30
|
Bailey AD, Pavelitz T, Weiner AM. The microsatellite sequence (CT)n x (GA)n promotes stable chromosomal integration of large tandem arrays of functional human U2 small nuclear RNA genes. Mol Cell Biol 1998; 18:2262-71. [PMID: 9528797 PMCID: PMC121475 DOI: 10.1128/mcb.18.4.2262] [Citation(s) in RCA: 21] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/1997] [Accepted: 01/20/1998] [Indexed: 02/07/2023] Open
Abstract
The multigene family encoding human U2 small nuclear RNA (snRNA) is organized as a single large tandem array containing 5 to 25 copies of a 6.1-kb repeat unit (the RNU2 locus). Remarkably, each of the repeat units within an individual U2 tandem array appears to be identical except for an irregular dinucleotide tract, known as the CT microsatellite, which exhibits minor length and sequence polymorphism. Using a somatic cell genetic assay, we previously noticed that the CT microsatellite appeared to stabilize artificial tandem arrays of U2 snRNA genes. We now demonstrate that the CT microsatellite is required to establish large tandem arrays of transcriptionally active U2 genes, increasing both the average and maximum size of the resulting arrays. In contrast, the CT microsatellite has no effect on the average or maximal size of artificial arrays containing transcriptionally inactive U2 genes that lack key promoter elements. Our data reinforce the connection between recombination and transcription. Active U2 transcription interferes with establishment or maintenance of the U2 tandem array, and the CT microsatellite opposes these effects, perhaps by binding GAGA or GAGA-related factors which alter local chromatin structure. We speculate that the mechanisms responsible for maintenance of tandem arrays containing active promoters may differ from those that maintain tandem arrays of transcriptionally inactive sequences.
Collapse
Affiliation(s)
- A D Bailey
- Department of Molecular Biophysics, Yale University, New Haven, Connecticut 06520-8114, USA
| | | | | |
Collapse
|
31
|
Liao D, Pavelitz T, Kidd JR, Kidd KK, Weiner AM. Concerted evolution of the tandemly repeated genes encoding human U2 snRNA (the RNU2 locus) involves rapid intrachromosomal homogenization and rare interchromosomal gene conversion. EMBO J 1997; 16:588-98. [PMID: 9034341 PMCID: PMC1169662 DOI: 10.1093/emboj/16.3.588] [Citation(s) in RCA: 59] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
We have surveyed the tandemly repeated genes encoding U2 snRNA in a diverse panel of humans. We found only two polymorphisms within the U2 repeat unit: a SacI polymorphism (alleles SacI+ or SacI-) and a CT microsatellite polymorphism (alleles CT+ or CT-). Surprisingly, individual U2 tandem arrays are entirely SacI+ or SacI-, and entirely CT+ or CT-, although the SacI and CT alleles can occur in any combination. We also found that polymorphisms in the left and right junction regions flanking the tandem array fall into only two haplotypes (JL+ and JL-, JR+ and JR-). Most surprisingly, JL+ is always associated with JR+, and JL- with JR-. Thus individual U2 arrays do not exchange flanking markers, despite independent assortment and subsequent homogenization of the SacI and CT alleles within the U2 repeat units. We propose that the primary driving force for concerted evolution of the tandem U2 genes is intrachromosomal homogenization; interchromosomal genetic exchanges are much rarer, and reciprocal nonsister chromatid exchange apparently does not occur. Thus concerted evolution of the U2 tandem array occurs in situ along a chromosome lineage, and linkage disequilibrium between sequences flanking the U2 array may persist for long periods of time.
Collapse
Affiliation(s)
- D Liao
- Department of Molecular Biophysics and Biochemistry, Yale University School of Medicine, New Haven, CT 06510-8024, USA
| | | | | | | | | |
Collapse
|
32
|
Abstract
The centromere, recognized cytologically as the primary constriction, is essential for chromosomal attachment to the spindle and for proper segregation of mitotic and meiotic chromosomes. Considerable progress has been made in identifying both DNA and protein components of the centromere and kinetochore complex in mammalian chromosomes, including definition of specific motor proteins with demonstrable functions in chromosome movement. Searches for possible environmental influences on chromosome disjunction might logically be based on known components of the segregation apparatus, both intrinsic and extrinsic to the chromosomes themselves. This article reviews available information on both DNA and protein components of the centromere of mammalian, particularly human, chromosomes and summarizes our current understanding of their role(s) in facilitating normal chromosome behavior in mitosis and meiosis.
Collapse
Affiliation(s)
- B A Sullivan
- Department of Genetics, Case Western Reserve, University School of Medicine, Cleveland, Ohio 44106-4955, USA
| | | | | |
Collapse
|