1
|
Zhang Y, Chu J, Cheng H, Li H. De novo reconstruction of satellite repeat units from sequence data. Genome Res 2023; 33:gr.278005.123. [PMID: 37918962 PMCID: PMC10760446 DOI: 10.1101/gr.278005.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 10/18/2023] [Indexed: 11/04/2023]
Abstract
Satellite DNA are long tandemly repeating sequences in a genome and may be organized as high-order repeats (HORs). They are enriched in centromeres and are challenging to assemble. Existing algorithms for identifying satellite repeats either require the complete assembly of satellites or only work for simple repeat structures without HORs. Here we describe Satellite Repeat Finder (SRF), a new algorithm for reconstructing satellite repeat units and HORs from accurate reads or assemblies without prior knowledge on repeat structures. Applying SRF to real sequence data, we show that SRF could reconstruct known satellites in human and well-studied model organisms. We also find satellite repeats are pervasive in various other species, accounting for up to 12% of their genome contents but are often underrepresented in assemblies. With the rapid progress in genome sequencing, SRF will help the annotation of new genomes and the study of satellite DNA evolution even if such repeats are not fully assembled.
Collapse
Affiliation(s)
- Yujie Zhang
- Harvard School of Public Health, Boston, Massachusetts 02115, USA
| | - Justin Chu
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Haoyu Cheng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA;
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA
| |
Collapse
|
2
|
Scelfo A, Fachinetti D. Centromere: A Trojan horse for genome stability. DNA Repair (Amst) 2023; 130:103569. [PMID: 37708591 DOI: 10.1016/j.dnarep.2023.103569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 08/22/2023] [Accepted: 09/05/2023] [Indexed: 09/16/2023]
Abstract
Centromeres play a key role in the maintenance of genome stability to prevent carcinogenesis and diseases. They are specialized chromosome loci essential to ensure faithful transmission of genomic information across cell generations by mediating the interaction with spindle microtubules. Nonetheless, while fulfilling these essential roles, their distinct repetitive composition and susceptibility to mechanical stresses during cell division render them susceptible to breakage events. In this review, we delve into the present understanding of the underlying causes of centromere fragility, from the mechanisms governing its DNA replication and repair, to the pathways acting to counteract potential challenges. We propose that the centromere represents a "Trojan horse" exerting vital functions that, at the same time, potentially threatens whole genome stability.
Collapse
Affiliation(s)
- Andrea Scelfo
- Institut Curie, CNRS, UMR 144, Sorbonne University, 26 rue d'Ulm, 75005 Paris, France.
| | - Daniele Fachinetti
- Institut Curie, CNRS, UMR 144, Sorbonne University, 26 rue d'Ulm, 75005 Paris, France.
| |
Collapse
|
3
|
Chrisman B, He C, Jung JY, Stockham N, Paskov K, Washington P, Petereit J, Wall DP. Localizing unmapped sequences with families to validate the Telomere-to-Telomere assembly and identify new hotspots for genetic diversity. Genome Res 2023; 33:1734-1746. [PMID: 37879860 PMCID: PMC10691534 DOI: 10.1101/gr.277175.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 05/25/2023] [Indexed: 10/27/2023]
Abstract
Although it is ubiquitous in genomics, the current human reference genome (GRCh38) is incomplete: It is missing large sections of heterochromatic sequence, and as a singular, linear reference genome, it does not represent the full spectrum of human genetic diversity. To characterize gaps in GRCh38 and human genetic diversity, we developed an algorithm for sequence location approximation using nuclear families (ASLAN) to identify the region of origin of reads that do not align to GRCh38. Using unmapped reads and variant calls from whole-genome sequences (WGSs), ASLAN uses a maximum likelihood model to identify the most likely region of the genome that a subsequence belongs to given the distribution of the subsequence in the unmapped reads and phasings of families. Validating ASLAN on synthetic data and on reads from the alternative haplotypes in the decoy genome, ASLAN localizes >90% of 100-bp sequences with >92% accuracy and ∼1 Mb of resolution. We then ran ASLAN on 100-mers from unmapped reads from WGS from more than 700 families, and compared ASLAN localizations to alignment of the 100-mers to the recently released T2T-CHM13 assembly. We found that many unmapped reads in GRCh38 originate from telomeres and centromeres that are gaps in GRCh38. ASLAN localizations are in high concordance with T2T-CHM13 alignments, except in the centromeres of the acrocentric chromosomes. Comparing ASLAN localizations and T2T-CHM13 alignments, we identified sequences missing from T2T-CHM13 or sequences with high divergence from their aligned region in T2T-CHM13, highlighting new hotspots for genetic diversity.
Collapse
Affiliation(s)
- Brianna Chrisman
- Department of Bioengineering, Stanford University, Stanford, California 94305, USA;
- Nevada Bioinformatics Center, University of Nevada, Reno, Nevada 89557, USA
| | - Chloe He
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA
| | - Jae-Yoon Jung
- Department of Pediatrics (Systems Medicine), Stanford University, Stanford, California 94305, USA
| | - Nate Stockham
- Department of Neuroscience, Stanford University, Stanford, California 94305, USA
| | - Kelley Paskov
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA
| | - Peter Washington
- Department of Bioengineering, Stanford University, Stanford, California 94305, USA
| | - Juli Petereit
- Nevada Bioinformatics Center, University of Nevada, Reno, Nevada 89557, USA
| | - Dennis P Wall
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA
- Department of Pediatrics (Systems Medicine), Stanford University, Stanford, California 94305, USA
| |
Collapse
|
4
|
Bzikadze AV, Pevzner PA. UniAligner: a parameter-free framework for fast sequence alignment. Nat Methods 2023; 20:1346-1354. [PMID: 37580559 DOI: 10.1038/s41592-023-01970-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 07/05/2023] [Indexed: 08/16/2023]
Abstract
Even though the recent advances in 'complete genomics' revealed the previously inaccessible genomic regions, analysis of variations in centromeres and other extra-long tandem repeats (ETRs) faces an algorithmic challenge since there are currently no tools for accurate sequence comparison of ETRs. Counterintuitively, the classical alignment approaches, such as the Smith-Waterman algorithm, fail to construct biologically adequate alignments of ETRs. We present UniAligner-the parameter-free sequence alignment algorithm with sequence-dependent alignment scoring that automatically changes for any pair of compared sequences. UniAligner prioritizes matches of rare substrings that are more likely to be relevant to the evolutionary relationship between two sequences. We apply UniAligner to estimate the mutation rates in human centromeres, and quantify the extremely high rate of large duplications and deletions in centromeres. This high rate suggests that centromeres may represent some of the most rapidly evolving regions of the human genome with respect to their structural organization.
Collapse
Affiliation(s)
- Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, La Jolla, CA, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
5
|
Ponomartsev N, Zilov D, Gushcha E, Travina A, Sergeev A, Enukashvily N. Overexpression of Pericentromeric HSAT2 DNA Increases Expression of EMT Markers in Human Epithelial Cancer Cell Lines. Int J Mol Sci 2023; 24:ijms24086918. [PMID: 37108080 PMCID: PMC10138405 DOI: 10.3390/ijms24086918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 04/02/2023] [Accepted: 04/04/2023] [Indexed: 04/29/2023] Open
Abstract
Pericentromeric tandemly repeated DNA of human satellites 1, 2, and 3 (HS1, HS2, and HS3) is actively transcribed in some cells. However, the functionality of the transcription remains obscure. Studies in this area have been hampered by the absence of a gapless genome assembly. The aim of our study was to map a transcript that we have previously described as HS2/HS3 on chromosomes using a newly published gapless genome assembly T2T-CHM13, and create a plasmid overexpressing the transcript to assess the influence of HS2/HS3 transcription on cancer cells. We report here that the sequence of the transcript is tandemly repeated on nine chromosomes (1, 2, 7, 9, 10, 16, 17, 22, and Y). A detailed analysis of its genomic localization and annotation in the T2T-CHM13 assembly revealed that the sequence belonged to HSAT2 (HS2) but not to the HS3 family of tandemly repeated DNA. The transcript was found on both strands of HSAT2 arrays. The overexpression of the HSAT2 transcript increased the transcription of the genes encoding the proteins involved in the epithelial-to-mesenchymal transition, EMT (SNAI1, ZEB1, and SNAI2), and the genes that mark cancer-associated fibroblasts (VIM, COL1A1, COL11A1, and ACTA2) in cancer cell lines A549 and HeLa. Co-transfection of the overexpression plasmid and antisense nucleotides eliminated the transcription of EMT genes observed after HSAT2 overexpression. Antisense oligonucleotides also decreased transcription of the EMT genes induced by tumor growth factor beta 1 (TGFβ1). Thus, our study suggests HSAT2 lncRNA transcribed from the pericentromeric tandemly repeated DNA is involved in EMT regulation in cancer cells.
Collapse
Affiliation(s)
- Nikita Ponomartsev
- Institute of Cytology, Russian Academy of Sciences, St. Petersburg 194064, Russia
| | - Danil Zilov
- Institute of Cytology, Russian Academy of Sciences, St. Petersburg 194064, Russia
- Applied Genomics Laboratory, SCAMT Institute, ITMO University, Saint Petersburg 191002, Russia
| | - Ekaterina Gushcha
- Institute of Cytology, Russian Academy of Sciences, St. Petersburg 194064, Russia
| | - Alexandra Travina
- Institute of Cytology, Russian Academy of Sciences, St. Petersburg 194064, Russia
| | - Alexander Sergeev
- Institute of Cytology, Russian Academy of Sciences, St. Petersburg 194064, Russia
| | - Natella Enukashvily
- Institute of Cytology, Russian Academy of Sciences, St. Petersburg 194064, Russia
| |
Collapse
|
6
|
Šatović-Vukšić E, Plohl M. Satellite DNAs-From Localized to Highly Dispersed Genome Components. Genes (Basel) 2023; 14:genes14030742. [PMID: 36981013 PMCID: PMC10048060 DOI: 10.3390/genes14030742] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 03/15/2023] [Accepted: 03/16/2023] [Indexed: 03/30/2023] Open
Abstract
According to the established classical view, satellite DNAs are defined as abundant non-coding DNA sequences repeated in tandem that build long arrays located in heterochromatin. Advances in sequencing methodologies and development of specialized bioinformatics tools enabled defining a collection of all repetitive DNAs and satellite DNAs in a genome, the repeatome and the satellitome, respectively, as well as their reliable annotation on sequenced genomes. Supported by various non-model species included in recent studies, the patterns of satellite DNAs and satellitomes as a whole showed much more diversity and complexity than initially thought. Differences are not only in number and abundance of satellite DNAs but also in their distribution across the genome, array length, interspersion patterns, association with transposable elements, localization in heterochromatin and/or in euchromatin. In this review, we compare characteristic organizational features of satellite DNAs and satellitomes across different animal and plant species in order to summarize organizational forms and evolutionary processes that may lead to satellitomes' diversity and revisit some basic notions regarding repetitive DNA landscapes in genomes.
Collapse
Affiliation(s)
- Eva Šatović-Vukšić
- Division of Molecular Biology, Ruđer Bošković Institute, 10000 Zagreb, Croatia
| | - Miroslav Plohl
- Division of Molecular Biology, Ruđer Bošković Institute, 10000 Zagreb, Croatia
| |
Collapse
|
7
|
Mahlke MA, Lumerman L, Ly P, Nechemia-Arbely Y. Epigenetic centromere identity is precisely maintained through DNA replication but is uniquely specified among human cells. Life Sci Alliance 2023; 6:e202201807. [PMID: 36596606 PMCID: PMC9811134 DOI: 10.26508/lsa.202201807] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 12/21/2022] [Accepted: 12/22/2022] [Indexed: 01/05/2023] Open
Abstract
Centromere identity is defined and maintained epigenetically by the presence of the histone variant CENP-A. How centromeric CENP-A position is specified and precisely maintained through DNA replication is not fully understood. The recently released Telomere-to-Telomere (T2T) genome assembly containing the first complete human centromere sequences provides a new resource for examining CENP-A position. Mapping CENP-A position in clones of the same cell line to the T2T assembly identified highly similar CENP-A position after multiple cell divisions. In contrast, centromeric CENP-A epialleles were evident at several centromeres of different human cell lines, demonstrating the location of CENP-A enrichment and the site of kinetochore recruitment vary among human cells. Across the cell cycle, CENP-A molecules deposited in G1 phase are maintained in their precise position through DNA replication. Thus, despite CENP-A dilution during DNA replication, CENP-A is precisely reloaded onto the same sequences within the daughter centromeres, maintaining unique centromere identity among human cells.
Collapse
Affiliation(s)
- Megan A Mahlke
- UPMC Hillman Cancer Center, Pittsburgh, PA, USA
- Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Lior Lumerman
- UPMC Hillman Cancer Center, Pittsburgh, PA, USA
- Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Peter Ly
- Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Yael Nechemia-Arbely
- UPMC Hillman Cancer Center, Pittsburgh, PA, USA
- Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
8
|
Lopes M, Louzada S, Ferreira D, Veríssimo G, Eleutério D, Gama-Carvalho M, Chaves R. Human Satellite 1A analysis provides evidence of pericentromeric transcription. BMC Biol 2023; 21:28. [PMID: 36755311 PMCID: PMC9909926 DOI: 10.1186/s12915-023-01521-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 01/19/2023] [Indexed: 02/10/2023] Open
Abstract
BACKGROUND Pericentromeric regions of human chromosomes are composed of tandem-repeated and highly organized sequences named satellite DNAs. Human classical satellite DNAs are classified into three families named HSat1, HSat2, and HSat3, which have historically posed a challenge for the assembly of the human reference genome where they are misrepresented due to their repetitive nature. Although being known for a long time as the most AT-rich fraction of the human genome, classical satellite HSat1A has been disregarded in genomic and transcriptional studies, falling behind other human satellites in terms of functional knowledge. Here, we aim to characterize and provide an understanding on the biological relevance of HSat1A. RESULTS The path followed herein trails with HSat1A isolation and cloning, followed by in silico analysis. Monomer copy number and expression data was obtained in a wide variety of human cell lines, with greatly varying profiles in tumoral/non-tumoral samples. HSat1A was mapped in human chromosomes and applied in in situ transcriptional assays. Additionally, it was possible to observe the nuclear organization of HSat1A transcripts and further characterize them by 3' RACE-Seq. Size-varying polyadenylated HSat1A transcripts were detected, which possibly accounts for the intricate regulation of alternative polyadenylation. CONCLUSION As far as we know, this work pioneers HSat1A transcription studies. With the emergence of new human genome assemblies, acrocentric pericentromeres are becoming relevant characters in disease and other biological contexts. HSat1A sequences and associated noncoding RNAs will most certainly prove significant in the future of HSat research.
Collapse
Affiliation(s)
- Mariana Lopes
- grid.12341.350000000121821287CytoGenomics Lab, Department of Genetics and Biotechnology (DGB), University of Trás-Os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal ,grid.9983.b0000 0001 2181 4263BioISI – Biosystems & Integrative Sciences Institute, Faculty of Sciences, University of Lisboa, 1749-016 Lisbon, Portugal
| | - Sandra Louzada
- grid.12341.350000000121821287CytoGenomics Lab, Department of Genetics and Biotechnology (DGB), University of Trás-Os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal ,grid.9983.b0000 0001 2181 4263BioISI – Biosystems & Integrative Sciences Institute, Faculty of Sciences, University of Lisboa, 1749-016 Lisbon, Portugal
| | - Daniela Ferreira
- grid.12341.350000000121821287CytoGenomics Lab, Department of Genetics and Biotechnology (DGB), University of Trás-Os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal ,grid.9983.b0000 0001 2181 4263BioISI – Biosystems & Integrative Sciences Institute, Faculty of Sciences, University of Lisboa, 1749-016 Lisbon, Portugal
| | - Gabriela Veríssimo
- grid.12341.350000000121821287CytoGenomics Lab, Department of Genetics and Biotechnology (DGB), University of Trás-Os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal ,grid.9983.b0000 0001 2181 4263BioISI – Biosystems & Integrative Sciences Institute, Faculty of Sciences, University of Lisboa, 1749-016 Lisbon, Portugal
| | - Daniel Eleutério
- grid.9983.b0000 0001 2181 4263BioISI – Biosystems & Integrative Sciences Institute, Faculty of Sciences, University of Lisboa, 1749-016 Lisbon, Portugal
| | - Margarida Gama-Carvalho
- grid.9983.b0000 0001 2181 4263BioISI – Biosystems & Integrative Sciences Institute, Faculty of Sciences, University of Lisboa, 1749-016 Lisbon, Portugal
| | - Raquel Chaves
- CytoGenomics Lab, Department of Genetics and Biotechnology (DGB), University of Trás-Os-Montes and Alto Douro (UTAD), 5000-801, Vila Real, Portugal. .,BioISI - Biosystems & Integrative Sciences Institute, Faculty of Sciences, University of Lisboa, 1749-016, Lisbon, Portugal.
| |
Collapse
|
9
|
Enukashvily NI, Ponomartsev NV, Ketkar A, Suezov R, Chubar AV, Prjibelski AD, Shafranskaya DD, Elmshäuser S, Keber CU, Stefanova VN, Akopov AL, Klingmüller U, Pfefferle PI, Stiewe T, Lauth M, Brichkina AI. Pericentromeric satellite lncRNAs are induced in cancer-associated fibroblasts and regulate their functions in lung tumorigenesis. Cell Death Dis 2023; 14:19. [PMID: 36635266 PMCID: PMC9837065 DOI: 10.1038/s41419-023-05553-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 12/21/2022] [Accepted: 01/03/2023] [Indexed: 01/14/2023]
Abstract
The abnormal tumor microenvironment (TME) often dictates the therapeutic response of cancer to chemo- and immuno-therapy. Aberrant expression of pericentromeric satellite repeats has been reported for epithelial cancers, including lung cancer. However, the transcription of tandemly repetitive elements in stromal cells of the TME has been unappreciated, limiting the optimal use of satellite transcripts as biomarkers or anti-cancer targets. We found that transcription of pericentromeric satellite DNA (satDNA) in mouse and human lung adenocarcinoma was observed in cancer-associated fibroblasts (CAFs). In vivo, lung fibroblasts expressed pericentromeric satellite repeats HS2/HS3 specifically in tumors. In vitro, transcription of satDNA was induced in lung fibroblasts in response to TGFβ, IL1α, matrix stiffness, direct contact with tumor cells and treatment with chemotherapeutic drugs. Single-cell transcriptome analysis of human lung adenocarcinoma confirmed that CAFs were the cell type with the highest number of satellite transcripts. Human HS2/HS3 pericentromeric transcripts were detected in the nucleus, cytoplasm, extracellularly and co-localized with extracellular vesicles in situ in human biopsies and activated fibroblasts in vitro. The transcripts were transmitted into recipient cells and entered their nuclei. Knock-down of satellite transcripts in human lung fibroblasts attenuated cellular senescence and blocked the formation of an inflammatory CAFs phenotype which resulted in the inhibition of their pro-tumorigenic functions. In sum, our data suggest that satellite long non-coding (lnc) RNAs are induced in CAFs, regulate expression of inflammatory genes and can be secreted from the cells, which potentially might present a new element of cell-cell communication in the TME.
Collapse
Affiliation(s)
| | - Nikita V Ponomartsev
- Institute of Cytology, Russian Academy of Sciences, 194064, St.-Petersburg, Russia
- Institute of Molecular and Cell Biology, A*STAR, 138673, Singapore, Singapore
| | - Avanee Ketkar
- Philipps University of Marburg, Department of Gastroenterology, Center for Tumor- and Immune Biology, 35043, Marburg, Germany
- Philipps University of Marburg, Institute of Molecular Oncology, 35043, Marburg, Germany
- Member of the German Center for Lung Research (DZL), Philipps University of Marburg, Marburg, Germany
| | - Roman Suezov
- Philipps University of Marburg, Department of Gastroenterology, Center for Tumor- and Immune Biology, 35043, Marburg, Germany
- Member of the German Center for Lung Research (DZL), Philipps University of Marburg, Marburg, Germany
| | - Anna V Chubar
- Institute of Cytology, Russian Academy of Sciences, 194064, St.-Petersburg, Russia
| | - Andrey D Prjibelski
- Center for Algorithmic Biotechnology, St.-Petersburg State University, 199034, St.-Petersburg, Russia
| | - Daria D Shafranskaya
- Center for Algorithmic Biotechnology, St.-Petersburg State University, 199034, St.-Petersburg, Russia
| | - Sabrina Elmshäuser
- Philipps University of Marburg, Institute of Molecular Oncology, 35043, Marburg, Germany
- Member of the German Center for Lung Research (DZL), Philipps University of Marburg, Marburg, Germany
| | - Corinna U Keber
- Member of the German Center for Lung Research (DZL), Philipps University of Marburg, Marburg, Germany
- Philipps University of Marburg, Institute of Pathology, 35043, Marburg, Germany
| | - Vera N Stefanova
- Institute of Cytology, Russian Academy of Sciences, 194064, St.-Petersburg, Russia
| | - Andrey L Akopov
- Pavlov First State Medical University, 197022, St.-Petersburg, Russia
| | - Ursula Klingmüller
- Member of the German Center for Lung Research (DZL), Philipps University of Marburg, Marburg, Germany
- German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany
| | - Petra I Pfefferle
- Member of the German Center for Lung Research (DZL), Philipps University of Marburg, Marburg, Germany
- Philipps University of Marburg, Comprehensive Biobank Marburg CBBMR, 35043, Marburg, Germany
| | - Thorsten Stiewe
- Philipps University of Marburg, Institute of Molecular Oncology, 35043, Marburg, Germany
- Member of the German Center for Lung Research (DZL), Philipps University of Marburg, Marburg, Germany
| | - Matthias Lauth
- Philipps University of Marburg, Department of Gastroenterology, Center for Tumor- and Immune Biology, 35043, Marburg, Germany
| | - Anna I Brichkina
- Philipps University of Marburg, Department of Gastroenterology, Center for Tumor- and Immune Biology, 35043, Marburg, Germany.
- Philipps University of Marburg, Institute of Molecular Oncology, 35043, Marburg, Germany.
- Member of the German Center for Lung Research (DZL), Philipps University of Marburg, Marburg, Germany.
| |
Collapse
|
10
|
Urban JA, Ranjan R, Chen X. Asymmetric Histone Inheritance: Establishment, Recognition, and Execution. Annu Rev Genet 2022; 56:113-143. [PMID: 35905975 PMCID: PMC10054593 DOI: 10.1146/annurev-genet-072920-125226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The discovery of biased histone inheritance in asymmetrically dividing Drosophila melanogaster male germline stem cells demonstrates one means to produce two distinct daughter cells with identical genetic material. This inspired further studies in different systems, which revealed that this phenomenon may be a widespread mechanism to introduce cellular diversity. While the extent of asymmetric histone inheritance could vary among systems, this phenomenon is proposed to occur in three steps: first, establishment of histone asymmetry between sister chromatids during DNA replication; second, recognition of sister chromatids carrying asymmetric histone information during mitosis; and third, execution of this asymmetry in the resulting daughter cells. By compiling the current knowledge from diverse eukaryotic systems, this review comprehensively details and compares known chromatin factors, mitotic machinery components, and cell cycle regulators that may contribute to each of these three steps. Also discussed are potential mechanisms that introduce and regulate variable histone inheritance modes and how these different modes may contribute to cell fate decisions in multicellular organisms.
Collapse
Affiliation(s)
- Jennifer A Urban
- Department of Biology, The Johns Hopkins University, Baltimore, Maryland, USA;
| | - Rajesh Ranjan
- Department of Biology, The Johns Hopkins University, Baltimore, Maryland, USA; .,Howard Hughes Medical Institute, The Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Xin Chen
- Department of Biology, The Johns Hopkins University, Baltimore, Maryland, USA; .,Howard Hughes Medical Institute, The Johns Hopkins University, Baltimore, Maryland, USA; ,
| |
Collapse
|
11
|
A classical revival: Human satellite DNAs enter the genomics era. Semin Cell Dev Biol 2022; 128:2-14. [PMID: 35487859 DOI: 10.1016/j.semcdb.2022.04.012] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 04/11/2022] [Accepted: 04/12/2022] [Indexed: 12/30/2022]
Abstract
The classical human satellite DNAs, also referred to as human satellites 1, 2 and 3 (HSat1, HSat2, HSat3, or collectively HSat1-3), occur on most human chromosomes as large, pericentromeric tandem repeat arrays, which together constitute roughly 3% of the human genome (100 megabases, on average). Even though HSat1-3 were among the first human DNA sequences to be isolated and characterized at the dawn of molecular biology, they have remained almost entirely missing from the human genome reference assembly for 20 years, hindering studies of their sequence, regulation, and potential structural roles in the nucleus. Recently, the Telomere-to-Telomere Consortium produced the first truly complete assembly of a human genome, paving the way for new studies of HSat1-3 with modern genomic tools. This review provides an account of the history and current understanding of HSat1-3, with a view towards future studies of their evolution and roles in health and disease.
Collapse
|
12
|
Population Scale Analysis of Centromeric Satellite DNA Reveals Highly Dynamic Evolutionary Patterns and Genomic Organization in Long-Tailed and Rhesus Macaques. Cells 2022; 11:cells11121953. [PMID: 35741082 PMCID: PMC9221937 DOI: 10.3390/cells11121953] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 06/12/2022] [Accepted: 06/14/2022] [Indexed: 02/04/2023] Open
Abstract
Centromeric satellite DNA (cen-satDNA) consists of highly divergent repeat monomers, each approximately 171 base pairs in length. Here, we investigated the genetic diversity in the centromeric region of two primate species: long-tailed (Macaca fascicularis) and rhesus (Macaca mulatta) macaques. Fluorescence in situ hybridization and bioinformatic analysis showed the chromosome-specific organization and dynamic nature of cen-satDNAsequences, and their substantial diversity, with distinct subfamilies across macaque populations, suggesting increased turnovers. Comparative genomics identified high level polymorphisms spanning a 120 bp deletion region and a remarkable interspecific variability in cen-satDNA size and structure. Population structure analysis detected admixture patterns within populations, indicating their high divergence and rapid evolution. However, differences in cen-satDNA profiles appear to not be involved in hybrid incompatibility between the two species. Our study provides a genomic landscape of centromeric repeats in wild macaques and opens new avenues for exploring their impact on the adaptive evolution and speciation of primates.
Collapse
|
13
|
Gershman A, Sauria MEG, Guitart X, Vollger MR, Hook PW, Hoyt SJ, Jain M, Shumate A, Razaghi R, Koren S, Altemose N, Caldas GV, Logsdon GA, Rhie A, Eichler EE, Schatz MC, O'Neill RJ, Phillippy AM, Miga KH, Timp W. Epigenetic patterns in a complete human genome. Science 2022; 376:eabj5089. [PMID: 35357915 PMCID: PMC9170183 DOI: 10.1126/science.abj5089] [Citation(s) in RCA: 107] [Impact Index Per Article: 53.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The completion of a telomere-to-telomere human reference genome, T2T-CHM13, has resolved complex regions of the genome, including repetitive and homologous regions. Here, we present a high-resolution epigenetic study of previously unresolved sequences, representing entire acrocentric chromosome short arms, gene family expansions, and a diverse collection of repeat classes. This resource precisely maps CpG methylation (32.28 million CpGs), DNA accessibility, and short-read datasets (166,058 previously unresolved chromatin immunoprecipitation sequencing peaks) to provide evidence of activity across previously unidentified or corrected genes and reveals clinically relevant paralog-specific regulation. Probing CpG methylation across human centromeres from six diverse individuals generated an estimate of variability in kinetochore localization. This analysis provides a framework with which to investigate the most elusive regions of the human genome, granting insights into epigenetic regulation.
Collapse
Affiliation(s)
- Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
| | - Michael E G Sauria
- Department of Biology and Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Xavi Guitart
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Paul W Hook
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Savannah J Hoyt
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Miten Jain
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Roham Razaghi
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nicolas Altemose
- Department of Bioengineering, University of California Berkeley, Berkeley, CA, USA
| | - Gina V Caldas
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley CA, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Michael C Schatz
- Department of Biology and Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Rachel J O'Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
14
|
Altemose N, Glennis A, Bzikadze AV, Sidhwani P, Langley SA, Caldas GV, Hoyt SJ, Uralsky L, Ryabov FD, Shew CJ, Sauria MEG, Borchers M, Gershman A, Mikheenko A, Shepelev VA, Dvorkina T, Kunyavskaya O, Vollger MR, Rhie A, McCartney AM, Asri M, Lorig-Roach R, Shafin K, Aganezov S, Olson D, de Lima LG, Potapova T, Hartley GA, Haukness M, Kerpedjiev P, Gusev F, Tigyi K, Brooks S, Young A, Nurk S, Koren S, Salama SR, Paten B, Rogaev EI, Streets A, Karpen GH, Dernburg AF, Sullivan BA, Straight AF, Wheeler TJ, Gerton JL, Eichler EE, Phillippy AM, Timp W, Dennis MY, O'Neill RJ, Zook JM, Schatz MC, Pevzner PA, Diekhans M, Langley CH, Alexandrov IA, Miga KH. Complete genomic and epigenetic maps of human centromeres. Science 2022; 376:eabl4178. [PMID: 35357911 PMCID: PMC9233505 DOI: 10.1126/science.abl4178] [Citation(s) in RCA: 174] [Impact Index Per Article: 87.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.
Collapse
Affiliation(s)
- Nicolas Altemose
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - A. Glennis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, USA
| | - Pragya Sidhwani
- Department of Biochemistry, Stanford University, Stanford, CA, USA
| | - Sasha A. Langley
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Gina V. Caldas
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Savannah J. Hoyt
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Lev Uralsky
- Sirius University of Science and Technology, Sochi, Russia
- Vavilov Institute of General Genetics, Moscow, Russia
| | | | - Colin J. Shew
- Genome Center, MIND Institute, and Department of Biochemistry and Molecular Medicine, School of Medicine, University of California, Davis, Davis, CA, USA
| | | | | | - Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | | | - Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Olga Kunyavskaya
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ann M. McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Ryan Lorig-Roach
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Daniel Olson
- Department of Computer Science, University of Montana, Missoula, MT. USA
| | | | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Gabrielle A. Hartley
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Fedor Gusev
- Vavilov Institute of General Genetics, Moscow, Russia
| | - Kristof Tigyi
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Shelise Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alice Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sofie R. Salama
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| | - Evgeny I. Rogaev
- Sirius University of Science and Technology, Sochi, Russia
- Vavilov Institute of General Genetics, Moscow, Russia
- Department of Psychiatry, University of Massachusetts Medical School, Worcester, MA, USA
- Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Aaron Streets
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Gary H. Karpen
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- BioEngineering and BioMedical Sciences Department, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Abby F. Dernburg
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Institute for Quantitative Biosciences (QB3), University of California, Berkeley, Berkeley, CA, USA
| | - Beth A. Sullivan
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC, USA
| | | | - Travis J. Wheeler
- Department of Computer Science, University of Montana, Missoula, MT. USA
| | - Jennifer L. Gerton
- Stowers Institute for Medical Research, Kansas City, MO, USA
- University of Kansas Medical School, Department of Biochemistry and Molecular Biology and Cancer Center, University of Kansas, Kansas City, KS, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Megan Y. Dennis
- Genome Center, MIND Institute, and Department of Biochemistry and Molecular Medicine, School of Medicine, University of California, Davis, Davis, CA, USA
| | - Rachel J. O'Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Justin M. Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California at San Diego, San Diego, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Charles H. Langley
- Department of Evolution and Ecology, University of California Davis, Davis, CA, USA
| | - Ivan A. Alexandrov
- Vavilov Institute of General Genetics, Moscow, Russia
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
- Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| |
Collapse
|
15
|
Winans S, Yu HJ, de Los Santos K, Wang GZ, KewalRamani VN, Goff SP. A point mutation in HIV-1 integrase redirects proviral integration into centromeric repeats. Nat Commun 2022; 13:1474. [PMID: 35304442 PMCID: PMC8933506 DOI: 10.1038/s41467-022-29097-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 02/24/2022] [Indexed: 11/25/2022] Open
Abstract
Retroviruses utilize the viral integrase (IN) protein to integrate a DNA copy of their genome into host chromosomal DNA. HIV-1 integration sites are highly biased towards actively transcribed genes, likely mediated by binding of the IN protein to specific host factors, particularly LEDGF, located at these gene regions. We here report a substantial redirection of integration site distribution induced by a single point mutation in HIV-1 IN. Viruses carrying the K258R IN mutation exhibit a high frequency of integrations into centromeric alpha satellite repeat sequences, as assessed by deep sequencing, a more than 10-fold increase over wild-type. Quantitative PCR and in situ immunofluorescence assays confirm this bias of the K258R mutant virus for integration into centromeric DNA. Immunoprecipitation studies identify host factors binding to IN that may account for the observed bias for integration into centromeres. Centromeric integration events are known to be enriched in the latent reservoir of infected memory T cells, as well as in elite controllers who limit viral replication without intervention. The K258R point mutation in HIV-1 IN is also present in databases of latent proviruses found in patients, and may reflect an unappreciated aspect of the establishment of viral latency. HIV-1 integration sites are biased towards actively transcribed genes, likely mediated by binding of the viral integrase (IN) protein to host factors. Here, Winans et al. show that the K258R point mutation in IN eredirects viral DNA integration to the centromeres of host chromosomes, which may affect HIV latency.
Collapse
Affiliation(s)
- Shelby Winans
- Department of Biochemistry and Molecular Biophysics, Columbia University Medical Center, New York, NY, USA.,Department of Microbiology and Immunology, Columbia University Medical Center, New York, NY, USA.,Howard Hughes Medical Institute, Columbia University, New York, NY, USA
| | - Hyun Jae Yu
- Basic Science Program, Leidos Biomedical Research, Frederick National Laboratory, Frederick, MD, USA
| | - Kenia de Los Santos
- Department of Biochemistry and Molecular Biophysics, Columbia University Medical Center, New York, NY, USA.,Department of Microbiology and Immunology, Columbia University Medical Center, New York, NY, USA.,Howard Hughes Medical Institute, Columbia University, New York, NY, USA
| | - Gary Z Wang
- Department of Pathology, Columbia University Medical Center, New York, NY, USA
| | - Vineet N KewalRamani
- Basic Research Laboratory, Center for Cancer Research, National Cancer Institute, Frederick, MD, USA
| | - Stephen P Goff
- Department of Biochemistry and Molecular Biophysics, Columbia University Medical Center, New York, NY, USA. .,Department of Microbiology and Immunology, Columbia University Medical Center, New York, NY, USA. .,Howard Hughes Medical Institute, Columbia University, New York, NY, USA.
| |
Collapse
|
16
|
Henriksen RA, Jenjaroenpun P, Sjøstrøm IB, Jensen KR, Prada-Luengo I, Wongsurawat T, Nookaew I, Regenberg B. Circular DNA in the human germline and its association with recombination. Mol Cell 2022; 82:209-217.e7. [PMID: 34951964 PMCID: PMC10707452 DOI: 10.1016/j.molcel.2021.11.027] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 08/24/2021] [Accepted: 11/23/2021] [Indexed: 12/24/2022]
Abstract
Extrachromosomal circular DNA (eccDNA) is common in somatic tissue, but its existence and effects in the human germline are unexplored. We used microscopy, long-read DNA sequencing, and new analytic methods to document thousands of eccDNAs from human sperm. EccDNAs derived from all genomic regions and mostly contained a single DNA fragment, although some consisted of multiple fragments. The generation of eccDNA inversely correlates with the meiotic recombination rate, and chromosomes with high coding-gene density and Alu element abundance form the least eccDNA. Analysis of insertions in human genomes further indicates that eccDNA can persist in the human germline when the circular molecules reinsert themselves into the chromosomes. Our results suggest that eccDNA has transient and permanent effects on the germline. They explain how differences in the physical and genetic map might arise and offer an explanation of how Alu elements coevolved with genes to protect genome integrity against deleterious mutations producing eccDNA.
Collapse
Affiliation(s)
- Rasmus Amund Henriksen
- Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Piroon Jenjaroenpun
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Ida Borup Sjøstrøm
- Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | - Iñigo Prada-Luengo
- Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Thidathip Wongsurawat
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Intawat Nookaew
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Birgitte Regenberg
- Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
17
|
Bzikadze AV, Mikheenko A, Pevzner PA. Fast and accurate mapping of long reads to complete genome assemblies with VerityMap. Genome Res 2022; 32:2107-2118. [PMID: 36379716 PMCID: PMC9808623 DOI: 10.1101/gr.276871.122] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 11/09/2022] [Indexed: 11/16/2022]
Abstract
Recent advancements in long-read sequencing have enabled the telomere-to-telomere (complete) assembly of a human genome and are now contributing to the haplotype-resolved complete assemblies of multiple human genomes. Because the accuracy of read mapping tools deteriorates in highly repetitive regions, there is a need to develop accurate, error-exposing (detecting potential assembly errors), and diploid-aware (distinguishing different haplotypes) tools for read mapping in complete assemblies. We describe the first accurate, error-exposing, and partially diploid-aware VerityMap tool for long-read mapping to complete assemblies.
Collapse
Affiliation(s)
- Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, California 92093, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, 199034, Russia
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, California 92093, USA
| |
Collapse
|
18
|
de Lima LG, Howe E, Singh VP, Potapova T, Li H, Xu B, Castle J, Crozier S, Harrison CJ, Clifford SC, Miga KH, Ryan SL, Gerton JL. PCR amplicons identify widespread copy number variation in human centromeric arrays and instability in cancer. CELL GENOMICS 2021; 1:100064. [PMID: 34993501 PMCID: PMC8730464 DOI: 10.1016/j.xgen.2021.100064] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 07/13/2021] [Accepted: 08/24/2021] [Indexed: 12/13/2022]
Abstract
Centromeric α-satellite repeats represent ~6% of the human genome, but their length and repetitive nature make sequencing and analysis of those regions challenging. However, centromeres are essential for the stable propagation of chromosomes, so tools are urgently needed to monitor centromere copy number and how it influences chromosome transmission and genome stability. We developed and benchmarked droplet digital PCR (ddPCR) assays that measure copy number for five human centromeric arrays. We applied them to characterize natural variation in centromeric array size, analyzing normal tissue from 37 individuals from China and 39 individuals from the US and UK. Each chromosome-specific array varies in size up to 10-fold across individuals and up to 50-fold across chromosomes, indicating a unique complement of arrays in each individual. We also used the ddPCR assays to analyze centromere copy number in 76 matched tumor-normal samples across four cancer types, representing the most-comprehensive quantitative analysis of centromeric array stability in cancer to date. In contrast to stable transmission in cultured cells, centromeric arrays show gain and loss events in each of the cancer types, suggesting centromeric α-satellite DNA represents a new category of genome instability in cancer. Our methodology for measuring human centromeric-array copy number will advance research on centromeres and genome integrity in normal and disease states.
Collapse
Affiliation(s)
| | - Edmund Howe
- The Stowers Institute for Medical Research, Kansas City, MO, USA
| | | | - Tamara Potapova
- The Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Hua Li
- The Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Baoshan Xu
- Hospital of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Guanghua School of Stomatology, Institute of Stomatological Research, Sun Yat-sen University, Guangzhou, Guangdong Province, China
| | - Jemma Castle
- Newcastle University Centre for Cancer, Newcastle upon Tyne, UK
| | - Steve Crozier
- Newcastle University Centre for Cancer, Newcastle upon Tyne, UK
| | | | | | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Sarra L. Ryan
- Newcastle University Centre for Cancer, Newcastle upon Tyne, UK
| | - Jennifer L. Gerton
- The Stowers Institute for Medical Research, Kansas City, MO, USA
- University of Kansas Medical Center, Kansas City, KS, USA
| |
Collapse
|
19
|
Abstract
We are entering a new era in genomics where entire centromeric regions are accurately represented in human reference assemblies. Access to these high-resolution maps will enable new surveys of sequence and epigenetic variation in the population and offer new insight into satellite array genomics and centromere function. Here, we focus on the sequence organization and evolution of alpha satellites, which are credited as the genetic and genomic definition of human centromeres due to their interaction with inner kinetochore proteins and their importance in the development of human artificial chromosome assays. We provide an overview of alpha satellite repeat structure and array organization in the context of these high-quality reference data sets; discuss the emergence of variation-based surveys; and provide perspective on the role of this new source of genetic and epigenetic variation in the context of chromosome biology, genome instability, and human disease.
Collapse
Affiliation(s)
- Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California 95064, USA; .,Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Ivan A Alexandrov
- Department of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia; .,Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199004, Russia.,Research Center of Biotechnology of the Russian Academy of Sciences, Moscow 119071, Russia
| |
Collapse
|
20
|
Abstract
The centromere performs a universally conserved function, to accurately partition genetic information upon cell division. Yet, centromeres are among the most rapidly evolving regions of the genome and are bound by a varying assortment of centromere-binding factors that are themselves highly divergent at the protein-sequence level. A common thread in most species is the dependence on the centromere-specific histone variant CENP-A for the specification of the centromere site. However, CENP-A is not universally required in all species or cell types, making the identification of a general mechanism for centromere specification challenging. In this review, we examine our current understanding of the mechanisms of centromere specification in CENP-A-dependent and independent systems, focusing primarily on recent work.
Collapse
Affiliation(s)
- Barbara G Mellone
- Department of Molecular and Cell Biology, and Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA.
| | - Daniele Fachinetti
- Institut Curie, PSL Research University, CNRS, UMR 144, 26 rue d'Ulm, F-75005 Paris, France.
| |
Collapse
|
21
|
Vojvoda Zeljko T, Ugarković Đ, Pezer Ž. Differential enrichment of H3K9me3 at annotated satellite DNA repeats in human cell lines and during fetal development in mouse. Epigenetics Chromatin 2021; 14:47. [PMID: 34663449 PMCID: PMC8524813 DOI: 10.1186/s13072-021-00423-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Accepted: 10/05/2021] [Indexed: 01/24/2023] Open
Abstract
BACKGROUND Trimethylation of histone H3 on lysine 9 (H3K9me3) at satellite DNA sequences has been primarily studied at (peri)centromeric regions, where its level shows differences associated with various processes such as development and malignant transformation. However, the dynamics of H3K9me3 at distal satellite DNA repeats has not been thoroughly investigated. RESULTS We exploit the sets of publicly available data derived from chromatin immunoprecipitation combined with massively parallel DNA sequencing (ChIP-Seq), produced by the The Encyclopedia of DNA Elements (ENCODE) project, to analyze H3K9me3 at assembled satellite DNA repeats in genomes of human cell lines and during mouse fetal development. We show that annotated satellite elements are generally enriched for H3K9me3, but its level in cancer cell lines is on average lower than in normal cell lines. We find 407 satellite DNA instances with differential H3K9me3 enrichment between cancer and normal cells including a large 115-kb cluster of GSATII elements on chromosome 12. Differentially enriched regions are not limited to satellite DNA instances, but instead encompass a wider region of flanking sequences. We found no correlation between the levels of H3K9me3 and noncoding RNA at corresponding satellite DNA loci. The analysis of data derived from multiple tissues identified 864 instances of satellite DNA sequences in the mouse reference genome that are differentially enriched between fetal developmental stages. CONCLUSIONS Our study reveals significant differences in H3K9me3 level at a subset of satellite repeats between biological states and as such contributes to understanding of the role of satellite DNA repeats in epigenetic regulation during development and carcinogenesis.
Collapse
Affiliation(s)
| | | | - Željka Pezer
- Ruđer Bošković Institute, Bijenička 54, 10000, Zagreb, Croatia.
| |
Collapse
|
22
|
Suzuki Y, Morishita S. The time is ripe to investigate human centromeres by long-read sequencing†. DNA Res 2021; 28:6381569. [PMID: 34609504 PMCID: PMC8502840 DOI: 10.1093/dnares/dsab021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 09/28/2021] [Indexed: 01/05/2023] Open
Abstract
The complete sequencing of human centromeres, which are filled with highly repetitive elements, has long been challenging. In human centromeres, α-satellite monomers of about 171 bp in length are the basic repeating units, but α-satellite monomers constitute the higher-order repeat (HOR) units, and thousands of copies of highly homologous HOR units form large arrays, which have hampered sequence assembly of human centromeres. Because most HOR unit occurrences are covered by long reads of about 10 kb, the recent availability of much longer reads is expected to enable observation of individual HOR occurrences in terms of their single-nucleotide or structural variants. The time has come to examine the complete sequence of human centromeres.
Collapse
Affiliation(s)
- Yuta Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8568, Japan
| | - Shinichi Morishita
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8568, Japan
| |
Collapse
|
23
|
Peart CR, Williams C, Pophaly SD, Neely BA, Gulland FMD, Adams DJ, Ng BL, Cheng W, Goebel ME, Fedrigo O, Haase B, Mountcastle J, Fungtammasan A, Formenti G, Collins J, Wood J, Sims Y, Torrance J, Tracey A, Howe K, Rhie A, Hoffman JI, Johnson J, Jarvis ED, Breen M, Wolf JBW. Hi-C scaffolded short- and long-read genome assemblies of the California sea lion are broadly consistent for syntenic inference across 45 million years of evolution. Mol Ecol Resour 2021; 21:2455-2470. [PMID: 34097816 PMCID: PMC9732816 DOI: 10.1111/1755-0998.13443] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 05/06/2021] [Accepted: 05/26/2021] [Indexed: 12/13/2022]
Abstract
With the advent of chromatin-interaction maps, chromosome-level genome assemblies have become a reality for a wide range of organisms. Scaffolding quality is, however, difficult to judge. To explore this gap, we generated multiple chromosome-scale genome assemblies of an emerging wild animal model for carcinogenesis, the California sea lion (Zalophus californianus). Short-read assemblies were scaffolded with two independent chromatin interaction mapping data sets (Hi-C and Chicago), and long-read assemblies with three data types (Hi-C, optical maps and 10X linked reads) following the "Vertebrate Genomes Project (VGP)" pipeline. In both approaches, 18 major scaffolds recovered the karyotype (2n = 36), with scaffold N50s of 138 and 147 Mb, respectively. Synteny relationships at the chromosome level with other pinniped genomes (2n = 32-36), ferret (2n = 34), red panda (2n = 36) and domestic dog (2n = 78) were consistent across approaches and recovered known fissions and fusions. Comparative chromosome painting and multicolour chromosome tiling with a panel of 264 genome-integrated single-locus canine bacterial artificial chromosome probes provided independent evaluation of genome organization. Broad-scale discrepancies between the approaches were observed within chromosomes, most commonly in translocations centred around centromeres and telomeres, which were better resolved in the VGP assembly. Genomic and cytological approaches agreed on near-perfect synteny of the X chromosome, and in combination allowed detailed investigation of autosomal rearrangements between dog and sea lion. This study presents high-quality genomes of an emerging cancer model and highlights that even highly fragmented short-read assemblies scaffolded with Hi-C can yield reliable chromosome-level scaffolds suitable for comparative genomic analyses.
Collapse
Affiliation(s)
- Claire R. Peart
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Munchen, Germany
| | - Christina Williams
- Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, North Carolina, USA
| | - Saurabh D. Pophaly
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Munchen, Germany,Max Planck institute for Plant Breeding Research, Cologne, Germany
| | - Benjamin A. Neely
- National Institute of Standards and Technology, NIST Charleston, Charleston, South Carolina, USA
| | - Frances M. D. Gulland
- Karen Dryer Wildlife Health Center, University of California Davis, Davis, California, USA
| | - David J. Adams
- Cytometry Core Facility, Wellcome Sanger Institute, Cambridge, UK
| | - Bee Ling Ng
- Cytometry Core Facility, Wellcome Sanger Institute, Cambridge, UK
| | - William Cheng
- Cytometry Core Facility, Wellcome Sanger Institute, Cambridge, UK
| | - Michael E. Goebel
- Institute of Marine Science, University of California Santa Cruz, Santa Cruz, California, USA
| | - Olivier Fedrigo
- Vertebrate Genome Lab, The Rockefeller University, New York City, New York, USA
| | - Bettina Haase
- Vertebrate Genome Lab, The Rockefeller University, New York City, New York, USA
| | | | | | - Giulio Formenti
- Vertebrate Genome Lab, The Rockefeller University, New York City, New York, USA,Laboratory of Neurogenetics of Language, The Rockefeller University, New York City, New York, USA
| | - Joanna Collins
- Tree of Life Programme, Wellcome Sanger Institute, Cambridge, UK
| | - Jonathan Wood
- Tree of Life Programme, Wellcome Sanger Institute, Cambridge, UK
| | - Ying Sims
- Tree of Life Programme, Wellcome Sanger Institute, Cambridge, UK
| | - James Torrance
- Tree of Life Programme, Wellcome Sanger Institute, Cambridge, UK
| | - Alan Tracey
- Tree of Life Programme, Wellcome Sanger Institute, Cambridge, UK
| | - Kerstin Howe
- Tree of Life Programme, Wellcome Sanger Institute, Cambridge, UK
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, Maryland, USA
| | - Joseph I. Hoffman
- Department of Animal Behaviour, Bielefeld University, Bielefeld, Germany,British Antarctic Survey, Cambridge, UK
| | - Jeremy Johnson
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, USA
| | - Erich D. Jarvis
- Vertebrate Genome Lab, The Rockefeller University, New York City, New York, USA,Howard Hughes Medical Institute, Chevy Chase, Maryland, USA
| | - Matthew Breen
- Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, North Carolina, USA,Comparative Medicine Institute, North Carolina State University, Raleigh, North Carolina, USA
| | - Jochen B. W. Wolf
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Munchen, Germany
| |
Collapse
|
24
|
Miga KH, Sullivan BA. Expanding studies of chromosome structure and function in the era of T2T genomics. Hum Mol Genet 2021; 30:R198-R205. [PMID: 34302168 PMCID: PMC8631062 DOI: 10.1093/hmg/ddab214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 07/16/2021] [Accepted: 07/20/2021] [Indexed: 11/13/2022] Open
Abstract
The recent accomplishment of a truly complete human genome has afforded a new view of chromosome structure and function that was limited 30 years ago. Here, we discuss the expansion of knowledge from the early cytological studies of the genome to the current high-resolution genomic, epigenetic and functional maps that have been achieved by recent technology and computational advances. These studies have revealed unexpected complexities of genome organization and function and uncovered new views of fundamental chromosomal elements. Comprehensive genomic maps will enable accurate diagnosis of human diseases caused by altered chromosome structure and function, facilitate development of chromosome-based therapies and shape the future of preventative medicine and healthcare.
Collapse
Affiliation(s)
- Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Beth A Sullivan
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC, USA
| |
Collapse
|
25
|
Abstract
Pangenomes are organized collections of the genomic information from related individuals or groups. Graphical pangenomics is the study of these pangenomes using graphical methods to identify and analyze genes, regions, and mutations of interest to an array of biological questions. This field has seen significant progress in recent years including the development of graph based models that better resolve biological phenomena, and an explosion of new tools for mapping reads, creating graphical genomes, and performing pangenome analysis. In this review, we discuss recent developments in models, algorithms associated with graphical genomes, and comparisons between similar tools. In addition we briefly discuss what these developments may mean for the future of genomics.
Collapse
|
26
|
Srikulnath K, Ahmad SF, Singchat W, Panthum T. Why Do Some Vertebrates Have Microchromosomes? Cells 2021; 10:2182. [PMID: 34571831 PMCID: PMC8466491 DOI: 10.3390/cells10092182] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 08/17/2021] [Accepted: 08/17/2021] [Indexed: 12/27/2022] Open
Abstract
With more than 70,000 living species, vertebrates have a huge impact on the field of biology and research, including karyotype evolution. One prominent aspect of many vertebrate karyotypes is the enigmatic occurrence of tiny and often cytogenetically indistinguishable microchromosomes, which possess distinctive features compared to macrochromosomes. Why certain vertebrate species carry these microchromosomes in some lineages while others do not, and how they evolve remain open questions. New studies have shown that microchromosomes exhibit certain unique characteristics of genome structure and organization, such as high gene densities, low heterochromatin levels, and high rates of recombination. Our review focuses on recent concepts to expand current knowledge on the dynamic nature of karyotype evolution in vertebrates, raising important questions regarding the evolutionary origins and ramifications of microchromosomes. We introduce the basic karyotypic features to clarify the size, shape, and morphology of macro- and microchromosomes and report their distribution across different lineages. Finally, we characterize the mechanisms of different evolutionary forces underlying the origin and evolution of microchromosomes.
Collapse
Affiliation(s)
- Kornsorn Srikulnath
- Animal Genomics and Bioresource Research Center (AGB Research Center), Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (T.P.)
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
- The International Undergraduate Program in Bioscience and Technology, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
- Amphibian Research Center, Hiroshima University, 1-3-1, Kagamiyama, Higashihiroshima 739-8526, Japan
| | - Syed Farhan Ahmad
- Animal Genomics and Bioresource Research Center (AGB Research Center), Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (T.P.)
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
- The International Undergraduate Program in Bioscience and Technology, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
| | - Worapong Singchat
- Animal Genomics and Bioresource Research Center (AGB Research Center), Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (T.P.)
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
| | - Thitipong Panthum
- Animal Genomics and Bioresource Research Center (AGB Research Center), Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (T.P.)
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
| |
Collapse
|
27
|
Hausmann F, Kurtz S. DeepGRP: engineering a software tool for predicting genomic repetitive elements using Recurrent Neural Networks with attention. Algorithms Mol Biol 2021; 16:20. [PMID: 34425870 PMCID: PMC8381506 DOI: 10.1186/s13015-021-00199-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 08/03/2021] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Repetitive elements contribute a large part of eukaryotic genomes. For example, about 40 to 50% of human, mouse and rat genomes are repetitive. So identifying and classifying repeats is an important step in genome annotation. This annotation step is traditionally performed using alignment based methods, either in a de novo approach or by aligning the genome sequence to a species specific set of repetitive sequences. Recently, Li (Bioinformatics 35:4408-4410, 2019) developed a novel software tool dna-brnn to annotate repetitive sequences using a recurrent neural network trained on sample annotations of repetitive elements. RESULTS We have developed the methods of dna-brnn further and engineered a new software tool DeepGRP. This combines the basic concepts of Li (Bioinformatics 35:4408-4410, 2019) with current techniques developed for neural machine translation, the attention mechanism, for the task of nucleotide-level annotation of repetitive elements. An evaluation on the human genome shows a 20% improvement of the Matthews correlation coefficient for the predictions delivered by DeepGRP, when compared to dna-brnn. DeepGRP predicts two additional classes of repeats (compared to dna-brnn) and is able to transfer repeat annotations, using RepeatMasker-based training data to a different species (mouse). Additionally, we could show that DeepGRP predicts repeats annotated in the Dfam database, but not annotated by RepeatMasker. DeepGRP is highly scalable due to its implementation in the TensorFlow framework. For example, the GPU-accelerated version of DeepGRP is approx. 1.8 times faster than dna-brnn, approx. 8.6 times faster than RepeatMasker and over 100 times faster than HMMER searching for models of the Dfam database. CONCLUSIONS By incorporating methods from neural machine translation, DeepGRP achieves a consistent improvement of the quality of the predictions compared to dna-brnn. Improved running times are obtained by employing TensorFlow as implementation framework and the use of GPUs. By incorporating two additional classes of repeats, DeepGRP provides more complete annotations, which were evaluated against three state-of-the-art tools for repeat annotation.
Collapse
Affiliation(s)
- Fabian Hausmann
- Institute of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Falkenried 94, 20251 Hamburg, Germany
| | - Stefan Kurtz
- ZBH - Center for Bioinformatics, MIN-Fakultät, Universität Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| |
Collapse
|
28
|
Dvorkina T, Kunyavskaya O, Bzikadze AV, Alexandrov I, Pevzner PA. CentromereArchitect: inference and analysis of the architecture of centromeres. Bioinformatics 2021; 37:i196-i204. [PMID: 34252949 PMCID: PMC8336445 DOI: 10.1093/bioinformatics/btab265] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Motivation Recent advances in long-read sequencing technologies led to rapid progress in centromere assembly in the last year and, for the first time, opened a possibility to address the long-standing questions about the architecture and evolution of human centromeres. However, since these advances have not been yet accompanied by the development of the centromere-specific bioinformatics algorithms, even the fundamental questions (e.g. centromere annotation by deriving the complete set of human monomers and high-order repeats), let alone more complex questions (e.g. explaining how monomers and high-order repeats evolved) about human centromeres remain open. Moreover, even though there was a four-decade-long series of studies aimed at cataloging all human monomers and high-order repeats, the rigorous algorithmic definitions of these concepts are still lacking. Thus, the development of a centromere annotation tool is a prerequisite for follow-up personalized biomedical studies of centromeres across the human population and evolutionary studies of centromeres across various species. Results We describe the CentromereArchitect, the first tool for the centromere annotation in a newly sequenced genome, apply it to the recently generated complete assembly of a human genome by the Telomere-to-Telomere consortium, generate the complete set of human monomers and high-order repeats for ‘live’ centromeres, and reveal a vast set of hybrid monomers that may represent the focal points of centromere evolution. Availability and implementation CentromereArchitect is publicly available on https://github.com/ablab/stringdecomposer/tree/ismb2021 Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Olga Kunyavskaya
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, CA 92093, USA
| | - Ivan Alexandrov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, CA 92093, USA
| |
Collapse
|
29
|
Xue L, Gao Y, Wu M, Tian T, Fan H, Huang Y, Huang Z, Li D, Xu L. Telomere-to-telomere assembly of a fish Y chromosome reveals the origin of a young sex chromosome pair. Genome Biol 2021; 22:203. [PMID: 34253240 PMCID: PMC8273981 DOI: 10.1186/s13059-021-02430-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 07/01/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The origin of sex chromosomes requires the establishment of recombination suppression between the proto-sex chromosomes. In many fish species, the sex chromosome pair is homomorphic with a recent origin, providing species for studying how and why recombination suppression evolved in the initial stages of sex chromosome differentiation, but this requires accurate sequence assembly of the X and Y (or Z and W) chromosomes, which may be difficult if they are recently diverged. RESULTS Here we produce a haplotype-resolved genome assembly of zig-zag eel (Mastacembelus armatus), an aquaculture fish, at the chromosomal scale. The diploid assembly is nearly gap-free, and in most chromosomes, we resolve the centromeric and subtelomeric heterochromatic sequences. In particular, the Y chromosome, including its highly repetitive short arm, has zero gaps. Using resequencing data, we identify a ~7 Mb fully sex-linked region (SLR), spanning the sex chromosome centromere and almost entirely embedded in the pericentromeric heterochromatin. The SLRs on the X and Y chromosomes are almost identical in sequence and gene content, but both are repetitive and heterochromatic, consistent with zero or low recombination. We further identify an HMG-domain containing gene HMGN6 in the SLR as a candidate sex-determining gene that is expressed at the onset of testis development. CONCLUSIONS Our study supports the idea that preexisting regions of low recombination, such as pericentromeric regions, can give rise to SLR in the absence of structural variations between the proto-sex chromosomes.
Collapse
Affiliation(s)
- Lingzhan Xue
- College of Fisheries, Hubei Provincial Engineering Laboratory for Pond Aquaculture, Huazhong Agricultural University, Wuhan, 430070, China.,Aquaculture and Genetic Breeding Laboratory, Freshwater Fisheries Research Institute of Fujian, Fuzhou, 350002, China
| | - Yu Gao
- College of Animal Science and Technology, Key Laboratory for Plateau Fishery Resources Conservation and Sustainable Utilization of Yunnan Province, Yunnan Agricultural University, Kunming, 650201, China
| | - Meiying Wu
- Aquaculture and Genetic Breeding Laboratory, Freshwater Fisheries Research Institute of Fujian, Fuzhou, 350002, China
| | - Tian Tian
- Aquaculture and Genetic Breeding Laboratory, Freshwater Fisheries Research Institute of Fujian, Fuzhou, 350002, China
| | - Haiping Fan
- Freshwater Fisheries Research Institute of Fujian, Fuzhou, 350002, China
| | - Yongji Huang
- Institute of Oceanography, Minjiang University, Fuzhou, 350108, China
| | - Zhen Huang
- Fujian Key Laboratory of Developmental and Neural Biology & Southern Center for Biomedical Research, College of Life Sciences, Fujian Normal University, Fuzhou, Fujian, China. .,Fujian Key Laboratory of Special Marine Bio-resources Sustainable Utilization, Fuzhou, 350117, Fujian, China.
| | - Dapeng Li
- College of Fisheries, Hubei Provincial Engineering Laboratory for Pond Aquaculture, Huazhong Agricultural University, Wuhan, 430070, China. .,Freshwater Aquaculture Collaborative Innovation Center of Hubei Province, Wuhan, 430070, China.
| | - Luohao Xu
- Department of Neurosciences and Developmental Biology, University of Vienna, 1090, Vienna, Austria.
| |
Collapse
|
30
|
Abstract
DNA synthesis technology has progressed to the point that it is now practical to synthesize entire genomes. Quite a variety of methods have been developed, first to synthesize single genes but ultimately to massively edit or write from scratch entire genomes. Synthetic genomes can essentially be clones of native sequences, but this approach does not teach us much new biology. The ability to endow genomes with novel properties offers special promise for addressing questions not easily approachable with conventional gene-at-a-time methods. These include questions about evolution and about how genomes are fundamentally wired informationally, metabolically, and genetically. The techniques and technologies relating to how to design, build, and deliver big DNA at the genome scale are reviewed here. A fuller understanding of these principles may someday lead to the ability to truly design genomes from scratch.
Collapse
Affiliation(s)
- Weimin Zhang
- Institute for Systems Genetics and Department of Biochemistry and Molecular Pharmacology, New York University Langone Health, New York, NY 10016, USA; , ,
| | - Leslie A Mitchell
- Institute for Systems Genetics and Department of Biochemistry and Molecular Pharmacology, New York University Langone Health, New York, NY 10016, USA; , ,
| | - Joel S Bader
- Department of Biomedical Engineering, Whiting School of Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA;
| | - Jef D Boeke
- Institute for Systems Genetics and Department of Biochemistry and Molecular Pharmacology, New York University Langone Health, New York, NY 10016, USA; , , .,Department of Biomedical Engineering, New York University Tandon School of Engineering, New York, NY 11201, USA
| |
Collapse
|
31
|
Tandem Repeats in Bacillus: Unique Features and Taxonomic Distribution. Int J Mol Sci 2021; 22:ijms22105373. [PMID: 34065296 PMCID: PMC8161180 DOI: 10.3390/ijms22105373] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 05/14/2021] [Accepted: 05/18/2021] [Indexed: 11/16/2022] Open
Abstract
Little is known about DNA tandem repeats across prokaryotes. We have recently described an enigmatic group of tandem repeats in bacterial genomes with a constant repeat size but variable sequence. These findings strongly suggest that tandem repeat size in some bacteria is under strong selective constraints. Here, we extend these studies and describe tandem repeats in a large set of Bacillus. Some species have very few repeats, while other species have a large number. Most tandem repeats have repeats with a constant size (either 52 or 20-21 nt), but a variable sequence. We characterize in detail these intriguing tandem repeats. Individual species have several families of tandem repeats with the same repeat length and different sequence. This result is in strong contrast with eukaryotes, where tandem repeats of many sizes are found in any species. We discuss the possibility that they are transcribed as small RNA molecules. They may also be involved in the stabilization of the nucleoid through interaction with proteins. We also show that the distribution of tandem repeats in different species has a taxonomic significance. The data we present for all tandem repeats and their families in these bacterial species will be useful for further genomic studies.
Collapse
|
32
|
Lopes M, Louzada S, Gama-Carvalho M, Chaves R. Genomic Tackling of Human Satellite DNA: Breaking Barriers through Time. Int J Mol Sci 2021; 22:4707. [PMID: 33946766 PMCID: PMC8125562 DOI: 10.3390/ijms22094707] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 04/24/2021] [Accepted: 04/27/2021] [Indexed: 12/12/2022] Open
Abstract
(Peri)centromeric repetitive sequences and, more specifically, satellite DNA (satDNA) sequences, constitute a major human genomic component. SatDNA sequences can vary on a large number of features, including nucleotide composition, complexity, and abundance. Several satDNA families have been identified and characterized in the human genome through time, albeit at different speeds. Human satDNA families present a high degree of sub-variability, leading to the definition of various subfamilies with different organization and clustered localization. Evolution of satDNA analysis has enabled the progressive characterization of satDNA features. Despite recent advances in the sequencing of centromeric arrays, comprehensive genomic studies to assess their variability are still required to provide accurate and proportional representation of satDNA (peri)centromeric/acrocentric short arm sequences. Approaches combining multiple techniques have been successfully applied and seem to be the path to follow for generating integrated knowledge in the promising field of human satDNA biology.
Collapse
Affiliation(s)
- Mariana Lopes
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (M.L.); (S.L.)
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| | - Sandra Louzada
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (M.L.); (S.L.)
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| | - Margarida Gama-Carvalho
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| | - Raquel Chaves
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (M.L.); (S.L.)
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| |
Collapse
|
33
|
Arora UP, Charlebois C, Lawal RA, Dumont BL. Population and subspecies diversity at mouse centromere satellites. BMC Genomics 2021; 22:279. [PMID: 33865332 PMCID: PMC8052823 DOI: 10.1186/s12864-021-07591-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Accepted: 04/08/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Mammalian centromeres are satellite-rich chromatin domains that execute conserved roles in kinetochore assembly and chromosome segregation. Centromere satellites evolve rapidly between species, but little is known about population-level diversity across these loci. RESULTS We developed a k-mer based method to quantify centromere copy number and sequence variation from whole genome sequencing data. We applied this method to diverse inbred and wild house mouse (Mus musculus) genomes to profile diversity across the core centromere (minor) satellite and the pericentromeric (major) satellite repeat. We show that minor satellite copy number varies more than 10-fold among inbred mouse strains, whereas major satellite copy numbers span a 3-fold range. In contrast to widely held assumptions about the homogeneity of mouse centromere repeats, we uncover marked satellite sequence heterogeneity within single genomes, with diversity levels across the minor satellite exceeding those at the major satellite. Analyses in wild-caught mice implicate subspecies and population origin as significant determinants of variation in satellite copy number and satellite heterogeneity. Intriguingly, we also find that wild-caught mice harbor dramatically reduced minor satellite copy number and elevated satellite sequence heterogeneity compared to inbred strains, suggesting that inbreeding may reshape centromere architecture in pronounced ways. CONCLUSION Taken together, our results highlight the power of k-mer based approaches for probing variation across repetitive regions, provide an initial portrait of centromere variation across Mus musculus, and lay the groundwork for future functional studies on the consequences of natural genetic variation at these essential chromatin domains.
Collapse
Affiliation(s)
- Uma P Arora
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME, 04609, USA.
- Tufts University, Graduate School of Biomedical Sciences, 136 Harrison Ave, Boston, MA, 02111, USA.
| | | | | | - Beth L Dumont
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME, 04609, USA.
- Tufts University, Graduate School of Biomedical Sciences, 136 Harrison Ave, Boston, MA, 02111, USA.
| |
Collapse
|
34
|
Landers CC, Rabeler CA, Ferrari EK, D'Alessandro LR, Kang DD, Malisa J, Bashir SM, Carone DM. Ectopic expression of pericentric HSATII RNA results in nuclear RNA accumulation, MeCP2 recruitment, and cell division defects. Chromosoma 2021; 130:75-90. [PMID: 33585981 PMCID: PMC7889552 DOI: 10.1007/s00412-021-00753-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Revised: 01/16/2021] [Accepted: 01/19/2021] [Indexed: 12/21/2022]
Abstract
Within the pericentric regions of human chromosomes reside large arrays of tandemly repeated satellite sequences. Expression of the human pericentric satellite HSATII is prevented by extensive heterochromatin silencing in normal cells, yet in many cancer cells, HSATII RNA is aberrantly expressed and accumulates in large nuclear foci in cis. Expression and aggregation of HSATII RNA in cancer cells is concomitant with recruitment of key chromatin regulatory proteins including methyl-CpG binding protein 2 (MeCP2). While HSATII expression has been observed in a wide variety of cancer cell lines and tissues, the effect of its expression is unknown. We tested the effect of stable expression of HSATII RNA within cells that do not normally express HSATII. Ectopic HSATII expression in HeLa and primary fibroblast cells leads to focal accumulation of HSATII RNA in cis and triggers the accumulation of MeCP2 onto nuclear HSATII RNA bodies. Further, long-term expression of HSATII RNA leads to cell division defects including lagging chromosomes, chromatin bridges, and other chromatin defects. Thus, expression of HSATII RNA in normal cells phenocopies its nuclear accumulation in cancer cells and allows for the characterization of the cellular events triggered by aberrant expression of pericentric satellite RNA.
Collapse
Affiliation(s)
- Catherine C Landers
- Department of Nutritional Sciences, University of Connecticut , Storrs, CT, USA
| | | | | | | | - Diana D Kang
- Division of Pharmaceutics and Pharmacology College of Pharmacy, Ohio State University, Columbus, OH, USA
| | - Jessica Malisa
- Stanford University School of Medicine, Stanford, CA, USA
| | - Safia M Bashir
- Department of Biology, Swarthmore College, Swarthmore, PA, USA
| | - Dawn M Carone
- Department of Biology, Swarthmore College, Swarthmore, PA, USA.
| |
Collapse
|
35
|
Holley G, Beyter D, Ingimundardottir H, Møller PL, Kristmundsdottir S, Eggertsson HP, Halldorsson BV. Ratatosk: hybrid error correction of long reads enables accurate variant calling and assembly. Genome Biol 2021; 22:28. [PMID: 33419473 PMCID: PMC7792008 DOI: 10.1186/s13059-020-02244-4] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 12/15/2020] [Indexed: 12/20/2022] Open
Abstract
A major challenge to long read sequencing data is their high error rate of up to 15%. We present Ratatosk, a method to correct long reads with short read data. We demonstrate on 5 human genome trios that Ratatosk reduces the error rate of long reads 6-fold on average with a median error rate as low as 0.22 %. SNP calls in Ratatosk corrected reads are nearly 99 % accurate and indel calls accuracy is increased by up to 37 %. An assembly of Ratatosk corrected reads from an Ashkenazi individual yields a contig N50 of 45 Mbp and less misassemblies than a PacBio HiFi reads assembly.
Collapse
Affiliation(s)
| | | | | | - Peter L Møller
- Department of Biomedicine, Aarhus University, Aarhus, Denmark
| | - Snædis Kristmundsdottir
- deCODE genetics/Amgen Inc., Reykjavík, Iceland
- School of Technology, Reykjavik University, Reykjavík, Iceland
| | | | - Bjarni V Halldorsson
- deCODE genetics/Amgen Inc., Reykjavík, Iceland
- School of Technology, Reykjavik University, Reykjavík, Iceland
| |
Collapse
|
36
|
Mihìc P, Hédouin S, Francastel C. Centromeres Transcription and Transcripts for Better and for Worse. PROGRESS IN MOLECULAR AND SUBCELLULAR BIOLOGY 2021; 60:169-201. [PMID: 34386876 DOI: 10.1007/978-3-030-74889-0_7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Centromeres are chromosomal regions that are essential for the faithful transmission of genetic material through each cell division. They represent the chromosomal platform on which assembles a protein complex, the kinetochore, which mediates attachment to the mitotic spindle. In most organisms, centromeres assemble on large arrays of tandem satellite repeats, although their DNA sequences and organization are highly divergent among species. It has become evident that centromeres are not defined by underlying DNA sequences, but are instead epigenetically defined by the deposition of the centromere-specific histone H3 variant, CENP-A. In addition, and although long regarded as silent chromosomal loci, centromeres are in fact transcriptionally competent in most species, yet at low levels in normal somatic cells, but where the resulting transcripts participate in centromere architecture, identity, and function. In this chapter, we discuss the various roles proposed for centromere transcription and their transcripts, and the potential molecular mechanisms involved. We also discuss pathological cases in which unscheduled transcription of centromeric repeats or aberrant accumulation of their transcripts are pathological signatures of chromosomal instability diseases. In sum, tight regulation of centromeric satellite repeats transcription is critical for healthy development and tissue homeostasis, and thus prevents the emergence of disease states.
Collapse
Affiliation(s)
- Pia Mihìc
- Université De Paris, Epigenetics and Cell Fate, CNRS UMR7216, Paris, France
| | - Sabrine Hédouin
- Division of Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Claire Francastel
- Université De Paris, Epigenetics and Cell Fate, CNRS UMR7216, Paris, France.
| |
Collapse
|
37
|
Cechova M. Probably Correct: Rescuing Repeats with Short and Long Reads. Genes (Basel) 2020; 12:48. [PMID: 33396198 PMCID: PMC7823596 DOI: 10.3390/genes12010048] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 12/23/2020] [Accepted: 12/24/2020] [Indexed: 02/07/2023] Open
Abstract
Ever since the introduction of high-throughput sequencing following the human genome project, assembling short reads into a reference of sufficient quality posed a significant problem as a large portion of the human genome-estimated 50-69%-is repetitive. As a result, a sizable proportion of sequencing reads is multi-mapping, i.e., without a unique placement in the genome. The two key parameters for whether or not a read is multi-mapping are the read length and genome complexity. Long reads are now able to span difficult, heterochromatic regions, including full centromeres, and characterize chromosomes from "telomere to telomere". Moreover, identical reads or repeat arrays can be differentiated based on their epigenetic marks, such as methylation patterns, aiding in the assembly process. This is despite the fact that long reads still contain a modest percentage of sequencing errors, disorienting the aligners and assemblers both in accuracy and speed. Here, I review the proposed and implemented solutions to the repeat resolution and the multi-mapping read problem, as well as the downstream consequences of reference choice, repeat masking, and proper representation of sex chromosomes. I also consider the forthcoming challenges and solutions with regards to long reads, where we expect the shift from the problem of repeat localization within a single individual to the problem of repeat positioning within pangenomes.
Collapse
Affiliation(s)
- Monika Cechova
- Genetics and Reproductive Biotechnologies, Veterinary Research Institute, Central European Institute of Technology (CEITEC), 621 00 Brno, Czech Republic
| |
Collapse
|
38
|
Ahmad SF, Singchat W, Jehangir M, Suntronpong A, Panthum T, Malaivijitnond S, Srikulnath K. Dark Matter of Primate Genomes: Satellite DNA Repeats and Their Evolutionary Dynamics. Cells 2020; 9:E2714. [PMID: 33352976 PMCID: PMC7767330 DOI: 10.3390/cells9122714] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 12/15/2020] [Accepted: 12/16/2020] [Indexed: 12/12/2022] Open
Abstract
A substantial portion of the primate genome is composed of non-coding regions, so-called "dark matter", which includes an abundance of tandemly repeated sequences called satellite DNA. Collectively known as the satellitome, this genomic component offers exciting evolutionary insights into aspects of primate genome biology that raise new questions and challenge existing paradigms. A complete human reference genome was recently reported with telomere-to-telomere human X chromosome assembly that resolved hundreds of dark regions, encompassing a 3.1 Mb centromeric satellite array that had not been identified previously. With the recent exponential increase in the availability of primate genomes, and the development of modern genomic and bioinformatics tools, extensive growth in our knowledge concerning the structure, function, and evolution of satellite elements is expected. The current state of knowledge on this topic is summarized, highlighting various types of primate-specific satellite repeats to compare their proportions across diverse lineages. Inter- and intraspecific variation of satellite repeats in the primate genome are reviewed. The functional significance of these sequences is discussed by describing how the transcriptional activity of satellite repeats can affect gene expression during different cellular processes. Sex-linked satellites are outlined, together with their respective genomic organization. Mechanisms are proposed whereby satellite repeats might have emerged as novel sequences during different evolutionary phases. Finally, the main challenges that hinder the detection of satellite DNA are outlined and an overview of the latest methodologies to address technological limitations is presented.
Collapse
Affiliation(s)
- Syed Farhan Ahmad
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, Bangkok 10900, Thailand
| | - Worapong Singchat
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, Bangkok 10900, Thailand
| | - Maryam Jehangir
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Department of Structural and Functional Biology, Institute of Bioscience at Botucatu, São Paulo State University (UNESP), Botucatu, São Paulo 18618-689, Brazil
| | - Aorarat Suntronpong
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, Bangkok 10900, Thailand
| | - Thitipong Panthum
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, Bangkok 10900, Thailand
| | - Suchinda Malaivijitnond
- National Primate Research Center of Thailand, Chulalongkorn University, Saraburi 18110, Thailand;
- Department of Biology, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand
| | - Kornsorn Srikulnath
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, Bangkok 10900, Thailand
- National Primate Research Center of Thailand, Chulalongkorn University, Saraburi 18110, Thailand;
- Center of Excellence on Agricultural Biotechnology (AG-BIO/PERDO-CHE), Bangkok 10900, Thailand
- Omics Center for Agriculture, Bioresources, Food and Health, Kasetsart University (OmiKU), Bangkok 10900, Thailand
| |
Collapse
|
39
|
Suzuki Y, Myers EW, Morishita S. Rapid and ongoing evolution of repetitive sequence structures in human centromeres. SCIENCE ADVANCES 2020; 6:6/50/eabd9230. [PMID: 33310858 PMCID: PMC7732198 DOI: 10.1126/sciadv.abd9230] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 10/30/2020] [Indexed: 06/12/2023]
Abstract
Our understanding of centromere sequence variation across human populations is limited by its extremely long nested repeat structures called higher-order repeats that are challenging to sequence. Here, we analyzed chromosomes 11, 17, and X using long-read sequencing data for 36 individuals from diverse populations including a Han Chinese trio and 21 Japanese. We revealed substantial structural diversity with many previously unidentified variant higher-order repeats specific to individuals characterizing rapid, haplotype-specific evolution of human centromeric arrays, while frequent single-nucleotide variants are largely conserved. We found a characteristic pattern shared among prevalent variants in human and chimpanzee. Our findings pave the way for studying sequence evolution in human and primate centromeres.
Collapse
Affiliation(s)
- Yuta Suzuki
- The University of Tokyo, Graduate School of Frontier Sciences, Department of Computational Biology and Medical Sciences, Kashiwa, Chiba 277-8568, Japan.
| | - Eugene W Myers
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | - Shinichi Morishita
- The University of Tokyo, Graduate School of Frontier Sciences, Department of Computational Biology and Medical Sciences, Kashiwa, Chiba 277-8568, Japan.
| |
Collapse
|
40
|
The Cytogenomic "Theory of Everything": Chromohelkosis May Underlie Chromosomal Instability and Mosaicism in Disease and Aging. Int J Mol Sci 2020; 21:ijms21218328. [PMID: 33171981 PMCID: PMC7664247 DOI: 10.3390/ijms21218328] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 11/03/2020] [Accepted: 11/04/2020] [Indexed: 01/28/2023] Open
Abstract
Mechanisms for somatic chromosomal mosaicism (SCM) and chromosomal instability (CIN) are not completely understood. During molecular karyotyping and bioinformatic analyses of children with neurodevelopmental disorders and congenital malformations (n = 612), we observed colocalization of regular chromosomal imbalances or copy number variations (CNV) with mosaic ones (n = 47 or 7.7%). Analyzing molecular karyotyping data and pathways affected by CNV burdens, we proposed a mechanism for SCM/CIN, which had been designated as “chromohelkosis” (from the Greek words chromosome ulceration/open wound). Briefly, structural chromosomal imbalances are likely to cause local instability (“wreckage”) at the breakpoints, which results either in partial/whole chromosome loss (e.g., aneuploidy) or elongation of duplicated regions. Accordingly, a function for classical/alpha satellite DNA (protection from the wreckage towards the centromere) has been hypothesized. Since SCM and CIN are ubiquitously involved in development, homeostasis and disease (e.g., prenatal development, cancer, brain diseases, aging), we have metaphorically (ironically) designate the system explaining chromohelkosis contribution to SCM/CIN as the cytogenomic “theory of everything”, similar to the homonymous theory in physics inasmuch as it might explain numerous phenomena in chromosome biology. Recognizing possible empirical and theoretical weaknesses of this “theory”, we nevertheless believe that studies of chromohelkosis-like processes are required to understand structural variability and flexibility of the genome.
Collapse
|
41
|
Bzikadze AV, Pevzner PA. Automated assembly of centromeres from ultra-long error-prone reads. Nat Biotechnol 2020; 38:1309-1316. [PMID: 32665660 PMCID: PMC10718184 DOI: 10.1038/s41587-020-0582-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2019] [Accepted: 05/29/2020] [Indexed: 12/12/2022]
Abstract
Centromeric variation has been linked to cancer and infertility, but centromere sequences contain multiple tandem repeats and can only be assembled manually from long error-prone reads. Here we describe the centroFlye algorithm for centromere assembly using long error-prone reads, and apply it to assemble human centromeres on chromosomes 6 and X. Our analyses reveal putative breakpoints in the manual reconstruction of the human X centromere, demonstrate that human X chromosome is partitioned into repeat subfamilies and provide initial insights into centromere evolution. We anticipate that centroFlye could be applied to automatically close remaining multimegabase gaps in the reference human genome.
Collapse
Affiliation(s)
- Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
42
|
Balzano E, Pelliccia F, Giunta S. Genome (in)stability at tandem repeats. Semin Cell Dev Biol 2020; 113:97-112. [PMID: 33109442 DOI: 10.1016/j.semcdb.2020.10.003] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Revised: 09/26/2020] [Accepted: 10/10/2020] [Indexed: 12/12/2022]
Abstract
Repeat sequences account for over half of the human genome and represent a significant source of variation that underlies physiological and pathological states. Yet, their study has been hindered due to limitations in short-reads sequencing technology and difficulties in assembly. A important category of repetitive DNA in the human genome is comprised of tandem repeats (TRs), where repetitive units are arranged in a head-to-tail pattern. Compared to other regions of the genome, TRs carry between 10 and 10,000 fold higher mutation rate. There are several mutagenic mechanisms that can give rise to this propensity toward instability, but their precise contribution remains speculative. Given the high degree of homology between these sequences and their arrangement in tandem, once damaged, TRs have an intrinsic propensity to undergo aberrant recombination with non-allelic exchange and generate harmful rearrangements that may undermine the stability of the entire genome. The dynamic mutagenesis at TRs has been found to underlie individual polymorphism associated with neurodegenerative and neuromuscular disorders, as well as complex genetic diseases like cancer and diabetes. Here, we review our current understanding of the surveillance and repair mechanisms operating within these regions, and we describe how alterations in these protective processes can readily trigger mutational signatures found at TRs, ultimately resulting in the pathological correlation between TRs instability and human diseases. Finally, we provide a viewpoint to counter the detrimental effects that TRs pose in light of their selection and conservation, as important drivers of human evolution.
Collapse
Affiliation(s)
- Elisa Balzano
- Dipartimento di Biologia e Biotecnologie "Charles Darwin", Sapienza Università di Roma, 00185 Roma, Italy
| | - Franca Pelliccia
- Dipartimento di Biologia e Biotecnologie "Charles Darwin", Sapienza Università di Roma, 00185 Roma, Italy
| | - Simona Giunta
- The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA; Dipartimento di Biologia e Biotecnologie "Charles Darwin", Sapienza Università di Roma, 00185 Roma, Italy.
| |
Collapse
|
43
|
Unique Features of Tandem Repeats in Bacteria. J Bacteriol 2020; 202:JB.00229-20. [PMID: 32839174 DOI: 10.1128/jb.00229-20] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Accepted: 08/17/2020] [Indexed: 02/06/2023] Open
Abstract
DNA tandem repeats, or satellites, are well described in eukaryotic species, but little is known about their prevalence across prokaryotes. Here, we performed the most complete characterization to date of satellites in bacteria. We identified 121,638 satellites from 12,233 fully sequenced and assembled bacterial genomes with a very uneven distribution. We also determined the families of satellites which have a related sequence. There are 85 genomes that are particularly satellite rich and contain several families of satellites of yet unknown function. Interestingly, we only found two main types of noncoding satellites, depending on their repeat sizes, 22/44 or 52 nucleotides (nt). An intriguing feature is the constant size of the repeats in the genomes of different species, whereas their sequences show no conservation. Individual species also have several families of satellites with the same repeat length and different sequences. This result is in marked contrast with previous findings in eukaryotes, where noncoding satellites of many sizes are found in any species investigated. We describe in greater detail these noncoding satellites in the spirochete Leptospira interrogans and in several bacilli. These satellites undoubtedly play a specific role in the species which have acquired them. We discuss the possibility that they represent binding sites for transcription factors not previously described or that they are involved in the stabilization of the nucleoid through interaction with proteins.IMPORTANCE We found an enigmatic group of noncoding satellites in 85 bacterial genomes with a constant repeat size but variable sequence. This pattern of DNA organization is unique and had not been previously described in bacteria. These findings strongly suggest that satellite size in some bacteria is under strong selective constraints and thus that satellites are very likely to play a fundamental role. We also provide a list and properties of all satellites in 12,233 genomes, which may be used for further genomic analysis.
Collapse
|
44
|
Abstract
The overall structure and composition of human centromeres have been well reported, but how these elements vary between individual chromosomes and influence the chromosome-specific behavior during mitosis remains untested. In our study, we discover the existence of heterogeneity of centromeric DNA features that dictates the chromosome segregation fidelity during mitosis.
Collapse
Affiliation(s)
- Marie Dumont
- Institut Curie, PSL Research University, Paris, France
| | | |
Collapse
|
45
|
Mahlke MA, Nechemia-Arbely Y. Guarding the Genome: CENP-A-Chromatin in Health and Cancer. Genes (Basel) 2020; 11:genes11070810. [PMID: 32708729 PMCID: PMC7397030 DOI: 10.3390/genes11070810] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 07/10/2020] [Accepted: 07/15/2020] [Indexed: 02/07/2023] Open
Abstract
Faithful chromosome segregation is essential for the maintenance of genomic integrity and requires functional centromeres. Centromeres are epigenetically defined by the histone H3 variant, centromere protein A (CENP-A). Here we highlight current knowledge regarding CENP-A-containing chromatin structure, specification of centromere identity, regulation of CENP-A deposition and possible contribution to cancer formation and/or progression. CENP-A overexpression is common among many cancers and predicts poor prognosis. Overexpression of CENP-A increases rates of CENP-A deposition ectopically at sites of high histone turnover, occluding CCCTC-binding factor (CTCF) binding. Ectopic CENP-A deposition leads to mitotic defects, centromere dysfunction and chromosomal instability (CIN), a hallmark of cancer. CENP-A overexpression is often accompanied by overexpression of its chaperone Holliday Junction Recognition Protein (HJURP), leading to epigenetic addiction in which increased levels of HJURP and CENP-A become necessary to support rapidly dividing p53 deficient cancer cells. Alterations in CENP-A posttranslational modifications are also linked to chromosome segregation errors and CIN. Collectively, CENP-A is pivotal to genomic stability through centromere maintenance, perturbation of which can lead to tumorigenesis.
Collapse
Affiliation(s)
- Megan A. Mahlke
- UPMC Hillman Cancer Center, Pittsburgh, PA 15213, USA;
- Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Yael Nechemia-Arbely
- UPMC Hillman Cancer Center, Pittsburgh, PA 15213, USA;
- Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, PA 15261, USA
- Correspondence: ; Tel.: +1-412-623-3228; Fax: +1-412-623-7828
| |
Collapse
|
46
|
Mikheenko A, Bzikadze AV, Gurevich A, Miga KH, Pevzner PA. TandemTools: mapping long reads and assessing/improving assembly quality in extra-long tandem repeats. Bioinformatics 2020; 36:i75-i83. [PMID: 32657355 PMCID: PMC7355294 DOI: 10.1093/bioinformatics/btaa440] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION Extra-long tandem repeats (ETRs) are widespread in eukaryotic genomes and play an important role in fundamental cellular processes, such as chromosome segregation. Although emerging long-read technologies have enabled ETR assemblies, the accuracy of such assemblies is difficult to evaluate since there are no tools for their quality assessment. Moreover, since the mapping of error-prone reads to ETRs remains an open problem, it is not clear how to polish draft ETR assemblies. RESULTS To address these problems, we developed the TandemTools software that includes the TandemMapper tool for mapping reads to ETRs and the TandemQUAST tool for polishing ETR assemblies and their quality assessment. We demonstrate that TandemTools not only reveals errors in ETR assemblies but also improves the recently generated assemblies of human centromeres. AVAILABILITY AND IMPLEMENTATION https://github.com/ablab/TandemTools. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, CA 92093, USA
| | - Alexey Gurevich
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, CA 92093, USA
| |
Collapse
|
47
|
Corless S, Höcker S, Erhardt S. Centromeric RNA and Its Function at and Beyond Centromeric Chromatin. J Mol Biol 2020; 432:4257-4269. [DOI: 10.1016/j.jmb.2020.03.027] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Revised: 03/26/2020] [Accepted: 03/27/2020] [Indexed: 12/21/2022]
|
48
|
Miga KH. Centromere studies in the era of 'telomere-to-telomere' genomics. Exp Cell Res 2020; 394:112127. [PMID: 32504677 DOI: 10.1016/j.yexcr.2020.112127] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 05/23/2020] [Accepted: 05/30/2020] [Indexed: 12/17/2022]
Abstract
We are entering into an exciting era of genomics where truly complete, high-quality assemblies of human chromosomes are available end-to-end, or from 'telomere-to-telomere' (T2T). This technological advance offers a new opportunity to include endogenous human centromeric regions in high-resolution, sequence-based studies. These emerging reference maps are expected to reveal a new functional landscape in the human genome, where centromere proteins, transcriptional regulation, and spatial organization can be examined with base-level resolution across different stages of development and disease. Such studies will depend on innovative assembly methods of extremely long tandem repeats (ETRs), or satellite DNAs, paired with the development of new, orthogonal validation methods to ensure accuracy and completeness. This review reflects the progress in centromere genomics, credited by recent advancements in long-read sequencing and assembly methods. In doing so, I will discuss the challenges that remain and the promise for a new period of scientific discovery for satellite DNA biology and centromere function.
Collapse
Affiliation(s)
- Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, CA, 95064, USA.
| |
Collapse
|
49
|
Abstract
Since the early days of the genome era, the scientific community has relied on a single 'reference' genome for each species, which is used as the basis for a wide range of genetic analyses, including studies of variation within and across species. As sequencing costs have dropped, thousands of new genomes have been sequenced, and scientists have come to realize that a single reference genome is inadequate for many purposes. By sampling a diverse set of individuals, one can begin to assemble a pan-genome: a collection of all the DNA sequences that occur in a species. Here we review efforts to create pan-genomes for a range of species, from bacteria to humans, and we further consider the computational methods that have been proposed in order to capture, interpret and compare pan-genome data. As scientists continue to survey and catalogue the genomic variation across human populations and begin to assemble a human pan-genome, these efforts will increase our power to connect variation to human diversity, disease and beyond.
Collapse
Affiliation(s)
- Rachel M Sherman
- Department of Computer Science, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA.
| | - Steven L Salzberg
- Department of Computer Science, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
50
|
Chromosome-Level Assembly of Drosophila bifasciata Reveals Important Karyotypic Transition of the X Chromosome. G3-GENES GENOMES GENETICS 2020; 10:891-897. [PMID: 31969429 PMCID: PMC7056972 DOI: 10.1534/g3.119.400922] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
The Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193 Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromeres, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.
Collapse
|