1
|
Gao S, Zhang Y, Bush SJ, Wang B, Yang X, Ye K. Centromere Landscapes Resolved from Hundreds of Human Genomes. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae071. [PMID: 39423139 DOI: 10.1093/gpbjnl/qzae071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Revised: 08/27/2024] [Accepted: 09/20/2024] [Indexed: 10/21/2024]
Abstract
High-fidelity (HiFi) sequencing has facilitated the assembly and analysis of the most repetitive region of the genome, the centromere. Nevertheless, our current understanding of human centromeres is based on a relatively small number of telomere-to-telomere assemblies, which have not yet captured its full diversity. In this study, we investigated the genomic diversity of human centromere higher order repeats (HORs) via both HiFi reads and haplotype-resolved assemblies from hundreds of samples drawn from ongoing pangenome-sequencing projects and reprocessed them via a novel HOR annotation pipeline, HiCAT-human. We used this wealth of data to provide a global survey of the centromeric HOR landscape; in particular, we found that 23 HORs presented significant copy number variability between populations. We detected three centromere genotypes with unbalanced population frequencies on chromosomes 5, 8, and 17. An inter-assembly comparison of HOR loci further revealed that while HOR array structures are diverse, they nevertheless tend to form a number of specific landscapes, each exhibiting different levels of HOR subunit expansion and possibly reflecting a cyclical evolutionary transition from homogeneous to nested structures and back.
Collapse
Affiliation(s)
- Shenghan Gao
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Yimeng Zhang
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Stephen J Bush
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Bo Wang
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Xiaofei Yang
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- Center for Mathematical Medical, The First Affiliated Hospital, Xi'an Jiaotong University, Xi'an 710061, China
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
- Faculty of Science, Leiden University, Leiden 2311 EZ, The Netherlands
| |
Collapse
|
2
|
Courret C, Hemmer LW, Wei X, Patel PD, Chabot BJ, Fuda NJ, Geng X, Chang CH, Mellone BG, Larracuente AM. Turnover of retroelements and satellite DNA drives centromere reorganization over short evolutionary timescales in Drosophila. PLoS Biol 2024; 22:e3002911. [PMID: 39570997 PMCID: PMC11620609 DOI: 10.1371/journal.pbio.3002911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2024] [Revised: 12/05/2024] [Accepted: 10/22/2024] [Indexed: 12/07/2024] Open
Abstract
Centromeres reside in rapidly evolving, repeat-rich genomic regions, despite their essential function in chromosome segregation. Across organisms, centromeres are rich in selfish genetic elements such as transposable elements and satellite DNAs that can bias their transmission through meiosis. However, these elements still need to cooperate at some level and contribute to, or avoid interfering with, centromere function. To gain insight into the balance between conflict and cooperation at centromeric DNA, we take advantage of the close evolutionary relationships within the Drosophila simulans clade-D. simulans, D. sechellia, and D. mauritiana-and their relative, D. melanogaster. Using chromatin profiling combined with high-resolution fluorescence in situ hybridization on stretched chromatin fibers, we characterize all centromeres across these species. We discovered dramatic centromere reorganization involving recurrent shifts between retroelements and satellite DNAs over short evolutionary timescales. We also reveal the recent origin (<240 Kya) of telocentric chromosomes in D. sechellia, where the X and fourth centromeres now sit on telomere-specific retroelements. Finally, the Y chromosome centromeres, which are the only chromosomes that do not experience female meiosis, do not show dynamic cycling between satDNA and TEs. The patterns of rapid centromere turnover in these species are consistent with genetic conflicts in the female germline and have implications for centromeric DNA function and karyotype evolution. Regardless of the evolutionary forces driving this turnover, the rapid reorganization of centromeric sequences over short evolutionary timescales highlights their potential as hotspots for evolutionary innovation.
Collapse
Affiliation(s)
- Cécile Courret
- Department of Biology, University of Rochester, Rochester, New York, United States of America
| | - Lucas W. Hemmer
- Department of Biology, University of Rochester, Rochester, New York, United States of America
| | - Xiaolu Wei
- Department of Biology, University of Rochester, Rochester, New York, United States of America
| | - Prachi D. Patel
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
| | - Bryce J. Chabot
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
| | - Nicholas J. Fuda
- Department of Biology, University of Rochester, Rochester, New York, United States of America
| | - Xuewen Geng
- Department of Biology, University of Rochester, Rochester, New York, United States of America
| | - Ching-Ho Chang
- Division of Basic Sciences, Fred Hutchinson Cancer Center, Seattle, Washington, United States of America
| | - Barbara G. Mellone
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
- Institute for Systems Genomics, University of Connecticut, Storrs, Connecticut, United States of America
| | - Amanda M. Larracuente
- Department of Biology, University of Rochester, Rochester, New York, United States of America
| |
Collapse
|
3
|
Said I, Barbash DA, Clark AG. The Structure of Simple Satellite Variation in the Human Genome and Its Correlation With Centromere Ancestry. Genome Biol Evol 2024; 16:evae153. [PMID: 39018452 PMCID: PMC11305138 DOI: 10.1093/gbe/evae153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 06/21/2024] [Accepted: 07/12/2024] [Indexed: 07/19/2024] Open
Abstract
Although repetitive DNA forms much of the human genome, its study is challenging due to limitations in assembly and alignment of repetitive short-reads. We have deployed k-Seek, software that detects tandem repeats embedded in single reads, on 2,504 human genomes from the 1,000 Genomes Project to quantify the variation and abundance of simple satellites (repeat units <20 bp). We find that the ancestral monomer of Human Satellite 3 makes up the largest portion of simple satellite content in humans (mean of ∼8 Mb). We discovered ∼50,000 rare tandem repeats that are not detected in the T2T-CHM13v2.0 assembly, including undescribed variants of telomericand pericentromeric repeats. We find broad homogeneity of the most abundant repeats across populations, except for AG-rich repeats which are more abundant in African individuals. We also find cliques of highly similar AG- and AT-rich satellites that are interspersed and form higher-order structures that covary in copy number across individuals, likely through concerted amplification via unequal exchange. Finally, we use pericentromeric polymorphisms to estimate centromeric genetic relatedness between individuals and find a strong predictive relationship between centromeric lineages and pericentromeric simple satellite abundances. In particular, ancestral monomers of Human Satellite 2 and Human Satellite 3 abundances correlate with clusters of centromeric ancestry on chromosome 16 and chromosome 9, with some clusters structured by population. These results provide new descriptions of the population dynamics that underlie the evolution of simple satellites in humans.
Collapse
Affiliation(s)
- Iskander Said
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Daniel A Barbash
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
4
|
Chen YL, Jones AN, Crawford A, Sattler M, Ettinger A, Torres-Padilla ME. Determinants of minor satellite RNA function in chromosome segregation in mouse embryonic stem cells. J Cell Biol 2024; 223:e202309027. [PMID: 38625077 PMCID: PMC11022885 DOI: 10.1083/jcb.202309027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 03/06/2024] [Accepted: 03/29/2024] [Indexed: 04/17/2024] Open
Abstract
The centromere is a fundamental higher-order structure in chromosomes ensuring their faithful segregation upon cell division. Centromeric transcripts have been described in several species and suggested to participate in centromere function. However, low sequence conservation of centromeric repeats appears inconsistent with a role in recruiting highly conserved centromeric proteins. Here, we hypothesized that centromeric transcripts may function through a secondary structure rather than sequence conservation. Using mouse embryonic stem cells (ESCs), we show that an imbalance in the levels of forward or reverse minor satellite (MinSat) transcripts leads to severe chromosome segregation defects. We further show that MinSat RNA adopts a stem-loop secondary structure, which is conserved in human α-satellite transcripts. We identify an RNA binding region in CENPC and demonstrate that MinSat transcripts function through the structured region of the RNA. Importantly, mutants that disrupt MinSat secondary structure do not cause segregation defects. We propose that the conserved role of centromeric transcripts relies on their secondary RNA structure.
Collapse
Affiliation(s)
- Yung-Li Chen
- Institute of Epigenetics and Stem Cells (IES), Helmholtz Munich, München, Germany
| | - Alisha N. Jones
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich, Neuherberg, Germany
| | - Amy Crawford
- Department of Chemistry, New York University, New York, NY, USA
| | - Michael Sattler
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich, Neuherberg, Germany
- Department of Bioscience, Bavarian NMR Center, School of Natural Sciences, Technical University of Munich, Garching, Germany
| | - Andreas Ettinger
- Institute of Epigenetics and Stem Cells (IES), Helmholtz Munich, München, Germany
| | - Maria-Elena Torres-Padilla
- Institute of Epigenetics and Stem Cells (IES), Helmholtz Munich, München, Germany
- Faculty of Biology, Ludwig-Maximilians Universität, München, Germany
| |
Collapse
|
5
|
Shukla HG, Chakraborty M, Emerson J. Genetic variation in recalcitrant repetitive regions of the Drosophila melanogaster genome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.11.598575. [PMID: 38915508 PMCID: PMC11195212 DOI: 10.1101/2024.06.11.598575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Many essential functions of organisms are encoded in highly repetitive genomic regions, including histones involved in DNA packaging, centromeres that are core components of chromosome segregation, ribosomal RNA comprising the protein translation machinery, telomeres that ensure chromosome integrity, piRNA clusters encoding host defenses against selfish elements, and virtually the entire Y chromosome. These regions, formed by highly similar tandem arrays, pose significant challenges for experimental and informatic study, impeding sequence-level descriptions essential for understanding genetic variation. Here, we report the assembly and variation analysis of such repetitive regions in Drosophila melanogaster, offering significant improvements to the existing community reference assembly. Our work successfully recovers previously elusive segments, including complete reconstructions of the histone locus and the pericentric heterochromatin of the X chromosome, spanning the Stellate locus to the distal flank of the rDNA cluster. To infer structural changes in these regions where alignments are often not practicable, we introduce landmark anchors based on unique variants that are putatively orthologous. These regions display considerable structural variation between different D. melanogaster strains, exhibiting differences in copy number and organization of homologous repeat units between haplotypes. In the histone cluster, although we observe minimal genetic exchange indicative of crossing over, the variation patterns suggest mechanisms such as unequal sister chromatid exchange. We also examine the prevalence and scale of concerted evolution in the histone and Stellate clusters and discuss the mechanisms underlying these observed patterns.
Collapse
Affiliation(s)
- Harsh G. Shukla
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California 92697, USA
- Graduate Program in Mathematical, Computational and Systems Biology, University of California Irvine, Irvine, California 92697, USA
| | - Mahul Chakraborty
- Department of Biology, Texas A&M University, College Station, Texas 77843, USA
| | - J.J. Emerson
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California 92697, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, California 92697, USA
| |
Collapse
|
6
|
Wang B, Jia Y, Dang N, Yu J, Bush SJ, Gao S, He W, Wang S, Guo H, Yang X, Ma W, Ye K. Near telomere-to-telomere genome assemblies of two Chlorella species unveil the composition and evolution of centromeres in green algae. BMC Genomics 2024; 25:356. [PMID: 38600443 PMCID: PMC11005252 DOI: 10.1186/s12864-024-10280-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 04/02/2024] [Indexed: 04/12/2024] Open
Abstract
BACKGROUND Centromeres play a crucial and conserved role in cell division, although their composition and evolutionary history in green algae, the evolutionary ancestors of land plants, remains largely unknown. RESULTS We constructed near telomere-to-telomere (T2T) assemblies for two Trebouxiophyceae species, Chlorella sorokiniana NS4-2 and Chlorella pyrenoidosa DBH, with chromosome numbers of 12 and 13, and genome sizes of 58.11 Mb and 53.41 Mb, respectively. We identified and validated their centromere sequences using CENH3 ChIP-seq and found that, similar to humans and higher plants, the centromeric CENH3 signals of green algae display a pattern of hypomethylation. Interestingly, the centromeres of both species largely comprised transposable elements, although they differed significantly in their composition. Species within the Chlorella genus display a more diverse centromere composition, with major constituents including members of the LTR/Copia, LINE/L1, and LINE/RTEX families. This is in contrast to green algae including Chlamydomonas reinhardtii, Coccomyxa subellipsoidea, and Chromochloris zofingiensis, in which centromere composition instead has a pronounced single-element composition. Moreover, we observed significant differences in the composition and structure of centromeres among chromosomes with strong collinearity within the Chlorella genus, suggesting that centromeric sequence evolves more rapidly than sequence in non-centromeric regions. CONCLUSIONS This study not only provides high-quality genome data for comparative genomics of green algae but gives insight into the composition and evolutionary history of centromeres in early plants, laying an important foundation for further research on their evolution.
Collapse
Affiliation(s)
- Bo Wang
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Yanyan Jia
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Ningxin Dang
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Jie Yu
- College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Stephen J Bush
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Shenghan Gao
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Wenxi He
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Sirui Wang
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Hongtao Guo
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Xiaofei Yang
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Weimin Ma
- College of Life Sciences, Shanghai Normal University, Shanghai, China.
| | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China.
- Faculty of Science, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
7
|
Chen C, Wu S, Sun Y, Zhou J, Chen Y, Zhang J, Birchler JA, Han F, Yang N, Su H. Three near-complete genome assemblies reveal substantial centromere dynamics from diploid to tetraploid in Brachypodium genus. Genome Biol 2024; 25:63. [PMID: 38439049 PMCID: PMC10910784 DOI: 10.1186/s13059-024-03206-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 02/26/2024] [Indexed: 03/06/2024] Open
Abstract
BACKGROUND Centromeres are critical for maintaining genomic stability in eukaryotes, and their turnover shapes genome architectures and drives karyotype evolution. However, the co-evolution of centromeres from different species in allopolyploids over millions of years remains largely unknown. RESULTS Here, we generate three near-complete genome assemblies, a tetraploid Brachypodium hybridum and its two diploid ancestors, Brachypodium distachyon and Brachypodium stacei. We detect high degrees of sequence, structural, and epigenetic variations of centromeres at base-pair resolution between closely related Brachypodium genomes, indicating the appearance and accumulation of species-specific centromere repeats from a common origin during evolution. We also find that centromere homogenization is accompanied by local satellite repeats bursting and retrotransposon purging, and the frequency of retrotransposon invasions drives the degree of interspecies centromere diversification. We further investigate the dynamics of centromeres during alloploidization process, and find that dramatic genetics and epigenetics architecture variations are associated with the turnover of centromeres between homologous chromosomal pairs from diploid to tetraploid. Additionally, our pangenomes analysis reveals the ongoing variations of satellite repeats and stable evolutionary homeostasis within centromeres among individuals of each Brachypodium genome with different polyploidy levels. CONCLUSIONS Our results provide unprecedented information on the genomic, epigenomic, and functional diversity of highly repetitive DNA between closely related species and their allopolyploid genomes at both coarse and fine scale.
Collapse
Affiliation(s)
- Chuanye Chen
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, 430070, China
| | - Siying Wu
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yishuang Sun
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing, 100101, China
- University of the Chinese Academy of Sciences, Beijing, 100049, China
| | - Jingwei Zhou
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yiqian Chen
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jing Zhang
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing, 100101, China
| | - James A Birchler
- Division of Biological Sciences, University of Missouri, Columbia, MO, 65211, USA
| | - Fangpu Han
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing, 100101, China
- University of the Chinese Academy of Sciences, Beijing, 100049, China
| | - Ning Yang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, 430070, China
| | - Handong Su
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, 430070, China.
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
| |
Collapse
|
8
|
Shiraishi Y, Koya J, Chiba K, Okada A, Arai Y, Saito Y, Shibata T, Kataoka K. Precise characterization of somatic complex structural variations from tumor/control paired long-read sequencing data with nanomonsv. Nucleic Acids Res 2023; 51:e74. [PMID: 37336583 PMCID: PMC10415145 DOI: 10.1093/nar/gkad526] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 05/23/2023] [Accepted: 06/07/2023] [Indexed: 06/21/2023] Open
Abstract
We present our novel software, nanomonsv, for detecting somatic structural variations (SVs) using tumor and matched control long-read sequencing data with a single-base resolution. The current version of nanomonsv includes two detection modules, Canonical SV module, and Single breakend SV module. Using tumor/control paired long-read sequencing data from three cancer and their matched lymphoblastoid lines, we demonstrate that Canonical SV module can identify somatic SVs that can be captured by short-read technologies with higher precision and recall than existing methods. In addition, we have developed a workflow to classify mobile element insertions while elucidating their in-depth properties, such as 5' truncations, internal inversions, as well as source sites for 3' transductions. Furthermore, Single breakend SV module enables the detection of complex SVs that can only be identified by long-reads, such as SVs involving highly-repetitive centromeric sequences, and LINE1- and virus-mediated rearrangements. In summary, our approaches applied to cancer long-read sequencing data can reveal various features of somatic SVs and will lead to a better understanding of mutational processes and functional consequences of somatic SVs.
Collapse
Affiliation(s)
- Yuichi Shiraishi
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Junji Koya
- Division of Molecular Oncology, National Cancer Center Research Institute, Tokyo, Japan
| | - Kenichi Chiba
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Ai Okada
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Yasuhito Arai
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
| | - Yuki Saito
- Division of Molecular Oncology, National Cancer Center Research Institute, Tokyo, Japan
- Department of Gastroenterology, Keio University School of Medicine, Tokyo, Japan
| | - Tatsuhiro Shibata
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
- Laboratory of Molecular Medicine, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Keisuke Kataoka
- Division of Molecular Oncology, National Cancer Center Research Institute, Tokyo, Japan
- Department of Hematology, Keio University School of Medicine, Tokyo, Japan
| |
Collapse
|
9
|
Ma H, Ding W, Chen Y, Zhou J, Chen W, Lan C, Mao H, Li Q, Yan W, Su H. Centromere Plasticity With Evolutionary Conservation and Divergence Uncovered by Wheat 10+ Genomes. Mol Biol Evol 2023; 40:msad176. [PMID: 37541261 PMCID: PMC10422864 DOI: 10.1093/molbev/msad176] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Revised: 06/26/2023] [Accepted: 07/28/2023] [Indexed: 08/06/2023] Open
Abstract
Centromeres (CEN) are the chromosomal regions that play a crucial role in maintaining genomic stability. The underlying highly repetitive DNA sequences can evolve quickly in most eukaryotes, and promote karyotype evolution. Despite their variability, it is not fully understood how these widely variable sequences ensure the homeostasis of centromere function. In this study, we investigated the genetics and epigenetics of CEN in a population of wheat lines from global breeding programs. We captured a high degree of sequences, positioning, and epigenetic variations in the large and complex wheat CEN. We found that most CENH3-associated repeats are Cereba element of retrotransposons and exhibit phylogenetic homogenization across different wheat lines, but the less-associated repeat sequences diverge on their own way in each wheat line, implying specific mechanisms for selecting certain repeat types as functional core CEN. Furthermore, we observed that CENH3 nucleosome structures display looser wrapping of DNA termini on complex centromeric repeats, including the repositioned CEN. We also found that strict CENH3 nucleosome positioning and intrinsic DNA features play a role in determining centromere identity among different lines. Specific non-B form DNAs were substantially associated with CENH3 nucleosomes for the repositioned centromeres. These findings suggest that multiple mechanisms were involved in the adaptation of CENH3 nucleosomes that can stabilize CEN. Ultimately, we proposed a remarkable epigenetic plasticity of centromere chromatin within the diverse genomic context, and the high robustness is crucial for maintaining centromere function and genome stability in wheat 10+ lines as a result of past breeding selections.
Collapse
Affiliation(s)
- Huan Ma
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China
| | - Wentao Ding
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China
| | - Yiqian Chen
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China
| | - Jingwei Zhou
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China
| | - Wei Chen
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China
| | - Caixia Lan
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China
| | - Hailiang Mao
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China
| | - Qiang Li
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China
| | - Wenhao Yan
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China
| | - Handong Su
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| |
Collapse
|
10
|
Gao S, Yang X, Guo H, Zhao X, Wang B, Ye K. HiCAT: a tool for automatic annotation of centromere structure. Genome Biol 2023; 24:58. [PMID: 36978122 PMCID: PMC10053651 DOI: 10.1186/s13059-023-02900-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 03/17/2023] [Indexed: 03/30/2023] Open
Abstract
Significant improvements in long-read sequencing technologies have unlocked complex genomic areas, such as centromeres, in the genome and introduced the centromere annotation problem. Currently, centromeres are annotated in a semi-manual way. Here, we propose HiCAT, a generalizable automatic centromere annotation tool, based on hierarchical tandem repeat mining to facilitate decoding of centromere architecture. We apply HiCAT to simulated datasets, human CHM13-T2T and gapless Arabidopsis thaliana genomes. Our results are generally consistent with previous inferences but also greatly improve annotation continuity and reveal additional fine structures, demonstrating HiCAT's performance and general applicability.
Collapse
Affiliation(s)
- Shenghan Gao
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Xiaofei Yang
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China.
| | - Hongtao Guo
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Xixi Zhao
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Bo Wang
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China.
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
- Faculty of Science, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
11
|
Silva JM, Qi W, Pinho AJ, Pratas D. AlcoR: alignment-free simulation, mapping, and visualization of low-complexity regions in biological data. Gigascience 2022; 12:giad101. [PMID: 38091509 PMCID: PMC10716826 DOI: 10.1093/gigascience/giad101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/29/2023] [Accepted: 11/07/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Low-complexity data analysis is the area that addresses the search and quantification of regions in sequences of elements that contain low-complexity or repetitive elements. For example, these can be tandem repeats, inverted repeats, homopolymer tails, GC-biased regions, similar genes, and hairpins, among many others. Identifying these regions is crucial because of their association with regulatory and structural characteristics. Moreover, their identification provides positional and quantity information where standard assembly methodologies face significant difficulties because of substantial higher depth coverage (mountains), ambiguous read mapping, or where sequencing or reconstruction defects may occur. However, the capability to distinguish low-complexity regions (LCRs) in genomic and proteomic sequences is a challenge that depends on the model's ability to find them automatically. Low-complexity patterns can be implicit through specific or combined sources, such as algorithmic or probabilistic, and recurring to different spatial distances-namely, local, medium, or distant associations. FINDINGS This article addresses the challenge of automatically modeling and distinguishing LCRs, providing a new method and tool (AlcoR) for efficient and accurate segmentation and visualization of these regions in genomic and proteomic sequences. The method enables the use of models with different memories, providing the ability to distinguish local from distant low-complexity patterns. The method is reference and alignment free, providing additional methodologies for testing, including a highly flexible simulation method for generating biological sequences (DNA or protein) with different complexity levels, sequence masking, and a visualization tool for automatic computation of the LCR maps into an ideogram style. We provide illustrative demonstrations using synthetic, nearly synthetic, and natural sequences showing the high efficiency and accuracy of AlcoR. As large-scale results, we use AlcoR to unprecedentedly provide a whole-chromosome low-complexity map of a recent complete human genome and the haplotype-resolved chromosome pairs of a heterozygous diploid African cassava cultivar. CONCLUSIONS The AlcoR method provides the ability of fast sequence characterization through data complexity analysis, ideally for scenarios entangling the presence of new or unknown sequences. AlcoR is implemented in C language using multithreading to increase the computational speed, is flexible for multiple applications, and does not contain external dependencies. The tool accepts any sequence in FASTA format. The source code is freely provided at https://github.com/cobilab/alcor.
Collapse
Affiliation(s)
- Jorge M Silva
- IEETA, Institute of Electronics and Informatics Engineering of Aveiro, and LASI, Intelligent Systems Associate Laboratory, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal
- Department of Electronics Telecommunications and Informatics, University of Aveiro, Campus Universitario de Santiago, 3810-193, Aveiro, Portugal
| | - Weihong Qi
- Functional Genomics Center Zurich, ETH Zurich and University of Zurich, Winterthurerstrasse, 190, 8057, Zurich, Switzerland
- SIB, Swiss Institute of Bioinformatics, 1202, Geneva, Switzerland
| | - Armando J Pinho
- IEETA, Institute of Electronics and Informatics Engineering of Aveiro, and LASI, Intelligent Systems Associate Laboratory, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal
- Department of Electronics Telecommunications and Informatics, University of Aveiro, Campus Universitario de Santiago, 3810-193, Aveiro, Portugal
| | - Diogo Pratas
- IEETA, Institute of Electronics and Informatics Engineering of Aveiro, and LASI, Intelligent Systems Associate Laboratory, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal
- Department of Electronics Telecommunications and Informatics, University of Aveiro, Campus Universitario de Santiago, 3810-193, Aveiro, Portugal
- Department of Virology, University of Helsinki, Haartmaninkatu, 3, 00014 Helsinki, Finland
| |
Collapse
|
12
|
Logsdon GA, Eichler EE. The Dynamic Structure and Rapid Evolution of Human Centromeric Satellite DNA. Genes (Basel) 2022; 14:92. [PMID: 36672831 PMCID: PMC9859433 DOI: 10.3390/genes14010092] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 12/22/2022] [Accepted: 12/24/2022] [Indexed: 12/31/2022] Open
Abstract
The complete sequence of a human genome provided our first comprehensive view of the organization of satellite DNA associated with heterochromatin. We review how our understanding of the genetic architecture and epigenetic properties of human centromeric DNA have advanced as a result. Preliminary studies of human and nonhuman ape centromeres reveal complex, saltatory mutational changes organized around distinct evolutionary layers. Pockets of regional hypomethylation within higher-order α-satellite DNA, termed centromere dip regions, appear to define the site of kinetochore attachment in all human chromosomes, although such epigenetic features can vary even within the same chromosome. Sequence resolution of satellite DNA is providing new insights into centromeric function with potential implications for improving our understanding of human biology and health.
Collapse
Affiliation(s)
- Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
13
|
Kunyavskaya O, Dvorkina T, Bzikadze AV, Alexandrov I, Pevzner PA. Automated annotation of human centromeres with HORmon. Genome Res 2022; 32:1137-1151. [PMID: 35545449 DOI: 10.1101/gr.276362.121] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Accepted: 05/06/2022] [Indexed: 11/24/2022]
Abstract
Recent advances in long-read sequencing opened a possibility to address the long-standing questions about the architecture and evolution of human centromeres. They also emphasized the need for centromere annotation (partitioning human centromeres into monomers and higher-order repeats (HORs)). Even though there was a half-century-long series of semi-manual studies of centromere architecture, a rigorous centromere annotation algorithm is still lacking. Moreover, an automated centromere annotation is a prerequisite for studies of genetic diseases associated with centromeres, and evolutionary studies of centromeres across multiple species. Although the monomer decomposition (transforming a centromere into a monocentromere written in the monomer alphabet) and the HOR decomposition (representing a monocentromere in the alphabet of HORs) are currently viewed as two separate problems, we demonstrate that they should be integrated into a single framework in such a way that HOR (monomer) inference affects monomer (HOR) inference. We thus developed the HORmon algorithm that integrates the monomer/HOR inference and automatically generates the human monomers/HORs that are largely consistent with the previous semi-manual inference.
Collapse
Affiliation(s)
- Olga Kunyavskaya
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University
| | - Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University
| | | | - Ivan Alexandrov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University
| | | |
Collapse
|
14
|
Altemose N, Glennis A, Bzikadze AV, Sidhwani P, Langley SA, Caldas GV, Hoyt SJ, Uralsky L, Ryabov FD, Shew CJ, Sauria MEG, Borchers M, Gershman A, Mikheenko A, Shepelev VA, Dvorkina T, Kunyavskaya O, Vollger MR, Rhie A, McCartney AM, Asri M, Lorig-Roach R, Shafin K, Aganezov S, Olson D, de Lima LG, Potapova T, Hartley GA, Haukness M, Kerpedjiev P, Gusev F, Tigyi K, Brooks S, Young A, Nurk S, Koren S, Salama SR, Paten B, Rogaev EI, Streets A, Karpen GH, Dernburg AF, Sullivan BA, Straight AF, Wheeler TJ, Gerton JL, Eichler EE, Phillippy AM, Timp W, Dennis MY, O'Neill RJ, Zook JM, Schatz MC, Pevzner PA, Diekhans M, Langley CH, Alexandrov IA, Miga KH. Complete genomic and epigenetic maps of human centromeres. Science 2022; 376:eabl4178. [PMID: 35357911 PMCID: PMC9233505 DOI: 10.1126/science.abl4178] [Citation(s) in RCA: 247] [Impact Index Per Article: 82.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.
Collapse
Affiliation(s)
- Nicolas Altemose
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - A. Glennis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, USA
| | - Pragya Sidhwani
- Department of Biochemistry, Stanford University, Stanford, CA, USA
| | - Sasha A. Langley
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Gina V. Caldas
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Savannah J. Hoyt
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Lev Uralsky
- Sirius University of Science and Technology, Sochi, Russia
- Vavilov Institute of General Genetics, Moscow, Russia
| | | | - Colin J. Shew
- Genome Center, MIND Institute, and Department of Biochemistry and Molecular Medicine, School of Medicine, University of California, Davis, Davis, CA, USA
| | | | | | - Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | | | - Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Olga Kunyavskaya
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ann M. McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Ryan Lorig-Roach
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Daniel Olson
- Department of Computer Science, University of Montana, Missoula, MT. USA
| | | | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Gabrielle A. Hartley
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Fedor Gusev
- Vavilov Institute of General Genetics, Moscow, Russia
| | - Kristof Tigyi
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Shelise Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alice Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sofie R. Salama
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| | - Evgeny I. Rogaev
- Sirius University of Science and Technology, Sochi, Russia
- Vavilov Institute of General Genetics, Moscow, Russia
- Department of Psychiatry, University of Massachusetts Medical School, Worcester, MA, USA
- Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Aaron Streets
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Gary H. Karpen
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- BioEngineering and BioMedical Sciences Department, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Abby F. Dernburg
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Institute for Quantitative Biosciences (QB3), University of California, Berkeley, Berkeley, CA, USA
| | - Beth A. Sullivan
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC, USA
| | | | - Travis J. Wheeler
- Department of Computer Science, University of Montana, Missoula, MT. USA
| | - Jennifer L. Gerton
- Stowers Institute for Medical Research, Kansas City, MO, USA
- University of Kansas Medical School, Department of Biochemistry and Molecular Biology and Cancer Center, University of Kansas, Kansas City, KS, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Megan Y. Dennis
- Genome Center, MIND Institute, and Department of Biochemistry and Molecular Medicine, School of Medicine, University of California, Davis, Davis, CA, USA
| | - Rachel J. O'Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Justin M. Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California at San Diego, San Diego, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Charles H. Langley
- Department of Evolution and Ecology, University of California Davis, Davis, CA, USA
| | - Ivan A. Alexandrov
- Vavilov Institute of General Genetics, Moscow, Russia
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
- Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| |
Collapse
|
15
|
de Lima LG, Howe E, Singh VP, Potapova T, Li H, Xu B, Castle J, Crozier S, Harrison CJ, Clifford SC, Miga KH, Ryan SL, Gerton JL. PCR amplicons identify widespread copy number variation in human centromeric arrays and instability in cancer. CELL GENOMICS 2021; 1:100064. [PMID: 34993501 PMCID: PMC8730464 DOI: 10.1016/j.xgen.2021.100064] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 07/13/2021] [Accepted: 08/24/2021] [Indexed: 12/13/2022]
Abstract
Centromeric α-satellite repeats represent ~6% of the human genome, but their length and repetitive nature make sequencing and analysis of those regions challenging. However, centromeres are essential for the stable propagation of chromosomes, so tools are urgently needed to monitor centromere copy number and how it influences chromosome transmission and genome stability. We developed and benchmarked droplet digital PCR (ddPCR) assays that measure copy number for five human centromeric arrays. We applied them to characterize natural variation in centromeric array size, analyzing normal tissue from 37 individuals from China and 39 individuals from the US and UK. Each chromosome-specific array varies in size up to 10-fold across individuals and up to 50-fold across chromosomes, indicating a unique complement of arrays in each individual. We also used the ddPCR assays to analyze centromere copy number in 76 matched tumor-normal samples across four cancer types, representing the most-comprehensive quantitative analysis of centromeric array stability in cancer to date. In contrast to stable transmission in cultured cells, centromeric arrays show gain and loss events in each of the cancer types, suggesting centromeric α-satellite DNA represents a new category of genome instability in cancer. Our methodology for measuring human centromeric-array copy number will advance research on centromeres and genome integrity in normal and disease states.
Collapse
Affiliation(s)
| | - Edmund Howe
- The Stowers Institute for Medical Research, Kansas City, MO, USA
| | | | - Tamara Potapova
- The Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Hua Li
- The Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Baoshan Xu
- Hospital of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Guanghua School of Stomatology, Institute of Stomatological Research, Sun Yat-sen University, Guangzhou, Guangdong Province, China
| | - Jemma Castle
- Newcastle University Centre for Cancer, Newcastle upon Tyne, UK
| | - Steve Crozier
- Newcastle University Centre for Cancer, Newcastle upon Tyne, UK
| | | | | | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Sarra L. Ryan
- Newcastle University Centre for Cancer, Newcastle upon Tyne, UK
| | - Jennifer L. Gerton
- The Stowers Institute for Medical Research, Kansas City, MO, USA
- University of Kansas Medical Center, Kansas City, KS, USA
| |
Collapse
|
16
|
Abstract
We are entering a new era in genomics where entire centromeric regions are accurately represented in human reference assemblies. Access to these high-resolution maps will enable new surveys of sequence and epigenetic variation in the population and offer new insight into satellite array genomics and centromere function. Here, we focus on the sequence organization and evolution of alpha satellites, which are credited as the genetic and genomic definition of human centromeres due to their interaction with inner kinetochore proteins and their importance in the development of human artificial chromosome assays. We provide an overview of alpha satellite repeat structure and array organization in the context of these high-quality reference data sets; discuss the emergence of variation-based surveys; and provide perspective on the role of this new source of genetic and epigenetic variation in the context of chromosome biology, genome instability, and human disease.
Collapse
Affiliation(s)
- Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California 95064, USA; .,Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Ivan A Alexandrov
- Department of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia; .,Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199004, Russia.,Research Center of Biotechnology of the Russian Academy of Sciences, Moscow 119071, Russia
| |
Collapse
|
17
|
Suzuki Y, Morishita S. The time is ripe to investigate human centromeres by long-read sequencing†. DNA Res 2021; 28:6381569. [PMID: 34609504 PMCID: PMC8502840 DOI: 10.1093/dnares/dsab021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 09/28/2021] [Indexed: 01/05/2023] Open
Abstract
The complete sequencing of human centromeres, which are filled with highly repetitive elements, has long been challenging. In human centromeres, α-satellite monomers of about 171 bp in length are the basic repeating units, but α-satellite monomers constitute the higher-order repeat (HOR) units, and thousands of copies of highly homologous HOR units form large arrays, which have hampered sequence assembly of human centromeres. Because most HOR unit occurrences are covered by long reads of about 10 kb, the recent availability of much longer reads is expected to enable observation of individual HOR occurrences in terms of their single-nucleotide or structural variants. The time has come to examine the complete sequence of human centromeres.
Collapse
Affiliation(s)
- Yuta Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8568, Japan
| | - Shinichi Morishita
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8568, Japan
| |
Collapse
|
18
|
Nikaido M, Kakiuchi N, Miyamoto S, Hirano T, Takeuchi Y, Funakoshi T, Yokoyama A, Ogasawara T, Yamamoto Y, Yamada A, Setoyama T, Shimizu T, Kato Y, Uose S, Sakurai T, Minamiguchi S, Obama K, Sakai Y, Muto M, Chiba T, Ogawa S, Seno H. Indolent feature of Helicobacter pylori-uninfected intramucosal signet ring cell carcinomas with CDH1 mutations. Gastric Cancer 2021; 24:1102-1114. [PMID: 33961152 DOI: 10.1007/s10120-021-01191-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 04/09/2021] [Indexed: 02/07/2023]
Abstract
BACKGROUND In Helicobacter pylori (Hp)-uninfected individuals, diffuse-type gastric cancer (DGC) was reported as the most common type of cancer. However, the carcinogenic mechanism of Hp-uninfected sporadic DGC is largely unknown. METHODS We performed whole-exome sequencing of Hp-uninfected DGCs and Hp-uninfected normal gastric mucosa. For advanced DGCs, external datasets were also analyzed. RESULTS Eighteen patients (aged 29-78 years) with DGCs and nine normal subjects (28-77 years) were examined. The mutation burden in intramucosal DGCs (10-66 mutations per exome) from individuals aged 29-73 years was not very different from that in the normal gastric glands, which showed a constant mutation accumulation rate (0.33 mutations/exome/year). Unbiased dN/dS analysis showed that CDH1 somatic mutation was a driver mutation for intramucosal DGC. CDH1 mutation was more frequent in intramucosal DGCs (67%) than in advanced DGCs (27%). In contrast, TP53 mutation was more frequent in advanced DGCs (52%) than in intramucosal DGCs (0%). This discrepancy in mutations suggests that CDH1-mutated intramucosal DGCs make a relatively small contribution to advanced DGC formation. Among the 16 intramucosal DGCs (median size, 6.5 mm), 15 DGCs were pure signet ring cell carcinoma (SRCC) with reduced E-cadherin expression and a low proliferative capacity (median Ki-67 index, 2.4%). Five SRCCs reviewed endoscopically over 2-5 years showed no progression. CONCLUSIONS Impaired E-cadherin function due to CDH1 mutation was considered as an early carcinogenic event of Hp-uninfected intramucosal SRCC. Genetic and clinical analyses suggest that Hp-uninfected intramucosal SRCCs may be less likely to develop into advanced DGCs.
Collapse
Affiliation(s)
- Mitsuhiro Nikaido
- Department of Gastroenterology and Hepatology, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Nobuyuki Kakiuchi
- Department of Gastroenterology and Hepatology, Kyoto University Graduate School of Medicine, Kyoto, Japan.,Department of Pathology and Tumor Biology, Kyoto University Graduate School of Medicine, Kyoto, Japan.,Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto, Japan
| | - Shin'ichi Miyamoto
- Department of Gastroenterology and Hepatology, Kyoto University Graduate School of Medicine, Kyoto, Japan. .,Department of Gastroenterology, National Hospital Organization Kyoto Medical Center, 1-1 Fukakusa-Mukaihata-Cho, Fushimi, Kyoto, 612-8555, Japan.
| | - Tomonori Hirano
- Department of Gastroenterology and Hepatology, Kyoto University Graduate School of Medicine, Kyoto, Japan.,Department of Pathology and Tumor Biology, Kyoto University Graduate School of Medicine, Kyoto, Japan.,Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto, Japan
| | - Yasuhide Takeuchi
- Department of Pathology and Tumor Biology, Kyoto University Graduate School of Medicine, Kyoto, Japan.,Department of Diagnostic Pathology, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Taro Funakoshi
- Department of Therapeutic Oncology, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Akira Yokoyama
- Department of Therapeutic Oncology, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Tatsuki Ogasawara
- Department of Pathology and Tumor Biology, Kyoto University Graduate School of Medicine, Kyoto, Japan.,Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto, Japan
| | - Yoshihiro Yamamoto
- Department of Therapeutic Oncology, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Atsushi Yamada
- Department of Therapeutic Oncology, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Takeshi Setoyama
- Department of Gastroenterology and Hepatology, Kyoto University Graduate School of Medicine, Kyoto, Japan.,Department of Gastroenterology, Osaka Red Cross Hospital, Osaka, Japan
| | - Takahiro Shimizu
- Department of Gastroenterology and Hepatology, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Yukari Kato
- Department of Gastroenterology and Hepatology, Kansai Electric Power Hospital, Osaka, Japan
| | - Suguru Uose
- Department of Gastroenterology and Hepatology, Kansai Electric Power Hospital, Osaka, Japan
| | - Takaki Sakurai
- Department of Diagnostic Pathology, Kyoto University Graduate School of Medicine, Kyoto, Japan.,Department of Pathology, Kansai Electric Power Hospital, Osaka, Japan
| | - Sachiko Minamiguchi
- Department of Diagnostic Pathology, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Kazutaka Obama
- Department of Surgery, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Yoshiharu Sakai
- Department of Surgery, Kyoto University Graduate School of Medicine, Kyoto, Japan.,Department of Surgery, Osaka Red Cross Hospital, Osaka, Japan
| | - Manabu Muto
- Department of Therapeutic Oncology, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Tsutomu Chiba
- Department of Gastroenterology and Hepatology, Kyoto University Graduate School of Medicine, Kyoto, Japan.,Department of Gastroenterology and Hepatology, Kansai Electric Power Hospital, Osaka, Japan
| | - Seishi Ogawa
- Department of Pathology and Tumor Biology, Kyoto University Graduate School of Medicine, Kyoto, Japan.,Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto, Japan.,Department of Medicine, Center for Hematology and Regenerative Medicine, Karolinska Institute, Stockholm, Sweden
| | - Hiroshi Seno
- Department of Gastroenterology and Hepatology, Kyoto University Graduate School of Medicine, Kyoto, Japan
| |
Collapse
|
19
|
Dvorkina T, Kunyavskaya O, Bzikadze AV, Alexandrov I, Pevzner PA. CentromereArchitect: inference and analysis of the architecture of centromeres. Bioinformatics 2021; 37:i196-i204. [PMID: 34252949 PMCID: PMC8336445 DOI: 10.1093/bioinformatics/btab265] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Motivation Recent advances in long-read sequencing technologies led to rapid progress in centromere assembly in the last year and, for the first time, opened a possibility to address the long-standing questions about the architecture and evolution of human centromeres. However, since these advances have not been yet accompanied by the development of the centromere-specific bioinformatics algorithms, even the fundamental questions (e.g. centromere annotation by deriving the complete set of human monomers and high-order repeats), let alone more complex questions (e.g. explaining how monomers and high-order repeats evolved) about human centromeres remain open. Moreover, even though there was a four-decade-long series of studies aimed at cataloging all human monomers and high-order repeats, the rigorous algorithmic definitions of these concepts are still lacking. Thus, the development of a centromere annotation tool is a prerequisite for follow-up personalized biomedical studies of centromeres across the human population and evolutionary studies of centromeres across various species. Results We describe the CentromereArchitect, the first tool for the centromere annotation in a newly sequenced genome, apply it to the recently generated complete assembly of a human genome by the Telomere-to-Telomere consortium, generate the complete set of human monomers and high-order repeats for ‘live’ centromeres, and reveal a vast set of hybrid monomers that may represent the focal points of centromere evolution. Availability and implementation CentromereArchitect is publicly available on https://github.com/ablab/stringdecomposer/tree/ismb2021 Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Olga Kunyavskaya
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, CA 92093, USA
| | - Ivan Alexandrov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, CA 92093, USA
| |
Collapse
|
20
|
|