1
|
Glunčić M, Vlahović I, Rosandić M, Paar V. Precise identification of cascading alpha satellite higher order repeats in T2T-CHM13 assembly of human chromosome 3. Croat Med J 2024; 65:209-219. [PMID: 38868967 PMCID: PMC11157248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 05/28/2024] [Indexed: 06/14/2024] Open
Abstract
AIM To precisely identify and analyze alpha-satellite higher-order repeats (HORs) in T2T-CHM13 assembly of human chromosome 3. METHODS From the recently sequenced complete T2T-CHM13 assembly of human chromosome 3, the precise alpha satellite HOR structure was computed by using the novel high-precision GRM2023 algorithm with global repeat map (GRM) and monomer distance (MD) diagrams. RESULTS The major alpha satellite HOR array in chromosome 3 revealed a novel cascading HOR, housing 17mer HOR copies with subfragments of periods 15 and 2. Within each row in the cascading HOR, the monomers were of different types, but different rows within the same cascading 17mer HOR contained more than one monomer of the same type. Each canonical 17mer HOR copy comprised 17 monomers belonging to 16 different monomer types. Another pronounced 10mer HOR array was of the regular Willard's type. CONCLUSION Our findings emphasize the complexity within the chromosome 3 centromere as well as deviations from expected highly regular patterns.
Collapse
Affiliation(s)
- Matko Glunčić
- Matko Glunčić, Department of Physics, Faculty of Science, University of Zagreb, Bijenička cesta 32, 10000 Zagreb, Croatia,
| | | | | | | |
Collapse
|
2
|
Chen W, Wang X, Sun J, Wang X, Zhu Z, Ayhan DH, Yi S, Yan M, Zhang L, Meng T, Mu Y, Li J, Meng D, Bian J, Wang K, Wang L, Chen S, Chen R, Jin J, Li B, Zhang X, Deng XW, He H, Guo L. Two telomere-to-telomere gapless genomes reveal insights into Capsicum evolution and capsaicinoid biosynthesis. Nat Commun 2024; 15:4295. [PMID: 38769327 PMCID: PMC11106260 DOI: 10.1038/s41467-024-48643-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 05/08/2024] [Indexed: 05/22/2024] Open
Abstract
Chili pepper (Capsicum) is known for its unique fruit pungency due to the presence of capsaicinoids. The evolutionary history of capsaicinoid biosynthesis and the mechanism of their tissue specificity remain obscure due to the lack of high-quality Capsicum genomes. Here, we report two telomere-to-telomere (T2T) gap-free genomes of C. annuum and its wild nonpungent relative C. rhomboideum to investigate the evolution of fruit pungency in chili peppers. We precisely delineate Capsicum centromeres, which lack high-copy tandem repeats but are extensively invaded by CRM retrotransposons. Through phylogenomic analyses, we estimate the evolutionary timing of capsaicinoid biosynthesis. We reveal disrupted coding and regulatory regions of key biosynthesis genes in nonpungent species. We also find conserved placenta-specific accessible chromatin regions, which likely allow for tissue-specific biosynthetic gene coregulation and capsaicinoid accumulation. These T2T genomic resources will accelerate chili pepper genetic improvement and help to understand Capsicum genome evolution.
Collapse
Affiliation(s)
- Weikai Chen
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Xiangfeng Wang
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Jie Sun
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Xinrui Wang
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Zhangsheng Zhu
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
- College of Horticulture, South China Agricultural University, Guangzhou, 510642, China
| | - Dilay Hazal Ayhan
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Shu Yi
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Ming Yan
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Lili Zhang
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
- College of Modern Agriculture and Environment, Weifang Institute of Technology, Weifang, 262500, China
| | - Tan Meng
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Yu Mu
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Jun Li
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Dian Meng
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Jianxin Bian
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Ke Wang
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
- College of Life Sciences, Shandong Agricultural University, Tai'an, 271018, China
| | - Lu Wang
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Shaoying Chen
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Ruidong Chen
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Jingyun Jin
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Bosheng Li
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Xingping Zhang
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Xing Wang Deng
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China
| | - Hang He
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China.
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China.
| | - Li Guo
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China.
| |
Collapse
|
3
|
Glunčić M, Vlahović I, Rosandić M, Paar V. Novel Concept of Alpha Satellite Cascading Higher-Order Repeats (HORs) and Precise Identification of 15mer and 20mer Cascading HORs in Complete T2T-CHM13 Assembly of Human Chromosome 15. Int J Mol Sci 2024; 25:4395. [PMID: 38673983 PMCID: PMC11050224 DOI: 10.3390/ijms25084395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 04/08/2024] [Accepted: 04/11/2024] [Indexed: 04/28/2024] Open
Abstract
Unraveling the intricate centromere structure of human chromosomes holds profound implications, illuminating fundamental genetic mechanisms and potentially advancing our comprehension of genetic disorders and therapeutic interventions. This study rigorously identified and structurally analyzed alpha satellite higher-order repeats (HORs) within the centromere of human chromosome 15 in the complete T2T-CHM13 assembly using the high-precision GRM2023 algorithm. The most extensive alpha satellite HOR array in chromosome 15 reveals a novel cascading HOR, housing 429 15mer HOR copies, containing 4-, 7- and 11-monomer subfragments. Within each row of cascading HORs, all alpha satellite monomers are of distinct types, as in regular Willard's HORs. However, different HOR copies within the same cascading 15mer HOR contain more than one monomer of the same type. Each canonical 15mer HOR copy comprises 15 monomers belonging to only 9 different monomer types. Notably, 65% of the 429 15mer cascading HOR copies exhibit canonical structures, while 35% display variant configurations. Identified as the second most extensive alpha satellite HOR, another novel cascading HOR within human chromosome 15 encompasses 164 20mer HOR copies, each featuring two subfragments. Moreover, a distinct pattern emerges as interspersed 25mer/26mer structures differing from regular Willard's HORs and giving rise to a 34-monomer subfragment. Only a minor 18mer HOR array of 12 HOR copies is of the regular Willard's type. These revelations highlight the complexity within the chromosome 15 centromeric region, accentuating deviations from anticipated highly regular patterns and hinting at profound information encoding and functional potential within the human centromere.
Collapse
Affiliation(s)
- Matko Glunčić
- Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia;
| | - Ines Vlahović
- Algebra LAB, Algebra University College, 10000 Zagreb, Croatia;
| | - Marija Rosandić
- Department of Internal Medicine, University Hospital Centre Zagreb, 10000 Zagreb, Croatia;
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| | - Vladimir Paar
- Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia;
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| |
Collapse
|
4
|
Kang X, Xu J, Luo X, Schönhuth A. Hybrid-hybrid correction of errors in long reads with HERO. Genome Biol 2023; 24:275. [PMID: 38041098 PMCID: PMC10690975 DOI: 10.1186/s13059-023-03112-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 11/16/2023] [Indexed: 12/03/2023] Open
Abstract
Although generally superior, hybrid approaches for correcting errors in third-generation sequencing (TGS) reads, using next-generation sequencing (NGS) reads, mistake haplotype-specific variants for errors in polyploid and mixed samples. We suggest HERO, as the first "hybrid-hybrid" approach, to make use of both de Bruijn graphs and overlap graphs for optimal catering to the particular strengths of NGS and TGS reads. Extensive benchmarking experiments demonstrate that HERO improves indel and mismatch error rates by on average 65% (27[Formula: see text]95%) and 20% (4[Formula: see text]61%). Using HERO prior to genome assembly significantly improves the assemblies in the majority of the relevant categories.
Collapse
Affiliation(s)
- Xiongbin Kang
- College of Biology, Hunan University, Changsha, China
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Jialu Xu
- College of Biology, Hunan University, Changsha, China
| | - Xiao Luo
- College of Biology, Hunan University, Changsha, China.
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany.
| |
Collapse
|
5
|
Bzikadze AV, Pevzner PA. UniAligner: a parameter-free framework for fast sequence alignment. Nat Methods 2023; 20:1346-1354. [PMID: 37580559 DOI: 10.1038/s41592-023-01970-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 07/05/2023] [Indexed: 08/16/2023]
Abstract
Even though the recent advances in 'complete genomics' revealed the previously inaccessible genomic regions, analysis of variations in centromeres and other extra-long tandem repeats (ETRs) faces an algorithmic challenge since there are currently no tools for accurate sequence comparison of ETRs. Counterintuitively, the classical alignment approaches, such as the Smith-Waterman algorithm, fail to construct biologically adequate alignments of ETRs. We present UniAligner-the parameter-free sequence alignment algorithm with sequence-dependent alignment scoring that automatically changes for any pair of compared sequences. UniAligner prioritizes matches of rare substrings that are more likely to be relevant to the evolutionary relationship between two sequences. We apply UniAligner to estimate the mutation rates in human centromeres, and quantify the extremely high rate of large duplications and deletions in centromeres. This high rate suggests that centromeres may represent some of the most rapidly evolving regions of the human genome with respect to their structural organization.
Collapse
Affiliation(s)
- Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, La Jolla, CA, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
6
|
Gao S, Yang X, Guo H, Zhao X, Wang B, Ye K. HiCAT: a tool for automatic annotation of centromere structure. Genome Biol 2023; 24:58. [PMID: 36978122 PMCID: PMC10053651 DOI: 10.1186/s13059-023-02900-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 03/17/2023] [Indexed: 03/30/2023] Open
Abstract
Significant improvements in long-read sequencing technologies have unlocked complex genomic areas, such as centromeres, in the genome and introduced the centromere annotation problem. Currently, centromeres are annotated in a semi-manual way. Here, we propose HiCAT, a generalizable automatic centromere annotation tool, based on hierarchical tandem repeat mining to facilitate decoding of centromere architecture. We apply HiCAT to simulated datasets, human CHM13-T2T and gapless Arabidopsis thaliana genomes. Our results are generally consistent with previous inferences but also greatly improve annotation continuity and reveal additional fine structures, demonstrating HiCAT's performance and general applicability.
Collapse
Affiliation(s)
- Shenghan Gao
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Xiaofei Yang
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China.
| | - Hongtao Guo
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Xixi Zhao
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Bo Wang
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China.
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
- Faculty of Science, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
7
|
GALA: a computational framework for de novo chromosome-by-chromosome assembly with long reads. Nat Commun 2023; 14:204. [PMID: 36639368 PMCID: PMC9839709 DOI: 10.1038/s41467-022-35670-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 12/16/2022] [Indexed: 01/15/2023] Open
Abstract
High-quality genome assembly has wide applications in genetics and medical studies. However, it is still very challenging to achieve gap-free chromosome-scale assemblies using current workflows for long-read platforms. Here we report on GALA (Gap-free long-read Assembly tool), a computational framework for chromosome-based sequencing data separation and de novo assembly implemented through a multi-layer graph that identifies discordances within preliminary assemblies and partitions the data into chromosome-scale scaffolding groups. The subsequent independent assembly of each scaffolding group generates a gap-free assembly likely free from the mis-assembly errors which usually hamper existing workflows. This flexible framework also allows us to integrate data from various technologies, such as Hi-C, genetic maps, and even motif analyses to generate gap-free chromosome-scale assemblies. As a proof of principle we de novo assemble the C. elegans genome using combined PacBio and Nanopore sequencing data and a rice cultivar genome using Nanopore sequencing data from publicly available datasets. We also demonstrate the proposed method's applicability with a gap-free assembly of the human genome using PacBio high-fidelity (HiFi) long reads. Thus, our method enables straightforward assembly of genomes with multiple data sources and overcomes barriers that at present restrict the application of de novo genome assembly technology.
Collapse
|
8
|
Zhu H, Guo S, Zhao J, Arbab Sakandar H, Lv R, Wen Q, Chen X. Whole Genome Sequence Analysis of Lactiplantibacillus plantarum Bacteriophage P2. Pol J Microbiol 2022; 71:421-428. [PMID: 36185020 PMCID: PMC9608156 DOI: 10.33073/pjm-2022-037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 07/22/2022] [Indexed: 11/24/2022] Open
Abstract
Phage P2 was isolated from failed fermentation broth carried out by Lactiplantibacillus plantarum IMAU10120. A previous study in our laboratory showed that this phage belonged to the Siphoviridae family. In this study, this phage’s genomic characteristics were analyzed using whole-genome sequencing. It was revealed that phage P2 was 77.9 kb in length and had 39.28% G + C content. Its genome included 96 coding sequences (CDS) and two tRNA genes involved in the function of the structure, DNA replication, packaging, and regulation. Phage P2 had higher host specificity; many tested strains were not infected. Cell wall adsorption experiments showed that the adsorption receptor component of phage P2 might be a part of the cell wall peptidoglycan. This research might enrich the knowledge about genomic information of lactobacillus phages and provide some primary data to establish phage control measures.
Collapse
Affiliation(s)
- Hanfang Zhu
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, P.R.China,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, P.R.China,Collaborative Innovative Center of Ministry of Education for Lactic Acid Bacteria and Fermented Dairy Products, Inner Mongolia Agricultural University, Hohhot, P.R.China
| | - She Guo
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, P.R.China,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, P.R.China,Collaborative Innovative Center of Ministry of Education for Lactic Acid Bacteria and Fermented Dairy Products, Inner Mongolia Agricultural University, Hohhot, P.R.China
| | - Jie Zhao
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, P.R.China,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, P.R.China,Collaborative Innovative Center of Ministry of Education for Lactic Acid Bacteria and Fermented Dairy Products, Inner Mongolia Agricultural University, Hohhot, P.R.China
| | - Hafiz Arbab Sakandar
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, P.R.China,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, P.R.China,Collaborative Innovative Center of Ministry of Education for Lactic Acid Bacteria and Fermented Dairy Products, Inner Mongolia Agricultural University, Hohhot, P.R.China
| | - Ruirui Lv
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, P.R.China,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, P.R.China,Collaborative Innovative Center of Ministry of Education for Lactic Acid Bacteria and Fermented Dairy Products, Inner Mongolia Agricultural University, Hohhot, P.R.China
| | - Qiannan Wen
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, P.R.China,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, P.R.China,Collaborative Innovative Center of Ministry of Education for Lactic Acid Bacteria and Fermented Dairy Products, Inner Mongolia Agricultural University, Hohhot, P.R.China
| | - Xia Chen
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, P.R.China,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, P.R.China,Collaborative Innovative Center of Ministry of Education for Lactic Acid Bacteria and Fermented Dairy Products, Inner Mongolia Agricultural University, Hohhot, P.R.China, X. Chen, Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, China # Hanfang Zhu and She Guo have contributed equally to this study.
| |
Collapse
|
9
|
Jain C, Rhie A, Hansen NF, Koren S, Phillippy AM. Long-read mapping to repetitive reference sequences using Winnowmap2. Nat Methods 2022; 19:705-710. [PMID: 35365778 PMCID: PMC10510034 DOI: 10.1038/s41592-022-01457-8] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Accepted: 03/17/2022] [Indexed: 01/10/2023]
Abstract
Approximately 5-10% of the human genome remains inaccessible due to the presence of repetitive sequences such as segmental duplications and tandem repeat arrays. We show that existing long-read mappers often yield incorrect alignments and variant calls within long, near-identical repeats, as they remain vulnerable to allelic bias. In the presence of a nonreference allele within a repeat, a read sampled from that region could be mapped to an incorrect repeat copy. To address this limitation, we developed a new long-read mapping method, Winnowmap2, by using minimal confidently alignable substrings. Winnowmap2 computes each read mapping through a collection of confident subalignments. This approach is more tolerant of structural variation and more sensitive to paralog-specific variants within repeats. Our experiments highlight that Winnowmap2 successfully addresses the issue of allelic bias, enabling more accurate downstream variant calls in repetitive sequences.
Collapse
Affiliation(s)
- Chirag Jain
- Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, India.
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA.
| | - Arang Rhie
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| | - Nancy F Hansen
- Comparative Genomics Analysis Unit, National Human Genome Research Institute, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| | - Adam M Phillippy
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| |
Collapse
|
10
|
Xu D, Song Y, Zhao X, Gong D, Yang Y, Pan W. RAviz: a visualization tool for detecting false-positive alignments in repetitive genomic regions. HORTICULTURE RESEARCH 2022; 9:uhac161. [PMID: 36204211 PMCID: PMC9531338 DOI: 10.1093/hr/uhac161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 08/01/2022] [Accepted: 07/08/2022] [Indexed: 06/16/2023]
Affiliation(s)
| | | | - Xianjia Zhao
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Henan Zhengzhou 450001, China
| | - Desheng Gong
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | | | | |
Collapse
|
11
|
Kunyavskaya O, Dvorkina T, Bzikadze AV, Alexandrov I, Pevzner PA. Automated annotation of human centromeres with HORmon. Genome Res 2022; 32:1137-1151. [PMID: 35545449 DOI: 10.1101/gr.276362.121] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Accepted: 05/06/2022] [Indexed: 11/24/2022]
Abstract
Recent advances in long-read sequencing opened a possibility to address the long-standing questions about the architecture and evolution of human centromeres. They also emphasized the need for centromere annotation (partitioning human centromeres into monomers and higher-order repeats (HORs)). Even though there was a half-century-long series of semi-manual studies of centromere architecture, a rigorous centromere annotation algorithm is still lacking. Moreover, an automated centromere annotation is a prerequisite for studies of genetic diseases associated with centromeres, and evolutionary studies of centromeres across multiple species. Although the monomer decomposition (transforming a centromere into a monocentromere written in the monomer alphabet) and the HOR decomposition (representing a monocentromere in the alphabet of HORs) are currently viewed as two separate problems, we demonstrate that they should be integrated into a single framework in such a way that HOR (monomer) inference affects monomer (HOR) inference. We thus developed the HORmon algorithm that integrates the monomer/HOR inference and automatically generates the human monomers/HORs that are largely consistent with the previous semi-manual inference.
Collapse
Affiliation(s)
- Olga Kunyavskaya
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University
| | - Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University
| | | | - Ivan Alexandrov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University
| | | |
Collapse
|
12
|
Altemose N, Glennis A, Bzikadze AV, Sidhwani P, Langley SA, Caldas GV, Hoyt SJ, Uralsky L, Ryabov FD, Shew CJ, Sauria MEG, Borchers M, Gershman A, Mikheenko A, Shepelev VA, Dvorkina T, Kunyavskaya O, Vollger MR, Rhie A, McCartney AM, Asri M, Lorig-Roach R, Shafin K, Aganezov S, Olson D, de Lima LG, Potapova T, Hartley GA, Haukness M, Kerpedjiev P, Gusev F, Tigyi K, Brooks S, Young A, Nurk S, Koren S, Salama SR, Paten B, Rogaev EI, Streets A, Karpen GH, Dernburg AF, Sullivan BA, Straight AF, Wheeler TJ, Gerton JL, Eichler EE, Phillippy AM, Timp W, Dennis MY, O'Neill RJ, Zook JM, Schatz MC, Pevzner PA, Diekhans M, Langley CH, Alexandrov IA, Miga KH. Complete genomic and epigenetic maps of human centromeres. Science 2022; 376:eabl4178. [PMID: 35357911 PMCID: PMC9233505 DOI: 10.1126/science.abl4178] [Citation(s) in RCA: 174] [Impact Index Per Article: 87.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.
Collapse
Affiliation(s)
- Nicolas Altemose
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - A. Glennis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, USA
| | - Pragya Sidhwani
- Department of Biochemistry, Stanford University, Stanford, CA, USA
| | - Sasha A. Langley
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Gina V. Caldas
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Savannah J. Hoyt
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Lev Uralsky
- Sirius University of Science and Technology, Sochi, Russia
- Vavilov Institute of General Genetics, Moscow, Russia
| | | | - Colin J. Shew
- Genome Center, MIND Institute, and Department of Biochemistry and Molecular Medicine, School of Medicine, University of California, Davis, Davis, CA, USA
| | | | | | - Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | | | - Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Olga Kunyavskaya
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ann M. McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Ryan Lorig-Roach
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Daniel Olson
- Department of Computer Science, University of Montana, Missoula, MT. USA
| | | | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Gabrielle A. Hartley
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Fedor Gusev
- Vavilov Institute of General Genetics, Moscow, Russia
| | - Kristof Tigyi
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Shelise Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alice Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sofie R. Salama
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| | - Evgeny I. Rogaev
- Sirius University of Science and Technology, Sochi, Russia
- Vavilov Institute of General Genetics, Moscow, Russia
- Department of Psychiatry, University of Massachusetts Medical School, Worcester, MA, USA
- Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Aaron Streets
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Gary H. Karpen
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- BioEngineering and BioMedical Sciences Department, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Abby F. Dernburg
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Institute for Quantitative Biosciences (QB3), University of California, Berkeley, Berkeley, CA, USA
| | - Beth A. Sullivan
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC, USA
| | | | - Travis J. Wheeler
- Department of Computer Science, University of Montana, Missoula, MT. USA
| | - Jennifer L. Gerton
- Stowers Institute for Medical Research, Kansas City, MO, USA
- University of Kansas Medical School, Department of Biochemistry and Molecular Biology and Cancer Center, University of Kansas, Kansas City, KS, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Megan Y. Dennis
- Genome Center, MIND Institute, and Department of Biochemistry and Molecular Medicine, School of Medicine, University of California, Davis, Davis, CA, USA
| | - Rachel J. O'Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Justin M. Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California at San Diego, San Diego, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Charles H. Langley
- Department of Evolution and Ecology, University of California Davis, Davis, CA, USA
| | - Ivan A. Alexandrov
- Vavilov Institute of General Genetics, Moscow, Russia
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
- Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| |
Collapse
|
13
|
Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, Vollger MR, Altemose N, Uralsky L, Gershman A, Aganezov S, Hoyt SJ, Diekhans M, Logsdon GA, Alonge M, Antonarakis SE, Borchers M, Bouffard GG, Brooks SY, Caldas GV, Chen NC, Cheng H, Chin CS, Chow W, de Lima LG, Dishuck PC, Durbin R, Dvorkina T, Fiddes IT, Formenti G, Fulton RS, Fungtammasan A, Garrison E, Grady PG, Graves-Lindsay TA, Hall IM, Hansen NF, Hartley GA, Haukness M, Howe K, Hunkapiller MW, Jain C, Jain M, Jarvis ED, Kerpedjiev P, Kirsche M, Kolmogorov M, Korlach J, Kremitzki M, Li H, Maduro VV, Marschall T, McCartney AM, McDaniel J, Miller DE, Mullikin JC, Myers EW, Olson ND, Paten B, Peluso P, Pevzner PA, Porubsky D, Potapova T, Rogaev EI, Rosenfeld JA, Salzberg SL, Schneider VA, Sedlazeck FJ, Shafin K, Shew CJ, Shumate A, Sims Y, Smit AFA, Soto DC, Sović I, Storer JM, Streets A, Sullivan BA, Thibaud-Nissen F, Torrance J, Wagner J, Walenz BP, Wenger A, Wood JMD, Xiao C, Yan SM, Young AC, Zarate S, Surti U, McCoy RC, Dennis MY, Alexandrov IA, Gerton JL, O’Neill RJ, Timp W, Zook JM, Schatz MC, Eichler EE, Miga KH, Phillippy AM. The complete sequence of a human genome. Science 2022; 376:44-53. [PMID: 35357919 PMCID: PMC9186530 DOI: 10.1126/science.abj6987] [Citation(s) in RCA: 987] [Impact Index Per Article: 493.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.
Collapse
Affiliation(s)
- Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego; La Jolla, CA, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University; Saint Petersburg, Russia
| | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Nicolas Altemose
- Department of Bioengineering, University of California, Berkeley; Berkeley, CA, USA
| | - Lev Uralsky
- Sirius University of Science and Technology; Sochi, Russia
- Vavilov Institute of General Genetics; Moscow, Russia
| | - Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University; Baltimore, MD, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Savannah J. Hoyt
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | - Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Michael Alonge
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | | | | | - Gerard G. Bouffard
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Shelise Y. Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Gina V. Caldas
- Department of Molecular and Cell Biology, University of California, Berkeley; Berkeley, CA, USA
| | - Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute; Boston, MA
- Department of Biomedical Informatics, Harvard Medical School; Boston, MA
| | | | | | | | - Philip C. Dishuck
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Richard Durbin
- Wellcome Sanger Institute; Cambridge, UK
- Department of Genetics, University of Cambridge; Cambridge, UK
| | - Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University; Saint Petersburg, Russia
| | | | - Giulio Formenti
- Laboratory of Neurogenetics of Language and The Vertebrate Genome Lab, The Rockefeller University; New York, NY, USA
- Howard Hughes Medical Institute; Chevy Chase, MD, USA
| | - Robert S. Fulton
- Department of Genetics, Washington University School of Medicine; St. Louis, MO, USA
| | | | - Erik Garrison
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
- University of Tennessee Health Science Center; Memphis, TN, USA
| | - Patrick G.S. Grady
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | | | - Ira M. Hall
- Department of Genetics, Yale University School of Medicine; New Haven, CT, USA
| | - Nancy F. Hansen
- Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Gabrielle A. Hartley
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | | | | | - Chirag Jain
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
- Department of Computational and Data Sciences, Indian Institute of Science; Bangalore KA, India
| | - Miten Jain
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | - Erich D. Jarvis
- Laboratory of Neurogenetics of Language and The Vertebrate Genome Lab, The Rockefeller University; New York, NY, USA
- Howard Hughes Medical Institute; Chevy Chase, MD, USA
| | | | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Mikhail Kolmogorov
- Department of Computer Science and Engineering, University of California, San Diego; San Diego, CA, USA
| | | | - Milinn Kremitzki
- McDonnell Genome Institute, Washington University in St. Louis; St. Louis, MO, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute; Boston, MA
- Department of Biomedical Informatics, Harvard Medical School; Boston, MA
| | - Valerie V. Maduro
- Undiagnosed Diseases Program, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Tobias Marschall
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics; Düsseldorf, Germany
| | - Ann M. McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Jennifer McDaniel
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Danny E. Miller
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children’s Hospital; Seattle, WA, USA
| | - James C. Mullikin
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
- Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Eugene W. Myers
- Max-Planck Institute of Molecular Cell Biology and Genetics; Dresden, Germany
| | - Nathan D. Olson
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | | | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California, San Diego; San Diego, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research; Kansas City, MO, USA
| | - Evgeny I. Rogaev
- Sirius University of Science and Technology; Sochi, Russia
- Vavilov Institute of General Genetics; Moscow, Russia
- Department of Psychiatry, University of Massachusetts Medical School; Worcester, MA, USA
- Faculty of Biology, Lomonosov Moscow State University; Moscow, Russia
| | | | - Steven L. Salzberg
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD, USA
| | - Valerie A. Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health; Bethesda, MD, USA
| | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine; Houston TX, USA
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | - Colin J. Shew
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis; CA, USA
| | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD, USA
| | - Ying Sims
- Wellcome Sanger Institute; Cambridge, UK
| | | | - Daniela C. Soto
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis; CA, USA
| | - Ivan Sović
- Pacific Biosciences; Menlo Park, CA, USA
- Digital BioLogic d.o.o.; Ivanić-Grad, Croatia
| | | | - Aaron Streets
- Department of Bioengineering, University of California, Berkeley; Berkeley, CA, USA
- Chan Zuckerberg Biohub; San Francisco, CA, USA
| | - Beth A. Sullivan
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine; Durham, NC, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health; Bethesda, MD, USA
| | | | - Justin Wagner
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Brian P. Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | | | | | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health; Bethesda, MD, USA
| | - Stephanie M. Yan
- Department of Biology, Johns Hopkins University; Baltimore, MD, USA
| | - Alice C. Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Urvashi Surti
- Department of Pathology, University of Pittsburgh; Pittsburgh, PA, USA
| | - Rajiv C. McCoy
- Department of Biology, Johns Hopkins University; Baltimore, MD, USA
| | - Megan Y. Dennis
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis; CA, USA
| | - Ivan A. Alexandrov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University; Saint Petersburg, Russia
- Vavilov Institute of General Genetics; Moscow, Russia
- Research Center of Biotechnology of the Russian Academy of Sciences; Moscow, Russia
| | - Jennifer L. Gerton
- Stowers Institute for Medical Research; Kansas City, MO, USA
- Department of Biochemistry and Molecular Biology, University of Kansas Medical School; Kansas City, MO, USA
| | - Rachel J. O’Neill
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Johns Hopkins University; Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD, USA
| | - Justin M. Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
- Department of Biology, Johns Hopkins University; Baltimore, MD, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
- Howard Hughes Medical Institute; Chevy Chase, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| |
Collapse
|
14
|
Bankevich A, Bzikadze AV, Kolmogorov M, Antipov D, Pevzner PA. Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads. Nat Biotechnol 2022; 40:1075-1081. [PMID: 35228706 DOI: 10.1038/s41587-022-01220-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 01/11/2022] [Indexed: 11/09/2022]
Abstract
Although most existing genome assemblers are based on de Bruijn graphs, the construction of these graphs for large genomes and large k-mer sizes has remained elusive. This algorithmic challenge has become particularly pressing with the emergence of long, high-fidelity (HiFi) reads that have been recently used to generate a semi-manual telomere-to-telomere assembly of the human genome. To enable automated assemblies of long, HiFi reads, we present the La Jolla Assembler (LJA), a fast algorithm using the Bloom filter, sparse de Bruijn graphs and disjointig generation. LJA reduces the error rate in HiFi reads by three orders of magnitude, constructs the de Bruijn graph for large genomes and large k-mer sizes and transforms it into a multiplex de Bruijn graph with varying k-mer sizes. Compared to state-of-the-art assemblers, our algorithm not only achieves five-fold fewer misassemblies but also generates more contiguous assemblies. We demonstrate the utility of LJA via the automated assembly of a human genome that completely assembled six chromosomes.
Collapse
Affiliation(s)
- Anton Bankevich
- Department of Computer Science and Engineering, University of California, San Diego, San Diego CA, USA.
| | - Andrey V Bzikadze
- Program in Bioinformatics and Systems Biology, University of California, San Diego, San Diego CA, USA
| | - Mikhail Kolmogorov
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz CA, USA
| | - Dmitry Antipov
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, San Diego CA, USA.
| |
Collapse
|
15
|
Pan W, Ruan J. En Route to Completion: What Is An Ideal Reference Genome? GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:1-3. [PMID: 34509700 PMCID: PMC9510861 DOI: 10.1016/j.gpb.2021.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 09/02/2021] [Indexed: 11/12/2022]
|
16
|
Gao D, Nascimento EFMB, Leal-Bertioli SCM, Abernathy B, Jackson SA, Araujo ACG, Bertioli DJ. TAR30, a homolog of the canonical plant TTTAGGG telomeric repeat, is enriched in the proximal chromosome regions of peanut (Arachis hypogaea L.). Chromosome Res 2022; 30:77-90. [DOI: 10.1007/s10577-022-09684-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 01/07/2022] [Accepted: 01/11/2022] [Indexed: 11/03/2022]
|
17
|
Bzikadze AV, Mikheenko A, Pevzner PA. Fast and accurate mapping of long reads to complete genome assemblies with VerityMap. Genome Res 2022; 32:2107-2118. [PMID: 36379716 PMCID: PMC9808623 DOI: 10.1101/gr.276871.122] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 11/09/2022] [Indexed: 11/16/2022]
Abstract
Recent advancements in long-read sequencing have enabled the telomere-to-telomere (complete) assembly of a human genome and are now contributing to the haplotype-resolved complete assemblies of multiple human genomes. Because the accuracy of read mapping tools deteriorates in highly repetitive regions, there is a need to develop accurate, error-exposing (detecting potential assembly errors), and diploid-aware (distinguishing different haplotypes) tools for read mapping in complete assemblies. We describe the first accurate, error-exposing, and partially diploid-aware VerityMap tool for long-read mapping to complete assemblies.
Collapse
Affiliation(s)
- Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, California 92093, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, 199034, Russia
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, California 92093, USA
| |
Collapse
|
18
|
Abstract
We are entering a new era in genomics where entire centromeric regions are accurately represented in human reference assemblies. Access to these high-resolution maps will enable new surveys of sequence and epigenetic variation in the population and offer new insight into satellite array genomics and centromere function. Here, we focus on the sequence organization and evolution of alpha satellites, which are credited as the genetic and genomic definition of human centromeres due to their interaction with inner kinetochore proteins and their importance in the development of human artificial chromosome assays. We provide an overview of alpha satellite repeat structure and array organization in the context of these high-quality reference data sets; discuss the emergence of variation-based surveys; and provide perspective on the role of this new source of genetic and epigenetic variation in the context of chromosome biology, genome instability, and human disease.
Collapse
Affiliation(s)
- Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California 95064, USA; .,Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Ivan A Alexandrov
- Department of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia; .,Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199004, Russia.,Research Center of Biotechnology of the Russian Academy of Sciences, Moscow 119071, Russia
| |
Collapse
|
19
|
Suzuki Y, Morishita S. The time is ripe to investigate human centromeres by long-read sequencing†. DNA Res 2021; 28:6381569. [PMID: 34609504 PMCID: PMC8502840 DOI: 10.1093/dnares/dsab021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 09/28/2021] [Indexed: 01/05/2023] Open
Abstract
The complete sequencing of human centromeres, which are filled with highly repetitive elements, has long been challenging. In human centromeres, α-satellite monomers of about 171 bp in length are the basic repeating units, but α-satellite monomers constitute the higher-order repeat (HOR) units, and thousands of copies of highly homologous HOR units form large arrays, which have hampered sequence assembly of human centromeres. Because most HOR unit occurrences are covered by long reads of about 10 kb, the recent availability of much longer reads is expected to enable observation of individual HOR occurrences in terms of their single-nucleotide or structural variants. The time has come to examine the complete sequence of human centromeres.
Collapse
Affiliation(s)
- Yuta Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8568, Japan
| | - Shinichi Morishita
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8568, Japan
| |
Collapse
|
20
|
Abstract
The reference human genome sequence is inarguably the most important and widely used resource in the fields of human genetics and genomics. It has transformed the conduct of biomedical sciences and brought invaluable benefits to the understanding and improvement of human health. However, the commonly used reference sequence has profound limitations, because across much of its span, it represents the sequence of just one human haplotype. This single, monoploid reference structure presents a critical barrier to representing the broad genomic diversity in the human population. In this review, we discuss the modernization of the reference human genome sequence to a more complete reference of human genomic diversity, known as a human pangenome.
Collapse
Affiliation(s)
- Karen H Miga
- UC Santa Cruz Genomics Institute and Department of Biomedical Engineering, University of California, Santa Cruz, California 95064, USA;
| | - Ting Wang
- Department of Genetics, Edison Family Center for Genome Sciences and Systems Biology, and McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA;
| |
Collapse
|
21
|
Dvorkina T, Kunyavskaya O, Bzikadze AV, Alexandrov I, Pevzner PA. CentromereArchitect: inference and analysis of the architecture of centromeres. Bioinformatics 2021; 37:i196-i204. [PMID: 34252949 PMCID: PMC8336445 DOI: 10.1093/bioinformatics/btab265] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Motivation Recent advances in long-read sequencing technologies led to rapid progress in centromere assembly in the last year and, for the first time, opened a possibility to address the long-standing questions about the architecture and evolution of human centromeres. However, since these advances have not been yet accompanied by the development of the centromere-specific bioinformatics algorithms, even the fundamental questions (e.g. centromere annotation by deriving the complete set of human monomers and high-order repeats), let alone more complex questions (e.g. explaining how monomers and high-order repeats evolved) about human centromeres remain open. Moreover, even though there was a four-decade-long series of studies aimed at cataloging all human monomers and high-order repeats, the rigorous algorithmic definitions of these concepts are still lacking. Thus, the development of a centromere annotation tool is a prerequisite for follow-up personalized biomedical studies of centromeres across the human population and evolutionary studies of centromeres across various species. Results We describe the CentromereArchitect, the first tool for the centromere annotation in a newly sequenced genome, apply it to the recently generated complete assembly of a human genome by the Telomere-to-Telomere consortium, generate the complete set of human monomers and high-order repeats for ‘live’ centromeres, and reveal a vast set of hybrid monomers that may represent the focal points of centromere evolution. Availability and implementation CentromereArchitect is publicly available on https://github.com/ablab/stringdecomposer/tree/ismb2021 Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Olga Kunyavskaya
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, CA 92093, USA
| | - Ivan Alexandrov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, CA 92093, USA
| |
Collapse
|
22
|
Comprehensive identification of transposable element insertions using multiple sequencing technologies. Nat Commun 2021; 12:3836. [PMID: 34158502 PMCID: PMC8219666 DOI: 10.1038/s41467-021-24041-8] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 05/27/2021] [Indexed: 02/05/2023] Open
Abstract
Transposable elements (TEs) help shape the structure and function of the human genome. When inserted into some locations, TEs may disrupt gene regulation and cause diseases. Here, we present xTea (x-Transposable element analyzer), a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for short-read data, xTea can be applied to both short-read and long-read data. Our analysis shows that xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery. With long-read data, we created a catalogue of polymorphic insertions with full assembly and annotation of insertional sequences for various types of retroelements, including pseudogenes and endogenous retroviruses. Notably, we find that individual genomes have an average of nine groups of full-length L1s in centromeres, suggesting that centromeres and other highly repetitive regions such as telomeres are a significant yet unexplored source of active L1s. xTea is available at https://github.com/parklab/xTea .
Collapse
|
23
|
Khorsand P, Denti L, Bonizzoni P, Chikhi R, Hormozdiari F. Comparative genome analysis using sample-specific string detection in accurate long reads. BIOINFORMATICS ADVANCES 2021; 1:vbab005. [PMID: 36700094 PMCID: PMC9710709 DOI: 10.1093/bioadv/vbab005] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Motivation Comparative genome analysis of two or more whole-genome sequenced (WGS) samples is at the core of most applications in genomics. These include the discovery of genomic differences segregating in populations, case-control analysis in common diseases and diagnosing rare disorders. With the current progress of accurate long-read sequencing technologies (e.g. circular consensus sequencing from PacBio sequencers), we can dive into studying repeat regions of the genome (e.g. segmental duplications) and hard-to-detect variants (e.g. complex structural variants). Results We propose a novel framework for comparative genome analysis through the discovery of strings that are specific to one genome ('samples-specific' strings). We have developed a novel, accurate and efficient computational method for the discovery of sample-specific strings between two groups of WGS samples. The proposed approach will give us the ability to perform comparative genome analysis without the need to map the reads and is not hindered by shortcomings of the reference genome and mapping algorithms. We show that the proposed approach is capable of accurately finding sample-specific strings representing nearly all variation (>98%) reported across pairs or trios of WGS samples using accurate long reads (e.g. PacBio HiFi data). Availability and implementation Data, code and instructions for reproducing the results presented in this manuscript are publicly available at https://github.com/Parsoa/PingPong. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | - Luca Denti
- Department of Computational Biology, Institut Pasteur, Paris 75015, France
| | | | - Paola Bonizzoni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, 20126, Italy,To whom correspondence should be addressed. or or
| | - Rayan Chikhi
- Department of Computational Biology, Institut Pasteur, Paris 75015, France,To whom correspondence should be addressed. or or
| | - Fereydoun Hormozdiari
- Genome Center, UC Davis, Davis, CA 95616, USA,UC Davis MIND Institute, Sacramento, CA 95817, USA,Department of Biochemistry and Molecular Medicine, Sacramento, UC Davis, Sacramento, CA 95817, USA,To whom correspondence should be addressed. or or
| |
Collapse
|
24
|
Abstract
The first gapless, telomere-to-telomere sequence of a human autosome, chromosome 8, is complete. Sequencing and assembly of the corresponding centromere in the chimpanzee, orangutan and macaque reveals details of its rapid evolution over the past 25 million years.
Collapse
Affiliation(s)
- Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
25
|
Kim E, Kim J, Kim C, Lee J. Long-read sequencing and de novo genome assemblies reveal complex chromosome end structures caused by telomere dysfunction at the single nucleotide level. Nucleic Acids Res 2021; 49:3338-3353. [PMID: 33693840 PMCID: PMC8034613 DOI: 10.1093/nar/gkab141] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 01/28/2021] [Accepted: 02/20/2021] [Indexed: 02/06/2023] Open
Abstract
Karyotype change and subsequent evolution is triggered by chromosome fusion and rearrangement events, which often occur when telomeres become dysfunctional. Telomeres protect linear chromosome ends from DNA damage responses (DDRs), and telomere dysfunction may result in genome instability. However, the complex chromosome end structures and the other possible consequences of telomere dysfunction have rarely been resolved at the nucleotide level due to the lack of the high-throughput methods needed to analyse these highly repetitive regions. Here we applied long-read sequencing technology to Caenorhabditis elegans survivor lines that emerged after telomere dysfunction. The survivors have preserved traces of DDRs in their genomes and our data revealed that variants generated by telomere dysfunction are accumulated along all chromosomes. The reconstruction of the chromosome end structures through de novo genome assemblies revealed diverse types of telomere damage processing at the nucleotide level. When telomeric repeats were totally eroded by telomere dysfunction, DDRs were mostly terminated by chromosome fusion events. We also partially reconstructed the most complex end structure and its DDR signatures, which would have been accumulated via multiple cell divisions. These finely resolved chromosome end structures suggest possible mechanisms regarding the repair processes after telomere dysfunction, providing insights into chromosome evolution in nature.
Collapse
Affiliation(s)
- Eunkyeong Kim
- Department of Biological Sciences, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, Korea.,Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, Korea
| | - Jun Kim
- Department of Biological Sciences, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, Korea.,Research Institute of Basic Sciences, Seoul National University, Seoul 08826, Korea
| | - Chuna Kim
- Aging Research Center, Korea Research Institute of Bioscience and Biotechnology, Gwahak-ro 125, Daejeon 34141, Korea
| | - Junho Lee
- Department of Biological Sciences, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, Korea.,Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, Korea.,Research Institute of Basic Sciences, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
26
|
|
27
|
Ivanova NG, Ostromyshenskii D, Podgornaya O. Tandem Repeat-Based Probes Support the Loop Model of Pericentromere Packing. Cytogenet Genome Res 2021; 161:93-102. [PMID: 33601374 DOI: 10.1159/000513228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 08/18/2020] [Indexed: 11/19/2022] Open
Abstract
Constitutive heterochromatin is the most mysterious part of the eukaryotic genome. It forms vital chromosome regions such as the centromeric and the pericentromeric ones. The main component of heterochromatic regions are tandem repeats (TR), and their specific organization complicates assembly, annotation, and mapping of these regions. Unannotated and unmapped TR arrays are still present in database contigs. In this study, we used a set of TR in the genomes of the pig (Sus scrofa) and the Chinese hamster (Cricetulus griseus) identified with the help of bioinformatics techniques and determined the specificity of the designed probes. The signal of the 4 pig TR probes in spermatogenic cells was often ring-shaped, especially in primary spermatocytes. The rings were located in the regions relatively weakly stained with DAPI. The unique assembly of the centromeric region was traced using the hamster meiotic chromosomes. The probe specific to chromosome 5 was used. Two signals, arranged as rings, were seen at the pachytene stage, similar to those in the pig spermatogenic cells. In the spermatogenic cells of both pig and hamster, the rings appeared on the chromosomes with pericentromeric TR probes. Our observations support the loop model of the centromeric region, the size of the loops being about 50 kb.
Collapse
Affiliation(s)
- Nadezhda G Ivanova
- Laboratory of Non-coding DNA, Institute of Cytology RAS, St. Petersburg, Russian Federation,
| | | | - Olga Podgornaya
- Laboratory of Non-coding DNA, Institute of Cytology RAS, St. Petersburg, Russian Federation.,Department of Cytology and Histology, St. Petersburg State University, St. Petersburg, Russian Federation
| |
Collapse
|
28
|
Abstract
Since the human genome was published in 2001, many of the gaps in the original sequence have been filled in, offering a more detailed understanding of genome regulation, structure and function.
Collapse
|
29
|
Suzuki Y, Myers EW, Morishita S. Rapid and ongoing evolution of repetitive sequence structures in human centromeres. SCIENCE ADVANCES 2020; 6:6/50/eabd9230. [PMID: 33310858 PMCID: PMC7732198 DOI: 10.1126/sciadv.abd9230] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 10/30/2020] [Indexed: 06/12/2023]
Abstract
Our understanding of centromere sequence variation across human populations is limited by its extremely long nested repeat structures called higher-order repeats that are challenging to sequence. Here, we analyzed chromosomes 11, 17, and X using long-read sequencing data for 36 individuals from diverse populations including a Han Chinese trio and 21 Japanese. We revealed substantial structural diversity with many previously unidentified variant higher-order repeats specific to individuals characterizing rapid, haplotype-specific evolution of human centromeric arrays, while frequent single-nucleotide variants are largely conserved. We found a characteristic pattern shared among prevalent variants in human and chimpanzee. Our findings pave the way for studying sequence evolution in human and primate centromeres.
Collapse
Affiliation(s)
- Yuta Suzuki
- The University of Tokyo, Graduate School of Frontier Sciences, Department of Computational Biology and Medical Sciences, Kashiwa, Chiba 277-8568, Japan.
| | - Eugene W Myers
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | - Shinichi Morishita
- The University of Tokyo, Graduate School of Frontier Sciences, Department of Computational Biology and Medical Sciences, Kashiwa, Chiba 277-8568, Japan.
| |
Collapse
|
30
|
Miga KH, Koren S, Rhie A, Vollger MR, Gershman A, Bzikadze A, Brooks S, Howe E, Porubsky D, Logsdon GA, Schneider VA, Potapova T, Wood J, Chow W, Armstrong J, Fredrickson J, Pak E, Tigyi K, Kremitzki M, Markovic C, Maduro V, Dutra A, Bouffard GG, Chang AM, Hansen NF, Wilfert AB, Thibaud-Nissen F, Schmitt AD, Belton JM, Selvaraj S, Dennis MY, Soto DC, Sahasrabudhe R, Kaya G, Quick J, Loman NJ, Holmes N, Loose M, Surti U, Risques RA, Graves Lindsay TA, Fulton R, Hall I, Paten B, Howe K, Timp W, Young A, Mullikin JC, Pevzner PA, Gerton JL, Sullivan BA, Eichler EE, Phillippy AM. Telomere-to-telomere assembly of a complete human X chromosome. Nature 2020; 585:79-84. [PMID: 32663838 PMCID: PMC7484160 DOI: 10.1038/s41586-020-2547-7] [Citation(s) in RCA: 401] [Impact Index Per Article: 100.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Accepted: 05/29/2020] [Indexed: 12/15/2022]
Abstract
After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end to end, and hundreds of unresolved gaps persist1,2. Here we present a human genome assembly that surpasses the continuity of GRCh382, along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome3, we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes.
Collapse
Affiliation(s)
- Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ariel Gershman
- Department of Molecular Biology and Genetics, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Andrey Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, San Diego, CA, USA
| | - Shelise Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Rockville, MD, USA
| | - Edmund Howe
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | | | | | - Joel Armstrong
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Evgenia Pak
- Cytogenetic and Microscopy Core, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Kristof Tigyi
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Milinn Kremitzki
- McDonnell Genome Institute at Washington University, St Louis, MO, USA
| | | | - Valerie Maduro
- Undiagnosed Diseases Program, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Amalia Dutra
- Cytogenetic and Microscopy Core, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gerard G Bouffard
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Rockville, MD, USA
| | - Alexander M Chang
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nancy F Hansen
- Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Amy B Wilfert
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | | | - Megan Y Dennis
- Department of Biochemistry and Molecular Medicine, Genome Center, MIND Institute, University of California Davis, Davis, CA, USA
| | - Daniela C Soto
- Department of Biochemistry and Molecular Medicine, Genome Center, MIND Institute, University of California Davis, Davis, CA, USA
| | - Ruta Sahasrabudhe
- DNA Technologies Core, Genome Center, University of California Davis, Davis, CA, USA
| | - Gulhan Kaya
- Department of Biochemistry and Molecular Medicine, Genome Center, MIND Institute, University of California Davis, Davis, CA, USA
| | - Josh Quick
- Institute of Microbiology and Infection, University of Birmingham, Birmingham, UK
| | - Nicholas J Loman
- Institute of Microbiology and Infection, University of Birmingham, Birmingham, UK
| | - Nadine Holmes
- DeepSeq, School of Life Sciences, University of Nottingham, Nottingham, UK
| | - Matthew Loose
- DeepSeq, School of Life Sciences, University of Nottingham, Nottingham, UK
| | - Urvashi Surti
- Department of Pathology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Rosa Ana Risques
- Department of Pathology, University of Washington, Seattle, WA, USA
| | | | - Robert Fulton
- McDonnell Genome Institute at Washington University, St Louis, MO, USA
| | - Ira Hall
- McDonnell Genome Institute at Washington University, St Louis, MO, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Winston Timp
- Department of Molecular Biology and Genetics, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Alice Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Rockville, MD, USA
| | - James C Mullikin
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Rockville, MD, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA, USA
| | | | - Beth A Sullivan
- Department of Molecular Genetics and Microbiology, Division of Human Genetics, Duke University Medical Center, Durham, NC, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
31
|
Arunkumar G, Melters DP. Centromeric Transcription: A Conserved Swiss-Army Knife. Genes (Basel) 2020; 11:E911. [PMID: 32784923 PMCID: PMC7463856 DOI: 10.3390/genes11080911] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 08/05/2020] [Accepted: 08/07/2020] [Indexed: 12/11/2022] Open
Abstract
In most species, the centromere is comprised of repetitive DNA sequences, which rapidly evolve. Paradoxically, centromeres fulfill an essential function during mitosis, as they are the chromosomal sites wherein, through the kinetochore, the mitotic spindles bind. It is now generally accepted that centromeres are transcribed, and that such transcription is associated with a broad range of functions. More than a decade of work on this topic has shown that centromeric transcripts are found across the eukaryotic tree and associate with heterochromatin formation, chromatin structure, kinetochore structure, centromeric protein loading, and inner centromere signaling. In this review, we discuss the conservation of small and long non-coding centromeric RNAs, their associations with various centromeric functions, and their potential roles in disease.
Collapse
Affiliation(s)
| | - Daniël P. Melters
- Chromatin Structure and Epigenetic Mechanisms, Laboratory of Receptor Biology and Gene Expression, Center for Cancer Research, NCI, NIH, Bethesda, MD 20892, USA;
| |
Collapse
|
32
|
Kirov I, Odintsov S, Omarov M, Gvaramiya S, Merkulov P, Dudnikov M, Ermolaev A, Van Laere K, Soloviev A, Khrustaleva L. Functional Allium fistulosum Centromeres Comprise Arrays of a Long Satellite Repeat, Insertions of Retrotransposons and Chloroplast DNA. FRONTIERS IN PLANT SCIENCE 2020; 11:562001. [PMID: 33193489 PMCID: PMC7644871 DOI: 10.3389/fpls.2020.562001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Accepted: 10/07/2020] [Indexed: 05/08/2023]
Abstract
The centromere is a unique part of the chromosome combining a conserved function with an extreme variability in its DNA sequence. Most of our knowledge about the functional centromere organization is obtained from species with small and medium genome/chromosome sizes while the progress in plants with big genomes and large chromosomes is lagging behind. Here, we studied the genomic organization of the functional centromere in Allium fistulosum and A. cepa, both species with a large genome (13 Gb and 16 Gb/1C, 2n = 2x = 16) and large-sized chromosomes. Using low-depth DNA sequencing for these two species and previously obtained CENH3 immunoprecipitation data we identified two long (1.2 Kb) and high-copy repeats, AfCen1K and AcCen1K. FISH experiments showed that AfCen1K is located in all centromeres of A. fistulosum chromosomes while no AcCen1K FISH signals were identified on A. cepa chromosomes. Our molecular cytogenetic and bioinformatics survey demonstrated that these repeats are partially similar but differ in chromosomal location, sequence structure and genomic organization. In addition, we could conclude that the repeats are transcribed and their RNAs are not polyadenylated. We also observed that these repeats are associated with insertions of retrotransposons and plastidic DNA and the landscape of A. cepa and A. fistulosum centromeric regions possess insertions of plastidic DNA. Finally, we carried out detailed comparative satellitome analysis of A. cepa and A. fistulosum genomes and identified a new chromosome- and A. cepa-specific tandem repeat, TR2CL137, located in the centromeric region. Our results shed light on the Allium centromere organization and provide unique data for future application in Allium genome annotation.
Collapse
Affiliation(s)
- Ilya Kirov
- Laboratory of Marker-assisted and genomic selection of plants, All-Russia Research Institute of Agricultural Biotechnology, Moscow, Russia
- Kurchatov Genomics Center of ARRIAB, All-Russia Research Institute of Agricultural Biotechnology, Moscow, Russia
- *Correspondence: Ilya Kirov,
| | - Sergey Odintsov
- Center of Molecular Biotechnology, Russian State Agrarian University-Moscow Timiryazev Agricultural Academy, Moscow, Russia
| | - Murad Omarov
- Laboratory of Marker-assisted and genomic selection of plants, All-Russia Research Institute of Agricultural Biotechnology, Moscow, Russia
- National Research University Higher School of Economics, Moscow, Russia
| | - Sofya Gvaramiya
- Laboratory of Marker-assisted and genomic selection of plants, All-Russia Research Institute of Agricultural Biotechnology, Moscow, Russia
| | - Pavel Merkulov
- Laboratory of Marker-assisted and genomic selection of plants, All-Russia Research Institute of Agricultural Biotechnology, Moscow, Russia
| | - Maxim Dudnikov
- Laboratory of Marker-assisted and genomic selection of plants, All-Russia Research Institute of Agricultural Biotechnology, Moscow, Russia
| | - Alexey Ermolaev
- Center of Molecular Biotechnology, Russian State Agrarian University-Moscow Timiryazev Agricultural Academy, Moscow, Russia
| | - Katrijn Van Laere
- Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Plant Sciences Unit, Melle, Belgium
| | - Alexander Soloviev
- Laboratory of Marker-assisted and genomic selection of plants, All-Russia Research Institute of Agricultural Biotechnology, Moscow, Russia
| | - Ludmila Khrustaleva
- Center of Molecular Biotechnology, Russian State Agrarian University-Moscow Timiryazev Agricultural Academy, Moscow, Russia
- Plant Cell Engineering Laboratory, All-Russia Research Institute of Agricultural Biotechnology, Moscow, Russia
| |
Collapse
|