1
|
Zhang F, Li C, Yang D, Liu B, Zhou Y, Zhou Z, Zhong H, Wang Z, Chen D. Label-Free and Sequence-Independent Isothermal Amplification Strategy for the Simultaneous Detection of Genomic 5-Methylcytosine and 5-Hydroxymethylcytosine. Anal Chem 2025. [PMID: 39869504 DOI: 10.1021/acs.analchem.4c06200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2025]
Abstract
5-Methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) are crucial epigenetic modifications in eukaryotic genomic DNA that regulate gene expression and are associated with the occurrence of various cancers. Here, we combined bisulfite conversion with 4-acetamido-2,2,6,6-tetramethyl-1-oxopiperridinium tetrafluoroborate (ACT+BF4-, TCI) oxidation to develop a label-free and sequence-independent isothermal amplification (BTIA) assay for a genome-wide 5mC and 5hmC analysis. The BTIA strategy can distinguish 5mC and 5hmC signatures from other bases with high sensitivity and good specificity, avoiding sophisticated chemical modifications and expensive protein labeling. Moreover, the utilization of terminal deoxynucleotidyl transferase (TdT) enables the proposed strategy to detect global 5mC and 5hmC without sequence dependence. With only 78 ng of input of genomic DNA, global 5mC and 5hmC levels were accurately quantified in cells (including cancer cells of A549, T47D, and K562 and normal cells of HEK-293T, CHO, and NRK-52E) and clinical whole blood samples (including healthy control, precancerous cervical cancer, and confirmed cervical cancer) within 18 h. The detection results suggested that 5mC was highly expressed in cancer cells. More importantly, a significant increase in 5mC was observed in precancerous cervical cancer and further upregulation in confirmed cervical cancer, suggesting a correlation between 5mC and cancer occurrence and development. However, 5hmC showed the reverse result in these tested cells and clinical samples. Collectively, the BTIA strategy can be easily performed on the ordinary heating apparatus in almost all research and medical laboratories, showing a significant application in the early screening of cervical cancer in the clinic.
Collapse
Affiliation(s)
- Feng Zhang
- School of Pharmaceutical Sciences, Guizhou University, Guiyang 550025, China
| | - Chengpeng Li
- School of Pharmaceutical Sciences, Guizhou University, Guiyang 550025, China
| | - Di Yang
- School of Pharmaceutical Sciences, Guizhou University, Guiyang 550025, China
| | - Bingqian Liu
- School of Pharmaceutical Sciences, Guizhou University, Guiyang 550025, China
| | - Yue Zhou
- School of Pharmaceutical Sciences, Guizhou University, Guiyang 550025, China
| | - Zhixu Zhou
- School of Pharmaceutical Sciences, Guizhou University, Guiyang 550025, China
| | - Hang Zhong
- School of Pharmaceutical Sciences, Guizhou University, Guiyang 550025, China
| | - Zhenchao Wang
- School of Pharmaceutical Sciences, Guizhou University, Guiyang 550025, China
| | - Danping Chen
- School of Pharmaceutical Sciences, Guizhou University, Guiyang 550025, China
| |
Collapse
|
2
|
Kixmoeller K, Tarasovetc EV, Mer E, Chang YW, Black BE. Centromeric chromatin clearings demarcate the site of kinetochore formation. Cell 2025:S0092-8674(24)01467-3. [PMID: 39855195 DOI: 10.1016/j.cell.2024.12.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 11/24/2024] [Accepted: 12/18/2024] [Indexed: 01/27/2025]
Abstract
The centromere is the chromosomal locus that recruits the kinetochore, directing faithful propagation of the genome during cell division. Using cryo-ET on human mitotic chromosomes, we reveal a distinctive architecture at the centromere: clustered 20- to 25-nm nucleosome-associated complexes within chromatin clearings that delineate them from surrounding chromatin. Centromere components CENP-C and CENP-N are each required for the integrity of the complexes, while CENP-C is also required to maintain the chromatin clearing. We find that CENP-C is required in mitosis, not just for kinetochore assembly, likely reflecting its role in organizing the inner kinetochore during chromosome segregation. We further visualize the scaffold of the fibrous corona, a structure amplified at unattached kinetochores, revealing crescent-shaped parallel arrays of fibrils extending >1 μm. Thus, we reveal how the organization of centromeric chromatin creates a clearing at the site of kinetochore formation as well as the nature of kinetochore amplification mediated by corona fibrils.
Collapse
Affiliation(s)
- Kathryn Kixmoeller
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Biochemistry, Biophysics, Chemical Biology Graduate Group, University of Pennsylvania, Philadelphia, PA, USA; Institute of Structural Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Penn Center for Genome Integrity, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Ekaterina V Tarasovetc
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Institute of Structural Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Penn Center for Genome Integrity, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Elie Mer
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Biochemistry, Biophysics, Chemical Biology Graduate Group, University of Pennsylvania, Philadelphia, PA, USA; Institute of Structural Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Penn Center for Genome Integrity, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yi-Wei Chang
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Biochemistry, Biophysics, Chemical Biology Graduate Group, University of Pennsylvania, Philadelphia, PA, USA; Institute of Structural Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| | - Ben E Black
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Biochemistry, Biophysics, Chemical Biology Graduate Group, University of Pennsylvania, Philadelphia, PA, USA; Institute of Structural Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Penn Center for Genome Integrity, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
3
|
Wang M, Duan S, Sun Q, Liu K, Liu Y, Wang Z, Li X, Wei L, Liu Y, Nie S, Zhou K, Ma Y, Yuan H, Liu B, Hu L, Liu C, He G. YHSeqY3000 panel captures all founding lineages in the Chinese paternal genomic diversity database. BMC Biol 2025; 23:18. [PMID: 39838386 PMCID: PMC11752814 DOI: 10.1186/s12915-025-02122-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 01/07/2025] [Indexed: 01/23/2025] Open
Abstract
BACKGROUND The advancements in second-/third-generation sequencing technologies, alongside computational innovations, have significantly enhanced our understanding of the genomic structure of Y-chromosomes and their unique phylogenetic characteristics. These researches, despite the challenges posed by the lack of population-scale genomic databases, have the potential to revolutionize our approach to high-resolution, population-specific Y-chromosome panels and databases for anthropological and forensic applications. OBJECTIVES This study aimed to develop the highest-resolution Y-targeted sequencing panel, utilizing time-stamped, core phylogenetic informative mutations identified from high-coverage sequences in the YanHuang cohort. This panel is intended to provide a new tool for forensic complex pedigree search and paternal biogeographical ancestry inference, as well as explore the general patterns of the fine-scale paternal evolutionary history of ethnolinguistically diverse Chinese populations. RESULTS The sequencing performance of the East Asian-specific Y-chromosomal panel, including 2999-core SNP variants, was found to be robust and reliable. The YHSeqY3000 panel was designed to capture the genetic diversity of Chinese paternal lineages from 3500 years ago, identifying 408 terminal lineages in 2097 individuals across 41 genetically and geographically distinct populations. We identified a fine-scale paternal substructure that was correlating with ancient population migrations and expansions. New evidence was provided for extensive gene flow events between minority ethnic groups and Han Chinese people, based on the integrative Chinese Paternal Genomic Diversity Database. CONCLUSIONS This work successfully integrated Y-chromosome-related basic genomic science with forensic and anthropological translational applications, emphasizing the necessity of comprehensively characterizing Y-chromosome genomic diversity from genomically under-representative populations. This is particularly important in the second phase of our population-specific medical or anthropological genomic cohorts, where dense sampling strategies are employed.
Collapse
Affiliation(s)
- Mengge Wang
- Institute of Rare Diseases, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610000, Sichuan, China.
- Center for Archaeological Science, Sichuan University, Chengdu, 610000, China.
- Anti-Drug Technology Center of Guangdong Province, Guangzhou, 510230, China.
- Department of Oto-Rhino-Laryngology, West China Hospital of Sichuan University, Chengdu, 610000, China.
| | - Shuhan Duan
- Institute of Rare Diseases, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610000, Sichuan, China
- Center for Archaeological Science, Sichuan University, Chengdu, 610000, China
- School of Basic Medical Sciences, North Sichuan Medical College, Nanchong, 637100, China
- Department of Oto-Rhino-Laryngology, West China Hospital of Sichuan University, Chengdu, 610000, China
| | - Qiuxia Sun
- Institute of Rare Diseases, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610000, Sichuan, China
- Center for Archaeological Science, Sichuan University, Chengdu, 610000, China
- Department of Forensic Medicine, College of Basic Medicine, Chongqing Medical University, Chongqing, 400331, China
| | - Kaijun Liu
- School of International Tourism and Culture, Guizhou Normal University, Guiyang, 550025, China
- MoFang Human Genome Research Institute, Tianfu Software Park, Chengdu, 610042, Sichuan, China
| | - Yan Liu
- Institute of Rare Diseases, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610000, Sichuan, China
- School of Basic Medical Sciences, North Sichuan Medical College, Nanchong, 637100, China
| | - Zhiyong Wang
- Institute of Rare Diseases, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610000, Sichuan, China
- Center for Archaeological Science, Sichuan University, Chengdu, 610000, China
- School of Forensic Medicine, Kunming Medical University, Kunming, 650500, China
| | - Xiangping Li
- Institute of Rare Diseases, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610000, Sichuan, China
- Center for Archaeological Science, Sichuan University, Chengdu, 610000, China
- School of Forensic Medicine, Kunming Medical University, Kunming, 650500, China
| | - Lanhai Wei
- School of Ethnology and Anthropology, Inner Mongolia Normal University, Hohhot, 010028, Inner Mongolia, China
| | - Yunhui Liu
- Institute of Rare Diseases, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610000, Sichuan, China
- Center for Archaeological Science, Sichuan University, Chengdu, 610000, China
- Department of Forensic Medicine, College of Basic Medicine, Chongqing Medical University, Chongqing, 400331, China
| | - Shengjie Nie
- School of Forensic Medicine, Kunming Medical University, Kunming, 650500, China
| | - Kun Zhou
- MoFang Human Genome Research Institute, Tianfu Software Park, Chengdu, 610042, Sichuan, China
| | - Yongxin Ma
- Department of Medical Genetics, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, Sichuan, China
| | - Huijun Yuan
- Institute of Rare Diseases, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610000, Sichuan, China
- Center for Archaeological Science, Sichuan University, Chengdu, 610000, China
| | - Bing Liu
- Institute of Forensic Science, Ministry of Public Security, Beijing, 100038, China
| | - Lan Hu
- Institute of Forensic Science, Ministry of Public Security, Beijing, 100038, China
| | - Chao Liu
- Anti-Drug Technology Center of Guangdong Province, Guangzhou, 510230, China.
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, 510515, China.
| | - Guanglin He
- Institute of Rare Diseases, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610000, Sichuan, China.
- Center for Archaeological Science, Sichuan University, Chengdu, 610000, China.
- Anti-Drug Technology Center of Guangdong Province, Guangzhou, 510230, China.
| |
Collapse
|
4
|
Sobral AF, Dinis-Oliveira RJ, Barbosa DJ. CRISPR-Cas technology in forensic investigations: Principles, applications, and ethical considerations. Forensic Sci Int Genet 2025; 74:103163. [PMID: 39437497 DOI: 10.1016/j.fsigen.2024.103163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 10/08/2024] [Accepted: 10/09/2024] [Indexed: 10/25/2024]
Abstract
CRISPR-Cas (Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR-associated proteins) systems are adaptive immune systems originally present in bacteria, where they are essential to protect against external genetic elements, including viruses and plasmids. Taking advantage of this system, CRISPR-Cas-based technologies have emerged as incredible tools for precise genome editing, thus significantly advancing several research fields. Forensic sciences represent a multidisciplinary field that explores scientific methods to investigate and resolve legal issues, particularly criminal investigations and subject identification. Consequently, it plays a critical role in the justice system, providing scientific evidence to support judicial investigations. Although less explored, CRISPR-Cas-based methodologies demonstrate strong potential in the field of forensic sciences due to their high accuracy and sensitivity, including DNA profiling and identification, interpretation of crime scene investigations, detection of food contamination or fraud, and other aspects related to environmental forensics. However, using CRISPR-Cas-based methodologies in human samples raises several ethical issues and concerns regarding the potential misuse of individual genetic information. In this manuscript, we provide an overview of potential applications of CRISPR-Cas-based methodologies in several areas of forensic sciences and discuss the legal implications that challenge their routine implementation in this research field.
Collapse
Affiliation(s)
- Ana Filipa Sobral
- Associate Laboratory i4HB - Institute for Health and Bioeconomy, University Institute of Health Sciences - CESPU, Gandra 4585-116, Portugal; UCIBIO - Applied Molecular Biosciences Unit, Toxicologic Pathology Research Laboratory, University Institute of Health Sciences (1H-TOXRUN, IUCS-CESPU), Gandra 4585-116, Portugal.
| | - Ricardo Jorge Dinis-Oliveira
- Associate Laboratory i4HB - Institute for Health and Bioeconomy, University Institute of Health Sciences - CESPU, Gandra 4585-116, Portugal; UCIBIO - Applied Molecular Biosciences Unit, Translational Toxicology Research Laboratory, University Institute of Health Sciences (1H-TOXRUN, IUCS-CESPU), Gandra 4585-116, Portugal; Department of Public Health and Forensic Sciences and Medical Education, Faculty of Medicine, University of Porto, Porto 4200-319, Portugal; FOREN - Forensic Science Experts, Dr. Mário Moutinho Avenue, No. 33-A, Lisbon 1400-136, Portugal.
| | - Daniel José Barbosa
- Associate Laboratory i4HB - Institute for Health and Bioeconomy, University Institute of Health Sciences - CESPU, Gandra 4585-116, Portugal; UCIBIO - Applied Molecular Biosciences Unit, Translational Toxicology Research Laboratory, University Institute of Health Sciences (1H-TOXRUN, IUCS-CESPU), Gandra 4585-116, Portugal.
| |
Collapse
|
5
|
Ferreira MR, Carratto TMT, Frontanilla TS, Bonadio RS, Jain M, de Oliveira SF, Castelli EC, Mendes-Junior CT. Advances in forensic genetics: Exploring the potential of long read sequencing. Forensic Sci Int Genet 2025; 74:103156. [PMID: 39427416 DOI: 10.1016/j.fsigen.2024.103156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 10/04/2024] [Accepted: 10/06/2024] [Indexed: 10/22/2024]
Abstract
DNA-based technologies have been used in forensic practice since the mid-1980s. While PCR-based STR genotyping using Capillary Electrophoresis remains the gold standard for generating DNA profiles in routine casework worldwide, the research community is continually seeking alternative methods capable of providing additional information to enhance discrimination power or contribute with new investigative leads. Oxford Nanopore Technologies (ONT) and PacBio third-generation sequencing have revolutionized the field, offering real-time capabilities, single-molecule resolution, and long-read sequencing (LRS). ONT, the pioneer of nanopore sequencing, uses biological nanopores to analyze nucleic acids in real-time. Its devices have revolutionized sequencing and may represent an interesting alternative for forensic research and routine casework, given that it offers unparalleled flexibility in a portable size: it enables sequencing approaches that range widely from PCR-amplified short target regions (e.g., CODIS STRs) to PCR-free whole transcriptome or even ultra-long whole genome sequencing. Despite its higher error rate compared to Illumina sequencing, it can significantly improve accuracy in read alignment against a reference genome or de novo genome assembly. This is achieved by generating long contiguous sequences that correctly assemble repetitive sections and regions with structural variation. Moreover, it allows real-time determination of DNA methylation status from native DNA without the need for bisulfite conversion. LRS enables the analysis of thousands of markers at once, providing phasing information and eliminating the need for multiple assays. This maximizes the information retrieved from a single invaluable sample. In this review, we explore the potential use of LRS in different forensic genetics approaches.
Collapse
Affiliation(s)
- Marcel Rodrigues Ferreira
- Molecular Genetics and Bioinformatics Laboratory, Experimental Research Unit - Unipex, School of Medicine, São Paulo State University - Unesp, Botucatu, São Paulo, Brazil
| | - Thássia Mayra Telles Carratto
- Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP 14040-901, Brazil
| | - Tamara Soledad Frontanilla
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP 14049-900, Brazil
| | - Raphael Severino Bonadio
- Depto Genética e Morfologia, Instituto de Ciências Biológicas, Universidade de Brasília, Brasília, DF, Brazil
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Khoury College of Computer Sciences, Northeastern University, Boston, MA, United States
| | | | - Erick C Castelli
- Molecular Genetics and Bioinformatics Laboratory, Experimental Research Unit - Unipex, School of Medicine, São Paulo State University - Unesp, Botucatu, São Paulo, Brazil; Pathology Department, School of Medicine, São Paulo State University - Unesp, Botucatu, São Paulo, Brazil
| | - Celso Teixeira Mendes-Junior
- Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP 14040-901, Brazil.
| |
Collapse
|
6
|
Luo LY, Wu H, Zhao LM, Zhang YH, Huang JH, Liu QY, Wang HT, Mo DX, EEr HH, Zhang LQ, Chen HL, Jia SG, Wang WM, Li MH. Telomere-to-telomere sheep genome assembly identifies variants associated with wool fineness. Nat Genet 2025; 57:218-230. [PMID: 39779954 DOI: 10.1038/s41588-024-02037-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 11/19/2024] [Indexed: 01/11/2025]
Abstract
Ongoing efforts to improve sheep reference genome assemblies still leave many gaps and incomplete regions, resulting in a few common failures and errors in genomic studies. Here, we report a 2.85-Gb gap-free telomere-to-telomere genome of a ram (T2T-sheep1.0), including all autosomes and the X and Y chromosomes. This genome adds 220.05 Mb of previously unresolved regions and 754 new genes to the most updated reference assembly ARS-UI_Ramb_v3.0; it contains four types of repeat units (SatI, SatII, SatIII and CenY) in centromeric regions. T2T-sheep1.0 has a base accuracy of more than 99.999%, corrects several structural errors in previous reference assemblies and improves structural variant detection in repetitive sequences. Alignment of whole-genome short-read sequences of global domestic and wild sheep against T2T-sheep1.0 identifies 2,664,979 new single-nucleotide polymorphisms in previously unresolved regions, which improves the population genetic analyses and detection of selective signals for domestication (for example, ABCC4) and wool fineness (for example, FOXQ1).
Collapse
Affiliation(s)
- Ling-Yun Luo
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Hui Wu
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Li-Ming Zhao
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems; Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs; Engineering Research Center of Grassland Industry, Ministry of Education; College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, China
| | - Ya-Hui Zhang
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Jia-Hui Huang
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Qiu-Yue Liu
- Institute of Genetics and Developmental Biology, The Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing, China
| | - Hai-Tao Wang
- Institute of Genetics and Developmental Biology, The Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing, China
| | - Dong-Xin Mo
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - He-Hua EEr
- Institute of Animal Science, Ningxia Academy of Agriculture and Forestry Sciences, Yinchuan, China
| | - Lian-Quan Zhang
- Ningxia Shuomuyanchi Tan Sheep Breeding Co. Ltd., Wuzhong, China
| | | | - Shan-Gang Jia
- College of Grassland Science and Technology, China Agricultural University, Beijing, China.
| | - Wei-Min Wang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems; Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs; Engineering Research Center of Grassland Industry, Ministry of Education; College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, China.
| | - Meng-Hua Li
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, China.
| |
Collapse
|
7
|
Smeds L, Kamali K, Kejnovská I, Kejnovský E, Chiaromonte F, Makova KD. Non-canonical DNA in human and other ape telomere-to-telomere genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.02.610891. [PMID: 39713403 PMCID: PMC11661062 DOI: 10.1101/2024.09.02.610891] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
Non-canonical (non-B) DNA structures-e.g., bent DNA, hairpins, G-quadruplexes, Z-DNA, etc.-which form at certain sequence motifs (e.g., A-phased repeats, inverted repeats, etc.), have emerged as important regulators of cellular processes and drivers of genome evolution. Yet, they have been understudied due to their repetitive nature and potentially inaccurate sequences generated with short-read technologies. Here we comprehensively characterize such motifs in the long-read telomere-to-telomere (T2T) genomes of human, bonobo, chimpanzee, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. Non-B DNA motifs are enriched at the genomic regions added to T2T assemblies, and occupy 9-15%, 9-11%, and 12-38% of autosomes, and chromosomes X and Y, respectively. Functional regions (e.g., promoters and enhancers) and repetitive sequences are enriched in non-B DNA motifs. Non-B DNA motifs concentrate at short arms of acrocentric chromosomes in a pattern reflecting their satellite repeat content and might contribute to satellite dynamics in these regions. Most centromeres and/or their flanking regions are enriched in at least one non-B DNA motif type, consistent with a potential role of non-B structures in determining centromeres. Our results highlight the uneven distribution of predicted non-B DNA structures across ape genomes and suggest their novel functions in previously inaccessible genomic regions.
Collapse
Affiliation(s)
- Linnéa Smeds
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Kaivan Kamali
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Iva Kejnovská
- Department of Biophysics of Nucleic Acids, Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 65 Brno, Czech Republic
| | - Eduard Kejnovský
- Department of Plant Developmental Genetics, Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 65 Brno, Czech Republic
| | - Francesca Chiaromonte
- Department of Statistics, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park, PA 16802 USA
- L'EMbeDS, Sant'Anna School of Advanced Studies, 56127 Pisa, Italy
| | - Kateryna D Makova
- Department of Biology, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park, PA 16802 USA
| |
Collapse
|
8
|
Mastrorosa FK, Oshima KK, Rozanski AN, Harvey WT, Eichler EE, Logsdon GA. Identification and annotation of centromeric hypomethylated regions with CDR-Finder. Bioinformatics 2024; 40:btae733. [PMID: 39657946 PMCID: PMC11663805 DOI: 10.1093/bioinformatics/btae733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 11/26/2024] [Accepted: 12/06/2024] [Indexed: 12/12/2024] Open
Abstract
MOTIVATION Centromeres are chromosomal regions historically understudied with sequencing technologies due to their repetitive nature and short-read mapping limitations. However, recent improvements in long-read sequencing allow for the investigation of complex regions of the genome at the sequence and epigenetic levels. RESULTS Here, we present Centromere Dip Region (CDR)-Finder: a tool to identify regions of hypomethylation within the centromeres of high-quality, contiguous genome assemblies. These regions are typically associated with a unique type of chromatin containing the histone H3 variant CENP-A, which marks the location of the kinetochore. CDR-Finder identifies the CDRs in large and short centromeres and generates a BED file indicating the location of the CDRs within the centromere. It also outputs a plot for visualization, validation, and downstream analysis. AVAILABILITY AND IMPLEMENTATION CDR-Finder is available at https://github.com/EichlerLab/CDR-Finder.
Collapse
Affiliation(s)
- Francesco Kumara Mastrorosa
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, United States
| | - Keisuke K Oshima
- Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Allison N Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, United States
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, United States
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, United States
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, United States
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, United States
| |
Collapse
|
9
|
Volarić M, Meštrović N, Despot-Slade E. SatXplor-a comprehensive pipeline for satellite DNA analyses in complex genome assemblies. Brief Bioinform 2024; 26:bbae660. [PMID: 39708839 DOI: 10.1093/bib/bbae660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 10/31/2024] [Accepted: 12/04/2024] [Indexed: 12/23/2024] Open
Abstract
Satellite DNAs (satDNAs) are tandemly repeated sequences that make up a significant portion of almost all eukaryotic genomes. Although satDNAs have been shown to play an important role in genome organization and evolution, they are relatively poorly analyzed, even in model organisms. One of the main reasons for the current lack of in-depth studies on satDNAs is their underrepresentation in genome assemblies. Due to complexity, abundance, and highly repetitive nature of satDNAs, their analysis is challenging, requiring efficient tools that ensure accurate annotation and comprehensive genome-wide analysis. We present a novel pipeline, named satellite DNA Exploration (SatXplor), designed to robustly characterize satDNA elements and analyze their arrays and flanking regions. SatXplor is benchmarked against other tools and curated satDNA datasets from diverse species, including mice and humans, showcase its versatility across genomes with varying complexities and satDNA profiles. Component algorithms excel in the identification of tandemly repeated sequences and, for the first time, enable evaluation of satDNA variation and array annotation with the addition of information about surrounding genomic landscape. SatXplor is an innovative pipeline for satDNA analysis that can be paired with any tool used for satDNA detection, offering insights into the structural characteristics, array determination, and genomic context of satDNA elements. By integrating various computational techniques, from sequence analysis and homology investigation to advanced clustering and graph-based methods, it provides a versatile and comprehensive approach to explore the complexity of satDNA organization and understand the underlying mechanisms and evolutionary aspects. It is open-source and freely accessible at https://github.com/mvolar/SatXplor.
Collapse
Affiliation(s)
- Marin Volarić
- Ruđer Bošković Institute, Bijenička cesta 54, 10000 Zagreb, Croatia
| | | | | |
Collapse
|
10
|
Iyer SV, Goodwin S, McCombie WR. Leveraging the power of long reads for targeted sequencing. Genome Res 2024; 34:1701-1718. [PMID: 39567237 PMCID: PMC11610587 DOI: 10.1101/gr.279168.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 10/01/2024] [Indexed: 11/22/2024]
Abstract
Long-read sequencing technologies have improved the contiguity and, as a result, the quality of genome assemblies by generating reads long enough to span and resolve complex or repetitive regions of the genome. Several groups have shown the power of long reads in detecting thousands of genomic and epigenomic features that were previously missed by short-read sequencing approaches. While these studies demonstrate how long reads can help resolve repetitive and complex regions of the genome, they also highlight the throughput and coverage requirements needed to accurately resolve variant alleles across large populations using these platforms. At the time of this review, whole-genome long-read sequencing is more expensive than short-read sequencing on the highest throughput short-read instruments; thus, achieving sufficient coverage to detect low-frequency variants (such as somatic variation) in heterogenous samples remains challenging. Targeted sequencing, on the other hand, provides the depth necessary to detect these low-frequency variants in heterogeneous populations. Here, we review currently used and recently developed targeted sequencing strategies that leverage existing long-read technologies to increase the resolution with which we can look at nucleic acids in a variety of biological contexts.
Collapse
Affiliation(s)
- Shruti V Iyer
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | | |
Collapse
|
11
|
Koren S, Bao Z, Guarracino A, Ou S, Goodwin S, Jenike KM, Lucas J, McNulty B, Park J, Rautiainen M, Rhie A, Roelofs D, Schneiders H, Vrijenhoek I, Nijbroek K, Nordesjo O, Nurk S, Vella M, Lawrence KR, Ware D, Schatz MC, Garrison E, Huang S, McCombie WR, Miga KH, Wittenberg AHJ, Phillippy AM. Gapless assembly of complete human and plant chromosomes using only nanopore sequencing. Genome Res 2024; 34:1919-1930. [PMID: 39505490 PMCID: PMC11610574 DOI: 10.1101/gr.279334.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 10/08/2024] [Indexed: 11/08/2024]
Abstract
The combination of ultra-long (UL) Oxford Nanopore Technologies (ONT) sequencing reads with long, accurate Pacific Bioscience (PacBio) High Fidelity (HiFi) reads has enabled the completion of a human genome and spurred similar efforts to complete the genomes of many other species. However, this approach for complete, "telomere-to-telomere" genome assembly relies on multiple sequencing platforms, limiting its accessibility. ONT "Duplex" sequencing reads, where both strands of the DNA are read to improve quality, promise high per-base accuracy. To evaluate this new data type, we generated ONT Duplex data for three widely studied genomes: human HG002, Solanum lycopersicum Heinz 1706 (tomato), and Zea mays B73 (maize). For the diploid, heterozygous HG002 genome, we also used "Pore-C" chromatin contact mapping to completely phase the haplotypes. We found the accuracy of Duplex data to be similar to HiFi sequencing, but with read lengths tens of kilobases longer, and the Pore-C data to be compatible with existing diploid assembly algorithms. This combination of read length and accuracy enables the construction of a high-quality initial assembly, which can then be further resolved using the UL reads, and finally phased into chromosome-scale haplotypes with Pore-C. The resulting assemblies have a base accuracy exceeding 99.999% (Q50) and near-perfect continuity, with most chromosomes assembled as single contigs. We conclude that ONT sequencing is a viable alternative to HiFi sequencing for de novo genome assembly, and provides a multirun single-instrument solution for the reconstruction of complete genomes.
Collapse
Affiliation(s)
- Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA;
| | - Zhigui Bao
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Baden-Württemberg, Germany
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
| | - Shujun Ou
- Department of Molecular Genetics, Ohio State University, Columbus, Ohio 43210, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Katharine M Jenike
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Julian Lucas
- Genomics Institute, University of California Santa Cruz, Santa Cruz, California 95060, USA
| | - Brandy McNulty
- Genomics Institute, University of California Santa Cruz, Santa Cruz, California 95060, USA
| | - Jimin Park
- Genomics Institute, University of California Santa Cruz, Santa Cruz, California 95060, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Arang Rhie
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | | | | | | | - Olle Nordesjo
- Oxford Nanopore Technologies, Oxford OX4 4DQ, United Kingdom
| | - Sergey Nurk
- Oxford Nanopore Technologies, Oxford OX4 4DQ, United Kingdom
| | - Mike Vella
- Oxford Nanopore Technologies, Oxford OX4 4DQ, United Kingdom
| | | | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
- USDA ARS NEA Plant, Soil and Nutrition Laboratory Research Unit, Ithaca, New York 14853, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
| | - Sanwen Huang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
- State Key Laboratory of Tropical Crop Breeding, Chinese Academy of Tropical Agricultural Sciences, Haikou, Hainan 571101, China
| | | | - Karen H Miga
- Genomics Institute, University of California Santa Cruz, Santa Cruz, California 95060, USA
| | | | - Adam M Phillippy
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA;
| |
Collapse
|
12
|
Kukla-Bartoszek M, Głombik K. Train and Reprogram Your Brain: Effects of Physical Exercise at Different Stages of Life on Brain Functions Saved in Epigenetic Modifications. Int J Mol Sci 2024; 25:12043. [PMID: 39596111 PMCID: PMC11593723 DOI: 10.3390/ijms252212043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2024] [Revised: 11/05/2024] [Accepted: 11/07/2024] [Indexed: 11/28/2024] Open
Abstract
Multiple studies have demonstrated the significant effects of physical exercise on brain plasticity, the enhancement of memory and cognition, and mood improvement. Although the beneficial impact of exercise on brain functions and mental health is well established, the exact mechanisms underlying this phenomenon are currently under thorough investigation. Several hypotheses have emerged suggesting various possible mechanisms, including the effects of hormones, neurotrophins, neurotransmitters, and more recently also other compounds such as lactate or irisin, which are released under the exercise circumstances and act both locally or/and on distant tissues, triggering systemic body reactions. Nevertheless, none of these actually explain the long-lasting effect of exercise, which can persist for years or even be passed on to subsequent generations. It is believed that these long-lasting effects are mediated through epigenetic modifications, influencing the expression of particular genes and the translation and modification of specific proteins. This review explores the impact of regular physical exercise on brain function and brain plasticity and the associated occurrence of epigenetic modifications. It examines how these changes contribute to the prevention and treatment of neuropsychiatric and neurological disorders, as well as their influence on the natural aging process and mental health.
Collapse
Affiliation(s)
| | - Katarzyna Głombik
- Laboratory of Immunoendocrinology, Department of Experimental Neuroendocrinology, Maj Institute of Pharmacology, Polish Academy of Sciences, Smętna 12, 31-343 Kraków, Poland;
| |
Collapse
|
13
|
Mohanty SK, Chiaromonte F, Makova KD. Evolutionary Dynamics of G-Quadruplexes in Human and Other Great Ape Telomere-to-Telomere Genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.05.621973. [PMID: 39574740 PMCID: PMC11580976 DOI: 10.1101/2024.11.05.621973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2024]
Abstract
G-quadruplexes (G4s) are non-canonical DNA structures that can form at approximately 1% of the human genome. G4s contribute to point mutations and structural variation and thus facilitate genomic instability. They play important roles in regulating replication, transcription, and telomere maintenance, and some of them evolve under purifying selection. Nevertheless, the evolutionary dynamics of G4s has remained underexplored. Here we conducted a comprehensive analysis of predicted G4s (pG4s) in the recently released, telomere-to-telomere (T2T) genomes of human and other great apes-bonobo, chimpanzee, gorilla, Bornean orangutan, and Sumatran orangutan. We annotated tens of thousands of new pG4s in T2T compared to previous ape genome assemblies, including 41,236 in the human genome. Analyzing species alignments, we found approximately one-third of pG4s shared by all apes studied and identified thousands of species- and genus-specific pG4s. pG4s accumulated and diverged at rates consistent with divergence times between the studied species. We observed a significant enrichment and hypomethylation of pG4 shared across species at regulatory regions, including promoters, 5' and 3'UTRs, and origins of replication, strongly suggesting their formation and functional role in these regions. pG4s shared among great apes displayed lower methylation levels compared to species-specific pG4s, suggesting evolutionary conservation of functional roles of the former. Many species-specific pG4s were located in the repetitive and satellite regions deciphered in the T2T genomes. Our findings illuminate the evolutionary dynamics of G4s, their role in gene regulation, and their potential contribution to species-specific adaptations in great apes, emphasizing the utility of high-resolution T2T genomes in uncovering previously elusive genomic features.
Collapse
Affiliation(s)
- Saswat K. Mohanty
- Molecular, Cellular, and Integrative Biosciences, Huck Institutes of the Life Sciences, Penn State University, University Park, PA 16802, USA
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Francesca Chiaromonte
- Department of Statistics, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
- EMbeDS, Sant’Anna School of Advanced Studies, 56127 Pisa, Italy
| | - Kateryna D. Makova
- Department of Biology, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| |
Collapse
|
14
|
Kumara Mastrorosa F, Oshima KK, Rozanski AN, Harvey WT, Eichler EE, Logsdon GA. Identification and annotation of centromeric hypomethylated regions with Centromere Dip Region (CDR)-Finder. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.01.621587. [PMID: 39574726 PMCID: PMC11580854 DOI: 10.1101/2024.11.01.621587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2024]
Abstract
Centromeres are chromosomal regions historically understudied with sequencing technologies due to their repetitive nature and short-read mapping limitations. However, recent improvements in long-read sequencing allowed for the investigation of complex regions of the genome at the sequence and epigenetic levels. Here, we present Centromere Dip Region (CDR)-Finder: a tool to identify regions of hypomethylation within the centromeres of high-quality, contiguous genome assemblies. These regions are typically associated with a unique type of chromatin containing the histone H3 variant CENP-A, which marks the location of the kinetochore. CDR-Finder identifies the CDRs in large and short centromeres and generates a BED file indicating the location of the CDRs within the centromere. It also outputs a plot for visualization, validation, and downstream analysis. CDR-Finder is available at https://github.com/EichlerLab/CDR-Finder.
Collapse
Affiliation(s)
- F. Kumara Mastrorosa
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Keisuke K. Oshima
- Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Allison N. Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - William T. Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Present address: Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
15
|
Xia Y, Li D, Chen T, Pan S, Huang H, Zhang W, Liang Y, Fu Y, Peng Z, Zhang H, Zhang L, Peng S, Shi R, He X, Zhou S, Jiao W, Zhao X, Wu X, Zhou L, Zhou J, Ouyang Q, Tian Y, Jiang X, Zhou Y, Tang S, Shen J, Ohshima K, Tan Z. Microsatellite density landscapes illustrate short tandem repeats aggregation in the complete reference human genome. BMC Genomics 2024; 25:960. [PMID: 39402450 PMCID: PMC11477012 DOI: 10.1186/s12864-024-10843-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2024] [Accepted: 09/26/2024] [Indexed: 10/19/2024] Open
Abstract
BACKGROUND Microsatellites are increasingly realized to have biological significance in human genome and health in past decades, the assembled complete reference sequence of human genome T2T-CHM13 brought great help for a comprehensive study of short tandem repeats in the human genome. RESULTS Microsatellites density landscapes of all 24 chromosomes were built here for the first complete reference sequence of human genome T2T-CHM13. These landscapes showed that short tandem repeats (STRs) are prone to aggregate characteristically to form a large number of STRs density peaks. We classified 8,823 High Microsatellites Density Peaks (HMDPs), 35,257 Middle Microsatellites Density Peaks (MMDPs) and 199, 649 Low Microsatellites Density Peaks (LMDPs) on the 24 chromosomes; and also classified the motif types of every microsatellites density peak. These STRs density aggregation peaks are mainly composing of a single motif, and AT is the most dominant motif, followed by AATGG and CCATT motifs. And 514 genomic regions were characterized by microsatellite density feature in the full T2T-CHM13 genome. CONCLUSIONS These landscape maps exhibited that microsatellites aggregate in many genomic positions to form a large number of microsatellite density peaks with composing of mainly single motif type in the complete reference genome, indicating that the local microsatellites density varies enormously along the every chromosome of T2T-CHM13.
Collapse
Affiliation(s)
- Yun Xia
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Douyue Li
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Tingyi Chen
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Saichao Pan
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Hanrou Huang
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Wenxiang Zhang
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Yulin Liang
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Yongzhuo Fu
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Zhuli Peng
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Hongxi Zhang
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Liang Zhang
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Shan Peng
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Ruixue Shi
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Xingxin He
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Siqian Zhou
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Weili Jiao
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Xiangyan Zhao
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Xiaolong Wu
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Lan Zhou
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Jingyu Zhou
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Qingjian Ouyang
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - You Tian
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Xiaoping Jiang
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Yi Zhou
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Shiying Tang
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Junxiong Shen
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | | | - Zhongyang Tan
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China.
| |
Collapse
|
16
|
Yoo D, Rhie A, Hebbar P, Antonacci F, Logsdon GA, Solar SJ, Antipov D, Pickett BD, Safonova Y, Montinaro F, Luo Y, Malukiewicz J, Storer JM, Lin J, Sequeira AN, Mangan RJ, Hickey G, Anez GM, Balachandran P, Bankevich A, Beck CR, Biddanda A, Borchers M, Bouffard GG, Brannan E, Brooks SY, Carbone L, Carrel L, Chan AP, Crawford J, Diekhans M, Engelbrecht E, Feschotte C, Formenti G, Garcia GH, de Gennaro L, Gilbert D, Green RE, Guarracino A, Gupta I, Haddad D, Han J, Harris RS, Hartley GA, Harvey WT, Hiller M, Hoekzema K, Houck ML, Jeong H, Kamali K, Kellis M, Kille B, Lee C, Lee Y, Lees W, Lewis AP, Li Q, Loftus M, Loh YHE, Loucks H, Ma J, Mao Y, Martinez JFI, Masterson P, McCoy RC, McGrath B, McKinney S, Meyer BS, Miga KH, Mohanty SK, Munson KM, Pal K, Pennell M, Pevzner PA, Porubsky D, Potapova T, Ringeling FR, Roha JL, Ryder OA, Sacco S, Saha S, Sasaki T, Schatz MC, Schork NJ, Shanks C, Smeds L, Son DR, Steiner C, Sweeten AP, Tassia MG, Thibaud-Nissen F, Torres-González E, Trivedi M, Wei W, Wertz J, Yang M, Zhang P, Zhang S, Zhang Y, Zhang Z, Zhao SA, Zhu Y, Jarvis ED, Gerton JL, Rivas-González I, Paten B, Szpiech ZA, Huber CD, Lenz TL, Konkel MK, Yi SV, Canzar S, Watson CT, Sudmant PH, Molloy E, Garrison E, Lowe CB, Ventura M, O’Neill RJ, Koren S, Makova KD, Phillippy AM, Eichler EE. Complete sequencing of ape genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.31.605654. [PMID: 39131277 PMCID: PMC11312596 DOI: 10.1101/2024.07.31.605654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
We present haplotype-resolved reference genomes and comparative analyses of six ape species, namely: chimpanzee, bonobo, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. We achieve chromosome-level contiguity with unparalleled sequence accuracy (<1 error in 500,000 base pairs), completely sequencing 215 gapless chromosomes telomere-to-telomere. We resolve challenging regions, such as the major histocompatibility complex and immunoglobulin loci, providing more in-depth evolutionary insights. Comparative analyses, including human, allow us to investigate the evolution and diversity of regions previously uncharacterized or incompletely studied without bias from mapping to the human reference. This includes newly minted gene families within lineage-specific segmental duplications, centromeric DNA, acrocentric chromosomes, and subterminal heterochromatin. This resource should serve as a definitive baseline for all future evolutionary studies of humans and our closest living ape relatives.
Collapse
Affiliation(s)
- DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Prajna Hebbar
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Francesca Antonacci
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, 70124, Italy
| | - Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19103, USA
| | - Steven J. Solar
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Dmitry Antipov
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Brandon D. Pickett
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Yana Safonova
- Computer Science and Engineering Department, Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA 16801, USA
| | - Francesco Montinaro
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, 70124, Italy
- Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Yanting Luo
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA
| | - Joanna Malukiewicz
- Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, 20146 Hamburg, Germany
| | - Jessica M. Storer
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| | - Jiadong Lin
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Abigail N. Sequeira
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Riley J. Mangan
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Genetics Training Program, Harvard Medical School, Boston, MA 02115, USA
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | | | | | - Anton Bankevich
- Computer Science and Engineering Department, Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA 16801, USA
| | - Christine R. Beck
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA
| | - Arjun Biddanda
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Matthew Borchers
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Gerard G. Bouffard
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Emry Brannan
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Shelise Y. Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Lucia Carbone
- Department of Medicine, KCVI, Oregon Health Sciences University, Portland, OR, USA
- Division of Genetics, Oregon National Primate Research Center, Beaverton, OR, USA
| | - Laura Carrel
- PSU Medical School, Penn State University School of Medicine, Hershey, PA, USA
| | - Agnes P. Chan
- The Translational Genomics Research Institute, a part of the City of Hope National Medical Center, Phoenix, AZ, USA
| | - Juyun Crawford
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Eric Engelbrecht
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
| | - Cedric Feschotte
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY 10021, USA
| | - Gage H. Garcia
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Luciana de Gennaro
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, 70124, Italy
| | - David Gilbert
- San Diego Biomedical Research Institute, San Diego, CA, USA
| | | | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Ishaan Gupta
- Department of Computer Science and Engineering, University of California San Diego, CA, USA
| | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Junmin Han
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Robert S. Harris
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Gabrielle A. Hartley
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| | - William T. Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Michael Hiller
- LOEWE Centre for Translational Biodiversity Genomics, Senckenberg Research Institute, Goethe University, Frankfurt, Germany
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marlys L. Houck
- San Diego Zoo Wildlife Alliance, Escondido, CA, 92027-7000, USA
| | - Hyeonsoo Jeong
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Kaivan Kamali
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Bryce Kille
- Department of Computer Science, Rice University, Houston, TX 77005, USA
| | - Chul Lee
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Youngho Lee
- Laboratory of bioinformatics and population genetics, Interdisciplinary program in bioinformatics, Seoul National University, Republic of Korea
| | - William Lees
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
- Bioengineering Program, Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel
| | - Alexandra P. Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Mark Loftus
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Yong Hwee Eddie Loh
- Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
| | - Hailey Loucks
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, PA, USA
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
- Center for Genomic Research, International Institutes of Medicine, Fourth Affiliated Hospital, Zhejiang University, Yiwu, Zhejiang, China
- Shanghai Jiao Tong University Chongqing Research Institute, Chongqing, China
| | - Juan F. I. Martinez
- Computer Science and Engineering Department, Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA 16801, USA
| | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Rajiv C. McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Barbara McGrath
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Sean McKinney
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Britta S. Meyer
- Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, 20146 Hamburg, Germany
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Saswat K. Mohanty
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Karol Pal
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Matt Pennell
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California San Diego, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Francisca R. Ringeling
- Faculty of Informatics and Data Science, University of Regensburg, 93053 Regensburg, Germany
| | - Joana L. Roha
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA
| | - Oliver A. Ryder
- San Diego Zoo Wildlife Alliance, Escondido, CA, 92027-7000, USA
| | - Samuel Sacco
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Swati Saha
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
| | - Takayo Sasaki
- San Diego Biomedical Research Institute, San Diego, CA, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Nicholas J. Schork
- The Translational Genomics Research Institute, a part of the City of Hope National Medical Center, Phoenix, AZ, USA
| | - Cole Shanks
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Linnéa Smeds
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Dongmin R. Son
- Department of Ecology, Evolution and Marine Biology, Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
| | - Cynthia Steiner
- San Diego Zoo Wildlife Alliance, Escondido, CA, 92027-7000, USA
| | - Alexander P. Sweeten
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Michael G. Tassia
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | - Mihir Trivedi
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Wenjie Wei
- School of Life Sciences, Westlake University, Hangzhou 310024, China
- National Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, 430070, Wuhan, China
| | - Julie Wertz
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Muyu Yang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, PA, USA
| | - Panpan Zhang
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Yang Zhang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, PA, USA
| | - Zhenmiao Zhang
- Department of Computer Science and Engineering, University of California San Diego, CA, USA
| | - Sarah A. Zhao
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Yixin Zhu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Erich D. Jarvis
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | | | - Iker Rivas-González
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Zachary A. Szpiech
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Christian D. Huber
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Tobias L. Lenz
- Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, 20146 Hamburg, Germany
| | - Miriam K. Konkel
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Soojin V. Yi
- Department of Ecology, Evolution and Marine Biology, Department of Molecular, Cellular and Developmental Biology, Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
| | - Stefan Canzar
- Faculty of Informatics and Data Science, University of Regensburg, 93053 Regensburg, Germany
| | - Corey T. Watson
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
| | - Peter H. Sudmant
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA
- Center for Computational Biology, University of California, Berkeley, Berkeley, USA
| | - Erin Molloy
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Craig B. Lowe
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA
| | - Mario Ventura
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, 70124, Italy
| | - Rachel J. O’Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA
- Departments of Molecular and Cell Biology, UConn Storrs, CT, USA
| | - Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Kateryna D. Makova
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
17
|
Gafurov A, VinaŘ T, Medvedev P, Brejová B. Fast Context-Aware Analysis of Genome Annotation Colocalization. J Comput Biol 2024; 31:946-964. [PMID: 39381845 PMCID: PMC11698669 DOI: 10.1089/cmb.2024.0667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2024] Open
Abstract
An annotation is a set of genomic intervals sharing a particular function or property. Examples include genes or their exons, sequence repeats, regions with a particular epigenetic state, and copy number variants. A common task is to compare two annotations to determine if one is enriched or depleted in the regions covered by the other. We study the problem of assigning statistical significance to such a comparison based on a null model representing random unrelated annotations. To incorporate more background information into such analyses, we propose a new null model based on a Markov chain that differentiates among several genomic contexts. These contexts can capture various confounding factors, such as GC content or assembly gaps. We then develop a new algorithm for estimating p-values by computing the exact expectation and variance of the test statistic and then estimating the p-value using a normal approximation. Compared to the previous algorithm by Gafurov et al., the new algorithm provides three advances: (1) the running time is improved from quadratic to linear or quasi-linear, (2) the algorithm can handle two different test statistics, and (3) the algorithm can handle both simple and context-dependent Markov chain null models. We demonstrate the efficiency and accuracy of our algorithm on synthetic and real data sets, including the recent human telomere-to-telomere assembly. In particular, our algorithm computed p-values for 450 pairs of human genome annotations using 24 threads in under three hours. Moreover, the use of genomic contexts to correct for GC bias resulted in the reversal of some previously published findings.
Collapse
Affiliation(s)
- Askar Gafurov
- Faculty of Mathematics, Physics and Informatics, Comenius University, Bratislava, Slovakia
- LIRMM, University of Montpellier, Montpellier, France
| | - Tomáš VinaŘ
- Faculty of Mathematics, Physics and Informatics, Comenius University, Bratislava, Slovakia
| | - Paul Medvedev
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, USA
| | - BroŇa Brejová
- Faculty of Mathematics, Physics and Informatics, Comenius University, Bratislava, Slovakia
| |
Collapse
|
18
|
Karageorgiou C, Gokcumen O, Dennis MY. Deciphering the role of structural variation in human evolution: a functional perspective. Curr Opin Genet Dev 2024; 88:102240. [PMID: 39121701 PMCID: PMC11485010 DOI: 10.1016/j.gde.2024.102240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 06/27/2024] [Accepted: 07/23/2024] [Indexed: 08/12/2024]
Abstract
Advances in sequencing technologies have enabled the comparison of high-quality genomes of diverse primate species, revealing vast amounts of divergence due to structural variation. Given their large size, structural variants (SVs) can simultaneously alter the function and regulation of multiple genes. Studies estimate that collectively more than 3.5% of the genome is divergent in humans versus other great apes, impacting thousands of genes. Functional genomics and gene-editing tools in various model systems recently emerged as an exciting frontier - investigating the wide-ranging impacts of SVs on molecular, cellular, and systems-level phenotypes. This review examines existing research and identifies future directions to broaden our understanding of the functional roles of SVs on phenotypic innovations and diversity impacting uniquely human features, ranging from cognition to metabolic adaptations.
Collapse
Affiliation(s)
- Charikleia Karageorgiou
- Department of Biological Sciences, University at Buffalo, 109 Cooke Hall, Buffalo, NY 14260, USA. https://twitter.com/@evobioclio
| | - Omer Gokcumen
- Department of Biological Sciences, University at Buffalo, 109 Cooke Hall, Buffalo, NY 14260, USA
| | - Megan Y Dennis
- Department of Biochemistry & Molecular Medicine, Genome Center, and MIND Institute, University of California, Davis, CA 95616, USA.
| |
Collapse
|
19
|
Olagunju TA, Rosen BD, Neibergs HL, Becker GM, Davenport KM, Elsik CG, Hadfield TS, Koren S, Kuhn KL, Rhie A, Shira KA, Skibiel AL, Stegemiller MR, Thorne JW, Villamediana P, Cockett NE, Murdoch BM, Smith TPL. Telomere-to-telomere assemblies of cattle and sheep Y-chromosomes uncover divergent structure and gene content. Nat Commun 2024; 15:8277. [PMID: 39333471 PMCID: PMC11436988 DOI: 10.1038/s41467-024-52384-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 09/05/2024] [Indexed: 09/29/2024] Open
Abstract
Reference genomes of cattle and sheep have lacked contiguous assemblies of the sex-determining Y chromosome. Here, we assemble complete and gapless telomere to telomere (T2T) Y chromosomes for these species. We find that the pseudo-autosomal regions are similar in length, but the total chromosome size is substantially different, with the cattle Y more than twice the length of the sheep Y. The length disparity is accounted for by expanded ampliconic region in cattle. The genic amplification in cattle contrasts with pseudogenization in sheep suggesting opposite evolutionary mechanisms since their divergence 19MYA. The centromeres also differ dramatically despite the close relationship between these species at the overall genome sequence level. These Y chromosomes have been added to the current reference assemblies in GenBank opening new opportunities for the study of evolution and variation while supporting efforts to improve sustainability in these important livestock species that generally use sire-driven genetic improvement strategies.
Collapse
Affiliation(s)
- Temitayo A Olagunju
- Department of Animal, Veterinary and Food Sciences (AVFS), University of Idaho, Moscow, ID, USA
| | - Benjamin D Rosen
- Animal Genomics and Improvement Laboratory (AGIL), ARS, USDA, Beltsville, MD, USA
| | - Holly L Neibergs
- Department of Animal Sciences, Washington State University, Pullman, WA, USA
| | - Gabrielle M Becker
- Department of Animal, Veterinary and Food Sciences (AVFS), University of Idaho, Moscow, ID, USA
| | | | - Christine G Elsik
- Divisions of Animal Sciences and Plant Science & Technology, University of Missouri, Columbia, MO, USA
| | - Tracy S Hadfield
- Animal, Dairy and Veterinary Sciences (ADVS), Utah State University, Logan, UT, USA
| | - Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Kristen L Kuhn
- U.S. Meat Animal Research Center (USMARC), ARS, USDA, Clay Center, NE, USA
| | - Arang Rhie
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Katie A Shira
- Department of Animal, Veterinary and Food Sciences (AVFS), University of Idaho, Moscow, ID, USA
| | - Amy L Skibiel
- Department of Animal, Veterinary and Food Sciences (AVFS), University of Idaho, Moscow, ID, USA
| | - Morgan R Stegemiller
- Department of Animal, Veterinary and Food Sciences (AVFS), University of Idaho, Moscow, ID, USA
| | | | - Patricia Villamediana
- Department of Dairy and Food Science, South Dakota State University, Brookings, SD, USA
| | - Noelle E Cockett
- Animal, Dairy and Veterinary Sciences (ADVS), Utah State University, Logan, UT, USA
| | - Brenda M Murdoch
- Department of Animal, Veterinary and Food Sciences (AVFS), University of Idaho, Moscow, ID, USA.
| | - Timothy P L Smith
- U.S. Meat Animal Research Center (USMARC), ARS, USDA, Clay Center, NE, USA.
| |
Collapse
|
20
|
de Lima LG, Guarracino A, Koren S, Potapova T, McKinney S, Rhie A, Solar SJ, Seidel C, Fagen B, Walenz BP, Bouffard GG, Brooks SY, Peterson M, Hall K, Crawford J, Young AC, Pickett BD, Garrison E, Phillippy AM, Gerton JL. The formation and propagation of human Robertsonian chromosomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.24.614821. [PMID: 39386535 PMCID: PMC11463614 DOI: 10.1101/2024.09.24.614821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Robertsonian chromosomes are a type of variant chromosome found commonly in nature. Present in one in 800 humans, these chromosomes can underlie infertility, trisomies, and increased cancer incidence. Recognized cytogenetically for more than a century, their origins have remained mysterious. Recent advances in genomics allowed us to assemble three human Robertsonian chromosomes completely. We identify a common breakpoint and epigenetic changes in centromeres that provide insight into the formation and propagation of common Robertsonian translocations. Further investigation of the assembled genomes of chimpanzee and bonobo highlights the structural features of the human genome that uniquely enable the specific crossover event that creates these chromosomes. Resolving the structure and epigenetic features of human Robertsonian chromosomes at a molecular level paves the way to understanding how chromosomal structural variation occurs more generally, and how chromosomes evolve.
Collapse
Affiliation(s)
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Sean McKinney
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Arang Rhie
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Steven J Solar
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Chris Seidel
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Brandon Fagen
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Brian P Walenz
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gerard G Bouffard
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shelise Y Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Kate Hall
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Juyun Crawford
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alice C Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Brandon D Pickett
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Adam M Phillippy
- Stowers Institute for Medical Research, Kansas City, MO, USA
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | |
Collapse
|
21
|
Logsdon GA, Ebert P, Audano PA, Loftus M, Porubsky D, Ebler J, Yilmaz F, Hallast P, Prodanov T, Yoo D, Paisie CA, Harvey WT, Zhao X, Martino GV, Henglin M, Munson KM, Rabbani K, Chin CS, Gu B, Ashraf H, Austine-Orimoloye O, Balachandran P, Bonder MJ, Cheng H, Chong Z, Crabtree J, Gerstein M, Guethlein LA, Hasenfeld P, Hickey G, Hoekzema K, Hunt SE, Jensen M, Jiang Y, Koren S, Kwon Y, Li C, Li H, Li J, Norman PJ, Oshima KK, Paten B, Phillippy AM, Pollock NR, Rausch T, Rautiainen M, Scholz S, Song Y, Söylev A, Sulovari A, Surapaneni L, Tsapalou V, Zhou W, Zhou Y, Zhu Q, Zody MC, Mills RE, Devine SE, Shi X, Talkowski ME, Chaisson MJP, Dilthey AT, Konkel MK, Korbel JO, Lee C, Beck CR, Eichler EE, Marschall T. Complex genetic variation in nearly complete human genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.24.614721. [PMID: 39372794 PMCID: PMC11451754 DOI: 10.1101/2024.09.24.614721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
Diverse sets of complete human genomes are required to construct a pangenome reference and to understand the extent of complex structural variation. Here, we sequence 65 diverse human genomes and build 130 haplotype-resolved assemblies (130 Mbp median continuity), closing 92% of all previous assembly gaps1,2 and reaching telomere-to-telomere (T2T) status for 39% of the chromosomes. We highlight complete sequence continuity of complex loci, including the major histocompatibility complex (MHC), SMN1/SMN2, NBPF8, and AMY1/AMY2, and fully resolve 1,852 complex structural variants (SVs). In addition, we completely assemble and validate 1,246 human centromeres. We find up to 30-fold variation in α-satellite high-order repeat (HOR) array length and characterize the pattern of mobile element insertions into α-satellite HOR arrays. While most centromeres predict a single site of kinetochore attachment, epigenetic analysis suggests the presence of two hypomethylated regions for 7% of centromeres. Combining our data with the draft pangenome reference1 significantly enhances genotyping accuracy from short-read data, enabling whole-genome inference3 to a median quality value (QV) of 45. Using this approach, 26,115 SVs per sample are detected, substantially increasing the number of SVs now amenable to downstream disease association studies.
Collapse
Affiliation(s)
- Glennis A Logsdon
- Perelman School of Medicine, University of Pennsylvania, Department of Genetics, Epigenetics Institute, Philadelphia, PA, USA
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter Ebert
- Core Unit Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Mark Loftus
- Clemson University, Department of Genetics & Biochemistry, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Timofey Prodanov
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Carolyn A Paisie
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Xuefang Zhao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Gianni V Martino
- Clemson University, Department of Genetics & Biochemistry, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
- Medical University of South Carolina, College of Graduate Studies, Charleston, SC, USA
| | - Mir Henglin
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Keon Rabbani
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | - Bida Gu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Hufsah Ashraf
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Olanrewaju Austine-Orimoloye
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | | | - Marc Jan Bonder
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands; Oncode Institute, Utrecht, The Netherlands
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center, Heidelberg, Germany
| | - Haoyu Cheng
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| | - Zechen Chong
- Department of Biomedical Informatics and Data Science, Heersink School of Medicine, University of Alabama, Birmingham, AL, USA
| | - Jonathan Crabtree
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Lisbeth A Guethlein
- Department of Structural Biology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Patrick Hasenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Matthew Jensen
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Yunzhe Jiang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Youngjun Kwon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Chong Li
- Temple University, Department of Computer and Information Sciences, College of Science and Technology, Philadelphia, PA, USA
- Temple University, Institute for Genomics and Evolutionary Medicine, Philadelphia, PA, USA
| | - Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Jiaqi Li
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Paul J Norman
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
- Department of Immunology and Microbiology, University of Colorado School of Medicine, Aurora, CO, USA
| | - Keisuke K Oshima
- Perelman School of Medicine, University of Pennsylvania, Department of Genetics, Epigenetics Institute, Philadelphia, PA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nicholas R Pollock
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
| | - Tobias Rausch
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Mikko Rautiainen
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Stephan Scholz
- Institute of Medical Microbiology and Hospital Hygiene, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Yuwei Song
- Department of Biomedical Informatics and Data Science, Heersink School of Medicine, University of Alabama, Birmingham, AL, USA
| | - Arda Söylev
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Likhitha Surapaneni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Vasiliki Tsapalou
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Weichen Zhou
- Department of Computational Medicine & Bioinformatics, University of Michigan, MI, USA
| | - Ying Zhou
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Stanford Health Care, Palo Alto, CA, USA
| | | | - Ryan E Mills
- Department of Computational Medicine & Bioinformatics, University of Michigan, MI, USA
| | - Scott E Devine
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Xinghua Shi
- Temple University, Department of Computer and Information Sciences, College of Science and Technology, Philadelphia, PA, USA
- Temple University, Institute for Genomics and Evolutionary Medicine, Philadelphia, PA, USA
| | - Mike E Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Alexander T Dilthey
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
- Institute of Medical Microbiology and Hospital Hygiene, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Miriam K Konkel
- Clemson University, Department of Genetics & Biochemistry, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- The University of Connecticut Health Center, Farmington, CT, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| |
Collapse
|
22
|
Wen W, Zhong J, Zhang Z, Jia L, Chu T, Wang N, Danko CG, Wang Z. dHICA: a deep transformer-based model enables accurate histone imputation from chromatin accessibility. Brief Bioinform 2024; 25:bbae459. [PMID: 39316943 PMCID: PMC11421843 DOI: 10.1093/bib/bbae459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 07/13/2024] [Accepted: 09/04/2024] [Indexed: 09/26/2024] Open
Abstract
Histone modifications (HMs) are pivotal in various biological processes, including transcription, replication, and DNA repair, significantly impacting chromatin structure. These modifications underpin the molecular mechanisms of cell-type-specific gene expression and complex diseases. However, annotating HMs across different cell types solely using experimental approaches is impractical due to cost and time constraints. Herein, we present dHICA (deep histone imputation using chromatin accessibility), a novel deep learning framework that integrates DNA sequences and chromatin accessibility data to predict multiple HM tracks. Employing the transformer architecture alongside dilated convolutions, dHICA boasts an extensive receptive field and captures more cell-type-specific information. dHICA outperforms state-of-the-art baselines and achieves superior performance in cell-type-specific loci and gene elements, aligning with biological expectations. Furthermore, dHICA's imputations hold significant potential for downstream applications, including chromatin state segmentation and elucidating the functional implications of SNPs (Single Nucleotide Polymorphisms). In conclusion, dHICA serves as a valuable tool for advancing the understanding of chromatin dynamics, offering enhanced predictive capabilities and interpretability.
Collapse
Affiliation(s)
- Wen Wen
- School of Software Technology, Dalian University of Technology, Linggong Rd, Liaoning 116024, China
| | - Jiaxin Zhong
- School of Software Technology, Dalian University of Technology, Linggong Rd, Liaoning 116024, China
| | - Zhaoxi Zhang
- School of Software Technology, Dalian University of Technology, Linggong Rd, Liaoning 116024, China
| | - Lijuan Jia
- School of Software Technology, Dalian University of Technology, Linggong Rd, Liaoning 116024, China
| | - Tinyi Chu
- Meinig School of Biomedical Engineering, Cornell University, Weill Hall, Ithaca, NY 14853, United States
| | - Nating Wang
- Department of Molecular Biology and Genetics, Cornell University, Biotechnology Building, Ithaca, NY 14853, United States
| | - Charles G Danko
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Hungerford Hill Rd, Ithaca, NY 14853, United States
- Department of Biomedical Sciences, College of Veterinary Medicine, Cornell University, Tower Rd, Ithaca, NY 14853, United States
| | - Zhong Wang
- School of Software Technology, Dalian University of Technology, Linggong Rd, Liaoning 116024, China
| |
Collapse
|
23
|
Ma Z, Zuo T, Frey N, Rangrez AY. A systematic framework for understanding the microbiome in human health and disease: from basic principles to clinical translation. Signal Transduct Target Ther 2024; 9:237. [PMID: 39307902 PMCID: PMC11418828 DOI: 10.1038/s41392-024-01946-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 07/03/2024] [Accepted: 08/01/2024] [Indexed: 09/26/2024] Open
Abstract
The human microbiome is a complex and dynamic system that plays important roles in human health and disease. However, there remain limitations and theoretical gaps in our current understanding of the intricate relationship between microbes and humans. In this narrative review, we integrate the knowledge and insights from various fields, including anatomy, physiology, immunology, histology, genetics, and evolution, to propose a systematic framework. It introduces key concepts such as the 'innate and adaptive genomes', which enhance genetic and evolutionary comprehension of the human genome. The 'germ-free syndrome' challenges the traditional 'microbes as pathogens' view, advocating for the necessity of microbes for health. The 'slave tissue' concept underscores the symbiotic intricacies between human tissues and their microbial counterparts, highlighting the dynamic health implications of microbial interactions. 'Acquired microbial immunity' positions the microbiome as an adjunct to human immune systems, providing a rationale for probiotic therapies and prudent antibiotic use. The 'homeostatic reprogramming hypothesis' integrates the microbiome into the internal environment theory, potentially explaining the change in homeostatic indicators post-industrialization. The 'cell-microbe co-ecology model' elucidates the symbiotic regulation affecting cellular balance, while the 'meta-host model' broadens the host definition to include symbiotic microbes. The 'health-illness conversion model' encapsulates the innate and adaptive genomes' interplay and dysbiosis patterns. The aim here is to provide a more focused and coherent understanding of microbiome and highlight future research avenues that could lead to a more effective and efficient healthcare system.
Collapse
Affiliation(s)
- Ziqi Ma
- Department of Cardiology, Angiology and Pneumology, University Hospital Heidelberg, Heidelberg, Germany.
- DZHK (German Centre for Cardiovascular Research), partner site Heidelberg/Mannheim, Heidelberg, Germany.
| | - Tao Zuo
- Key Laboratory of Human Microbiome and Chronic Diseases (Sun Yat-sen University), Ministry of Education, Guangzhou, China
- Guangdong Institute of Gastroenterology, The Sixth Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Norbert Frey
- Department of Cardiology, Angiology and Pneumology, University Hospital Heidelberg, Heidelberg, Germany.
- DZHK (German Centre for Cardiovascular Research), partner site Heidelberg/Mannheim, Heidelberg, Germany.
| | - Ashraf Yusuf Rangrez
- Department of Cardiology, Angiology and Pneumology, University Hospital Heidelberg, Heidelberg, Germany.
- DZHK (German Centre for Cardiovascular Research), partner site Heidelberg/Mannheim, Heidelberg, Germany.
| |
Collapse
|
24
|
Engelbrecht E, Rodriguez OL, Watson CT. Addressing Technical Pitfalls in Pursuit of Molecular Factors That Mediate Immunoglobulin Gene Regulation. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2024; 213:651-662. [PMID: 39007649 PMCID: PMC11333172 DOI: 10.4049/jimmunol.2400131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 06/13/2024] [Indexed: 07/16/2024]
Abstract
The expressed Ab repertoire is a critical determinant of immune-related phenotypes. Ab-encoding transcripts are distinct from other expressed genes because they are transcribed from somatically rearranged gene segments. Human Abs are composed of two identical H and L chain polypeptides derived from genes in IGH locus and one of two L chain loci. The combinatorial diversity that results from Ab gene rearrangement and the pairing of different H and L chains contributes to the immense diversity of the baseline Ab repertoire. During rearrangement, Ab gene selection is mediated by factors that influence chromatin architecture, promoter/enhancer activity, and V(D)J recombination. Interindividual variation in the composition of the Ab repertoire associates with germline variation in IGH, implicating polymorphism in Ab gene regulation. Determining how IGH variants directly mediate gene regulation will require integration of these variants with other functional genomic datasets. In this study, we argue that standard approaches using short reads have limited utility for characterizing regulatory regions in IGH at haplotype resolution. Using simulated and chromatin immunoprecipitation sequencing reads, we define features of IGH that limit use of short reads and a single reference genome, namely 1) the highly duplicated nature of the DNA sequence in IGH and 2) structural polymorphisms that are frequent in the population. We demonstrate that personalized diploid references enhance performance of short-read data for characterizing mappable portions of the locus, while also showing that long-read profiling tools will ultimately be needed to fully resolve functional impacts of IGH germline variation on expressed Ab repertoires.
Collapse
Affiliation(s)
- Eric Engelbrecht
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY
| | - Oscar L Rodriguez
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY
| |
Collapse
|
25
|
Hartley GA, Okhovat M, Hoyt SJ, Fuller E, Pauloski N, Alexandre N, Alexandrov I, Drennan R, Dubocanin D, Gilbert DM, Mao Y, McCann C, Neph S, Ryabov F, Sasaki T, Storer JM, Svendsen D, Troy W, Wells J, Core L, Stergachis A, Carbone L, O’Neill RJ. Centromeric transposable elements and epigenetic status drive karyotypic variation in the eastern hoolock gibbon. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.29.610280. [PMID: 39257810 PMCID: PMC11384015 DOI: 10.1101/2024.08.29.610280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Great apes have maintained a stable karyotype with few large-scale rearrangements; in contrast, gibbons have undergone a high rate of chromosomal rearrangements coincident with rapid centromere turnover. Here we characterize assembled centromeres in the Eastern hoolock gibbon, Hoolock leuconedys (HLE), finding a diverse group of transposable elements (TEs) that differ from the canonical alpha satellites found across centromeres of other apes. We find that HLE centromeres contain a CpG methylation centromere dip region, providing evidence this epigenetic feature is conserved in the absence of satellite arrays; nevertheless, we report a variety of atypical centromeric features, including protein-coding genes and mismatched replication timing. Further, large structural variations define HLE centromeres and distinguish them from other gibbons. Combined with differentially methylated TEs, topologically associated domain boundaries, and segmental duplications at chromosomal breakpoints, we propose that a "perfect storm" of multiple genomic attributes with propensities for chromosome instability shaped gibbon centromere evolution.
Collapse
Affiliation(s)
- Gabrielle A. Hartley
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Mariam Okhovat
- Department of Medicine, Knight Cardiovascular Institute, Oregon Health and Science University, Portland, OR, USA
| | - Savannah J. Hoyt
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Emily Fuller
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Nicole Pauloski
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Nicolas Alexandre
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA, USA
| | - Ivan Alexandrov
- Department of Anatomy and Anthropology and Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Israel
| | - Ryan Drennan
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Danilo Dubocanin
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
| | - David M. Gilbert
- San Diego Biomedical Research Institute, San Diego, CA 92121, USA
| | - Yizi Mao
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Christine McCann
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Shane Neph
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Fedor Ryabov
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| | - Takayo Sasaki
- San Diego Biomedical Research Institute, San Diego, CA 92121, USA
| | - Jessica M. Storer
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Derek Svendsen
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | | | - Jackson Wells
- Department of Medicine, Knight Cardiovascular Institute, Oregon Health and Science University, Portland, OR, USA
| | - Leighton Core
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Andrew Stergachis
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Lucia Carbone
- Department of Medicine, Knight Cardiovascular Institute, Oregon Health and Science University, Portland, OR, USA
- Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, USA
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR, USA
- Division of Genetics, Oregon National Primate Research Center, Portland, OR, USA
| | - Rachel J. O’Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
- Department of Genetics and Genome Sciences, UConn Health, Farmington, CT, USA
| |
Collapse
|
26
|
Pandiloski N, Horváth V, Karlsson O, Koutounidou S, Dorazehi F, Christoforidou G, Matas-Fuentes J, Gerdes P, Garza R, Jönsson ME, Adami A, Atacho DAM, Johansson JG, Englund E, Kokaia Z, Jakobsson J, Douse CH. DNA methylation governs the sensitivity of repeats to restriction by the HUSH-MORC2 corepressor. Nat Commun 2024; 15:7534. [PMID: 39214989 PMCID: PMC11364546 DOI: 10.1038/s41467-024-50765-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 07/18/2024] [Indexed: 09/04/2024] Open
Abstract
The human silencing hub (HUSH) complex binds to transcripts of LINE-1 retrotransposons (L1s) and other genomic repeats, recruiting MORC2 and other effectors to remodel chromatin. How HUSH and MORC2 operate alongside DNA methylation, a central epigenetic regulator of repeat transcription, remains largely unknown. Here we interrogate this relationship in human neural progenitor cells (hNPCs), a somatic model of brain development that tolerates removal of DNA methyltransferase DNMT1. Upon loss of MORC2 or HUSH subunit TASOR in hNPCs, L1s remain silenced by robust promoter methylation. However, genome demethylation and activation of evolutionarily-young L1s attracts MORC2 binding, and simultaneous depletion of DNMT1 and MORC2 causes massive accumulation of L1 transcripts. We identify the same mechanistic hierarchy at pericentromeric α-satellites and clustered protocadherin genes, repetitive elements important for chromosome structure and neurodevelopment respectively. Our data delineate the epigenetic control of repeats in somatic cells, with implications for understanding the vital functions of HUSH-MORC2 in hypomethylated contexts throughout human development.
Collapse
Affiliation(s)
- Ninoslav Pandiloski
- Laboratory of Epigenetics and Chromatin Dynamics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC B11, Lund University, Lund, Sweden
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Vivien Horváth
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Ofelia Karlsson
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Symela Koutounidou
- Laboratory of Epigenetics and Chromatin Dynamics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC B11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Fereshteh Dorazehi
- Laboratory of Epigenetics and Chromatin Dynamics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC B11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Georgia Christoforidou
- Laboratory of Epigenetics and Chromatin Dynamics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC B11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Jon Matas-Fuentes
- Laboratory of Epigenetics and Chromatin Dynamics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC B11, Lund University, Lund, Sweden
| | - Patricia Gerdes
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Raquel Garza
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | | | - Anita Adami
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Diahann A M Atacho
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Jenny G Johansson
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
| | - Elisabet Englund
- Division of Pathology, Department of Clinical Sciences, Lund University, Lund, Sweden
| | - Zaal Kokaia
- Lund Stem Cell Center, Lund University, Lund, Sweden
- Laboratory of Stem Cells and Restorative Neurology, Department of Clinical Sciences, BMC B10, Lund University, Lund, Sweden
| | - Johan Jakobsson
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Christopher H Douse
- Laboratory of Epigenetics and Chromatin Dynamics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC B11, Lund University, Lund, Sweden.
- Lund Stem Cell Center, Lund University, Lund, Sweden.
| |
Collapse
|
27
|
Hardikar S, Ren R, Ying Z, Zhou J, Horton JR, Bramble MD, Liu B, Lu Y, Liu B, Coletta LD, Shen J, Dan J, Zhang X, Cheng X, Chen T. The ICF syndrome protein CDCA7 harbors a unique DNA binding domain that recognizes a CpG dyad in the context of a non-B DNA. SCIENCE ADVANCES 2024; 10:eadr0036. [PMID: 39178265 PMCID: PMC11343032 DOI: 10.1126/sciadv.adr0036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Accepted: 07/18/2024] [Indexed: 08/25/2024]
Abstract
CDCA7, encoding a protein with a carboxyl-terminal cysteine-rich domain (CRD), is mutated in immunodeficiency, centromeric instability, and facial anomalies (ICF) syndrome, a disease related to hypomethylation of juxtacentromeric satellite DNA. How CDCA7 directs DNA methylation to juxtacentromeric regions is unknown. Here, we show that the CDCA7 CRD adopts a unique zinc-binding structure that recognizes a CpG dyad in a non-B DNA formed by two sequence motifs. CDCA7, but not ICF mutants, preferentially binds the non-B DNA with strand-specific CpG hemi-methylation. The unmethylated sequence motif is highly enriched at centromeres of human chromosomes, whereas the methylated motif is distributed throughout the genome. At S phase, CDCA7, but not ICF mutants, is concentrated in constitutive heterochromatin foci, and the formation of such foci can be inhibited by exogenous hemi-methylated non-B DNA bound by the CRD. Binding of the non-B DNA formed in juxtacentromeric regions during DNA replication provides a mechanism by which CDCA7 controls the specificity of DNA methylation.
Collapse
Affiliation(s)
- Swanand Hardikar
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Ren Ren
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Zhengzhou Ying
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Jujun Zhou
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - John R. Horton
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Matthew D. Bramble
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Bin Liu
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Program in Genetics and Epigenetics, The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX 77030, USA
| | - Yue Lu
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Bigang Liu
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Luis Della Coletta
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Jianjun Shen
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Jiameng Dan
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Xing Zhang
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Xiaodong Cheng
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Program in Genetics and Epigenetics, The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX 77030, USA
| | - Taiping Chen
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Program in Genetics and Epigenetics, The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX 77030, USA
| |
Collapse
|
28
|
Feng H, Wu L, Zhao B, Huff C, Zhang J, Wu J, Lin L, Wei P, Wu C. Benchmarking DNA Foundation Models for Genomic Sequence Classification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.16.608288. [PMID: 39185205 PMCID: PMC11343214 DOI: 10.1101/2024.08.16.608288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
The rapid advancement of DNA foundation language models has revolutionized the field of genomics, enabling the decoding of complex patterns and regulatory mechanisms within DNA sequences. However, the current evaluation of these models often relies on fine-tuning and limited datasets, which introduces biases and limits the assessment of their true potential. Here, we present a benchmarking study of three recent DNA foundation language models, including DNABERT-2, Nucleotide Transformer version-2 (NT-v2), and HyenaDNA, focusing on the quality of their zero-shot embeddings across a diverse range of genomic tasks and species through analyses of 57 real datasets. We found that DNABERT-2 exhibits the most consistent performance across human genome-related tasks, while NT-v2 excels in epigenetic modification detection. HyenaDNA stands out for its exceptional runtime scalability and ability to handle long input sequences. Importantly, we demonstrate that using mean token embedding consistently improves the performance of all three models compared to the default setting of sentence-level summary token embedding, with average AUC improvements ranging from 4.3% to 9.7% for different DNA foundation models. Furthermore, the performance differences between these models are significantly reduced when using mean token embedding. Our findings provide a framework for selecting and optimizing DNA language models, guiding researchers in applying these tools effectively in genomic studies.
Collapse
Affiliation(s)
- Haonan Feng
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Lang Wu
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawaii Cancer Center, University of Hawaii at Manoa, Honolulu, HI, 96813, USA
| | - Bingxin Zhao
- Department of Statistics and Data Science, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Chad Huff
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Jianjun Zhang
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Jia Wu
- Department of Imaging Physics, Division of Diagnostic Imaging, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Lifeng Lin
- Department of Epidemiology and Biostatistics, University of Arizona, Tucson, AZ, 85724, USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Chong Wu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
- Institute for Data Science in Oncology, The UT MD Anderson Cancer Center, Houston, TX, 77030, USA
| |
Collapse
|
29
|
Porubsky D, Dashnow H, Sasani TA, Logsdon GA, Hallast P, Noyes MD, Kronenberg ZN, Mokveld T, Koundinya N, Nolan C, Steely CJ, Guarracino A, Dolzhenko E, Harvey WT, Rowell WJ, Grigorev K, Nicholas TJ, Oshima KK, Lin J, Ebert P, Watkins WS, Leung TY, Hanlon VCT, McGee S, Pedersen BS, Goldberg ME, Happ HC, Jeong H, Munson KM, Hoekzema K, Chan DD, Wang Y, Knuth J, Garcia GH, Fanslow C, Lambert C, Lee C, Smith JD, Levy S, Mason CE, Garrison E, Lansdorp PM, Neklason DW, Jorde LB, Quinlan AR, Eberle MA, Eichler EE. A familial, telomere-to-telomere reference for human de novo mutation and recombination from a four-generation pedigree. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.05.606142. [PMID: 39149261 PMCID: PMC11326147 DOI: 10.1101/2024.08.05.606142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Using five complementary short- and long-read sequencing technologies, we phased and assembled >95% of each diploid human genome in a four-generation, 28-member family (CEPH 1463) allowing us to systematically assess de novo mutations (DNMs) and recombination. From this family, we estimate an average of 192 DNMs per generation, including 75.5 de novo single-nucleotide variants (SNVs), 7.4 non-tandem repeat indels, 79.6 de novo indels or structural variants (SVs) originating from tandem repeats, 7.7 centromeric de novo SVs and SNVs, and 12.4 de novo Y chromosome events per generation. STRs and VNTRs are the most mutable with 32 loci exhibiting recurrent mutation through the generations. We accurately assemble 288 centromeres and six Y chromosomes across the generations, documenting de novo SVs, and demonstrate that the DNM rate varies by an order of magnitude depending on repeat content, length, and sequence identity. We show a strong paternal bias (75-81%) for all forms of germline DNM, yet we estimate that 17% of de novo SNVs are postzygotic in origin with no paternal bias. We place all this variation in the context of a high-resolution recombination map (~3.5 kbp breakpoint resolution). We observe a strong maternal recombination bias (1.36 maternal:paternal ratio) with a consistent reduction in the number of crossovers with increasing paternal (r=0.85) and maternal (r=0.65) age. However, we observe no correlation between meiotic crossover locations and de novo SVs, arguing against non-allelic homologous recombination as a predominant mechanism. The use of multiple orthogonal technologies, near-telomere-to-telomere phased genome assemblies, and a multi-generation family to assess transmission has created the most comprehensive, publicly available "truth set" of all classes of genomic variants. The resource can be used to test and benchmark new algorithms and technologies to understand the most fundamental processes underlying human genetic variation.
Collapse
Affiliation(s)
- David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Thomas A Sasani
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Present address: Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Michelle D Noyes
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Nidhi Koundinya
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Cody J Steely
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Department of Internal Medicine, University of Kentucky College of Medicine, Lexington, KY, USA
| | - Andrea Guarracino
- Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | | | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - William J Rowell
- Department of Internal Medicine, University of Kentucky College of Medicine, Lexington, KY, USA
| | - Kirill Grigorev
- Blue Marble Space Institute of Science, Seattle, WA, USA
- Core Unit Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
| | - Thomas J Nicholas
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Keisuke K Oshima
- Present address: Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jiadong Lin
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter Ebert
- Core Unit Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - W Scott Watkins
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Tiffany Y Leung
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC, Canada
| | | | - Sean McGee
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Brent S Pedersen
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Michael E Goldberg
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Hannah C Happ
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Hyeonsoo Jeong
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Present address: Altos Labs, San Diego, CA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Daniel D Chan
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC, Canada
| | - Yanni Wang
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC, Canada
| | - Jordan Knuth
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Gage H Garcia
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Joshua D Smith
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Shawn Levy
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
- The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA
| | - Erik Garrison
- Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | | | - Deborah W Neklason
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Lynn B Jorde
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | | | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
30
|
Kalbfleisch TS, McKay SD, Murdoch BM, Adelson DL, Almansa-Villa D, Becker G, Beckett LM, Benítez-Galeano MJ, Biase F, Casey T, Chuong E, Clark E, Clarke S, Cockett N, Couldrey C, Davis BW, Elsik CG, Faraut T, Gao Y, Genet C, Grady P, Green J, Green R, Guan D, Hagen D, Hartley GA, Heaton M, Hoyt SJ, Huang W, Jarvis E, Kalleberg J, Khatib H, Koepfi KP, Koltes J, Koren S, Kuehn C, Leeb T, Leonard A, Liu GE, Low WY, McConnell H, McRae K, Miga K, Mousel M, Neibergs H, Olagunju T, Pennell M, Petry B, Pewsner M, Phillippy AM, Pickett BD, Pineda P, Potapova T, Rachagani S, Rhie A, Rijnkels M, Robic A, Rodriguez Osorio N, Safonova Y, Schettini G, Schnabel RD, Sirpu Natesh N, Stegemiller M, Storer J, Stothard P, Stull C, Tosser-Klopp G, Traglia GM, Tuggle CK, Van Tassell CP, Watson C, Weikard R, Wimmers K, Xie S, Yang L, Smith TPL, O'Neill RJ, Rosen BD. The Ruminant Telomere-to-Telomere (RT2T) Consortium. Nat Genet 2024; 56:1566-1573. [PMID: 39103649 DOI: 10.1038/s41588-024-01835-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 06/14/2024] [Indexed: 08/07/2024]
Abstract
Telomere-to-telomere (T2T) assemblies reveal new insights into the structure and function of the previously 'invisible' parts of the genome and allow comparative analyses of complete genomes across entire clades. We present here an open collaborative effort, termed the 'Ruminant T2T Consortium' (RT2T), that aims to generate complete diploid assemblies for numerous species of the Artiodactyla suborder Ruminantia to examine chromosomal evolution in the context of natural selection and domestication of species used as livestock.
Collapse
Affiliation(s)
| | - Stephanie D McKay
- Division of Animal Sciences, University of Missouri, Columbia, MO, USA
| | - Brenda M Murdoch
- Department of Animal, Veterinary and Food Sciences, University of Idaho, Moscow, ID, USA
| | - David L Adelson
- School of Biological Sciences, the University of Adelaide, North Terrace, Adelaide, South Australia, Australia
| | - Diego Almansa-Villa
- Genomics and Bioinformatics Unit, Departamento de Ciencias Biológicas, CENUR Litoral Norte, Universidad de la República, Salto, Uruguay
| | - Gabrielle Becker
- Department of Animal, Veterinary and Food Sciences, University of Idaho, Moscow, ID, USA
| | - Linda M Beckett
- Department of Animal Sciences, Purdue University, West Lafayette, IN, USA
| | - María José Benítez-Galeano
- Genomics and Bioinformatics Unit, Departamento de Ciencias Biológicas, CENUR Litoral Norte, Universidad de la República, Salto, Uruguay
| | - Fernando Biase
- School of Animal Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| | - Theresa Casey
- Department of Animal Sciences, Purdue University, West Lafayette, IN, USA
| | - Edward Chuong
- BioFrontiers Institute, Department of Molecular Cellular and Developmental Biology, University of Colorado Boulder, Boulder, CO, USA
| | - Emily Clark
- The Roslin Institute, University of Edinburgh, Edinburgh, UK
| | - Shannon Clarke
- Invermay Agricultural Centre, AgResearch Ltd, Mosgiel, New Zealand
| | - Noelle Cockett
- Department of Animal, Dairy and Veterinary Sciences, Utah State University, Logan, UT, USA
| | | | - Brian W Davis
- Department of Veterinary Pathobiology, School of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX, USA
| | - Christine G Elsik
- Division of Animal Sciences, University of Missouri, Columbia, MO, USA
| | - Thomas Faraut
- GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, France
| | - Yahui Gao
- Animal Genomics and Improvement Laboratory, USDA ARS, Beltsville, MD, USA
| | - Carine Genet
- GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, France
| | - Patrick Grady
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| | - Jonathan Green
- Division of Animal Sciences, University of Missouri, Columbia, MO, USA
| | - Richard Green
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| | - Dailu Guan
- Department of Animal Science, University of California, Davis, Davis, CA, USA
| | - Darren Hagen
- Department of Animal and Food Sciences, Oklahoma State University, Stillwater, OK, USA
| | | | - Mike Heaton
- U.S. Meat Animal Research Center, USDA ARS, Clay Center, NE, USA
| | - Savannah J Hoyt
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| | - Wen Huang
- Department of Animal Science, Michigan State University, East Lansing, MI, USA
| | - Erich Jarvis
- Vertebrate Genome Laboratory, the Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Jenna Kalleberg
- Division of Animal Sciences, University of Missouri, Columbia, MO, USA
| | - Hasan Khatib
- Department of Animal and Dairy Sciences, the University of Wisconsin-Madison, Madison, WI, USA
| | - Klaus-Peter Koepfi
- Smithsonian-Mason School of Conservation, George Mason University, Front Royal, VA, USA
- Center for Species Survival, Smithsonian's National Zoo and Conservation Biology Institute, Front Royal, VA, USA
| | - James Koltes
- Department of Animal Science, Iowa State University, Ames, IA, USA
| | - Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Christa Kuehn
- Friedrich-Loeffler-Institute (German Federal Research Institute for Animal Health), Greifswald-Insel Riems, Germany
| | - Tosso Leeb
- Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern, Switzerland
| | | | - George E Liu
- Animal Genomics and Improvement Laboratory, USDA ARS, Beltsville, MD, USA
| | - Wai Yee Low
- The Davies Research Centre, School of Animal and Veterinary Sciences, University of Adelaide, Roseworthy, South Australia, Australia
| | - Hunter McConnell
- Division of Animal Sciences, University of Missouri, Columbia, MO, USA
| | - Kathryn McRae
- Invermay Agricultural Centre, AgResearch Ltd, Mosgiel, New Zealand
| | - Karen Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- Biomolecular Engineering Department, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Michelle Mousel
- Animal Disease Research Unit, USDA ARS, Pullman, WA, USA
- School for Global Animal Health, Washington State University, Pullman, WA, USA
| | - Holly Neibergs
- Department of Animal Science, Washington State University, Pullman, WA, USA
| | - Temitayo Olagunju
- Department of Animal, Veterinary and Food Sciences, University of Idaho, Moscow, ID, USA
| | - Matt Pennell
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Bruna Petry
- Department of Animal Science, Iowa State University, Ames, IA, USA
| | - Mirjam Pewsner
- Institute of Fish and Wildlife Health, Vetsuisse Faculty, University of Bern, Bern, Switzerland
| | - Adam M Phillippy
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Brandon D Pickett
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Paulene Pineda
- The Davies Research Centre, School of Animal and Veterinary Sciences, University of Adelaide, Roseworthy, South Australia, Australia
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Satyanarayana Rachagani
- Veterinary Medicine and Surgery, NextGen Precision Health Institute, University of Missouri, Columbia, MO, USA
| | - Arang Rhie
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Monique Rijnkels
- Department of Veterinary Integrative Biosciences, School of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX, USA
| | - Annie Robic
- GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, France
| | - Nelida Rodriguez Osorio
- Genomics and Bioinformatics Unit, Departamento de Ciencias Biológicas, CENUR Litoral Norte, Universidad de la República, Salto, Uruguay
| | - Yana Safonova
- Computer Science and Engineering Department, Huck Institutes of the Life Sciences, Pennsylvania State University, State College, PA, USA
| | - Gustavo Schettini
- School of Animal Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| | - Robert D Schnabel
- Division of Animal Sciences, University of Missouri, Columbia, MO, USA
| | | | - Morgan Stegemiller
- Department of Animal, Veterinary and Food Sciences, University of Idaho, Moscow, ID, USA
| | - Jessica Storer
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Paul Stothard
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, Alberta, Canada
| | - Caleb Stull
- Division of Animal Sciences, University of Missouri, Columbia, MO, USA
| | | | - Germán M Traglia
- Genomics and Bioinformatics Unit, Departamento de Ciencias Biológicas, CENUR Litoral Norte, Universidad de la República, Salto, Uruguay
| | | | | | - Corey Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY, USA
| | - Rosemarie Weikard
- Institute of Genome Biology, Research Institute for Farm Animal Biology (FBN), Dummerstorf, Germany
| | - Klaus Wimmers
- Institute of Genome Biology, Research Institute for Farm Animal Biology (FBN), Dummerstorf, Germany
| | - Shangqian Xie
- Department of Animal, Veterinary and Food Sciences, University of Idaho, Moscow, ID, USA
| | - Liu Yang
- Animal Genomics and Improvement Laboratory, USDA ARS, Beltsville, MD, USA
| | | | - Rachel J O'Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA.
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA.
| | - Benjamin D Rosen
- Animal Genomics and Improvement Laboratory, USDA ARS, Beltsville, MD, USA.
| |
Collapse
|
31
|
Taylor DJ, Eizenga JM, Li Q, Das A, Jenike KM, Kenny EE, Miga KH, Monlong J, McCoy RC, Paten B, Schatz MC. Beyond the Human Genome Project: The Age of Complete Human Genome Sequences and Pangenome References. Annu Rev Genomics Hum Genet 2024; 25:77-104. [PMID: 38663087 PMCID: PMC11451085 DOI: 10.1146/annurev-genom-021623-081639] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2024]
Abstract
The Human Genome Project was an enormous accomplishment, providing a foundation for countless explorations into the genetics and genomics of the human species. Yet for many years, the human genome reference sequence remained incomplete and lacked representation of human genetic diversity. Recently, two major advances have emerged to address these shortcomings: complete gap-free human genome sequences, such as the one developed by the Telomere-to-Telomere Consortium, and high-quality pangenomes, such as the one developed by the Human Pangenome Reference Consortium. Facilitated by advances in long-read DNA sequencing and genome assembly algorithms, complete human genome sequences resolve regions that have been historically difficult to sequence, including centromeres, telomeres, and segmental duplications. In parallel, pangenomes capture the extensive genetic diversity across populations worldwide. Together, these advances usher in a new era of genomics research, enhancing the accuracy of genomic analysis, paving the path for precision medicine, and contributing to deeper insights into human biology.
Collapse
Affiliation(s)
- Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Jordan M Eizenga
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Arun Das
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Katharine M Jenike
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA;
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA;
| | - Karen H Miga
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Jean Monlong
- Institut de Recherche en Santé Digestive, Université de Toulouse, INSERM, INRA, ENVT, UPS, Toulouse, France;
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Benedict Paten
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| |
Collapse
|
32
|
Xiong HY, Wyns A, Campenhout JV, Hendrix J, De Bruyne E, Godderis L, Schabrun S, Nijs J, Polli A. Epigenetic Landscapes of Pain: DNA Methylation Dynamics in Chronic Pain. Int J Mol Sci 2024; 25:8324. [PMID: 39125894 PMCID: PMC11312850 DOI: 10.3390/ijms25158324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 07/19/2024] [Accepted: 07/23/2024] [Indexed: 08/12/2024] Open
Abstract
Chronic pain is a prevalent condition with a multifaceted pathogenesis, where epigenetic modifications, particularly DNA methylation, might play an important role. This review delves into the intricate mechanisms by which DNA methylation and demethylation regulate genes associated with nociception and pain perception in nociceptive pathways. We explore the dynamic nature of these epigenetic processes, mediated by DNA methyltransferases (DNMTs) and ten-eleven translocation (TET) enzymes, which modulate the expression of pro- and anti-nociceptive genes. Aberrant DNA methylation profiles have been observed in patients with various chronic pain syndromes, correlating with hypersensitivity to painful stimuli, neuronal hyperexcitability, and inflammatory responses. Genome-wide analyses shed light on differentially methylated regions and genes that could serve as potential biomarkers for chronic pain in the epigenetic landscape. The transition from acute to chronic pain is marked by rapid DNA methylation reprogramming, suggesting its potential role in pain chronicity. This review highlights the importance of understanding the temporal dynamics of DNA methylation during this transition to develop targeted therapeutic interventions. Reversing pathological DNA methylation patterns through epigenetic therapies emerges as a promising strategy for pain management.
Collapse
Affiliation(s)
- Huan-Yu Xiong
- Pain in Motion Research Group (PAIN), Department of Physiotherapy, Human Physiology and Anatomy, Faculty of Physical Education & Physiotherapy, Vrije Universiteit Brussel, 1090 Brussels, Belgium; (H.-Y.X.); (A.W.); (J.V.C.); (J.H.); (A.P.)
| | - Arne Wyns
- Pain in Motion Research Group (PAIN), Department of Physiotherapy, Human Physiology and Anatomy, Faculty of Physical Education & Physiotherapy, Vrije Universiteit Brussel, 1090 Brussels, Belgium; (H.-Y.X.); (A.W.); (J.V.C.); (J.H.); (A.P.)
| | - Jente Van Campenhout
- Pain in Motion Research Group (PAIN), Department of Physiotherapy, Human Physiology and Anatomy, Faculty of Physical Education & Physiotherapy, Vrije Universiteit Brussel, 1090 Brussels, Belgium; (H.-Y.X.); (A.W.); (J.V.C.); (J.H.); (A.P.)
| | - Jolien Hendrix
- Pain in Motion Research Group (PAIN), Department of Physiotherapy, Human Physiology and Anatomy, Faculty of Physical Education & Physiotherapy, Vrije Universiteit Brussel, 1090 Brussels, Belgium; (H.-Y.X.); (A.W.); (J.V.C.); (J.H.); (A.P.)
- Department of Public Health and Primary Care, Centre for Environment & Health, KU Leuven, 3000 Leuven, Belgium;
- Research Foundation—Flanders (FWO), 1000 Brussels, Belgium
| | - Elke De Bruyne
- Translational Oncology Research Center (TORC), Team Hematology and Immunology (HEIM), Vrije Universiteit Brussel, 1090 Brussels, Belgium;
| | - Lode Godderis
- Department of Public Health and Primary Care, Centre for Environment & Health, KU Leuven, 3000 Leuven, Belgium;
| | - Siobhan Schabrun
- The School of Physical Therapy, University of Western Ontario, London, ON N6A 3K7, Canada;
- The Gray Centre for Mobility and Activity, Parkwood Institute, St. Joseph’s Healthcare, London, ON N6A 4V2, Canada
| | - Jo Nijs
- Pain in Motion Research Group (PAIN), Department of Physiotherapy, Human Physiology and Anatomy, Faculty of Physical Education & Physiotherapy, Vrije Universiteit Brussel, 1090 Brussels, Belgium; (H.-Y.X.); (A.W.); (J.V.C.); (J.H.); (A.P.)
- Chronic Pain Rehabilitation, Department of Physical Medicine and Physiotherapy, University Hospital Brussels, 1090 Brussels, Belgium
- Department of Health and Rehabilitation, Unit of Physiotherapy, Institute of Neuroscience and Physiology, Sahlgrenska Academy, University of Gothenburg, 41390 Göterbog, Sweden
| | - Andrea Polli
- Pain in Motion Research Group (PAIN), Department of Physiotherapy, Human Physiology and Anatomy, Faculty of Physical Education & Physiotherapy, Vrije Universiteit Brussel, 1090 Brussels, Belgium; (H.-Y.X.); (A.W.); (J.V.C.); (J.H.); (A.P.)
- Department of Public Health and Primary Care, Centre for Environment & Health, KU Leuven, 3000 Leuven, Belgium;
- Research Foundation—Flanders (FWO), 1000 Brussels, Belgium
| |
Collapse
|
33
|
Jayakrishnan M, Havlová M, Veverka V, Regnard C, Becker P. Genomic context-dependent histone H3K36 methylation by three Drosophila methyltransferases and implications for dedicated chromatin readers. Nucleic Acids Res 2024; 52:7627-7649. [PMID: 38813825 PMCID: PMC11260483 DOI: 10.1093/nar/gkae449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 05/03/2024] [Accepted: 05/28/2024] [Indexed: 05/31/2024] Open
Abstract
Methylation of histone H3 at lysine 36 (H3K36me3) marks active chromatin. The mark is interpreted by epigenetic readers that assist transcription and safeguard the integrity of the chromatin fiber. The chromodomain protein MSL3 binds H3K36me3 to target X-chromosomal genes in male Drosophila for dosage compensation. The PWWP-domain protein JASPer recruits the JIL1 kinase to active chromatin on all chromosomes. Unexpectedly, depletion of K36me3 had variable, locus-specific effects on the interactions of those readers. This observation motivated a systematic and comprehensive study of K36 methylation in a defined cellular model. Contrasting prevailing models, we found that K36me1, K36me2 and K36me3 each contribute to distinct chromatin states. A gene-centric view of the changing K36 methylation landscape upon depletion of the three methyltransferases Set2, NSD and Ash1 revealed local, context-specific methylation signatures. Set2 catalyzes K36me3 predominantly at transcriptionally active euchromatin. NSD places K36me2/3 at defined loci within pericentric heterochromatin and on weakly transcribed euchromatic genes. Ash1 deposits K36me1 at regions with enhancer signatures. The genome-wide mapping of MSL3 and JASPer suggested that they bind K36me2 in addition to K36me3, which was confirmed by direct affinity measurement. This dual specificity attracts the readers to a broader range of chromosomal locations and increases the robustness of their actions.
Collapse
Affiliation(s)
- Muhunden Jayakrishnan
- Biomedical Center, Molecular Biology Division, Ludwig-Maximilians-Universität, Munich, Germany
| | - Magdalena Havlová
- Institute of Organic Chemistry and Biochemistry (IOCB) of the Czech Academy of Sciences, Prague, Czech Republic
| | - Václav Veverka
- Institute of Organic Chemistry and Biochemistry (IOCB) of the Czech Academy of Sciences, Prague, Czech Republic
- Department of Cell Biology, Faculty of Science, Charles University, Prague, Czech Republic
| | - Catherine Regnard
- Biomedical Center, Molecular Biology Division, Ludwig-Maximilians-Universität, Munich, Germany
| | - Peter B Becker
- Biomedical Center, Molecular Biology Division, Ludwig-Maximilians-Universität, Munich, Germany
| |
Collapse
|
34
|
Glunčić M, Vlahović I, Rosandić M, Paar V. Novel Cascade Alpha Satellite HORs in Orangutan Chromosome 13 Assembly: Discovery of the 59mer HOR-The largest Unit in Primates-And the Missing Triplet 45/27/18 HOR in Human T2T-CHM13v2.0 Assembly. Int J Mol Sci 2024; 25:7596. [PMID: 39062839 PMCID: PMC11276891 DOI: 10.3390/ijms25147596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 07/05/2024] [Accepted: 07/09/2024] [Indexed: 07/28/2024] Open
Abstract
From the recent genome assembly NHGRI_mPonAbe1-v2.0_NCBI (GCF_028885655.2) of orangutan chromosome 13, we computed the precise alpha satellite higher-order repeat (HOR) structure using the novel high-precision GRM2023 algorithm with Global Repeat Map (GRM) and Monomer Distance (MD) diagrams. This study rigorously identified alpha satellite HORs in the centromere of orangutan chromosome 13, discovering a novel 59mer HOR-the longest HOR unit identified in any primate to date. Additionally, it revealed the first intertwined sequence of three HORs, 18mer/27mer/45mer HORs, with a common aligned "backbone" across all HOR copies. The major 7mer HOR exhibits a Willard's-type canonical copy, although some segments of the array display significant irregularities. In contrast, the 14mer HOR forms a regular Willard's-type HOR array. Surprisingly, the GRM2023 high-precision analysis of chromosome 13 of human genome assembly T2T-CHM13v2.0 reveals the presence of only a 7mer HOR, despite both the orangutan and human genome assemblies being derived from whole genome shotgun sequences.
Collapse
Affiliation(s)
- Matko Glunčić
- Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia;
| | - Ines Vlahović
- Department of Interdisciplinary Sciences, Algebra University College, 10000 Zagreb, Croatia;
| | - Marija Rosandić
- University Hospital Centre Zagreb (Ret.), 10000 Zagreb, Croatia;
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| | - Vladimir Paar
- Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia;
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| |
Collapse
|
35
|
Xiao M, Wei R, Yu J, Gao C, Yang F, Zhang L. CpG Island Definition and Methylation Mapping of the T2T-YAO Genome. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae009. [PMID: 39142816 DOI: 10.1093/gpbjnl/qzae009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 12/05/2023] [Accepted: 12/08/2023] [Indexed: 08/16/2024]
Abstract
Precisely defining and mapping all cytosine (C) positions and their clusters, known as CpG islands (CGIs), as well as their methylation status, are pivotal for genome-wide epigenetic studies, especially when population-centric reference genomes are ready for timely application. Here, we first align the two high-quality reference genomes, T2T-YAO and T2T-CHM13, from different ethnic backgrounds in a base-by-base fashion and compute their genome-wide density-defined and position-defined CGIs. Second, by mapping some representative genome-wide methylation data from selected organs onto the two genomes, we find that there are about 4.7%-5.8% sequence divergency of variable categories depending on quality cutoffs. Genes among the divergent sequences are mostly associated with neurological functions. Moreover, CGIs associated with the divergent sequences are significantly different with respect to CpG density and observed CpG/expected CpG (O/E) ratio between the two genomes. Finally, we find that the T2T-YAO genome not only has a greater CpG coverage than that of the T2T-CHM13 genome when whole-genome bisulfite sequencing (WGBS) data from the European and American populations are mapped to each reference, but also shows more hyper-methylated CpG sites as compared to the T2T-CHM13 genome. Our study suggests that future genome-wide epigenetic studies of the Chinese populations rely on both acquisition of high-quality methylation data and subsequent precision CGI mapping based on the Chinese T2T reference.
Collapse
Affiliation(s)
- Ming Xiao
- College of Computer Science, Sichuan University, Chengdu 610065, China
| | - Rui Wei
- College of Computer Science, Sichuan University, Chengdu 610065, China
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
| | - Jun Yu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chujie Gao
- College of Computer Science, Sichuan University, Chengdu 610065, China
| | - Fengyi Yang
- College of Computer Science, Sichuan University, Chengdu 610065, China
| | - Le Zhang
- College of Computer Science, Sichuan University, Chengdu 610065, China
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
| |
Collapse
|
36
|
Andrade Ruiz L, Kops GJPL, Sacristan C. Vertebrate centromere architecture: from chromatin threads to functional structures. Chromosoma 2024; 133:169-181. [PMID: 38856923 PMCID: PMC11266386 DOI: 10.1007/s00412-024-00823-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Revised: 05/21/2024] [Accepted: 05/27/2024] [Indexed: 06/11/2024]
Abstract
Centromeres are chromatin structures specialized in sister chromatid cohesion, kinetochore assembly, and microtubule attachment during chromosome segregation. The regional centromere of vertebrates consists of long regions of highly repetitive sequences occupied by the Histone H3 variant CENP-A, and which are flanked by pericentromeres. The three-dimensional organization of centromeric chromatin is paramount for its functionality and its ability to withstand spindle forces. Alongside CENP-A, key contributors to the folding of this structure include components of the Constitutive Centromere-Associated Network (CCAN), the protein CENP-B, and condensin and cohesin complexes. Despite its importance, the intricate architecture of the regional centromere of vertebrates remains largely unknown. Recent advancements in long-read sequencing, super-resolution and cryo-electron microscopy, and chromosome conformation capture techniques have significantly improved our understanding of this structure at various levels, from the linear arrangement of centromeric sequences and their epigenetic landscape to their higher-order compaction. In this review, we discuss the latest insights on centromere organization and place them in the context of recent findings describing a bipartite higher-order organization of the centromere.
Collapse
Affiliation(s)
- Lorena Andrade Ruiz
- Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences, Utrecht, Netherlands
- University Medical Center Utrecht, Utrecht, Netherlands
- Oncode Institute, Utrecht, Netherlands
| | - Geert J P L Kops
- Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences, Utrecht, Netherlands
- University Medical Center Utrecht, Utrecht, Netherlands
- Oncode Institute, Utrecht, Netherlands
| | - Carlos Sacristan
- Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences, Utrecht, Netherlands.
- University Medical Center Utrecht, Utrecht, Netherlands.
- Oncode Institute, Utrecht, Netherlands.
| |
Collapse
|
37
|
Fu Y, Aganezov S, Mahmoud M, Beaulaurier J, Juul S, Treangen TJ, Sedlazeck FJ. MethPhaser: methylation-based long-read haplotype phasing of human genomes. Nat Commun 2024; 15:5327. [PMID: 38909018 PMCID: PMC11193733 DOI: 10.1038/s41467-024-49588-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 06/11/2024] [Indexed: 06/24/2024] Open
Abstract
The assignment of variants across haplotypes, phasing, is crucial for predicting the consequences, interaction, and inheritance of mutations and is a key step in improving our understanding of phenotype and disease. However, phasing is limited by read length and stretches of homozygosity along the genome. To overcome this limitation, we designed MethPhaser, a method that utilizes methylation signals from Oxford Nanopore Technologies to extend Single Nucleotide Variation (SNV)-based phasing. We demonstrate that haplotype-specific methylations extensively exist in Human genomes and the advent of long-read technologies enabled direct report of methylation signals. For ONT R9 and R10 cell line data, we increase the phase length N50 by 78%-151% at a phasing accuracy of 83.4-98.7% To assess the impact of tissue purity and random methylation signals due to inactivation, we also applied MethPhaser on blood samples from 4 patients, still showing improvements over SNV-only phasing. MethPhaser further improves phasing across HLA and multiple other medically relevant genes, improving our understanding of how mutations interact across multiple phenotypes. The concept of MethPhaser can also be extended to non-human diploid genomes. MethPhaser is available at https://github.com/treangenlab/methphaser .
Collapse
Affiliation(s)
- Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | | | - Sissel Juul
- Oxford Nanopore Technologies Inc, New York, NY, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA.
- Department of Bioengineering, Rice University, Houston, TX, USA.
| | - Fritz J Sedlazeck
- Department of Computer Science, Rice University, Houston, TX, USA.
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA.
| |
Collapse
|
38
|
Glunčić M, Vlahović I, Rosandić M, Paar V. Precise identification of cascading alpha satellite higher order repeats in T2T-CHM13 assembly of human chromosome 3. Croat Med J 2024; 65:209-219. [PMID: 38868967 PMCID: PMC11157248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 05/28/2024] [Indexed: 06/14/2024] Open
Abstract
AIM To precisely identify and analyze alpha-satellite higher-order repeats (HORs) in T2T-CHM13 assembly of human chromosome 3. METHODS From the recently sequenced complete T2T-CHM13 assembly of human chromosome 3, the precise alpha satellite HOR structure was computed by using the novel high-precision GRM2023 algorithm with global repeat map (GRM) and monomer distance (MD) diagrams. RESULTS The major alpha satellite HOR array in chromosome 3 revealed a novel cascading HOR, housing 17mer HOR copies with subfragments of periods 15 and 2. Within each row in the cascading HOR, the monomers were of different types, but different rows within the same cascading 17mer HOR contained more than one monomer of the same type. Each canonical 17mer HOR copy comprised 17 monomers belonging to 16 different monomer types. Another pronounced 10mer HOR array was of the regular Willard's type. CONCLUSION Our findings emphasize the complexity within the chromosome 3 centromere as well as deviations from expected highly regular patterns.
Collapse
Affiliation(s)
- Matko Glunčić
- Matko Glunčić, Department of Physics, Faculty of Science, University of Zagreb, Bijenička cesta 32, 10000 Zagreb, Croatia,
| | | | | | | |
Collapse
|
39
|
Sacristan C, Samejima K, Ruiz LA, Deb M, Lambers MLA, Buckle A, Brackley CA, Robertson D, Hori T, Webb S, Kiewisz R, Bepler T, van Kwawegen E, Risteski P, Vukušić K, Tolić IM, Müller-Reichert T, Fukagawa T, Gilbert N, Marenduzzo D, Earnshaw WC, Kops GJPL. Vertebrate centromeres in mitosis are functionally bipartite structures stabilized by cohesin. Cell 2024; 187:3006-3023.e26. [PMID: 38744280 PMCID: PMC11164432 DOI: 10.1016/j.cell.2024.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 01/30/2024] [Accepted: 04/14/2024] [Indexed: 05/16/2024]
Abstract
Centromeres are scaffolds for the assembly of kinetochores that ensure chromosome segregation during cell division. How vertebrate centromeres obtain a three-dimensional structure to accomplish their primary function is unclear. Using super-resolution imaging, capture-C, and polymer modeling, we show that vertebrate centromeres are partitioned by condensins into two subdomains during mitosis. The bipartite structure is found in human, mouse, and chicken cells and is therefore a fundamental feature of vertebrate centromeres. Super-resolution imaging and electron tomography reveal that bipartite centromeres assemble bipartite kinetochores, with each subdomain binding a distinct microtubule bundle. Cohesin links the centromere subdomains, limiting their separation in response to spindle forces and avoiding merotelic kinetochore-spindle attachments. Lagging chromosomes during cancer cell divisions frequently have merotelic attachments in which the centromere subdomains are separated and bioriented. Our work reveals a fundamental aspect of vertebrate centromere biology with implications for understanding the mechanisms that guarantee faithful chromosome segregation.
Collapse
Affiliation(s)
- Carlos Sacristan
- Oncode Institute, Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences (KNAW), and University Medical Center Utrecht, Utrecht, the Netherlands.
| | - Kumiko Samejima
- Wellcome Centre for Cell Biology, Institute of Cell Biology, University of Edinburgh, Edinburgh, UK.
| | - Lorena Andrade Ruiz
- Oncode Institute, Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences (KNAW), and University Medical Center Utrecht, Utrecht, the Netherlands
| | - Moonmoon Deb
- Wellcome Centre for Cell Biology, Institute of Cell Biology, University of Edinburgh, Edinburgh, UK
| | - Maaike L A Lambers
- Oncode Institute, Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences (KNAW), and University Medical Center Utrecht, Utrecht, the Netherlands
| | - Adam Buckle
- MRC Human Genetics Unit, Institute of Genetics & Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - Chris A Brackley
- SUPA School of Physics and Astronomy, University of Edinburgh, Edinburgh, UK
| | - Daniel Robertson
- Wellcome Centre for Cell Biology, Institute of Cell Biology, University of Edinburgh, Edinburgh, UK
| | - Tetsuya Hori
- Laboratory of Chromosome Biology, Graduate School of Frontier Biosciences, Osaka University, Suita, Osaka, Japan
| | - Shaun Webb
- Wellcome Centre for Cell Biology, Institute of Cell Biology, University of Edinburgh, Edinburgh, UK
| | - Robert Kiewisz
- Simons Machine Learning Center, New York Structural Biology Center, New York, NY 10027, USA; Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Darwin, 3, Campus Universidad Autonoma, Cantoblanco, Madrid 28049, Spain
| | - Tristan Bepler
- Simons Machine Learning Center, New York Structural Biology Center, New York, NY 10027, USA
| | - Eloïse van Kwawegen
- Oncode Institute, Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences (KNAW), and University Medical Center Utrecht, Utrecht, the Netherlands
| | | | | | | | - Thomas Müller-Reichert
- Experimental Center, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Tatsuo Fukagawa
- Laboratory of Chromosome Biology, Graduate School of Frontier Biosciences, Osaka University, Suita, Osaka, Japan
| | - Nick Gilbert
- MRC Human Genetics Unit, Institute of Genetics & Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - Davide Marenduzzo
- SUPA School of Physics and Astronomy, University of Edinburgh, Edinburgh, UK
| | - William C Earnshaw
- Wellcome Centre for Cell Biology, Institute of Cell Biology, University of Edinburgh, Edinburgh, UK.
| | - Geert J P L Kops
- Oncode Institute, Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences (KNAW), and University Medical Center Utrecht, Utrecht, the Netherlands.
| |
Collapse
|
40
|
Makova KD, Pickett BD, Harris RS, Hartley GA, Cechova M, Pal K, Nurk S, Yoo D, Li Q, Hebbar P, McGrath BC, Antonacci F, Aubel M, Biddanda A, Borchers M, Bornberg-Bauer E, Bouffard GG, Brooks SY, Carbone L, Carrel L, Carroll A, Chang PC, Chin CS, Cook DE, Craig SJC, de Gennaro L, Diekhans M, Dutra A, Garcia GH, Grady PGS, Green RE, Haddad D, Hallast P, Harvey WT, Hickey G, Hillis DA, Hoyt SJ, Jeong H, Kamali K, Pond SLK, LaPolice TM, Lee C, Lewis AP, Loh YHE, Masterson P, McGarvey KM, McCoy RC, Medvedev P, Miga KH, Munson KM, Pak E, Paten B, Pinto BJ, Potapova T, Rhie A, Rocha JL, Ryabov F, Ryder OA, Sacco S, Shafin K, Shepelev VA, Slon V, Solar SJ, Storer JM, Sudmant PH, Sweetalana, Sweeten A, Tassia MG, Thibaud-Nissen F, Ventura M, Wilson MA, Young AC, Zeng H, Zhang X, Szpiech ZA, Huber CD, Gerton JL, Yi SV, Schatz MC, Alexandrov IA, Koren S, O'Neill RJ, Eichler EE, Phillippy AM. The complete sequence and comparative analysis of ape sex chromosomes. Nature 2024; 630:401-411. [PMID: 38811727 PMCID: PMC11168930 DOI: 10.1038/s41586-024-07473-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 04/26/2024] [Indexed: 05/31/2024]
Abstract
Apes possess two sex chromosomes-the male-specific Y chromosome and the X chromosome, which is present in both males and females. The Y chromosome is crucial for male reproduction, with deletions being linked to infertility1. The X chromosome is vital for reproduction and cognition2. Variation in mating patterns and brain function among apes suggests corresponding differences in their sex chromosomes. However, owing to their repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the methodology developed for the telomere-to-telomere (T2T) human genome, we produced gapless assemblies of the X and Y chromosomes for five great apes (bonobo (Pan paniscus), chimpanzee (Pan troglodytes), western lowland gorilla (Gorilla gorilla gorilla), Bornean orangutan (Pongo pygmaeus) and Sumatran orangutan (Pongo abelii)) and a lesser ape (the siamang gibbon (Symphalangus syndactylus)), and untangled the intricacies of their evolution. Compared with the X chromosomes, the ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements-owing to the accumulation of lineage-specific ampliconic regions, palindromes, transposable elements and satellites. Many Y chromosome genes expand in multi-copy families and some evolve under purifying selection. Thus, the Y chromosome exhibits dynamic evolution, whereas the X chromosome is more stable. Mapping short-read sequencing data to these assemblies revealed diversity and selection patterns on sex chromosomes of more than 100 individual great apes. These reference assemblies are expected to inform human evolution and conservation genetics of non-human apes, all of which are endangered species.
Collapse
Affiliation(s)
| | - Brandon D Pickett
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Monika Cechova
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Karol Pal
- Penn State University, University Park, PA, USA
| | - Sergey Nurk
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - DongAhn Yoo
- University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Johns Hopkins University, Baltimore, MD, USA
| | - Prajna Hebbar
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | | | | | - Erich Bornberg-Bauer
- University of Münster, Münster, Germany
- MPI for Developmental Biology, Tübingen, Germany
| | - Gerard G Bouffard
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shelise Y Brooks
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lucia Carbone
- Oregon Health and Science University, Portland, OR, USA
- Oregon National Primate Research Center, Hillsboro, OR, USA
| | - Laura Carrel
- Penn State University School of Medicine, Hershey, PA, USA
| | | | | | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | | | | | | | - Mark Diekhans
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Amalia Dutra
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gage H Garcia
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Glenn Hickey
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - David A Hillis
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | - Hyeonsoo Jeong
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Yong-Hwee E Loh
- University of California Santa Barbara, Santa Barbara, CA, USA
| | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Kelly M McGarvey
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Karen H Miga
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Evgenia Pak
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Benedict Paten
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | - Arang Rhie
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Joana L Rocha
- University of California Berkeley, Berkeley, CA, USA
| | - Fedor Ryabov
- Masters Program in National Research, University Higher School of Economics, Moscow, Russia
| | | | - Samuel Sacco
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | - Steven J Solar
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Sweetalana
- Penn State University, University Park, PA, USA
| | - Alex Sweeten
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Johns Hopkins University, Baltimore, MD, USA
| | | | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Mario Ventura
- Università degli Studi di Bari Aldo Moro, Bari, Italy
| | | | - Alice C Young
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Xinru Zhang
- Penn State University, University Park, PA, USA
| | | | | | | | - Soojin V Yi
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | | | - Sergey Koren
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Evan E Eichler
- University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| | - Adam M Phillippy
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
41
|
Chen W, Wang X, Sun J, Wang X, Zhu Z, Ayhan DH, Yi S, Yan M, Zhang L, Meng T, Mu Y, Li J, Meng D, Bian J, Wang K, Wang L, Chen S, Chen R, Jin J, Li B, Zhang X, Deng XW, He H, Guo L. Two telomere-to-telomere gapless genomes reveal insights into Capsicum evolution and capsaicinoid biosynthesis. Nat Commun 2024; 15:4295. [PMID: 38769327 PMCID: PMC11106260 DOI: 10.1038/s41467-024-48643-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 05/08/2024] [Indexed: 05/22/2024] Open
Abstract
Chili pepper (Capsicum) is known for its unique fruit pungency due to the presence of capsaicinoids. The evolutionary history of capsaicinoid biosynthesis and the mechanism of their tissue specificity remain obscure due to the lack of high-quality Capsicum genomes. Here, we report two telomere-to-telomere (T2T) gap-free genomes of C. annuum and its wild nonpungent relative C. rhomboideum to investigate the evolution of fruit pungency in chili peppers. We precisely delineate Capsicum centromeres, which lack high-copy tandem repeats but are extensively invaded by CRM retrotransposons. Through phylogenomic analyses, we estimate the evolutionary timing of capsaicinoid biosynthesis. We reveal disrupted coding and regulatory regions of key biosynthesis genes in nonpungent species. We also find conserved placenta-specific accessible chromatin regions, which likely allow for tissue-specific biosynthetic gene coregulation and capsaicinoid accumulation. These T2T genomic resources will accelerate chili pepper genetic improvement and help to understand Capsicum genome evolution.
Collapse
Affiliation(s)
- Weikai Chen
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Xiangfeng Wang
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Jie Sun
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Xinrui Wang
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Zhangsheng Zhu
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
- College of Horticulture, South China Agricultural University, Guangzhou, 510642, China
| | - Dilay Hazal Ayhan
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Shu Yi
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Ming Yan
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Lili Zhang
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
- College of Modern Agriculture and Environment, Weifang Institute of Technology, Weifang, 262500, China
| | - Tan Meng
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Yu Mu
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Jun Li
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Dian Meng
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Jianxin Bian
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Ke Wang
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
- College of Life Sciences, Shandong Agricultural University, Tai'an, 271018, China
| | - Lu Wang
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Shaoying Chen
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Ruidong Chen
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Jingyun Jin
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Bosheng Li
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Xingping Zhang
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
| | - Xing Wang Deng
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China
| | - Hang He
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China.
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China.
| | - Li Guo
- Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, 261325, China.
| |
Collapse
|
42
|
Gafurov A, Vinar T, Medvedev P, Brejova B. Efficient Analysis of Annotation Colocalization Accounting for Genomic Contexts. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.22.568259. [PMID: 38045397 PMCID: PMC10690252 DOI: 10.1101/2023.11.22.568259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
An annotation is a set of genomic intervals sharing a particular function or property. Examples include genes or their exons, evolutionarily conserved elements, and regions with a particular epigenetic state. A common task is to compare two annotations to determine if one is enriched or depleted in the regions covered by the other. We study the problem of assigning statistical significance to such a comparison based on a null model representing two random unrelated annotations. To incorporate more background information into such analyses,we propose a new null model based on a Markov chain which differentiates among several genomic contexts. These contexts can capture various confounding factors, such as GC content or assembly gaps. We then develop a new algorithm for estimating p-values by computing the exact expectation and variance of the test statistics and then estimating the p-value using a normal approximation. Compared to the previous algorithm by Gafurov et al., the new algorithm provides three advances: (1) the running time is improved from quadratic to linear or quasi-linear, (2) the algorithm can handle two different test statistics, and (3) the algorithm can handle both simple and context-dependent Markov chain null models. We demonstrate the efficiency and accuracy of our algorithm on synthetic and real data sets, including the recent human telomere-to-telomere assembly. In particular, our algorithm computed p-values for 450 pairs of human genome annotations using 24 threads in under three hours. Moreover, the use of genomic contexts to correct for GC bias resulted in the reversal of some previously published findings.
Collapse
|
43
|
Logsdon GA, Rozanski AN, Ryabov F, Potapova T, Shepelev VA, Catacchio CR, Porubsky D, Mao Y, Yoo D, Rautiainen M, Koren S, Nurk S, Lucas JK, Hoekzema K, Munson KM, Gerton JL, Phillippy AM, Ventura M, Alexandrov IA, Eichler EE. The variation and evolution of complete human centromeres. Nature 2024; 629:136-145. [PMID: 38570684 PMCID: PMC11062924 DOI: 10.1038/s41586-024-07278-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 03/07/2024] [Indexed: 04/05/2024]
Abstract
Human centromeres have been traditionally very difficult to sequence and assemble owing to their repetitive nature and large size1. As a result, patterns of human centromeric variation and models for their evolution and function remain incomplete, despite centromeres being among the most rapidly mutating regions2,3. Here, using long-read sequencing, we completely sequenced and assembled all centromeres from a second human genome and compared it to the finished reference genome4,5. We find that the two sets of centromeres show at least a 4.1-fold increase in single-nucleotide variation when compared with their unique flanks and vary up to 3-fold in size. Moreover, we find that 45.8% of centromeric sequence cannot be reliably aligned using standard methods owing to the emergence of new α-satellite higher-order repeats (HORs). DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by >500 kb. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan and macaque genomes. Comparative analyses reveal a nearly complete turnover of α-satellite HORs, with characteristic idiosyncratic changes in α-satellite HORs for each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the short (p) and long (q) arms across centromeres and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.
Collapse
Affiliation(s)
- Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Allison N Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | | | - Claudia R Catacchio
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Bari, Italy
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Oxford Nanopore Technologies, Oxford, United Kingdom
| | - Julian K Lucas
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mario Ventura
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Bari, Italy
| | - Ivan A Alexandrov
- Department of Human Molecular Genetics and Biochemistry, Tel Aviv University, Tel Aviv, Israel
- Department of Anatomy and Anthropology, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Dan David Center for Human Evolution and Biohistory Research, Tel Aviv University, Tel Aviv, Israel
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
44
|
Zhao N, Lai C, Wang Y, Dai S, Gu H. Understanding the role of DNA methylation in colorectal cancer: Mechanisms, detection, and clinical significance. Biochim Biophys Acta Rev Cancer 2024; 1879:189096. [PMID: 38499079 DOI: 10.1016/j.bbcan.2024.189096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 02/18/2024] [Accepted: 03/13/2024] [Indexed: 03/20/2024]
Abstract
Colorectal cancer (CRC) is one of the deadliest malignancies worldwide, ranking third in incidence and second in mortality. Remarkably, early stage localized CRC has a 5-year survival rate of over 90%; in stark contrast, the corresponding 5-year survival rate for metastatic CRC (mCRC) is only 14%. Compounding this problem is the staggering lack of effective therapeutic strategies. Beyond genetic mutations, which have been identified as critical instigators of CRC initiation and progression, the importance of epigenetic modifications, particularly DNA methylation (DNAm), cannot be underestimated, given that DNAm can be used for diagnosis, treatment monitoring and prognostic evaluation. This review addresses the intricate mechanisms governing aberrant DNAm in CRC and its profound impact on critical oncogenic pathways. In addition, a comprehensive review of the various techniques used to detect DNAm alterations in CRC is provided, along with an exploration of the clinical utility of cancer-specific DNAm alterations.
Collapse
Affiliation(s)
- Ningning Zhao
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China; Hefei Cancer Hospital, Chinese Academy of Sciences, Hefei 230031, China
| | - Chuanxi Lai
- Division of Colorectal Surgery, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310016, China
| | - Yunfei Wang
- Zhejiang ShengTing Biotech. Ltd, Hangzhou 310000, China
| | - Sheng Dai
- Division of Colorectal Surgery, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310016, China.
| | - Hongcang Gu
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China; Hefei Cancer Hospital, Chinese Academy of Sciences, Hefei 230031, China.
| |
Collapse
|
45
|
Parl FF. Analysis of CENP-B Boxes as Anchor of Kinetochores in Centromeres of Human Chromosomes. Bioinform Biol Insights 2024; 18:11779322241248913. [PMID: 38690324 PMCID: PMC11060027 DOI: 10.1177/11779322241248913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 04/04/2024] [Indexed: 05/02/2024] Open
Abstract
The kinetochore is a multiprotein structure that attaches at one end to DNA in the centromere and at the other end to microtubules in the mitotic spindle. By connecting centromere and spindle, the kinetochore controls the migration of chromosomes during cell division. The exact position where the kinetochore assembles on each centromere was uncertain because large sections of centromeric DNA had not been sequenced due to highly repetitive alpha-satellite arrays. Embedded in the arrays is a 17 bp consensus sequence, the so-called CENP-B box, which binds the CENP-B protein, the only protein that binds directly to centromeric DNA. Recently, the Telomere-to-Telomere Consortium published the complete centromeric DNA sequences of all chromosomes including their epigenetic modifications in the T2T-CHM13 map. I used data from the T2T-CHM13 map to locate the CENP-B boxes in the centromeres as anchor of kinetochores. Most of the CENP-B boxes in centromeric DNA are methylated with the exception of the so-called centromere dip region (CDR), where CENP-B protein dimers bind to adjacent unmethylated CENP-B boxes and interact with CENP-A and CENP-C proteins to assemble the kinetochore. The centromeres of all chromosomes combined have a size of 407 Mb of which the kinetochores account for 5.0 Mb or 1.2%. There is no correlation between centromere and kinetochore size (P = .77). While the number of CENP-B boxes varies 4-fold between chromosomes, their density (number/Kb) varies less than 2-fold with a mean of 2.61 ± 0.33. The narrow range ensures a uniform pull of the spindle on the centromeres. I illustrate the findings in a model of the human kinetochore anchored at unmethylated CENP-B boxes in the CDR and present circos plots of chromosomes to show the location of kinetochores in their respective centromeres.
Collapse
Affiliation(s)
- Fritz F Parl
- Department of Pathology, Microbiology, and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
46
|
Kixmoeller K, Chang YW, Black BE. Centromeric chromatin clearings demarcate the site of kinetochore formation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.26.591177. [PMID: 38712116 PMCID: PMC11071481 DOI: 10.1101/2024.04.26.591177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
The centromere is the chromosomal locus that recruits the kinetochore, directing faithful propagation of the genome during cell division. The kinetochore has been interrogated by electron microscopy since the middle of the last century, but with methodologies that compromised fine structure. Using cryo-ET on human mitotic chromosomes, we reveal a distinctive architecture at the centromere: clustered 20-25 nm nucleosome-associated complexes within chromatin clearings that delineate them from surrounding chromatin. Centromere components CENP-C and CENP-N are each required for the integrity of the complexes, while CENP-C is also required to maintain the chromatin clearing. We further visualize the scaffold of the fibrous corona, a structure amplified at unattached kinetochores, revealing crescent-shaped parallel arrays of fibrils that extend >1 μm. Thus, we reveal how the organization of centromeric chromatin creates a clearing at the site of kinetochore formation as well as the nature of kinetochore amplification mediated by corona fibrils.
Collapse
Affiliation(s)
- Kathryn Kixmoeller
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, PA, USA
- Biochemistry Biophysics Chemical Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, PA, USA
- Institute of Structural Biology, Perelman School of Medicine, University of Pennsylvania, PA, USA
- Penn Center for Genome Integrity, Perelman School of Medicine, University of Pennsylvania, PA, USA
- Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, PA, USA
| | - Yi-Wei Chang
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, PA, USA
- Biochemistry Biophysics Chemical Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, PA, USA
- Institute of Structural Biology, Perelman School of Medicine, University of Pennsylvania, PA, USA
| | - Ben E. Black
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, PA, USA
- Biochemistry Biophysics Chemical Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, PA, USA
- Institute of Structural Biology, Perelman School of Medicine, University of Pennsylvania, PA, USA
- Penn Center for Genome Integrity, Perelman School of Medicine, University of Pennsylvania, PA, USA
- Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, PA, USA
| |
Collapse
|
47
|
Teschendorff AE. On epigenetic stochasticity, entropy and cancer risk. Philos Trans R Soc Lond B Biol Sci 2024; 379:20230054. [PMID: 38432318 PMCID: PMC10909509 DOI: 10.1098/rstb.2023.0054] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Accepted: 09/26/2023] [Indexed: 03/05/2024] Open
Abstract
Epigenetic changes are known to accrue in normal cells as a result of ageing and cumulative exposure to cancer risk factors. Increasing evidence points towards age-related epigenetic changes being acquired in a quasi-stochastic manner, and that they may play a causal role in cancer development. Here, I describe the quasi-stochastic nature of DNA methylation (DNAm) changes in ageing cells as well as in normal cells at risk of neoplastic transformation, discussing the implications of this stochasticity for developing cancer risk prediction strategies, and in particular, how it may require a conceptual paradigm shift in how we select cancer risk markers. I also describe the mounting evidence that a significant proportion of DNAm changes in ageing and cancer development are related to cell proliferation, reflecting tissue-turnover and the opportunity this offers for predicting cancer risk via the development of epigenetic mitotic-like clocks. Finally, I describe how age-associated DNAm changes may be causally implicated in cancer development via an irreversible suppression of tissue-specific transcription factors that increases epigenetic and transcriptomic entropy, promoting a more plastic yet aberrant cancer stem-cell state. This article is part of a discussion meeting issue 'Causes and consequences of stochastic processes in development and disease'.
Collapse
Affiliation(s)
- Andrew E. Teschendorff
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institute for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai 200031, People's Republic of China
| |
Collapse
|
48
|
Glunčić M, Vlahović I, Rosandić M, Paar V. Novel Concept of Alpha Satellite Cascading Higher-Order Repeats (HORs) and Precise Identification of 15mer and 20mer Cascading HORs in Complete T2T-CHM13 Assembly of Human Chromosome 15. Int J Mol Sci 2024; 25:4395. [PMID: 38673983 PMCID: PMC11050224 DOI: 10.3390/ijms25084395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 04/08/2024] [Accepted: 04/11/2024] [Indexed: 04/28/2024] Open
Abstract
Unraveling the intricate centromere structure of human chromosomes holds profound implications, illuminating fundamental genetic mechanisms and potentially advancing our comprehension of genetic disorders and therapeutic interventions. This study rigorously identified and structurally analyzed alpha satellite higher-order repeats (HORs) within the centromere of human chromosome 15 in the complete T2T-CHM13 assembly using the high-precision GRM2023 algorithm. The most extensive alpha satellite HOR array in chromosome 15 reveals a novel cascading HOR, housing 429 15mer HOR copies, containing 4-, 7- and 11-monomer subfragments. Within each row of cascading HORs, all alpha satellite monomers are of distinct types, as in regular Willard's HORs. However, different HOR copies within the same cascading 15mer HOR contain more than one monomer of the same type. Each canonical 15mer HOR copy comprises 15 monomers belonging to only 9 different monomer types. Notably, 65% of the 429 15mer cascading HOR copies exhibit canonical structures, while 35% display variant configurations. Identified as the second most extensive alpha satellite HOR, another novel cascading HOR within human chromosome 15 encompasses 164 20mer HOR copies, each featuring two subfragments. Moreover, a distinct pattern emerges as interspersed 25mer/26mer structures differing from regular Willard's HORs and giving rise to a 34-monomer subfragment. Only a minor 18mer HOR array of 12 HOR copies is of the regular Willard's type. These revelations highlight the complexity within the chromosome 15 centromeric region, accentuating deviations from anticipated highly regular patterns and hinting at profound information encoding and functional potential within the human centromere.
Collapse
Affiliation(s)
- Matko Glunčić
- Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia;
| | - Ines Vlahović
- Algebra LAB, Algebra University College, 10000 Zagreb, Croatia;
| | - Marija Rosandić
- Department of Internal Medicine, University Hospital Centre Zagreb, 10000 Zagreb, Croatia;
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| | - Vladimir Paar
- Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia;
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| |
Collapse
|
49
|
Hogan MP, Holding ML, Nystrom GS, Colston TJ, Bartlett DA, Mason AJ, Ellsworth SA, Rautsaw RM, Lawrence KC, Strickland JL, He B, Fraser P, Margres MJ, Gilbert DM, Gibbs HL, Parkinson CL, Rokyta DR. The genetic regulatory architecture and epigenomic basis for age-related changes in rattlesnake venom. Proc Natl Acad Sci U S A 2024; 121:e2313440121. [PMID: 38578985 PMCID: PMC11032440 DOI: 10.1073/pnas.2313440121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 03/13/2024] [Indexed: 04/07/2024] Open
Abstract
Developmental phenotypic changes can evolve under selection imposed by age- and size-related ecological differences. Many of these changes occur through programmed alterations to gene expression patterns, but the molecular mechanisms and gene-regulatory networks underlying these adaptive changes remain poorly understood. Many venomous snakes, including the eastern diamondback rattlesnake (Crotalus adamanteus), undergo correlated changes in diet and venom expression as snakes grow larger with age, providing models for identifying mechanisms of timed expression changes that underlie adaptive life history traits. By combining a highly contiguous, chromosome-level genome assembly with measures of expression, chromatin accessibility, and histone modifications, we identified cis-regulatory elements and trans-regulatory factors controlling venom ontogeny in the venom glands of C. adamanteus. Ontogenetic expression changes were significantly correlated with epigenomic changes within genes, immediately adjacent to genes (e.g., promoters), and more distant from genes (e.g., enhancers). We identified 37 candidate transcription factors (TFs), with the vast majority being up-regulated in adults. The ontogenetic change is largely driven by an increase in the expression of TFs associated with growth signaling, transcriptional activation, and circadian rhythm/biological timing systems in adults with corresponding epigenomic changes near the differentially expressed venom genes. However, both expression activation and repression contributed to the composition of both adult and juvenile venoms, demonstrating the complexity and potential evolvability of gene regulation for this trait. Overall, given that age-based trait variation is common across the tree of life, we provide a framework for understanding gene-regulatory-network-driven life-history evolution more broadly.
Collapse
Affiliation(s)
- Michael P. Hogan
- Department of Biological Science, Florida State University, Tallahassee, FL32306
| | - Matthew L. Holding
- Department of Biological Science, Florida State University, Tallahassee, FL32306
- Life Sciences Institute, University of Michigan, Ann Arbor, MI48109
| | - Gunnar S. Nystrom
- Department of Biological Science, Florida State University, Tallahassee, FL32306
| | - Timothy J. Colston
- Department of Biological Science, Florida State University, Tallahassee, FL32306
- Department of Biology, University of Puerto Rico at Mayagüez, Mayagüez, PR00681
| | - Daniel A. Bartlett
- Department of Biological Science, Florida State University, Tallahassee, FL32306
| | - Andrew J. Mason
- Department of Biological Sciences, Clemson University, Clemson, SC29634
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, OH43210
| | - Schyler A. Ellsworth
- Department of Biological Science, Florida State University, Tallahassee, FL32306
| | - Rhett M. Rautsaw
- Department of Biological Sciences, Clemson University, Clemson, SC29634
- Department of Integrative Biology, University of South Florida, Tampa, FL33620
- School of Biological Sciences, Washington State University, Pullman, WA99164
| | - Kylie C. Lawrence
- Department of Biological Science, Florida State University, Tallahassee, FL32306
| | - Jason L. Strickland
- Department of Biological Sciences, Clemson University, Clemson, SC29634
- Department of Biology, University of South Alabama, Mobile, AL36688
| | - Bing He
- Department of Biological Science, Florida State University, Tallahassee, FL32306
| | - Peter Fraser
- Department of Biological Science, Florida State University, Tallahassee, FL32306
| | - Mark J. Margres
- Department of Integrative Biology, University of South Florida, Tampa, FL33620
| | - David M. Gilbert
- Laboratory of Chromosome Replication and Epigenome Regulation, San Diego Biomedical Research Institute, San Diego, CA92121
| | - H. Lisle Gibbs
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, OH43210
| | - Christopher L. Parkinson
- Department of Biological Sciences, Clemson University, Clemson, SC29634
- Department of Forestry and Environmental Conservation, Clemson University, Clemson, SC29634
| | - Darin R. Rokyta
- Department of Biological Science, Florida State University, Tallahassee, FL32306
| |
Collapse
|
50
|
Bell CG. Epigenomic insights into common human disease pathology. Cell Mol Life Sci 2024; 81:178. [PMID: 38602535 PMCID: PMC11008083 DOI: 10.1007/s00018-024-05206-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/11/2024] [Accepted: 03/13/2024] [Indexed: 04/12/2024]
Abstract
The epigenome-the chemical modifications and chromatin-related packaging of the genome-enables the same genetic template to be activated or repressed in different cellular settings. This multi-layered mechanism facilitates cell-type specific function by setting the local sequence and 3D interactive activity level. Gene transcription is further modulated through the interplay with transcription factors and co-regulators. The human body requires this epigenomic apparatus to be precisely installed throughout development and then adequately maintained during the lifespan. The causal role of the epigenome in human pathology, beyond imprinting disorders and specific tumour suppressor genes, was further brought into the spotlight by large-scale sequencing projects identifying that mutations in epigenomic machinery genes could be critical drivers in both cancer and developmental disorders. Abrogation of this cellular mechanism is providing new molecular insights into pathogenesis. However, deciphering the full breadth and implications of these epigenomic changes remains challenging. Knowledge is accruing regarding disease mechanisms and clinical biomarkers, through pathogenically relevant and surrogate tissue analyses, respectively. Advances include consortia generated cell-type specific reference epigenomes, high-throughput DNA methylome association studies, as well as insights into ageing-related diseases from biological 'clocks' constructed by machine learning algorithms. Also, 3rd-generation sequencing is beginning to disentangle the complexity of genetic and DNA modification haplotypes. Cell-free DNA methylation as a cancer biomarker has clear clinical utility and further potential to assess organ damage across many disorders. Finally, molecular understanding of disease aetiology brings with it the opportunity for exact therapeutic alteration of the epigenome through CRISPR-activation or inhibition.
Collapse
Affiliation(s)
- Christopher G Bell
- William Harvey Research Institute, Barts & The London Faculty of Medicine, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ, UK.
| |
Collapse
|