1
|
Khamse S, Alizadeh S, Khorshid HRK, Delbari A, Tajeddin N, Ohadi M. A Hypermutable Region in the DISP2 Gene Links to Natural Selection and Late-Onset Neurocognitive Disorders in Humans. Mol Neurobiol 2024; 61:8777-8786. [PMID: 38565786 DOI: 10.1007/s12035-024-04155-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 03/25/2024] [Indexed: 04/04/2024]
Abstract
(CCG) short tandem repeats (STRs) are predominantly enriched in genic regions, mutation hotspots for C to T truncating substitutions, and involved in various neurological and neurodevelopmental disorders. However, intact blocks of this class of STRs are widely overlooked with respect to their link with natural selection. The human neuron-specific gene, DISP2 (dispatched RND transporter family member 2), contains a (CCG) repeat in its 5' untranslated region. Here, we sequenced this STR in a sample of 448 Iranian individuals, consisting of late-onset neurocognitive disorder (NCD) (N = 203) and controls (N = 245). We found that the region spanning the (CCG) repeat was highly mutated, resulting in several flanking (CCG) residues. However, an 8-repeat of the (CCG) repeat was predominantly abundant (frequency = 0.92) across the two groups. While the overall distribution of genotypes was not different between the two groups (p > 0.05), we detected four genotypes in the NCD group only (2% of the NCD genotypes, Mid-p = 0.02), consisting of extreme short alleles, 5- and 6-repeats, that were not detected in the control group. The patients harboring those genotypes received the diagnoses of probable Alzheimer's disease and vascular dementia. We also found six genotypes in the control group only (2.5% of the control genotypes, Mid-p = 0.01) that consisted of the 8-repeat and extreme long alleles, 9- and 10-repeats, of which the 10-repeat was not detected in the NCD group. The (CCG) repeat specifically expanded in primates. In conclusion, we report an indication of natural selection at a novel hypermutable region in the human genome and divergent alleles and genotypes in late-onset NhCDs and controls. These findings reinforce the hypothesis that a collection of rare alleles and genotypes in a number of genes may unambiguously contribute to the cognition impairment component of late-onset NCDs.
Collapse
Affiliation(s)
- S Khamse
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - S Alizadeh
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - H R Khorram Khorshid
- Personalized Medicine and Genometabolomics Research Center, Hope Generation Foundation, Tehran, Iran
| | - A Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| | - N Tajeddin
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
- Department of Biology, Central Tehran Branch, Islamic Azad University, Tehran, Iran
| | - M Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| |
Collapse
|
2
|
Alizadeh S, Khamse S, Vafadar S, Bernhart SH, Afshar H, Vahedi M, Rezaei O, Delbari A, Ohadi M. The human SMAD9 (GCC) repeat links to natural selection and late-onset neurocognitive disorders. Neurol Sci 2024; 45:5241-5251. [PMID: 38877206 DOI: 10.1007/s10072-024-07637-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 06/05/2024] [Indexed: 06/16/2024]
Abstract
INTRODUCTION Whereas (GCC)-repeats are overrepresented in genic regions, and mutation hotspots, they are largely unexplored with regard to their link with natural selection. Across numerous primate species and tissues, SMAD9 (SMAD Family Member 9) reaches highest level of expression in the human brain. This gene contains a (GCC)-repeat in the interval between + 1 and + 60 of the transcription start site, which is in the high-ranking (GCC)-repeats with respect to length. METHODS Here we sequenced this (GCC)-repeat in 396 Iranian individuals, consisting of late-onset neurocognitive disorder (NCD) (N = 181) and controls (N = 215). RESULTS We detected two predominantly abundant alleles of 7 and 9 repeats, forming 96.2% of the allele pool. The (GCC)7/(GCC)9 ratio was in the reverse order in the NCD group versus controls (p = 0.005), resulting from excess of (GCC)7 in the NCD group (p = 0.003) and (GCC)9 in the controls (p = 0.01). Five genotypes, predominantly consisting of (GCC)7 and lacking (GCC)9 were detected in the NCD group only (p = 0.008). The patients harboring those genotypes received the diagnoses of Alzheimer's disease (AD) and vascular dementia (VD). Five genotypes consisting of (GCC)9 and lacking (GCC)7 were detected in the control group only (p = 0.002). The group-specific genotypes formed approximately 4% of the genotype pool in the human samples studied. CONCLUSION We propose natural selection and a novel locus for late-onset AD and VD at the SMAD9 (GCC)-repeat in humans.
Collapse
Affiliation(s)
- Samira Alizadeh
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Daneshjoo Blvd. Koodakyar St, Tehran, 1985713871, Iran
| | - Safoura Khamse
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Daneshjoo Blvd. Koodakyar St, Tehran, 1985713871, Iran
| | - Sara Vafadar
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Daneshjoo Blvd. Koodakyar St, Tehran, 1985713871, Iran
| | - Stephan H Bernhart
- IZBI, Interdisciplinary Centre for Bioinformatics, Universität Leipzig, Härtelstr. 16-18, 04107, Leipzig, Germany
| | - Hossein Afshar
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Daneshjoo Blvd. Koodakyar St, Tehran, 1985713871, Iran
| | - Mohsen Vahedi
- Department of Biostatistics and Epidemiology, Paediatric Neurorehabilitation Research Centre, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Omid Rezaei
- Department of Psychiatry, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Ahmad Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Daneshjoo Blvd. Koodakyar St, Tehran, 1985713871, Iran.
| | - Mina Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Daneshjoo Blvd. Koodakyar St, Tehran, 1985713871, Iran.
| |
Collapse
|
3
|
Dwarshuis N, Kalra D, McDaniel J, Sanio P, Alvarez Jerez P, Jadhav B, Huang WE, Mondal R, Busby B, Olson ND, Sedlazeck FJ, Wagner J, Majidian S, Zook JM. The GIAB genomic stratifications resource for human reference genomes. Nat Commun 2024; 15:9029. [PMID: 39424793 PMCID: PMC11489684 DOI: 10.1038/s41467-024-53260-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 10/07/2024] [Indexed: 10/21/2024] Open
Abstract
Despite the growing variety of sequencing and variant-calling tools, no workflow performs equally well across the entire human genome. Understanding context-dependent performance is critical for enabling researchers, clinicians, and developers to make informed tradeoffs when selecting sequencing hardware and software. Here we describe a set of "stratifications," which are BED files that define distinct contexts throughout the genome. We define these for GRCh37/38 as well as the new T2T-CHM13 reference, adding many new hard-to-sequence regions which are critical for understanding performance as the field progresses. Specifically, we highlight the increase in hard-to-map and GC-rich stratifications in CHM13 relative to the previous references. We then compare the benchmarking performance with each reference and show the performance penalty brought about by these additional difficult regions in CHM13. Additionally, we demonstrate how the stratifications can track context-specific improvements over different platform iterations, using Oxford Nanopore Technologies as an example. The means to generate these stratifications are available as a snakemake pipeline at https://github.com/usnistgov/giab-stratifications . We anticipate this being useful in enabling precise risk-reward calculations when building sequencing pipelines for any of the commonly-used reference genomes.
Collapse
Affiliation(s)
- Nathan Dwarshuis
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD., USA
| | - Divya Kalra
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD., USA
| | - Philippe Sanio
- University of Applied Sciences Upper Austria - FH Hagenberg, Hagenberg im Mühlkreis, Austria
| | - Pilar Alvarez Jerez
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Bharati Jadhav
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, Hess Center for Science and Medicine, New York, NY, USA
| | - Wenyu Eddy Huang
- Department of Computer Science, College of Engineering, Rice University, Houston, TX, USA
| | - Rajarshi Mondal
- Department of Bioinformatics, Pondicherry University, Pondicherry, India
| | | | - Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD., USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, College of Engineering, Rice University, Houston, TX, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD., USA
| | - Sina Majidian
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD., USA.
| |
Collapse
|
4
|
Uguen K, Michaud JL, Génin E. Short Tandem Repeats in the era of next-generation sequencing: from historical loci to population databases. Eur J Hum Genet 2024; 32:1037-1044. [PMID: 38982300 PMCID: PMC11369099 DOI: 10.1038/s41431-024-01666-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 06/20/2024] [Accepted: 06/27/2024] [Indexed: 07/11/2024] Open
Abstract
In this study, we explore the landscape of short tandem repeats (STRs) within the human genome through the lens of evolving technologies to detect genomic variations. STRs, which encompass approximately 3% of our genomic DNA, are crucial for understanding human genetic diversity, disease mechanisms, and evolutionary biology. The advent of high-throughput sequencing methods has revolutionized our ability to accurately map and analyze STRs, highlighting their significance in genetic disorders, forensic science, and population genetics. We review the current available methodologies for STR analysis, the challenges in interpreting STR variations across different populations, and the implications of STRs in medical genetics. Our findings underscore the urgent need for comprehensive STR databases that reflect the genetic diversity of global populations, facilitating the interpretation of STR data in clinical diagnostics, genetic research, and forensic applications. This work sets the stage for future studies aimed at harnessing STR variations to elucidate complex genetic traits and diseases, reinforcing the importance of integrating STRs into genetic research and clinical practice.
Collapse
Affiliation(s)
- Kevin Uguen
- Univ Brest, Inserm, EFS, UMR 1078, GGB, Brest, France.
- Service de Génétique Médicale et Biologie de la Reproduction, CHU de Brest, Brest, France.
- CHU Sainte-Justine Azrieli Research Centre, Montréal, QC, Canada.
| | - Jacques L Michaud
- CHU Sainte-Justine Azrieli Research Centre, Montréal, QC, Canada
- Department of Pediatrics, Université de Montréal, Montréal, QC, Canada
- Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
| | | |
Collapse
|
5
|
Pandiloski N, Horváth V, Karlsson O, Koutounidou S, Dorazehi F, Christoforidou G, Matas-Fuentes J, Gerdes P, Garza R, Jönsson ME, Adami A, Atacho DAM, Johansson JG, Englund E, Kokaia Z, Jakobsson J, Douse CH. DNA methylation governs the sensitivity of repeats to restriction by the HUSH-MORC2 corepressor. Nat Commun 2024; 15:7534. [PMID: 39214989 PMCID: PMC11364546 DOI: 10.1038/s41467-024-50765-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 07/18/2024] [Indexed: 09/04/2024] Open
Abstract
The human silencing hub (HUSH) complex binds to transcripts of LINE-1 retrotransposons (L1s) and other genomic repeats, recruiting MORC2 and other effectors to remodel chromatin. How HUSH and MORC2 operate alongside DNA methylation, a central epigenetic regulator of repeat transcription, remains largely unknown. Here we interrogate this relationship in human neural progenitor cells (hNPCs), a somatic model of brain development that tolerates removal of DNA methyltransferase DNMT1. Upon loss of MORC2 or HUSH subunit TASOR in hNPCs, L1s remain silenced by robust promoter methylation. However, genome demethylation and activation of evolutionarily-young L1s attracts MORC2 binding, and simultaneous depletion of DNMT1 and MORC2 causes massive accumulation of L1 transcripts. We identify the same mechanistic hierarchy at pericentromeric α-satellites and clustered protocadherin genes, repetitive elements important for chromosome structure and neurodevelopment respectively. Our data delineate the epigenetic control of repeats in somatic cells, with implications for understanding the vital functions of HUSH-MORC2 in hypomethylated contexts throughout human development.
Collapse
Affiliation(s)
- Ninoslav Pandiloski
- Laboratory of Epigenetics and Chromatin Dynamics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC B11, Lund University, Lund, Sweden
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Vivien Horváth
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Ofelia Karlsson
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Symela Koutounidou
- Laboratory of Epigenetics and Chromatin Dynamics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC B11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Fereshteh Dorazehi
- Laboratory of Epigenetics and Chromatin Dynamics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC B11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Georgia Christoforidou
- Laboratory of Epigenetics and Chromatin Dynamics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC B11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Jon Matas-Fuentes
- Laboratory of Epigenetics and Chromatin Dynamics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC B11, Lund University, Lund, Sweden
| | - Patricia Gerdes
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Raquel Garza
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | | | - Anita Adami
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Diahann A M Atacho
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Jenny G Johansson
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
| | - Elisabet Englund
- Division of Pathology, Department of Clinical Sciences, Lund University, Lund, Sweden
| | - Zaal Kokaia
- Lund Stem Cell Center, Lund University, Lund, Sweden
- Laboratory of Stem Cells and Restorative Neurology, Department of Clinical Sciences, BMC B10, Lund University, Lund, Sweden
| | - Johan Jakobsson
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Christopher H Douse
- Laboratory of Epigenetics and Chromatin Dynamics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC B11, Lund University, Lund, Sweden.
- Lund Stem Cell Center, Lund University, Lund, Sweden.
| |
Collapse
|
6
|
He H, Leng Y, Cao X, Zhu Y, Li X, Yuan Q, Zhang B, He W, Wei H, Liu X, Xu Q, Guo M, Zhang H, Yang L, Lv Y, Wang X, Shi C, Zhang Z, Chen W, Zhang B, Wang T, Yu X, Qian H, Zhang Q, Dai X, Liu C, Cui Y, Wang Y, Zheng X, Xiong G, Zhou Y, Qian Q, Shang L. The pan-tandem repeat map highlights multiallelic variants underlying gene expression and agronomic traits in rice. Nat Commun 2024; 15:7291. [PMID: 39181885 PMCID: PMC11344853 DOI: 10.1038/s41467-024-51854-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 08/20/2024] [Indexed: 08/27/2024] Open
Abstract
Tandem repeats (TRs) are genomic regions that tandemly change in repeat number, which are often multiallelic. Their characteristics and contributions to gene expression and quantitative traits in rice are largely unknown. Here, we survey rice TR variations based on 231 genome assemblies and the rice pan-genome graph. We identify 227,391 multiallelic TR loci, including 54,416 TR variations that are absent from the Nipponbare reference genome. Only 1/3 TR variations show strong linkage with nearby bi-allelic variants (SNPs, Indels and PAVs). Using 193 panicle and 202 leaf transcriptomic data, we reveal 485 and 511 TRs act as QTLs independently of other bi-allelic variations to nearby gene expression, respectively. Using plant height and grain width as examples, we identify and validate TRs contributions to rice agronomic trait variations. These findings would enhance our understanding of the functions of multiallelic variants and facilitate rice molecular breeding.
Collapse
Affiliation(s)
- Huiying He
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yue Leng
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xinglan Cao
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
- State Key Laboratory of Crop Stress Adaptation and Improvement, School of Life Sciences, Henan University, Kaifeng, 475004, China
- Shenzhen Research Institute of Henan university, Shenzhen, 518000, China
| | - Yiwang Zhu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
- Institute of Biotechnology, Fujian Academy of Agricultural Sciences/Fujian Provincial Key Laboratory of Genetic Engineering for Agriculture, Fuzhou, 350003, China
| | - Xiaoxia Li
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Qiaoling Yuan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Bin Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
- Yazhouwan National Laboratory, Sanya, 572024, China
| | - Wenchuang He
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Hua Wei
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xiangpei Liu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Qiang Xu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Mingliang Guo
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Hong Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Longbo Yang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yang Lv
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xianmeng Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Chuanlin Shi
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Zhipeng Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Wu Chen
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Bintao Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Tianyi Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xiaoman Yu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Hongge Qian
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Qianqian Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xiaofan Dai
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Congcong Liu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yan Cui
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yuexing Wang
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China
| | - Xiaoming Zheng
- National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Science, Chinese Academy of Agricultural Sciences, 100081, Beijing, China
| | - Guosheng Xiong
- Academy for Advanced Interdisciplinary Studies, Plant Phenomics Research Center, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Yongfeng Zhou
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Qian Qian
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China.
- Yazhouwan National Laboratory, Sanya, 572024, China.
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China.
| | - Lianguang Shang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China.
- Yazhouwan National Laboratory, Sanya, 572024, China.
| |
Collapse
|
7
|
Ranathunge C, Welch ME. Clinal Variation in Short Tandem Repeats Linked to Gene Expression in Sunflower ( Helianthus annuus L.). Biomolecules 2024; 14:944. [PMID: 39199332 PMCID: PMC11352406 DOI: 10.3390/biom14080944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 07/25/2024] [Accepted: 08/01/2024] [Indexed: 09/01/2024] Open
Abstract
Short tandem repeat (STR) variation is rarely explored as a contributor to adaptive evolution. An intriguing mechanism involving STRs suggests that STRs function as "tuning knobs" of adaptation whereby stepwise changes in STR allele length have stepwise effects on phenotypes. Previously, we tested the predictions of the "tuning knob" model at the gene expression level by conducting an RNA-Seq experiment on natural populations of common sunflower (Helianthus annuus L.) transecting a well-defined cline from Kansas to Oklahoma. We identified 479 STRs with significant allele length effects on gene expression (eSTRs). In this study, we expanded the range to populations further north and south of the focal populations and used a targeted approach to study the relationship between STR allele length and gene expression in five selected eSTRs. Seeds from 96 individuals from six natural populations of sunflower from Nebraska and Texas were grown in a common garden. The individuals were genotyped at the five eSTRs, and gene expression was quantified with qRT-PCR. Linear regression models identified that eSTR length in comp26672 was significantly correlated with gene expression. Further, the length of comp26672 eSTR was significantly correlated with latitude across the range from Nebraska to Texas. The eSTR locus comp26672 was located in the CHUP1 gene, a gene associated with chloroplast movement in response to light intensity, which suggests a potential adaptive role for the eSTR locus. Collectively, our results from this targeted study show a consistent relationship between allele length and gene expression in some eSTRs across a broad geographical range in sunflower and suggest that some eSTRs may contribute to adaptive traits in common sunflower.
Collapse
|
8
|
Plavskin Y, de Biase MS, Ziv N, Janská L, Zhu YO, Hall DW, Schwarz RF, Tranchina D, Siegal ML. Spontaneous single-nucleotide substitutions and microsatellite mutations have distinct distributions of fitness effects. PLoS Biol 2024; 22:e3002698. [PMID: 38950062 PMCID: PMC11244821 DOI: 10.1371/journal.pbio.3002698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 07/12/2024] [Accepted: 06/04/2024] [Indexed: 07/03/2024] Open
Abstract
The fitness effects of new mutations determine key properties of evolutionary processes. Beneficial mutations drive evolution, yet selection is also shaped by the frequency of small-effect deleterious mutations, whose combined effect can burden otherwise adaptive lineages and alter evolutionary trajectories and outcomes in clonally evolving organisms such as viruses, microbes, and tumors. The small effect sizes of these important mutations have made accurate measurements of their rates difficult. In microbes, assessing the effect of mutations on growth can be especially instructive, as this complex phenotype is closely linked to fitness in clonally evolving organisms. Here, we perform high-throughput time-lapse microscopy on cells from mutation-accumulation strains to precisely infer the distribution of mutational effects on growth rate in the budding yeast, Saccharomyces cerevisiae. We show that mutational effects on growth rate are overwhelmingly negative, highly skewed towards very small effect sizes, and frequent enough to suggest that deleterious hitchhikers may impose a significant burden on evolving lineages. By using lines that accumulated mutations in either wild-type or slippage repair-defective backgrounds, we further disentangle the effects of 2 common types of mutations, single-nucleotide substitutions and simple sequence repeat indels, and show that they have distinct effects on yeast growth rate. Although the average effect of a simple sequence repeat mutation is very small (approximately 0.3%), many do alter growth rate, implying that this class of frequent mutations has an important evolutionary impact.
Collapse
Affiliation(s)
- Yevgeniy Plavskin
- Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
- Department of Biology, New York University, New York, New York, United States of America
| | - Maria Stella de Biase
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- Humboldt-Universität zu Berlin, Department of Biology, Berlin, Germany
| | - Naomi Ziv
- Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
- Department of Biology, New York University, New York, New York, United States of America
| | - Libuše Janská
- Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
- Department of Biology, New York University, New York, New York, United States of America
| | - Yuan O. Zhu
- Department of Genetics, Stanford University, Stanford, California, United States of America
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - David W. Hall
- Department of Genetics, University of Georgia, Athens, Georgia, United States of America
| | - Roland F. Schwarz
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- Institute for Computational Cancer Biology, Center for Integrated Oncology (CIO), Cancer Research Center Cologne Essen (CCCE), Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
| | - Daniel Tranchina
- Department of Biology, New York University, New York, New York, United States of America
- Courant Math Institute, New York University, New York, New York, United States of America
| | - Mark L. Siegal
- Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
- Department of Biology, New York University, New York, New York, United States of America
| |
Collapse
|
9
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024; 25:460-475. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
10
|
Rajan-Babu IS, Dolzhenko E, Eberle MA, Friedman JM. Sequence composition changes in short tandem repeats: heterogeneity, detection, mechanisms and clinical implications. Nat Rev Genet 2024; 25:476-499. [PMID: 38467784 DOI: 10.1038/s41576-024-00696-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2024] [Indexed: 03/13/2024]
Abstract
Short tandem repeats (STRs) are a class of repetitive elements, composed of tandem arrays of 1-6 base pair sequence motifs, that comprise a substantial fraction of the human genome. STR expansions can cause a wide range of neurological and neuromuscular conditions, known as repeat expansion disorders, whose age of onset, severity, penetrance and/or clinical phenotype are influenced by the length of the repeats and their sequence composition. The presence of non-canonical motifs, depending on the type, frequency and position within the repeat tract, can alter clinical outcomes by modifying somatic and intergenerational repeat stability, gene expression and mutant transcript-mediated and/or protein-mediated toxicities. Here, we review the diverse structural conformations of repeat expansions, technological advances for the characterization of changes in sequence composition, their clinical correlations and the impact on disease mechanisms.
Collapse
Affiliation(s)
- Indhu-Shree Rajan-Babu
- Department of Medical Genetics, The University of British Columbia, and Children's & Women's Hospital, Vancouver, British Columbia, Canada.
| | | | | | - Jan M Friedman
- Department of Medical Genetics, The University of British Columbia, and Children's & Women's Hospital, Vancouver, British Columbia, Canada
- BC Children's Hospital Research Institute, Vancouver, British Columbia, Canada
| |
Collapse
|
11
|
Chiu R, Rajan-Babu IS, Friedman JM, Birol I. A comprehensive tandem repeat catalog of the human genome. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.06.19.24309173. [PMID: 38947075 PMCID: PMC11213036 DOI: 10.1101/2024.06.19.24309173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
With the increasing availability of long-read sequencing data, high-quality human genome assemblies, and software for fully characterizing tandem repeats, genome-wide genotyping of tandem repeat loci on a population scale becomes more feasible. Such efforts not only expand our knowledge of the tandem repeat landscape in the human genome but also enhance our ability to differentiate pathogenic tandem repeat mutations from benign polymorphisms. To this end, we analyzed 272 genomes assembled using datasets from three public initiatives that employed different long-read sequencing technologies. Here, we report a catalog of over 18 million tandem repeat loci, many of which were previously unannotated. Some of these loci are highly polymorphic, and many of them reside within coding sequences.
Collapse
Affiliation(s)
- Readman Chiu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
| | - Indhu-Shree Rajan-Babu
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| | - Jan M Friedman
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
- BC Children's Hospital Research Institute, Vancouver, BC V5Z 4H4, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| |
Collapse
|
12
|
Plavskin Y, de Biase MS, Ziv N, Janská L, Zhu YO, Hall DW, Schwarz RF, Tranchina D, Siegal ML. Spontaneous single-nucleotide substitutions and microsatellite mutations have distinct distributions of fitness effects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.04.547687. [PMID: 37461506 PMCID: PMC10349969 DOI: 10.1101/2023.07.04.547687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/28/2023]
Abstract
The fitness effects of new mutations determine key properties of evolutionary processes. Beneficial mutations drive evolution, yet selection is also shaped by the frequency of small-effect deleterious mutations, whose combined effect can burden otherwise adaptive lineages and alter evolutionary trajectories and outcomes in clonally evolving organisms such as viruses, microbes, and tumors. The small effect sizes of these important mutations have made accurate measurements of their rates difficult. In microbes, assessing the effect of mutations on growth can be especially instructive, as this complex phenotype is closely linked to fitness in clonally evolving organisms. Here, we perform high-throughput time-lapse microscopy on cells from mutation-accumulation strains to precisely infer the distribution of mutational effects on growth rate in the budding yeast, Saccharomyces cerevisiae. We show that mutational effects on growth rate are overwhelmingly negative, highly skewed towards very small effect sizes, and frequent enough to suggest that deleterious hitchhikers may impose a significant burden on evolving lineages. By using lines that accumulated mutations in either wild-type or slippage repair-defective backgrounds, we further disentangle the effects of two common types of mutations, single-nucleotide substitutions and simple sequence repeat indels, and show that they have distinct effects on yeast growth rate. Although the average effect of a simple sequence repeat mutation is very small (~0.3%), many do alter growth rate, implying that this class of frequent mutations has an important evolutionary impact.
Collapse
|
13
|
Liang Y, Hao J, Wang J, Zhang G, Su Y, Liu Z, Wang T. Statistical Genomics Analysis of Simple Sequence Repeats from the Paphiopedilum Malipoense Transcriptome Reveals Control Knob Motifs Modulating Gene Expression. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2304848. [PMID: 38647414 PMCID: PMC11200097 DOI: 10.1002/advs.202304848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 02/26/2024] [Indexed: 04/25/2024]
Abstract
Simple sequence repeats (SSRs) are found in nonrandom distributions in genomes and are thought to impact gene expression. The distribution patterns of 48 295 SSRs of Paphiopedilum malipoense are mined and characterized based on the first full-length transcriptome and comprehensive transcriptome dataset from 12 organs. Statistical genomics analyses are used to investigate how SSRs in transcripts affect gene expression. The results demonstrate the correlations between SSR distributions, characteristics, and expression level. Nine expression-modulating motifs (expMotifs) are identified and a model is proposed to explain the effect of their key features, potency, and gene function on an intra-transcribed region scale. The expMotif-transcribed region combination is the most predominant contributor to the expression-modulating effect of SSRs, and some intra-transcribed regions are critical for this effect. Genes containing the same type of expMotif-SSR elements in the same transcribed region are likely linked in function, regulation, or evolution aspects. This study offers novel evidence to understand how SSRs regulate gene expression and provides potential regulatory elements for plant genetic engineering.
Collapse
Affiliation(s)
- Yingyi Liang
- College of Life SciencesSouth China Agricultural UniversityGuangzhou510642China
| | - Jing Hao
- College of Life SciencesSouth China Agricultural UniversityGuangzhou510642China
| | - Jieyu Wang
- College of Forestry and Landscape ArchitectureSouth China Agricultural UniversityGuangzhou510642China
| | - Guoqiang Zhang
- Key Laboratory of National Forestry and Grassland Administration for Orchid Conservation and Utilization at College of Landscape Architecture and ArtFujian Agriculture and Forestry UniversityFuzhou350002China
| | - Yingjuan Su
- School of Life SciencesSun Yat‐sen UniversityGuangzhou510275China
- Research Institute of Sun Yat‐sen University in ShenzhenShenzhen518107China
| | - Zhong‐Jian Liu
- Key Laboratory of National Forestry and Grassland Administration for Orchid Conservation and Utilization at College of Landscape Architecture and ArtFujian Agriculture and Forestry UniversityFuzhou350002China
| | - Ting Wang
- College of Life SciencesSouth China Agricultural UniversityGuangzhou510642China
| |
Collapse
|
14
|
Chang KH, Chen CM. The Role of NRF2 in Trinucleotide Repeat Expansion Disorders. Antioxidants (Basel) 2024; 13:649. [PMID: 38929088 PMCID: PMC11200942 DOI: 10.3390/antiox13060649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 05/20/2024] [Accepted: 05/23/2024] [Indexed: 06/28/2024] Open
Abstract
Trinucleotide repeat expansion disorders, a diverse group of neurodegenerative diseases, are caused by abnormal expansions within specific genes. These expansions trigger a cascade of cellular damage, including protein aggregation and abnormal RNA binding. A key contributor to this damage is oxidative stress, an imbalance of reactive oxygen species that harms cellular components. This review explores the interplay between oxidative stress and the NRF2 pathway in these disorders. NRF2 acts as the master regulator of the cellular antioxidant response, orchestrating the expression of enzymes that combat oxidative stress. Trinucleotide repeat expansion disorders often exhibit impaired NRF2 signaling, resulting in inadequate responses to excessive ROS production. NRF2 activation has been shown to upregulate antioxidative gene expression, effectively alleviating oxidative stress damage. NRF2 activators, such as omaveloxolone, vatiquinone, curcumin, sulforaphane, dimethyl fumarate, and resveratrol, demonstrate neuroprotective effects by reducing oxidative stress in experimental cell and animal models of these diseases. However, translating these findings into successful clinical applications requires further research. In this article, we review the literature supporting the role of NRF2 in the pathogenesis of these diseases and the potential therapeutics of NRF2 activators.
Collapse
Affiliation(s)
- Kuo-Hsuan Chang
- Department of Neurology, Chang Gung Memorial Hospital, Linkou Medical Center, Kueishan, Taoyuan 333, Taiwan;
- College of Medicine, Chang Gung University, Taoyuan 333, Taiwan
| | - Chiung-Mei Chen
- Department of Neurology, Chang Gung Memorial Hospital, Linkou Medical Center, Kueishan, Taoyuan 333, Taiwan;
- College of Medicine, Chang Gung University, Taoyuan 333, Taiwan
| |
Collapse
|
15
|
Hiatt L, Weisburd B, Dolzhenko E, VanNoy GE, Kurtas EN, Rehm HL, Quinlan A, Dashnow H. STRchive: a dynamic resource detailing population-level and locus-specific insights at tandem repeat disease loci. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.21.24307682. [PMID: 38826469 PMCID: PMC11142282 DOI: 10.1101/2024.05.21.24307682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Approximately 3% of the human genome consists of repetitive elements called tandem repeats (TRs), which include short tandem repeats (STRs) of 1-6bp motifs and variable number tandem repeats (VNTRs) of 7+bp motifs. TR variants contribute to several dozen mono- and polygenic diseases but remain understudied and "enigmatic," particularly relative to single nucleotide variants. It remains comparatively challenging to interpret the clinical significance of TR variants. Although existing resources provide portions of necessary data for interpretation at disease-associated loci, it is currently difficult or impossible to efficiently invoke the additional details critical to proper interpretation, such as motif pathogenicity, disease penetrance, and age of onset distributions. It is also often unclear how to apply population information to analyses. We present STRchive (S-T-archive, http://strchive.org/ ), a dynamic resource consolidating information on TR disease loci in humans from research literature, up-to-date clinical resources, and large-scale genomic databases, with the goal of streamlining TR variant interpretation at disease-associated loci. We apply STRchive -including pathogenic thresholds, motif classification, and clinical phenotypes-to a gnomAD cohort of ∼18.5k individuals genotyped at 60 disease-associated loci. Through detailed literature curation, we demonstrate that the majority of TR diseases affect children despite being thought of as adult diseases. Additionally, we show that pathogenic genotypes can be found within gnomAD which do not necessarily overlap with known disease prevalence, and leverage STRchive to interpret locus-specific findings therein. We apply a diagnostic blueprint empowered by STRchive to relevant clinical vignettes, highlighting possible pitfalls in TR variant interpretation. As a living resource, STRchive is maintained by experts, takes community contributions, and will evolve as understanding of TR diseases progresses.
Collapse
|
16
|
Maciocha F, Suchanecka A, Chmielowiec K, Chmielowiec J, Ciechanowicz A, Boroń A. Correlations of the CNR1 Gene with Personality Traits in Women with Alcohol Use Disorder. Int J Mol Sci 2024; 25:5174. [PMID: 38791212 PMCID: PMC11121729 DOI: 10.3390/ijms25105174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 05/02/2024] [Accepted: 05/07/2024] [Indexed: 05/26/2024] Open
Abstract
Alcohol use disorder (AUD) is a significant issue affecting women, with severe consequences for society, the economy, and most importantly, health. Both personality and alcohol use disorders are phenotypically very complex, and elucidating their shared heritability is a challenge for medical genetics. Therefore, our study investigated the correlations between the microsatellite polymorphism (AAT)n of the Cannabinoid Receptor 1 (CNR1) gene and personality traits in women with AUD. The study group included 187 female subjects. Of these, 93 were diagnosed with alcohol use disorder, and 94 were controls. Repeat length polymorphism of microsatellite regions (AAT)n in the CNR1 gene was identified with PCR. All participants were assessed with the Mini-International Neuropsychiatric Interview and completed the NEO Five-Factor and State-Trait Anxiety Inventories. In the group of AUD subjects, significantly fewer (AAT)n repeats were present when compared with controls (p = 0.0380). While comparing the alcohol use disorder subjects (AUD) and the controls, we observed significantly higher scores on the STAI trait (p < 0.00001) and state scales (p = 0.0001) and on the NEO Five-Factor Inventory Neuroticism (p < 0.00001) and Openness (p = 0.0237; insignificant after Bonferroni correction) scales. Significantly lower results were obtained on the NEO-FFI Extraversion (p = 0.00003), Agreeability (p < 0.00001) and Conscientiousness (p < 0.00001) scales by the AUD subjects when compared to controls. There was no statistically significant Pearson's linear correlation between the number of (AAT)n repeats in the CNR1 gene and the STAI and NEO Five-Factor Inventory scores in the group of AUD subjects. In contrast, Pearson's linear correlation analysis in controls showed a positive correlation between the number of the (AAT)n repeats and the STAI state scale (r = 0.184; p = 0.011; insignificant after Bonferroni correction) and a negative correlation with the NEO-FFI Openness scale (r = -0.241; p = 0.001). Interestingly, our study provided data on two separate complex issues, i.e., (1) the association of (AAT)n CNR1 repeats with the AUD in females; (2) the correlation of (AAT)n CNR1 repeats with anxiety as a state and Openness in non-alcohol dependent subjects. In conclusion, our study provided a plethora of valuable data for improving our understanding of alcohol use disorder and anxiety.
Collapse
Affiliation(s)
- Filip Maciocha
- Department of Clinical and Molecular Biochemistry, Pomeranian Medical University in Szczecin, Powstańców Wielkopolskich 72 St., 70-111 Szczecin, Poland; (F.M.); (A.C.)
| | - Aleksandra Suchanecka
- Independent Laboratory of Behavioral Genetics and Epigenetics, Pomeranian Medical University in Szczecin, Powstańców Wielkopolskich 72 St., 70-111 Szczecin, Poland;
| | - Krzysztof Chmielowiec
- Department of Hygiene and Epidemiology, Collegium Medicum, University of Zielona Góra, 28 Zyty St., 65-046 Zielona Góra, Poland; (K.C.); (J.C.)
| | - Jolanta Chmielowiec
- Department of Hygiene and Epidemiology, Collegium Medicum, University of Zielona Góra, 28 Zyty St., 65-046 Zielona Góra, Poland; (K.C.); (J.C.)
| | - Andrzej Ciechanowicz
- Department of Clinical and Molecular Biochemistry, Pomeranian Medical University in Szczecin, Powstańców Wielkopolskich 72 St., 70-111 Szczecin, Poland; (F.M.); (A.C.)
| | - Agnieszka Boroń
- Department of Clinical and Molecular Biochemistry, Pomeranian Medical University in Szczecin, Powstańców Wielkopolskich 72 St., 70-111 Szczecin, Poland; (F.M.); (A.C.)
| |
Collapse
|
17
|
Hamilton F, Mitchell R, Ghazal P, Timpson N. Phenotypic Associations With the HMOX1 GT(n) Repeat in European Populations. Am J Epidemiol 2024; 193:718-726. [PMID: 37414746 PMCID: PMC11074708 DOI: 10.1093/aje/kwad154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 12/21/2023] [Accepted: 07/03/2023] [Indexed: 07/08/2023] Open
Abstract
Heme oxygenase 1 is a key enzyme in the management of heme in humans. A GT(n) repeat length in the heme oxygenase 1 gene (HMOX1) has been widely associated with a variety of phenotypes, including susceptibility to and outcomes in diabetes, cancer, infections, and neonatal jaundice. However, studies have generally been small and results inconsistent. In this study, we imputed the GT(n) repeat length in participants from 2 UK cohort studies (the UK Biobank study (n = 463,005; recruited in 2006-2010) and the Avon Longitudinal Study of Parents and Children (ALSPAC; n = 937; recruited in 1990-1991)), with the reliability of imputation tested in other cohorts (1000 Genomes Project, Human Genome Diversity Project, and Personal Genome Project UK). Subsequently, we measured the relationship between repeat length and previously identified associations (diabetes, chronic obstructive pulmonary disease, pneumonia, and infection-related mortality in the UK Biobank; neonatal jaundice in ALSPAC) and performed a phenomewide association study in the UK Biobank. Despite high-quality imputation (correlation between true repeat length and imputed repeat length > 0.9 in test cohorts), clinical associations were not identified in either the phenomewide association study or specific association studies. These findings were robust to definitions of repeat length and sensitivity analyses. Despite multiple smaller studies identifying associations across a variety of clinical settings, we could not replicate or identify any relevant phenotypic associations with the HMOX1 GT(n) repeat.
Collapse
Affiliation(s)
- Fergus Hamilton
- Correspondence to Dr. Fergus Hamilton, MRC Integrative Epidemiology Unit, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, United Kingdom (e-mail: )
| | | | | | | |
Collapse
|
18
|
Tajeddin N, Arabfard M, Alizadeh S, Salesi M, Khamse S, Delbari A, Ohadi M. Novel islands of GGC and GCC repeats coincide with human evolution. Gene 2024; 902:148194. [PMID: 38262548 DOI: 10.1016/j.gene.2024.148194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 10/29/2023] [Accepted: 01/18/2024] [Indexed: 01/25/2024]
Abstract
BACKGROUND Because of high mutation rate, overrepresentation in genic regions, and link with various neurological, neurodegenerative, and movement disorders, GGC and GCC short tandem repeats (STRs) are prone to natural selection. Among a number of lacking data, the 3-repeats of these STRs remain widely unexplored. RESULTS In a genome-wide search in human, here we mapped GGC and GCC STRs of ≥3-repeats, and found novel islands of up to 45 of those STRs, populating spans of 1 to 2 kb of genomic DNA. RGPD4 and NOC4L harbored the densest (GGC)3 (probability 3.09061E-71) and (GCC)3 (probability 1.72376E-61) islands, respectively, and were human-specific. We also found prime instances of directional incremented density of STRs at specific loci in human versus other species, including the FOXK2 and SKI GGC islands. The genes containing those islands significantly diverged in expression in human versus other species, and the proteins encoded by those genes interact closely in a physical interaction network, consequence of which may be human-specific characteristics such as higher order brain functions. CONCLUSION We report novel islands of GGC and GCC STRs of evolutionary relevance to human. The density, and in some instances, periodicity of these islands support them as a novel genomic entity, which need to be further explored in evolutionary, mechanistic, and functional platforms.
Collapse
Affiliation(s)
- N Tajeddin
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Arabfard
- Chemical Injuries Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - S Alizadeh
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Salesi
- Chemical Injuries Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - S Khamse
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - A Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| |
Collapse
|
19
|
Goldberg ME, Noyes MD, Eichler EE, Quinlan AR, Harris K. Effects of parental age and polymer composition on short tandem repeat de novo mutation rates. Genetics 2024; 226:iyae013. [PMID: 38298127 PMCID: PMC10990422 DOI: 10.1093/genetics/iyae013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 08/11/2023] [Accepted: 01/05/2024] [Indexed: 02/02/2024] Open
Abstract
Short tandem repeats (STRs) are hotspots of genomic variability in the human germline because of their high mutation rates, which have long been attributed largely to polymerase slippage during DNA replication. This model suggests that STR mutation rates should scale linearly with a father's age, as progenitor cells continually divide after puberty. In contrast, it suggests that STR mutation rates should not scale with a mother's age at her child's conception, since oocytes spend a mother's reproductive years arrested in meiosis II and undergo a fixed number of cell divisions that are independent of the age at ovulation. Yet, mirroring recent findings, we find that STR mutation rates covary with paternal and maternal age, implying that some STR mutations are caused by DNA damage in quiescent cells rather than polymerase slippage in replicating progenitor cells. These results echo the recent finding that DNA damage in oocytes is a significant source of de novo single nucleotide variants and corroborate evidence of STR expansion in postmitotic cells. However, we find that the maternal age effect is not confined to known hotspots of oocyte mutagenesis, nor are postzygotic mutations likely to contribute significantly. STR nucleotide composition demonstrates divergent effects on de novo mutation (DNM) rates between sexes. Unlike the paternal lineage, maternally derived DNMs at A/T STRs display a significantly greater association with maternal age than DNMs at G/C-containing STRs. These observations may suggest the mechanism and developmental timing of certain STR mutations and contradict prior attribution of replication slippage as the primary mechanism of STR mutagenesis.
Collapse
Affiliation(s)
- Michael E Goldberg
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT 84112, USA
| | - Michelle D Noyes
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Aaron R Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT 84112, USA
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Computational Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| |
Collapse
|
20
|
Fang Y, Bansal K, Mostafavi S, Benoist C, Mathis D. AIRE relies on Z-DNA to flag gene targets for thymic T cell tolerization. Nature 2024; 628:400-407. [PMID: 38480882 PMCID: PMC11091860 DOI: 10.1038/s41586-024-07169-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 02/06/2024] [Indexed: 03/18/2024]
Abstract
AIRE is an unconventional transcription factor that enhances the expression of thousands of genes in medullary thymic epithelial cells and promotes clonal deletion or phenotypic diversion of self-reactive T cells1-4. The biological logic of AIRE's target specificity remains largely unclear as, in contrast to many transcription factors, it does not bind to a particular DNA sequence motif. Here we implemented two orthogonal approaches to investigate AIRE's cis-regulatory mechanisms: construction of a convolutional neural network and leveraging natural genetic variation through analysis of F1 hybrid mice5. Both approaches nominated Z-DNA and NFE2-MAF as putative positive influences on AIRE's target choices. Genome-wide mapping studies revealed that Z-DNA-forming and NFE2L2-binding motifs were positively associated with the inherent ability of a gene's promoter to generate DNA double-stranded breaks, and promoters showing strong double-stranded break generation were more likely to enter a poised state with accessible chromatin and already-assembled transcriptional machinery. Consequently, AIRE preferentially targets genes with poised promoters. We propose a model in which Z-DNA anchors the AIRE-mediated transcriptional program by enhancing double-stranded break generation and promoter poising. Beyond resolving a long-standing mechanistic conundrum, these findings suggest routes for manipulating T cell tolerance.
Collapse
Affiliation(s)
- Yuan Fang
- Department of Immunology, Harvard Medical School, Boston, MA, USA
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Kushagra Bansal
- Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore, India
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
- Canadian Institute for Advanced Research, Toronto, Ontario, Canada
| | | | - Diane Mathis
- Department of Immunology, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
21
|
Oketch JW, Wain LV, Hollox EJ. A comparison of software for analysis of rare and common short tandem repeat (STR) variation using human genome sequences from clinical and population-based samples. PLoS One 2024; 19:e0300545. [PMID: 38558075 PMCID: PMC10984476 DOI: 10.1371/journal.pone.0300545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 02/27/2024] [Indexed: 04/04/2024] Open
Abstract
Short tandem repeat (STR) variation is an often overlooked source of variation between genomes. STRs comprise about 3% of the human genome and are highly polymorphic. Some cause Mendelian disease, and others affect gene expression. Their contribution to common disease is not well-understood, but recent software tools designed to genotype STRs using short read sequencing data will help address this. Here, we compare software that genotypes common STRs and rarer STR expansions genome-wide, with the aim of applying them to population-scale genomes. By using the Genome-In-A-Bottle (GIAB) consortium and 1000 Genomes Project short-read sequencing data, we compare performance in terms of sequence length, depth, computing resources needed, genotyping accuracy and number of STRs genotyped. To ensure broad applicability of our findings, we also measure genotyping performance against a set of genomes from clinical samples with known STR expansions, and a set of STRs commonly used for forensic identification. We find that HipSTR, ExpansionHunter and GangSTR perform well in genotyping common STRs, including the CODIS 13 core STRs used for forensic analysis. GangSTR and ExpansionHunter outperform HipSTR for genotyping call rate and memory usage. ExpansionHunter denovo (EHdn), STRling and GangSTR outperformed STRetch for detecting expanded STRs, and EHdn and STRling used considerably less processor time compared to GangSTR. Analysis on shared genomic sequence data provided by the GIAB consortium allows future performance comparisons of new software approaches on a common set of data, facilitating comparisons and allowing researchers to choose the best software that fulfils their needs.
Collapse
Affiliation(s)
- John W. Oketch
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| | - Louise V. Wain
- Department of Population Health Sciences, University of Leicester, Leicester, United Kingdom
- National Institute for Health Research, Leicester Respiratory Biomedical Research Centre, Glenfield Hospital, Leicester, United Kingdom
| | - Edward J. Hollox
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| |
Collapse
|
22
|
Fazzari V, Moo-Choy A, Panoyan MA, Abbatangelo CL, Polimanti R, Novroski NM, Wendt FR. Multi-ancestry tandem repeat association study of hair colour using exome-wide sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.24.581865. [PMID: 38464141 PMCID: PMC10925195 DOI: 10.1101/2024.02.24.581865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Hair colour variation is influenced by hundreds of positions across the human genome but this genetic contribution has only been narrowly explored. Genome-wide association studies identified single nucleotide polymorphisms (SNPs) influencing hair colour but the biology underlying these associations is challenging to interpret. We report 16 tandem repeats (TRs) with effects on different models of hair colour plus two TRs associated with hair colour in diverse ancestry groups. Several of these TRs expand or contract amino acid coding regions of their localized protein such that structure, and by extension function, may be altered. We also demonstrate that independent of SNP variation, these TRs can be used to great an additive polygenic score that predicts darker hair colour. This work adds to the growing body of evidence regarding TR influence on human traits with relatively large and independent effects relative to surrounding SNP variation.
Collapse
|
23
|
Timmaraju VA, Finkelstein SD, Levine JA. Analytical Validation of Loss of Heterozygosity and Mutation Detection in Pancreatic Fine-Needle Aspirates by Capillary Electrophoresis and Sanger Sequencing. Diagnostics (Basel) 2024; 14:514. [PMID: 38472986 DOI: 10.3390/diagnostics14050514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 02/15/2024] [Accepted: 02/23/2024] [Indexed: 03/14/2024] Open
Abstract
Pancreatic cystic disease, including duct dilation, represents precursor states towards the development of pancreatic cancer, a form of malignancy with relatively low incidence but high mortality. While most of these cysts (>85%) are benign, the remainder can progress over time, leading to malignant transformation, invasion, and metastasis. Cytologic diagnosis is challenging, limited by the paucity or complete absence of cells representative of cystic lesions and fibrosis. Molecular analysis of fluids collected from endoscopic-guided fine-needle aspiration of pancreatic cysts and dilated duct lesions can be used to evaluate the risk of progression to malignancy. The basis for the enhanced diagnostic utility of molecular approaches is the ability to interrogate cell-free nucleic acid of the cyst/duct and/or extracellular fluid. The allelic imbalances at tumor suppressor loci and the selective oncogenic drivers are used clinically to help differentiate benign stable pancreatic cysts from those progressing toward high-grade dysplasia. Methods are discussed and used to determine the efficacy for diagnostic implementation. Here, we report the analytical validation of methods to detect causally associated molecular changes integral to the pathogenesis of pancreatic cancer from pancreatic cyst fluids.
Collapse
|
24
|
Arabfard M, Tajeddin N, Alizadeh S, Salesi M, Bayat H, Khorram Khorshid HR, Khamse S, Delbari A, Ohadi M. Dyads of GGC and GCC form hotspot colonies that coincide with the evolution of human and other great apes. BMC Genom Data 2024; 25:21. [PMID: 38383300 PMCID: PMC10880355 DOI: 10.1186/s12863-024-01207-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 02/11/2024] [Indexed: 02/23/2024] Open
Abstract
BACKGROUND GGC and GCC short tandem repeats (STRs) are of various evolutionary, biological, and pathological implications. However, the fundamental two-repeats (dyads) of these STRs are widely unexplored. RESULTS On a genome-wide scale, we mapped (GGC)2 and (GCC)2 dyads in human, and found monumental colonies (distance between each dyad < 500 bp) of extraordinary density, and in some instances periodicity. The largest (GCC)2 and (GGC)2 colonies were intergenic, homogeneous, and human-specific, consisting of 219 (GCC)2 on chromosome 2 (probability < 1.545E-219) and 70 (GGC)2 on chromosome 9 (probability = 1.809E-148). We also found that several colonies were shared in other great apes, and directionally increased in density and complexity in human, such as a colony of 99 (GCC)2 on chromosome 20, that specifically expanded in great apes, and reached maximum complexity in human (probability 1.545E-220). Numerous other colonies of evolutionary relevance in human were detected in other largely overlooked regions of the genome, such as chromosome Y and pseudogenes. Several of the genes containing or nearest to those colonies were divergently expressed in human. CONCLUSION In conclusion, (GCC)2 and (GGC)2 form unprecedented genomic colonies that coincide with the evolution of human and other great apes. The extent of the genomic rearrangements leading to those colonies support overlooked recombination hotspots, shared across great apes. The identified colonies deserve to be studied in mechanistic, evolutionary, and functional platforms.
Collapse
Affiliation(s)
- M Arabfard
- Chemical Injuries Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - N Tajeddin
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
- Department of Biology, Central Tehran Branch, Islamic Azad University, Tehran, Iran
| | - S Alizadeh
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Salesi
- Chemical Injuries Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
- Research Center for Prevention of Oral and Dental Diseases, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - H Bayat
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - H R Khorram Khorshid
- Personalized Medicine and Genometabolomics Research Center, Hope Generation Foundation, Tehran, Iran
| | - S Khamse
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - A Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| |
Collapse
|
25
|
Verbiest MA, Lundström O, Xia F, Baudis M, Bilgin Sonay T, Anisimova M. Short tandem repeat mutations regulate gene expression in colorectal cancer. Sci Rep 2024; 14:3331. [PMID: 38336885 PMCID: PMC10858039 DOI: 10.1038/s41598-024-53739-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 02/04/2024] [Indexed: 02/12/2024] Open
Abstract
Short tandem repeat (STR) mutations are prevalent in colorectal cancer (CRC), especially in tumours with the microsatellite instability (MSI) phenotype. While STR length variations are known to regulate gene expression under physiological conditions, the functional impact of STR mutations in CRC remains unclear. Here, we integrate STR mutation data with clinical information and gene expression data to study the gene regulatory effects of STR mutations in CRC. We confirm that STR mutability in CRC highly depends on the MSI status, repeat unit size, and repeat length. Furthermore, we present a set of 1244 putative expression STRs (eSTRs) for which the STR length is associated with gene expression levels in CRC tumours. The length of 73 eSTRs is associated with expression levels of cancer-related genes, nine of which are CRC-specific genes. We show that linear models describing eSTR-gene expression relationships allow for predictions of gene expression changes in response to eSTR mutations. Moreover, we found an increased mutability of eSTRs in MSI tumours. Our evidence of gene regulatory roles for eSTRs in CRC highlights a mostly overlooked way through which tumours may modulate their phenotypes. Future extensions of these findings could uncover new STR-based targets in the treatment of cancer.
Collapse
Affiliation(s)
- Max A Verbiest
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland.
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| | - Oxana Lundström
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Feifei Xia
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Michael Baudis
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Tugce Bilgin Sonay
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Ecology, Evolution and Environmental Biology, Columbia University, New York, USA
| | - Maria Anisimova
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
26
|
Fazal S, Danzi MC, Xu I, Kobren SN, Sunyaev S, Reuter C, Marwaha S, Wheeler M, Dolzhenko E, Lucas F, Wuchty S, Tekin M, Züchner S, Aguiar-Pulido V. RExPRT: a machine learning tool to predict pathogenicity of tandem repeat loci. Genome Biol 2024; 25:39. [PMID: 38297326 PMCID: PMC10832122 DOI: 10.1186/s13059-024-03171-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 01/10/2024] [Indexed: 02/02/2024] Open
Abstract
Expansions of tandem repeats (TRs) cause approximately 60 monogenic diseases. We expect that the discovery of additional pathogenic repeat expansions will narrow the diagnostic gap in many diseases. A growing number of TR expansions are being identified, and interpreting them is a challenge. We present RExPRT (Repeat EXpansion Pathogenicity pRediction Tool), a machine learning tool for distinguishing pathogenic from benign TR expansions. Our results demonstrate that an ensemble approach classifies TRs with an average precision of 93% and recall of 83%. RExPRT's high precision will be valuable in large-scale discovery studies, which require prioritization of candidate loci for follow-up studies.
Collapse
Affiliation(s)
- Sarah Fazal
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Biomedical Research Building (BRB), Miami, FL, 33136, USA
| | - Matt C Danzi
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Biomedical Research Building (BRB), Miami, FL, 33136, USA
| | - Isaac Xu
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Biomedical Research Building (BRB), Miami, FL, 33136, USA
| | | | - Shamil Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02155, USA
| | - Chloe Reuter
- Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, 94305, USA
- Division of Cardiovascular Medicine, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Shruti Marwaha
- Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, 94305, USA
- Division of Cardiovascular Medicine, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Matthew Wheeler
- Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, 94305, USA
- Division of Cardiovascular Medicine, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Francesca Lucas
- Department of Computer Science, Delft University of Technology, Delft, The Netherlands
| | - Stefan Wuchty
- Department of Computer Science, University of Miami, Miami, FL, USA
- Deptartment of Biology, University of Miami, Miami, FL, USA
- Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Mustafa Tekin
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Biomedical Research Building (BRB), Miami, FL, 33136, USA
| | - Stephan Züchner
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Biomedical Research Building (BRB), Miami, FL, 33136, USA.
| | | |
Collapse
|
27
|
Alizadeh S, Khamse S, Tajeddin N, Khorram Khorshid HR, Delbari A, Ohadi M. A GCC repeat in RAB26 undergoes natural selection in human and harbors divergent genotypes in late-onset Alzheimer's disease. Gene 2024; 893:147968. [PMID: 37931854 DOI: 10.1016/j.gene.2023.147968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 10/28/2023] [Accepted: 11/03/2023] [Indexed: 11/08/2023]
Abstract
Although mainly located in genic regions and being mutation hotspots, intact blocks of CG-rich trinucleotide short tandem repeats (STRs) are largely overlooked with respect to their link with natural selection. The human RAB26 (member RAS oncogene family) directs synaptic and secretory vesicles into preautophagosomal structures, inhibition of which specifically disrupts axonal transport of degradative organelles and leads to an axonal dystrophy, resembling Alzheimer's disease (AD). Human RAB26 contains a GCC repeat in the top 1st percent in respect of length. Here we sequenced this STR in 441 Iranian individuals, consisting of late-onset neurocognitive disorder (NCD) (N = 216) and controls (N = 225). In both groups, the 12-repeat allele and the 12/12 genotype were predominantly abundant. We found excess of homozygosity for non-12 alleles in the NCD group (Mid-P exact = 0.027). Furthermore, divergent genotypes were detected that were specific to the NCD group (2.8% of genotypes) (Mid-P exact = 0.006) or controls (3.1% of genotypes) (Mid-P exact = 0.004). The patients harboring divergent genotypes received the diagnosis of AD. Based on the predominant abundance of the 12-repeat and 12/12 genotype in both groups, excess of non-12 homozygosity in the NCD group, and divergent genotypes across the NCD and control groups, we propose natural selection at this locus and link with late-onset AD. Our findings strengthen the hypothesis that a collection of rare genotypes unambiguously contribute to the pathogenesis of late-onset NCDs, such as AD.
Collapse
Affiliation(s)
- S Alizadeh
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - S Khamse
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - N Tajeddin
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - H R Khorram Khorshid
- Personalized Medicine and Genometabolomics Research Center, Hope Generation Foundation, Tehran, Iran
| | - A Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| | - M Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| |
Collapse
|
28
|
Hong EP, Ramos EM, Aziz NA, Massey TH, McAllister B, Lobanov S, Jones L, Holmans P, Kwak S, Orth M, Ciosi M, Lomeikaite V, Monckton DG, Long JD, Lucente D, Wheeler VC, Gillis T, MacDonald ME, Sequeiros J, Gusella JF, Lee JM. Modification of Huntington's disease by short tandem repeats. Brain Commun 2024; 6:fcae016. [PMID: 38449714 PMCID: PMC10917446 DOI: 10.1093/braincomms/fcae016] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 12/20/2023] [Accepted: 01/22/2024] [Indexed: 03/08/2024] Open
Abstract
Expansions of glutamine-coding CAG trinucleotide repeats cause a number of neurodegenerative diseases, including Huntington's disease and several of spinocerebellar ataxias. In general, age-at-onset of the polyglutamine diseases is inversely correlated with the size of the respective inherited expanded CAG repeat. Expanded CAG repeats are also somatically unstable in certain tissues, and age-at-onset of Huntington's disease corrected for individual HTT CAG repeat length (i.e. residual age-at-onset), is modified by repeat instability-related DNA maintenance/repair genes as demonstrated by recent genome-wide association studies. Modification of one polyglutamine disease (e.g. Huntington's disease) by the repeat length of another (e.g. ATXN3, CAG expansions in which cause spinocerebellar ataxia 3) has also been hypothesized. Consequently, we determined whether age-at-onset in Huntington's disease is modified by the CAG repeats of other polyglutamine disease genes. We found that the CAG measured repeat sizes of other polyglutamine disease genes that were polymorphic in Huntington's disease participants but did not influence Huntington's disease age-at-onset. Additional analysis focusing specifically on ATXN3 in a larger sample set (n = 1388) confirmed the lack of association between Huntington's disease residual age-at-onset and ATXN3 CAG repeat length. Additionally, neither our Huntington's disease onset modifier genome-wide association studies single nucleotide polymorphism data nor imputed short tandem repeat data supported the involvement of other polyglutamine disease genes in modifying Huntington's disease. By contrast, our genome-wide association studies based on imputed short tandem repeats revealed significant modification signals for other genomic regions. Together, our short tandem repeat genome-wide association studies show that modification of Huntington's disease is associated with short tandem repeats that do not involve other polyglutamine disease-causing genes, refining the landscape of Huntington's disease modification and highlighting the importance of rigorous data analysis, especially in genetic studies testing candidate modifiers.
Collapse
Affiliation(s)
- Eun Pyo Hong
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Neurology, Harvard Medical School, Boston, MA 02115, USA
- Medical and Population Genetics Program, The Broad Institute of M.I.T. and Harvard, Cambridge, MA 02142, USA
| | - Eliana Marisa Ramos
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Neurology, Harvard Medical School, Boston, MA 02115, USA
| | - N Ahmad Aziz
- Population & Clinical Neuroepidemiology, German Center for Neurodegenerative Diseases, 53127 Bonn, Germany
- Department of Neurology, Faculty of Medicine, University of Bonn, Bonn D-53113, Germany
| | - Thomas H Massey
- Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff CF24 4HQ, UK
| | - Branduff McAllister
- Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff CF24 4HQ, UK
| | - Sergey Lobanov
- Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff CF24 4HQ, UK
| | - Lesley Jones
- Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff CF24 4HQ, UK
| | - Peter Holmans
- Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff CF24 4HQ, UK
| | - Seung Kwak
- Molecular System Biology, CHDI Foundation, Princeton, NJ 08540, USA
| | - Michael Orth
- University Hospital of Old Age Psychiatry and Psychotherapy, Bern University, CH-3000 Bern 60, Switzerland
| | - Marc Ciosi
- School of Molecular Biosciences, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK
| | - Vilija Lomeikaite
- School of Molecular Biosciences, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK
| | - Darren G Monckton
- School of Molecular Biosciences, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK
| | - Jeffrey D Long
- Department of Psychiatry, Carver College of Medicine and Department of Biostatistics, College of Public Health, University of Iowa, Iowa City, IA 52242, USA
| | - Diane Lucente
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Vanessa C Wheeler
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Neurology, Harvard Medical School, Boston, MA 02115, USA
| | - Tammy Gillis
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Marcy E MacDonald
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Neurology, Harvard Medical School, Boston, MA 02115, USA
- Medical and Population Genetics Program, The Broad Institute of M.I.T. and Harvard, Cambridge, MA 02142, USA
| | - Jorge Sequeiros
- UnIGENe, IBMC—Institute for Molecular and Cell Biology, i3S—Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Porto 420-135, Portugal
- ICBAS School of Medicine and Biomedical Sciences, University of Porto, Porto 420-135, Portugal
| | - James F Gusella
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Medical and Population Genetics Program, The Broad Institute of M.I.T. and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Jong-Min Lee
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Neurology, Harvard Medical School, Boston, MA 02115, USA
- Medical and Population Genetics Program, The Broad Institute of M.I.T. and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
29
|
Manigbas CA, Jadhav B, Garg P, Shadrina M, Lee W, Martin-Trujillo A, Sharp AJ. A phenome-wide association study of tandem repeat variation in 168,554 individuals from the UK Biobank. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.22.24301630. [PMID: 38343850 PMCID: PMC10854328 DOI: 10.1101/2024.01.22.24301630] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2024]
Abstract
Most genetic association studies focus on binary variants. To identify the effects of multi-allelic variation of tandem repeats (TRs) on human traits, we performed direct TR genotyping and phenome-wide association studies in 168,554 individuals from the UK Biobank, identifying 47 TRs showing causal associations with 73 traits. We replicated 23 of 31 (74%) of these causal associations in the All of Us cohort. While this set included several known repeat expansion disorders, novel associations we found were attributable to common polymorphic variation in TR length rather than rare expansions and include e.g. a coding polyhistidine motif in HRCT1 influencing risk of hypertension and a poly(CGC) in the 5'UTR of GNB2 influencing heart rate. Causal TRs were strongly enriched for associations with local gene expression and DNA methylation. Our study highlights the contribution of multi-allelic TRs to the "missing heritability" of the human genome.
Collapse
|
30
|
Zhang J, Zhu B. Short, but matters: short tandem repeats confer variation in transcription factor-DNA binding. Sci Bull (Beijing) 2024; 69:9-10. [PMID: 38042705 DOI: 10.1016/j.scib.2023.11.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2023]
Affiliation(s)
- Jing Zhang
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; Key Laboratory of Epigenetic Regulation and Intervention, Chinese Academy of Sciences, Beijing 100101, China; New Cornerstone Science Laboratory, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Bing Zhu
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; Key Laboratory of Epigenetic Regulation and Intervention, Chinese Academy of Sciences, Beijing 100101, China; New Cornerstone Science Laboratory, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
31
|
Parikh K, Quintero Reis A, Wendt FR. Association between suicidal ideation and tandem repeats in contactins. Front Psychiatry 2024; 14:1236540. [PMID: 38239902 PMCID: PMC10794671 DOI: 10.3389/fpsyt.2023.1236540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 12/13/2023] [Indexed: 01/22/2024] Open
Abstract
Background Death by suicide is one of the leading causes of death among adolescents. Genome-wide association studies (GWAS) have identified loci that associate with suicidal ideation and related behaviours. One such group of loci are the six contactin genes (CNTN1-6) that are critical to neurodevelopment through regulating neurite structure. Because single nucleotide polymorphisms (SNPs) detected by GWAS often map to non-coding intergenic regions, we investigated whether repetitive variants in CNTNs associated with suicidality in a young cohort aged 8 to 21. Understanding the genetic liability of suicidal thought and behavior in this age group will promote early intervention and treatment. Methods Genotypic and phenotypic data were obtained from the Philadelphia Neurodevelopment Cohort (PNC). Across six CNTNs, 232 short tandem repeats (STRs) were analyzed in up to 4,595 individuals of European ancestry who expressed current, previous, or no suicidal ideation. STRs were imputed into SNP arrays using a phased SNP-STR haplotype reference panel from the 1000 Genomes Project. We tested several additive and interactive models of locus-level burden (i.e., sum of STR alleles) with respect to suicidal ideation. Additive models included sex, birth year, developmental stage ("DevStage"), and the first 10 principal components of ancestry as covariates; interactive models assessed the effect of STR-by-DevStage considering all other covariates. Results CNTN1-[T]N interacted with DevStage to increase risk for current suicidal ideation (CNTN1-[T]N-by-DevStage; p = 0.00035). Compared to the youngest age group, the middle (OR = 1.80, p = 0.0514) and oldest (OR = 3.82, p = 0.0002) participant groups had significantly higher odds of suicidal ideation as their STR length expanded; this result was independent of polygenic scores for suicidal ideation. Discussion These findings highlight diversity in the genetic effects (i.e., SNP and STR) acting on suicidal thoughts and behavior and advance our understanding of suicidal ideation across childhood and adolescence.
Collapse
Affiliation(s)
- Kairavi Parikh
- Forensic Science Program, University of Toronto, Mississauga, ON, Canada
| | - Andrea Quintero Reis
- Forensic Science Program, University of Toronto, Mississauga, ON, Canada
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
| | - Frank R. Wendt
- Forensic Science Program, University of Toronto, Mississauga, ON, Canada
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
| |
Collapse
|
32
|
Goldberg ME, Noyes MD, Eichler EE, Quinlan AR, Harris K. Effects of parental age and polymer composition on short tandem repeat de novo mutation rates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.22.573131. [PMID: 38187618 PMCID: PMC10769404 DOI: 10.1101/2023.12.22.573131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Short tandem repeats (STRs) are hotspots of genomic variability in the human germline because of their high mutation rates, which have long been attributed largely to polymerase slippage during DNA replication. This model suggests that STR mutation rates should scale linearly with a father's age, as progenitor cells continually divide after puberty. In contrast, it suggests that STR mutation rates should not scale with a mother's age at her child's conception, since oocytes spend a mother's reproductive years arrested in meiosis II and undergo a fixed number of cell divisions that are independent of the age at ovulation. Yet, mirroring recent findings, we find that STR mutation rates covary with paternal and maternal age, implying that some STR mutations are caused by DNA damage in quiescent cells rather than the classical mechanism of polymerase slippage in replicating progenitor cells. These results also echo the recent finding that DNA damage in quiescent oocytes is a significant source of de novo SNVs and corroborate evidence of STR expansion in postmitotic cells. However, we find that the maternal age effect is not confined to previously discovered hotspots of oocyte mutagenesis, nor are post-zygotic mutations likely to contribute significantly. STR nucleotide composition demonstrates divergent effects on DNM rates between sexes. Unlike the paternal lineage, maternally derived DNMs at A/T STRs display a significantly greater association with maternal age than DNMs at GC-containing STRs. These observations may suggest the mechanism and developmental timing of certain STR mutations and are especially surprising considering the prior belief in replication slippage as the dominant mechanism of STR mutagenesis.
Collapse
Affiliation(s)
- Michael E. Goldberg
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
- Departments of Human Genetics and Biomedical Informatics, University of Utah, 15 S 2030 E, Salt Lake City, UT, 84112
| | - Michelle D. Noyes
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
- Howard Hughes Medical Institute, 3720 15 Ave NE, University of Washington, Seattle, WA, 98195
| | - Aaron R. Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, 15 S 2030 E, Salt Lake City, UT, 84112
- These authors contributed equally to this work
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
- Computational Biology Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA, 98109
- These authors contributed equally to this work
| |
Collapse
|
33
|
Birnbaum R. Rediscovering tandem repeat variation in schizophrenia: challenges and opportunities. Transl Psychiatry 2023; 13:402. [PMID: 38123544 PMCID: PMC10733427 DOI: 10.1038/s41398-023-02689-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 11/23/2023] [Accepted: 11/27/2023] [Indexed: 12/23/2023] Open
Abstract
Tandem repeats (TRs) are prevalent throughout the genome, constituting at least 3% of the genome, and often highly polymorphic. The high mutation rate of TRs, which can be orders of magnitude higher than single-nucleotide polymorphisms and indels, indicates that they are likely to make significant contributions to phenotypic variation, yet their contribution to schizophrenia has been largely ignored by recent genome-wide association studies (GWAS). Tandem repeat expansions are already known causative factors for over 50 disorders, while common tandem repeat variation is increasingly being identified as significantly associated with complex disease and gene regulation. The current review summarizes key background concepts of tandem repeat variation as pertains to disease risk, elucidating their potential for schizophrenia association. An overview of next-generation sequencing-based methods that may be applied for TR genome-wide identification is provided, and some key methodological challenges in TR analyses are delineated.
Collapse
Affiliation(s)
- Rebecca Birnbaum
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
34
|
Panoyan MA, Wendt FR. The role of tandem repeat expansions in brain disorders. Emerg Top Life Sci 2023; 7:249-263. [PMID: 37401564 DOI: 10.1042/etls20230022] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 06/05/2023] [Accepted: 06/19/2023] [Indexed: 07/05/2023]
Abstract
The human genome contains numerous genetic polymorphisms contributing to different health and disease outcomes. Tandem repeat (TR) loci are highly polymorphic yet under-investigated in large genomic studies, which has prompted research efforts to identify novel variations and gain a deeper understanding of their role in human biology and disease outcomes. We summarize the current understanding of TRs and their implications for human health and disease, including an overview of the challenges encountered when conducting TR analyses and potential solutions to overcome these challenges. By shedding light on these issues, this article aims to contribute to a better understanding of the impact of TRs on the development of new disease treatments.
Collapse
Affiliation(s)
- Mary Anne Panoyan
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
| | - Frank R Wendt
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Forensic Science Program, University of Toronto, Mississauga, ON, Canada
| |
Collapse
|
35
|
Loh PR. Uncovering complex trait heritability hidden in the repeatome. CELL GENOMICS 2023; 3:100461. [PMID: 38116125 PMCID: PMC10726486 DOI: 10.1016/j.xgen.2023.100461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
Short tandem repeats (STRs) account for a substantial fraction of human genetic variation, but their contribution to complex human phenotypes is largely unknown. Margoliash et al. perform detailed genome-wide association analysis and fine-mapping of STRs in UK Biobank, identifying many STRs likely to influence variation in blood and serum traits.
Collapse
Affiliation(s)
- Po-Ru Loh
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
36
|
Margoliash J, Fuchs S, Li Y, Zhang X, Massarat A, Goren A, Gymrek M. Polymorphic short tandem repeats make widespread contributions to blood and serum traits. CELL GENOMICS 2023; 3:100458. [PMID: 38116119 PMCID: PMC10726533 DOI: 10.1016/j.xgen.2023.100458] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Revised: 09/09/2023] [Accepted: 11/07/2023] [Indexed: 12/21/2023]
Abstract
Short tandem repeats (STRs) are genomic regions consisting of repeated sequences of 1-6 bp in succession. Single-nucleotide polymorphism (SNP)-based genome-wide association studies (GWASs) do not fully capture STR effects. To study these effects, we imputed 445,720 STRs into genotype arrays from 408,153 White British UK Biobank participants and tested for association with 44 blood phenotypes. Using two fine-mapping methods, we identify 119 candidate causal STR-trait associations and estimate that STRs account for 5.2%-7.6% of causal variants identifiable from GWASs for these traits. These are among the strongest associations for multiple phenotypes, including a coding CTG repeat associated with apolipoprotein B levels, a promoter CGG repeat with platelet traits, and an intronic poly(A) repeat with mean platelet volume. Our study suggests that STRs make widespread contributions to complex traits, provides stringently selected candidate causal STRs, and demonstrates the need to consider a more complete view of genetic variation in GWASs.
Collapse
Affiliation(s)
- Jonathan Margoliash
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Shai Fuchs
- Pediatric Endocrine and Diabetes Unit, Edmond and Lily Safra Children's Hospital, Sheba Medical Center, Ramat Gan, Israel
| | - Yang Li
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA; Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Xuan Zhang
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Arya Massarat
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Alon Goren
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA.
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA; Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
37
|
Hannan AJ. Expanding horizons of tandem repeats in biology and medicine: Why 'genomic dark matter' matters. Emerg Top Life Sci 2023; 7:ETLS20230075. [PMID: 38088823 PMCID: PMC10754335 DOI: 10.1042/etls20230075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 11/27/2023] [Accepted: 11/27/2023] [Indexed: 12/30/2023]
Abstract
Approximately half of the human genome includes repetitive sequences, and these DNA sequences (as well as their transcribed repetitive RNA and translated amino-acid repeat sequences) are known as the repeatome. Within this repeatome there are a couple of million tandem repeats, dispersed throughout the genome. These tandem repeats have been estimated to constitute ∼8% of the entire human genome. These tandem repeats can be located throughout exons, introns and intergenic regions, thus potentially affecting the structure and function of tandemly repetitive DNA, RNA and protein sequences. Over more than three decades, more than 60 monogenic human disorders have been found to be caused by tandem-repeat mutations. These monogenic tandem-repeat disorders include Huntington's disease, a variety of ataxias, amyotrophic lateral sclerosis and frontotemporal dementia, as well as many other neurodegenerative diseases. Furthermore, tandem-repeat disorders can include fragile X syndrome, related fragile X disorders, as well as other neurological and psychiatric disorders. However, these monogenic tandem-repeat disorders, which were discovered via their dominant or recessive modes of inheritance, may represent the 'tip of the iceberg' with respect to tandem-repeat contributions to human disorders. A previous proposal that tandem repeats may contribute to the 'missing heritability' of various common polygenic human disorders has recently been supported by a variety of new evidence. This includes genome-wide studies that associate tandem-repeat mutations with autism, schizophrenia, Parkinson's disease and various types of cancers. In this article, I will discuss how tandem-repeat mutations and polymorphisms could contribute to a wide range of common disorders, along with some of the many major challenges of tandem-repeat biology and medicine. Finally, I will discuss the potential of tandem repeats to be therapeutically targeted, so as to prevent and treat an expanding range of human disorders.
Collapse
Affiliation(s)
- Anthony J Hannan
- Florey Institute of Neuroscience and Mental Health, University of Melbourne, Parkville, Victoria 3010, Australia
- Department of Anatomy and Physiology, University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
38
|
Panoyan MA, Shi Y, Abbatangelo CL, Adler N, Moo-Choy A, Parra EJ, Polimanti R, Hu P, Wendt FR. Exome-wide tandem repeats confer large effects on subcortical volumes in UK Biobank participants. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.11.23299818. [PMID: 38168307 PMCID: PMC10760277 DOI: 10.1101/2023.12.11.23299818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
The human subcortex is involved in memory and cognition. Structural and functional changes in subcortical regions is implicated in psychiatric conditions. We performed an association study of subcortical volumes using 15,941 tandem repeats (TRs) derived from whole exome sequencing (WES) data in 16,527 unrelated European ancestry participants. We identified 17 loci, most of which were associated with accumbens volume, and nine of which had fine-mapping probability supporting their causal effect on subcortical volume independent of surrounding variation. The most significant association involved NTN1 -[GCGG] N and increased accumbens volume (β=5.93, P=8.16x10 -9 ). Three exonic TRs had large effects on thalamus volume ( LAT2 -[CATC] N β=-949, P=3.84x10 -6 and SLC39A4 -[CAG] N β=-1599, P=2.42x10 -8 ) and pallidum volume ( MCM2 -[AGG] N β=-404.9, P=147x10 -7 ). These genetic effects were consistent measurements of per-repeat expansion/contraction effects on organism fitness. With 3-dimensional modeling, we reinforced these effects to show that the expanded and contracted LAT2 -[CATC] N repeat causes a frameshift mutation that prevents appropriate protein folding. These TRs also exhibited independent effects on several psychiatric symptoms, including LAT2 -[CATC] N and the tiredness/low energy symptom of depression (β=0.340, P=0.003). These findings link genetic variation to tractable biology in the brain and relevant psychiatric symptoms. We also chart one pathway for TR prioritization in future complex trait genetic studies.
Collapse
|
39
|
Felício D, du Mérac TR, Amorim A, Martins S. Functional implications of paralog genes in polyglutamine spinocerebellar ataxias. Hum Genet 2023; 142:1651-1676. [PMID: 37845370 PMCID: PMC10676324 DOI: 10.1007/s00439-023-02607-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 09/22/2023] [Indexed: 10/18/2023]
Abstract
Polyglutamine (polyQ) spinocerebellar ataxias (SCAs) comprise a group of autosomal dominant neurodegenerative disorders caused by (CAG/CAA)n expansions. The elongated stretches of adjacent glutamines alter the conformation of the native proteins inducing neurotoxicity, and subsequent motor and neurological symptoms. Although the etiology and neuropathology of most polyQ SCAs have been extensively studied, only a limited selection of therapies is available. Previous studies on SCA1 demonstrated that ATXN1L, a human duplicated gene of the disease-associated ATXN1, alleviated neuropathology in mice models. Other SCA-associated genes have paralogs (i.e., copies at different chromosomal locations derived from duplication of the parental gene), but their functional relevance and potential role in disease pathogenesis remain unexplored. Here, we review the protein homology, expression pattern, and molecular functions of paralogs in seven polyQ dominant ataxias-SCA1, SCA2, MJD/SCA3, SCA6, SCA7, SCA17, and DRPLA. Besides ATXN1L, we highlight ATXN2L, ATXN3L, CACNA1B, ATXN7L1, ATXN7L2, TBPL2, and RERE as promising functional candidates to play a role in the neuropathology of the respective SCA, along with the parental gene. Although most of these duplicates lack the (CAG/CAA)n region, if functionally redundant, they may compensate for a partial loss-of-function or dysfunction of the wild-type genes in SCAs. We aim to draw attention to the hypothesis that paralogs of disease-associated genes may underlie the complex neuropathology of dominant ataxias and potentiate new therapeutic strategies.
Collapse
Affiliation(s)
- Daniela Felício
- Instituto de Investigação e Inovação em Saúde (i3S), 4200-135, Porto, Portugal
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), 4200-135, Porto, Portugal
- Instituto Ciências Biomédicas Abel Salazar (ICBAS), Universidade do Porto, 4050-313, Porto, Portugal
| | - Tanguy Rubat du Mérac
- Instituto de Investigação e Inovação em Saúde (i3S), 4200-135, Porto, Portugal
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), 4200-135, Porto, Portugal
- Faculty of Science, University of Amsterdam, 1098 XH, Amsterdam, The Netherlands
| | - António Amorim
- Instituto de Investigação e Inovação em Saúde (i3S), 4200-135, Porto, Portugal
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), 4200-135, Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, 4169-007, Porto, Portugal
| | - Sandra Martins
- Instituto de Investigação e Inovação em Saúde (i3S), 4200-135, Porto, Portugal.
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), 4200-135, Porto, Portugal.
| |
Collapse
|
40
|
Guo MH, Lee WP, Vardarajan B, Schellenberg GD, Phillips-Cremins J. Polygenic burden of short tandem repeat expansions promote risk for Alzheimer's disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.11.16.23298623. [PMID: 38014121 PMCID: PMC10680900 DOI: 10.1101/2023.11.16.23298623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Studies of the genetics of Alzheimer's disease (AD) have largely focused on single nucleotide variants and short insertions/deletions. However, most of the disease heritability has yet to be uncovered, suggesting that there is substantial genetic risk conferred by other forms of genetic variation. There are over one million short tandem repeats (STRs) in the genome, and their link to AD risk has not been assessed. As pathogenic expansions of STR cause over 30 neurologic diseases, it is important to ascertain whether STRs may also be implicated in AD risk. Here, we genotyped 321,742 polymorphic STR tracts genome-wide using PCR-free whole genome sequencing data from 2,981 individuals (1,489 AD case and 1,492 control individuals). We implemented an approach to identify STR expansions as STRs with tract lengths that are outliers from the population. We then tested for differences in aggregate burden of expansions in case versus control individuals. AD patients had a 1.19-fold increase of STR expansions compared to healthy elderly controls (p=8.27×10-3, two-sided Mann Whitney test). Individuals carrying > 30 STR expansions had 3.62-fold higher odds of having AD and had more severe AD neuropathology. AD STR expansions were highly enriched within active promoters in post-mortem hippocampal brain tissues and particularly within SINE-VNTR-Alu (SVA) retrotransposons. Together, these results demonstrate that expanded STRs within active promoter regions of the genome promote risk of AD.
Collapse
Affiliation(s)
- Michael H Guo
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Wan-Ping Lee
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Badri Vardarajan
- Department of Neurology, College of Physicians and Surgeons, Columbia University, New York, NY
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Jennifer Phillips-Cremins
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA
- Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
41
|
Bhati M, Mapel XM, Lloret-Villas A, Pausch H. Structural variants and short tandem repeats impact gene expression and splicing in bovine testis tissue. Genetics 2023; 225:iyad161. [PMID: 37655920 PMCID: PMC10627265 DOI: 10.1093/genetics/iyad161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 06/05/2023] [Accepted: 08/24/2023] [Indexed: 09/02/2023] Open
Abstract
Structural variants (SVs) and short tandem repeats (STRs) are significant sources of genetic variation. However, the impacts of these variants on gene regulation have not been investigated in cattle. Here, we genotyped and characterized 19,408 SVs and 374,821 STRs in 183 bovine genomes and investigated their impact on molecular phenotypes derived from testis transcriptomes. We found that 71% STRs were multiallelic. The vast majority (95%) of STRs and SVs were in intergenic and intronic regions. Only 37% SVs and 40% STRs were in high linkage disequilibrium (LD) (R2 > 0.8) with surrounding SNPs/insertions and deletions (Indels), indicating that SNP-based association testing and genomic prediction are blind to a nonnegligible portion of genetic variation. We showed that both SVs and STRs were more than 2-fold enriched among expression and splicing QTL (e/sQTL) relative to SNPs/Indels and were often associated with differential expression and splicing of multiple genes. Deletions and duplications had larger impacts on splicing and expression than any other type of SV. Exonic duplications predominantly increased gene expression either through alternative splicing or other mechanisms, whereas expression- and splicing-associated STRs primarily resided in intronic regions and exhibited bimodal effects on the molecular phenotypes investigated. Most e/sQTL resided within 100 kb of the affected genes or splicing junctions. We pinpoint candidate causal STRs and SVs associated with the expression of SLC13A4 and TTC7B and alternative splicing of a lncRNA and CAPP1. We provide a catalog of STRs and SVs for taurine cattle and show that these variants contribute substantially to gene expression and splicing variation.
Collapse
Affiliation(s)
- Meenu Bhati
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| | - Xena Marie Mapel
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| | | | - Hubert Pausch
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| |
Collapse
|
42
|
Ziaei Jam H, Li Y, DeVito R, Mousavi N, Ma N, Lujumba I, Adam Y, Maksimov M, Huang B, Dolzhenko E, Qiu Y, Kakembo FE, Joseph H, Onyido B, Adeyemi J, Bakhtiari M, Park J, Javadzadeh S, Jjingo D, Adebiyi E, Bafna V, Gymrek M. A deep population reference panel of tandem repeat variation. Nat Commun 2023; 14:6711. [PMID: 37872149 PMCID: PMC10593948 DOI: 10.1038/s41467-023-42278-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 10/05/2023] [Indexed: 10/25/2023] Open
Abstract
Tandem repeats (TRs) represent one of the largest sources of genetic variation in humans and are implicated in a range of phenotypes. Here we present a deep characterization of TR variation based on high coverage whole genome sequencing from 3550 diverse individuals from the 1000 Genomes Project and H3Africa cohorts. We develop a method, EnsembleTR, to integrate genotypes from four separate methods resulting in high-quality genotypes at more than 1.7 million TR loci. Our catalog reveals novel sequence features influencing TR heterozygosity, identifies population-specific trinucleotide expansions, and finds hundreds of novel eQTL signals. Finally, we generate a phased haplotype panel which can be used to impute most TRs from nearby single nucleotide polymorphisms (SNPs) with high accuracy. Overall, the TR genotypes and reference haplotype panel generated here will serve as valuable resources for future genome-wide and population-wide studies of TRs and their role in human phenotypes.
Collapse
Affiliation(s)
- Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Yang Li
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Ross DeVito
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Nima Mousavi
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
| | - Nichole Ma
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Ibra Lujumba
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
| | - Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Mikhail Maksimov
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Bonnie Huang
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | | | - Yunjiang Qiu
- Illumina Incorporated, San Diego, CA, 92122, USA
| | - Fredrick Elishama Kakembo
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
| | - Habi Joseph
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
| | - Blessing Onyido
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Jumoke Adeyemi
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Mehrdad Bakhtiari
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Sara Javadzadeh
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Daudi Jjingo
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
- Department of Computer Science, Makerere University, Kampala, Uganda
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, Baden-Württemberg, 69120, Germany
| | - Vineet Bafna
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
43
|
Hirano M, Kuwahara M, Yamagishi Y, Samukawa M, Fujii K, Yamashita S, Ando M, Oka N, Nagano M, Matsui T, Takeuchi T, Saigoh K, Kusunoki S, Takashima H, Nagai Y. CANVAS-related RFC1 mutations in patients with immune-mediated neuropathy. Sci Rep 2023; 13:17801. [PMID: 37853169 PMCID: PMC10584897 DOI: 10.1038/s41598-023-45011-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 10/14/2023] [Indexed: 10/20/2023] Open
Abstract
Cerebellar ataxia, neuropathy, and vestibular areflexia syndrome (CANVAS) has recently been attributed to biallelic repeat expansions in RFC1. More recently, the disease entity has expanded to atypical phenotypes, including chronic neuropathy without cerebellar ataxia or vestibular areflexia. Very recently, RFC1 expansions were found in patients with Sjögren syndrome who had neuropathy that did not respond to immunotherapy. In this study RFC1 was examined in 240 patients with acute or chronic neuropathies, including 105 with Guillain-Barré syndrome or Miller Fisher syndrome, 76 with chronic inflammatory demyelinating polyneuropathy, and 59 with other types of chronic neuropathy. Biallelic RFC1 mutations were found in three patients with immune-mediated neuropathies, including Guillain-Barré syndrome, idiopathic sensory ataxic neuropathy, or anti-myelin-associated glycoprotein (MAG) neuropathy, who responded to immunotherapies. In addition, a patient with chronic sensory autonomic neuropathy had biallelic mutations, and subclinical changes in Schwann cells on nerve biopsy. In summary, we found CANVAS-related RFC1 mutations in patients with treatable immune-mediated neuropathy or demyelinating neuropathy.
Collapse
Affiliation(s)
- Makito Hirano
- Department of Neurology, Kindai University, Faculty of Medicine, Ohno-Higashi, Osakasayama, Osaka, 589-8511, Japan.
| | - Motoi Kuwahara
- Department of Neurology, Kindai University, Faculty of Medicine, Ohno-Higashi, Osakasayama, Osaka, 589-8511, Japan
| | - Yuko Yamagishi
- Department of Neurology, Kindai University, Faculty of Medicine, Ohno-Higashi, Osakasayama, Osaka, 589-8511, Japan
| | - Makoto Samukawa
- Department of Neurology, Kindai University, Faculty of Medicine, Ohno-Higashi, Osakasayama, Osaka, 589-8511, Japan
| | - Kanako Fujii
- Department of Neurology, Kindai University, Faculty of Medicine, Ohno-Higashi, Osakasayama, Osaka, 589-8511, Japan
| | - Shoko Yamashita
- Department of Neurology, Kindai University, Faculty of Medicine, Ohno-Higashi, Osakasayama, Osaka, 589-8511, Japan
| | - Masahiro Ando
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Nobuyuki Oka
- Department of Neurology, NHO Minami-Kyoto Hospital, Joyo, Japan
| | - Mamoru Nagano
- Department of Anatomy, Kindai University, Faculty of Medicine, Osakasayama, Japan
| | - Taro Matsui
- Division of Neurology, Anti-Aging, and Vascular Medicine, Department of Internal Medicine, National Defense Medical College, Tokorozawa, Japan
| | - Toshihide Takeuchi
- Department of Neurology, Kindai University, Faculty of Medicine, Ohno-Higashi, Osakasayama, Osaka, 589-8511, Japan
| | - Kazumasa Saigoh
- Department of Neurology, Kindai University, Faculty of Medicine, Ohno-Higashi, Osakasayama, Osaka, 589-8511, Japan
| | - Susumu Kusunoki
- Department of Neurology, Kindai University, Faculty of Medicine, Ohno-Higashi, Osakasayama, Osaka, 589-8511, Japan
| | - Hiroshi Takashima
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Yoshitaka Nagai
- Department of Neurology, Kindai University, Faculty of Medicine, Ohno-Higashi, Osakasayama, Osaka, 589-8511, Japan
| |
Collapse
|
44
|
Kuhlman TE. Repetitive DNA regulates gene expression. Science 2023; 381:1289-1290. [PMID: 37733865 DOI: 10.1126/science.adk2055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/23/2023]
Abstract
Short tandem repeats affect gene expression by binding regulatory proteins.
Collapse
Affiliation(s)
- Thomas E Kuhlman
- Department of Physics and Astronomy, University of California, Riverside, Riverside, CA, USA
| |
Collapse
|
45
|
Horton CA, Alexandari AM, Hayes MGB, Marklund E, Schaepe JM, Aditham AK, Shah N, Suzuki PH, Shrikumar A, Afek A, Greenleaf WJ, Gordân R, Zeitlinger J, Kundaje A, Fordyce PM. Short tandem repeats bind transcription factors to tune eukaryotic gene expression. Science 2023; 381:eadd1250. [PMID: 37733848 DOI: 10.1126/science.add1250] [Citation(s) in RCA: 40] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 07/26/2023] [Indexed: 09/23/2023]
Abstract
Short tandem repeats (STRs) are enriched in eukaryotic cis-regulatory elements and alter gene expression, yet how they regulate transcription remains unknown. We found that STRs modulate transcription factor (TF)-DNA affinities and apparent on-rates by about 70-fold by directly binding TF DNA-binding domains, with energetic impacts exceeding many consensus motif mutations. STRs maximize the number of weakly preferred microstates near target sites, thereby increasing TF density, with impacts well predicted by statistical mechanics. Confirming that STRs also affect TF binding in cells, neural networks trained only on in vivo occupancies predicted effects identical to those observed in vitro. Approximately 90% of TFs preferentially bound STRs that need not resemble known motifs, providing a cis-regulatory mechanism to target TFs to genomic sites.
Collapse
Affiliation(s)
- Connor A Horton
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Amr M Alexandari
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Michael G B Hayes
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Emil Marklund
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Julia M Schaepe
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Arjun K Aditham
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- ChEM-H Institute, Stanford University, Stanford, CA 94305, USA
| | - Nilay Shah
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Peter H Suzuki
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Ariel Afek
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | | | - Raluca Gordân
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Computer Science, Duke University, Durham, NC 27708, USA
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC 27710, USA
| | - Julia Zeitlinger
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
- The University of Kansas Medical Center, Kansas City, KS 66103, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Polly M Fordyce
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- ChEM-H Institute, Stanford University, Stanford, CA 94305, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94110, USA
| |
Collapse
|
46
|
Liao X, Zhu W, Zhou J, Li H, Xu X, Zhang B, Gao X. Repetitive DNA sequence detection and its role in the human genome. Commun Biol 2023; 6:954. [PMID: 37726397 PMCID: PMC10509279 DOI: 10.1038/s42003-023-05322-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 09/04/2023] [Indexed: 09/21/2023] Open
Abstract
Repetitive DNA sequences playing critical roles in driving evolution, inducing variation, and regulating gene expression. In this review, we summarized the definition, arrangement, and structural characteristics of repeats. Besides, we introduced diverse biological functions of repeats and reviewed existing methods for automatic repeat detection, classification, and masking. Finally, we analyzed the type, structure, and regulation of repeats in the human genome and their role in the induction of complex diseases. We believe that this review will facilitate a comprehensive understanding of repeats and provide guidance for repeat annotation and in-depth exploration of its association with human diseases.
Collapse
Affiliation(s)
- Xingyu Liao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Wufei Zhu
- Department of Endocrinology, Yichang Central People's Hospital, The First College of Clinical Medical Science, China Three Gorges University, 443000, Yichang, P.R. China
| | - Juexiao Zhou
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Haoyang Li
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xiaopeng Xu
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Bin Zhang
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia.
| |
Collapse
|
47
|
Suzuki MM, Iijima K, Ogami K, Shinjo K, Murofushi Y, Xie J, Wang X, Kitano Y, Mamiya A, Kibe Y, Nishimura T, Ohka F, Saito R, Sato S, Kobayashi J, Yao R, Miyata K, Kataoka K, Suzuki HI, Kondo Y. TUG1-mediated R-loop resolution at microsatellite loci as a prerequisite for cancer cell proliferation. Nat Commun 2023; 14:4521. [PMID: 37607907 PMCID: PMC10444773 DOI: 10.1038/s41467-023-40243-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 07/19/2023] [Indexed: 08/24/2023] Open
Abstract
Oncogene-induced DNA replication stress (RS) and consequent pathogenic R-loop formation are known to impede S phase progression. Nonetheless, cancer cells continuously proliferate under such high-stressed conditions through incompletely understood mechanisms. Here, we report taurine upregulated gene 1 (TUG1) long noncoding RNA (lncRNA), which is highly expressed in many types of cancers, as an important regulator of intrinsic R-loop in cancer cells. Under RS conditions, TUG1 is rapidly upregulated via activation of the ATR-CHK1 signaling pathway, interacts with RPA and DHX9, and engages in resolving R-loops at certain loci, particularly at the CA repeat microsatellite loci. Depletion of TUG1 leads to overabundant R-loops and enhanced RS, leading to substantial inhibition of tumor growth. Our data reveal a role of TUG1 as molecule important for resolving R-loop accumulation in cancer cells and suggest targeting TUG1 as a potent therapeutic approach for cancer treatment.
Collapse
Affiliation(s)
- Miho M Suzuki
- Division of Cancer Biology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi, 466-8550, Japan
| | - Kenta Iijima
- Division of Cancer Biology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi, 466-8550, Japan
- Laboratory Animal Facilities and Services, Preeminent Medical Photonics Education and Research Center, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka, 431-3192, Japan
| | - Koichi Ogami
- Division of Molecular Oncology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi, 466-8550, Japan
| | - Keiko Shinjo
- Division of Cancer Biology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi, 466-8550, Japan
| | - Yoshiteru Murofushi
- Division of Cancer Biology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi, 466-8550, Japan
| | - Jingqi Xie
- Division of Cancer Biology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi, 466-8550, Japan
| | - Xuebing Wang
- Division of Cancer Biology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi, 466-8550, Japan
| | - Yotaro Kitano
- Department of Neurosurgery, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi, 466-8550, Japan
| | - Akira Mamiya
- Division of Cancer Biology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi, 466-8550, Japan
| | - Yuji Kibe
- Division of Cancer Biology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi, 466-8550, Japan
- Department of Neurosurgery, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi, 466-8550, Japan
| | - Tatsunori Nishimura
- Division of Cancer Biology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi, 466-8550, Japan
| | - Fumiharu Ohka
- Department of Neurosurgery, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi, 466-8550, Japan
| | - Ryuta Saito
- Department of Neurosurgery, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi, 466-8550, Japan
| | - Shinya Sato
- Molecular Pathology and Genetics Division, Kanagawa Cancer Center Research Institute, 2-3-2 Nakao, Asahi-ku, Yokohama, Kanagawa, 241-8515, Japan
| | - Junya Kobayashi
- School of Health Sciences at Narita, International University of Health and Welfare, 4-3 Kozunomori, Narita, Chiba, 286-8686, Japan
| | - Ryoji Yao
- Department of Cell Biology, Japanese Foundation for Cancer Research, 3-8-31 Ariake, Koto-ku, Tokyo, 135-8550, Japan
| | - Kanjiro Miyata
- Department of Materials Engineering, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Kazunori Kataoka
- Innovation Center of NanoMedicine, Kawasaki Institute of Industrial Promotion, 3-25-14 Tono-machi, Kawasaki-ku, Kanagawa, 210-0821, Japan
- Institute for Future Initiatives, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Hiroshi I Suzuki
- Division of Molecular Oncology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi, 466-8550, Japan
- Institute for Glyco-core Research (iGCORE), Tokai National Higher Education and Research System, Furo-cho, Chikusa-ku, Nagoya, Aichi, 464-8601, Japan
| | - Yutaka Kondo
- Division of Cancer Biology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, Aichi, 466-8550, Japan.
- Institute for Glyco-core Research (iGCORE), Tokai National Higher Education and Research System, Furo-cho, Chikusa-ku, Nagoya, Aichi, 464-8601, Japan.
| |
Collapse
|
48
|
Cuomo ASE, Nathan A, Raychaudhuri S, MacArthur DG, Powell JE. Single-cell genomics meets human genetics. Nat Rev Genet 2023; 24:535-549. [PMID: 37085594 PMCID: PMC10784789 DOI: 10.1038/s41576-023-00599-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/29/2023] [Indexed: 04/23/2023]
Abstract
Single-cell genomic technologies are revealing the cellular composition, identities and states in tissues at unprecedented resolution. They have now scaled to the point that it is possible to query samples at the population level, across thousands of individuals. Combining single-cell information with genotype data at this scale provides opportunities to link genetic variation to the cellular processes underpinning key aspects of human biology and disease. This strategy has potential implications for disease diagnosis, risk prediction and development of therapeutic solutions. But, effectively integrating large-scale single-cell genomic data, genetic variation and additional phenotypic data will require advances in data generation and analysis methods. As single-cell genetics begins to emerge as a field in its own right, we review its current state and the challenges and opportunities ahead.
Collapse
Affiliation(s)
- Anna S E Cuomo
- Garvan Institute of Medical Research, Darlinghurst, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
| | - Aparna Nathan
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Divisions of Rheumatology and Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Soumya Raychaudhuri
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Divisions of Rheumatology and Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| | - Joseph E Powell
- Garvan Institute of Medical Research, Darlinghurst, Sydney, New South Wales, Australia.
- UNSW Cellular Genomics Futures Institute, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
49
|
Weisburd B, Tiao G, Rehm HL. Insights from a genome-wide truth set of tandem repeat variation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.05.539588. [PMID: 37214979 PMCID: PMC10197592 DOI: 10.1101/2023.05.05.539588] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Tools for genotyping tandem repeats (TRs) from short read sequencing data have improved significantly over the past decade. Extensive comparisons of these tools to gold standard diagnostic methods like RP-PCR have confirmed their accuracy for tens to hundreds of well-studied loci. However, a scarcity of high-quality orthogonal truth data limited our ability to measure tool accuracy for the millions of other loci throughout the genome. To address this, we developed a TR truth set based on the Synthetic Diploid Benchmark (SynDip). By identifying the subset of insertions and deletions that represent TR expansions or contractions with motifs between 2 and 50 base pairs, we obtained accurate genotypes for 139,795 pure and 6,845 interrupted repeats in a single diploid sample. Our approach did not require running existing genotyping tools on short read or long read sequencing data and provided an alternative, more accurate view of tandem repeat variation. We applied this truth set to compare the strengths and weaknesses of widely-used tools for genotyping TRs, evaluated the completeness of existing genome-wide TR catalogs, and explored the properties of tandem repeat variation throughout the genome. We found that, without filtering, ExpansionHunter had higher accuracy than GangSTR and HipSTR over a wide range of motifs and allele sizes. Also, when errors in allele size occurred, ExpansionHunter tended to overestimate expansion sizes, while GangSTR tended to underestimate them. Additionally, we saw that widely-used TR catalogs miss between 16% and 41% of variant loci in the truth set. These results suggest that genome-wide analyses would benefit from genotyping a larger set of loci as well as further tool development that builds on the strengths of current algorithms. To that end, we developed a new catalog of 2.8 million loci that captures 95% of variant loci in the truth set, and created a modified version of ExpansionHunter that runs 2 to 3x faster than the original while producing the same output.
Collapse
Affiliation(s)
- Ben Weisburd
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Grace Tiao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Heidi L. Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
50
|
Maksimov MO, Wu C, Ashbrook DG, Villani F, Colonna V, Mousavi N, Ma N, Lu L, Pritchard JK, Goren A, Williams RW, Palmer AA, Gymrek M. A novel quantitative trait locus implicates Msh3 in the propensity for genome-wide short tandem repeat expansions in mice. Genome Res 2023; 33:689-702. [PMID: 37127331 PMCID: PMC10317118 DOI: 10.1101/gr.277576.122] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 04/26/2023] [Indexed: 05/03/2023]
Abstract
Short tandem repeats (STRs) are a class of rapidly mutating genetic elements typically characterized by repeated units of 1-6 bp. We leveraged whole-genome sequencing data for 152 recombinant inbred (RI) strains from the BXD family of mice to map loci that modulate genome-wide patterns of new mutations arising during parent-to-offspring transmission at STRs. We defined quantitative phenotypes describing the numbers and types of germline STR mutations in each strain and performed quantitative trait locus (QTL) analyses for each of these phenotypes. We identified a locus on Chromosome 13 at which strains inheriting the C57BL/6J (B) haplotype have a higher rate of STR expansions than those inheriting the DBA/2J (D) haplotype. The strongest candidate gene in this locus is Msh3, a known modifier of STR stability in cancer and at pathogenic repeat expansions in mice and humans, as well as a current drug target against Huntington's disease. The D haplotype at this locus harbors a cluster of variants near the 5' end of Msh3, including multiple missense variants near the DNA mismatch recognition domain. In contrast, the B haplotype contains a unique retrotransposon insertion. The rate of expansion covaries positively with Msh3 expression-with higher expression from the B haplotype. Finally, detailed analysis of mutation patterns showed that strains carrying the B allele have higher expansion rates, but slightly lower overall total mutation rates, compared with those with the D allele, particularly at tetranucleotide repeats. Our results suggest an important role for inherited variants in Msh3 in modulating genome-wide patterns of germline mutations at STRs.
Collapse
Affiliation(s)
- Mikhail O Maksimov
- Department of Medicine, University of California San Diego, La Jolla, California 92093, USA
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA
| | - Cynthia Wu
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, California 92093, USA
| | - David G Ashbrook
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
| | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
| | - Vincenza Colonna
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
- Institute of Genetics and Biophysics, National Research Council, Naples 80111, Italy
| | - Nima Mousavi
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, California 92093, USA
| | - Nichole Ma
- Department of Medicine, University of California San Diego, La Jolla, California 92093, USA
| | - Lu Lu
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
| | - Jonathan K Pritchard
- Department of Genetics, Stanford University, Stanford, California 94305, USA
- Department of Biology, Stanford University, Stanford, California 94305, USA
| | - Alon Goren
- Department of Medicine, University of California San Diego, La Jolla, California 92093, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, California 92093, USA
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
| | - Abraham A Palmer
- Institute for Genomic Medicine, University of California San Diego, La Jolla, California 92093, USA
- Department of Psychiatry, Department of Medicine, University of California San Diego, La Jolla, California 92093, USA
| | - Melissa Gymrek
- Department of Medicine, University of California San Diego, La Jolla, California 92093, USA;
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, California 92093, USA
- Department of Biomedical Informatics
| |
Collapse
|