1
|
Cadden GM, Wilken SJ, Magennis SW. A single CAA interrupt in a DNA three-way junction containing a CAG repeat hairpin results in parity-dependent trapping. Nucleic Acids Res 2024:gkae644. [PMID: 39041420 DOI: 10.1093/nar/gkae644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 07/04/2024] [Accepted: 07/14/2024] [Indexed: 07/24/2024] Open
Abstract
An increasing number of human disorders are attributed to genomic expansions of short tandem repeats (STRs). Secondary DNA structures formed by STRs are believed to play an important role in expansion, while the presence of nucleotide interruptions within the pure repeat sequence is known to delay the onset and progression of disease. We have used two single-molecule fluorescence techniques to analyse the structure and dynamics of DNA three-way junctions (3WJs) containing CAG repeat hairpin slipouts, with and without a single CAA interrupt. For a 3WJ with a (CAG)10 slipout, the CAA interrupt is preferentially located in the hairpin loop, and the branch migration dynamics are 4-fold slower than for the 3WJ with a pure (CAG)10, and 3-fold slower than a 3WJ with a pure (CAG)40 repeat. The (CAG)11 3WJ with CAA interrupt adopts a conformation that places the interrupt in or near the hairpin loop, with similar dynamics to the pure (CAG)10 and (CAG)11 3WJs. We have shown that changing a single nucleotide (G to A) in a pure repeat can have a large impact on 3WJ structure and dynamics, which may be important for the protective role of interrupts in repeat expansion diseases.
Collapse
Affiliation(s)
- Gillian M Cadden
- School of Chemistry, University of Glasgow, Joseph Black Building, University Avenue, Glasgow G12 8QQ, UK
| | - Svea J Wilken
- School of Chemistry, University of Glasgow, Joseph Black Building, University Avenue, Glasgow G12 8QQ, UK
| | - Steven W Magennis
- School of Chemistry, University of Glasgow, Joseph Black Building, University Avenue, Glasgow G12 8QQ, UK
| |
Collapse
|
2
|
Uguen K, Michaud JL, Génin E. Short Tandem Repeats in the era of next-generation sequencing: from historical loci to population databases. Eur J Hum Genet 2024:10.1038/s41431-024-01666-z. [PMID: 38982300 DOI: 10.1038/s41431-024-01666-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 06/20/2024] [Accepted: 06/27/2024] [Indexed: 07/11/2024] Open
Abstract
In this study, we explore the landscape of short tandem repeats (STRs) within the human genome through the lens of evolving technologies to detect genomic variations. STRs, which encompass approximately 3% of our genomic DNA, are crucial for understanding human genetic diversity, disease mechanisms, and evolutionary biology. The advent of high-throughput sequencing methods has revolutionized our ability to accurately map and analyze STRs, highlighting their significance in genetic disorders, forensic science, and population genetics. We review the current available methodologies for STR analysis, the challenges in interpreting STR variations across different populations, and the implications of STRs in medical genetics. Our findings underscore the urgent need for comprehensive STR databases that reflect the genetic diversity of global populations, facilitating the interpretation of STR data in clinical diagnostics, genetic research, and forensic applications. This work sets the stage for future studies aimed at harnessing STR variations to elucidate complex genetic traits and diseases, reinforcing the importance of integrating STRs into genetic research and clinical practice.
Collapse
Affiliation(s)
- Kevin Uguen
- Univ Brest, Inserm, EFS, UMR 1078, GGB, Brest, France.
- Service de Génétique Médicale et Biologie de la Reproduction, CHU de Brest, Brest, France.
- CHU Sainte-Justine Azrieli Research Centre, Montréal, QC, Canada.
| | - Jacques L Michaud
- CHU Sainte-Justine Azrieli Research Centre, Montréal, QC, Canada
- Department of Pediatrics, Université de Montréal, Montréal, QC, Canada
- Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
| | | |
Collapse
|
3
|
Plavskin Y, de Biase MS, Ziv N, Janská L, Zhu YO, Hall DW, Schwarz RF, Tranchina D, Siegal ML. Spontaneous single-nucleotide substitutions and microsatellite mutations have distinct distributions of fitness effects. PLoS Biol 2024; 22:e3002698. [PMID: 38950062 PMCID: PMC11244821 DOI: 10.1371/journal.pbio.3002698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 07/12/2024] [Accepted: 06/04/2024] [Indexed: 07/03/2024] Open
Abstract
The fitness effects of new mutations determine key properties of evolutionary processes. Beneficial mutations drive evolution, yet selection is also shaped by the frequency of small-effect deleterious mutations, whose combined effect can burden otherwise adaptive lineages and alter evolutionary trajectories and outcomes in clonally evolving organisms such as viruses, microbes, and tumors. The small effect sizes of these important mutations have made accurate measurements of their rates difficult. In microbes, assessing the effect of mutations on growth can be especially instructive, as this complex phenotype is closely linked to fitness in clonally evolving organisms. Here, we perform high-throughput time-lapse microscopy on cells from mutation-accumulation strains to precisely infer the distribution of mutational effects on growth rate in the budding yeast, Saccharomyces cerevisiae. We show that mutational effects on growth rate are overwhelmingly negative, highly skewed towards very small effect sizes, and frequent enough to suggest that deleterious hitchhikers may impose a significant burden on evolving lineages. By using lines that accumulated mutations in either wild-type or slippage repair-defective backgrounds, we further disentangle the effects of 2 common types of mutations, single-nucleotide substitutions and simple sequence repeat indels, and show that they have distinct effects on yeast growth rate. Although the average effect of a simple sequence repeat mutation is very small (approximately 0.3%), many do alter growth rate, implying that this class of frequent mutations has an important evolutionary impact.
Collapse
Affiliation(s)
- Yevgeniy Plavskin
- Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
- Department of Biology, New York University, New York, New York, United States of America
| | - Maria Stella de Biase
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- Humboldt-Universität zu Berlin, Department of Biology, Berlin, Germany
| | - Naomi Ziv
- Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
- Department of Biology, New York University, New York, New York, United States of America
| | - Libuše Janská
- Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
- Department of Biology, New York University, New York, New York, United States of America
| | - Yuan O. Zhu
- Department of Genetics, Stanford University, Stanford, California, United States of America
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - David W. Hall
- Department of Genetics, University of Georgia, Athens, Georgia, United States of America
| | - Roland F. Schwarz
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- Institute for Computational Cancer Biology, Center for Integrated Oncology (CIO), Cancer Research Center Cologne Essen (CCCE), Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
| | - Daniel Tranchina
- Department of Biology, New York University, New York, New York, United States of America
- Courant Math Institute, New York University, New York, New York, United States of America
| | - Mark L. Siegal
- Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
- Department of Biology, New York University, New York, New York, United States of America
| |
Collapse
|
4
|
Kim JH, Koh IG, Lee H, Lee GH, Song DY, Kim SW, Kim Y, Han JH, Bong G, Lee J, Byun H, Son JH, Kim YR, Lee Y, Kim JJ, Park JW, Kim IB, Choi JK, Jang JH, Trost B, Lee J, Kim E, Yoo HJ, An JY. Short tandem repeat expansions in cortical layer-specific genes implicate in phenotypic severity and adaptability of autism spectrum disorder. Psychiatry Clin Neurosci 2024; 78:405-415. [PMID: 38751214 DOI: 10.1111/pcn.13676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 02/14/2024] [Accepted: 04/15/2024] [Indexed: 07/06/2024]
Abstract
AIM Short tandem repeats (STRs) are repetitive DNA sequences and highly mutable in various human disorders. While the involvement of STRs in various genetic disorders has been extensively studied, their role in autism spectrum disorder (ASD) remains largely unexplored. In this study, we aimed to investigate genetic association of STR expansions with ASD using whole genome sequencing (WGS) and identify risk loci associated with ASD phenotypes. METHODS We analyzed WGS data of 634 ASD families and performed genome-wide evaluation for 12,929 STR loci. We found rare STR expansions that exceeded normal repeat lengths in autism cases compared to unaffected controls. By integrating single cell RNA and ATAC sequencing datasets of human postmortem brains, we prioritized STR loci in genes specifically expressed in cortical development stages. A deep learning method was used to predict functionality of ASD-associated STR loci. RESULTS In ASD cases, rare STR expansions predominantly occurred in early cortical layer-specific genes involved in neurodevelopment, highlighting the cellular specificity of STR-associated genes in ASD risk. Leveraging deep learning prediction models, we demonstrated that these STR expansions disrupted the regulatory activity of enhancers and promoters, suggesting a potential mechanism through which they contribute to ASD pathogenesis. We found that individuals with ASD-associated STR expansions exhibited more severe ASD phenotypes and diminished adaptability compared to non-carriers. CONCLUSION Short tandem repeat expansions in cortical layer-specific genes are associated with ASD and could potentially be a risk genetic factor for ASD. Our study is the first to show evidence of STR expansion associated with ASD in an under-investigated population.
Collapse
Affiliation(s)
- Jae Hyun Kim
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, Republic of Korea
| | - In Gyeong Koh
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, Republic of Korea
| | - Hyeji Lee
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, Republic of Korea
| | - Gang-Hee Lee
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, Republic of Korea
| | - Da-Yea Song
- Department of Psychiatry, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
- Department of Psychiatry, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Soo-Whee Kim
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, Republic of Korea
| | - Yujin Kim
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, Republic of Korea
| | - Jae Hyun Han
- Department of Psychiatry, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
- Department of Psychiatry, College of Medicine, Soonchunhyang University Cheonan Hospital, Cheonan, Republic of Korea
| | - Guiyoung Bong
- Department of Psychiatry, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Jeewon Lee
- Department of Psychiatry, Soonchunhyang University College of Medicine, Asan, Republic of Korea
| | - Heejung Byun
- Department of Neuropsychiatry, Seoul Metropolitan Children's Hospital, Seoul, Republic of Korea
| | - Ji Hyun Son
- Department of Neuropsychiatry, Seoul Metropolitan Children's Hospital, Seoul, Republic of Korea
| | - Ye Rim Kim
- Department of Psychiatry, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
- Department of Psychiatry, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Yoojeong Lee
- Department of Psychiatry, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Justine Jaewon Kim
- Department of Psychiatry, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Jung Woo Park
- Center for Biomedical Computing, Division of National Supercomputing, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea
| | - Il Bin Kim
- Department of Psychiatry, Hanyang University Guri Hospital, Guri, Republic of Korea
| | - Jung Kyoon Choi
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Ja-Hyun Jang
- Department of Laboratory Medicine and Genetics, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Brett Trost
- Molecular Medicine Program, The Hospital for Sick Children, Toronto, Ontario, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Junehawk Lee
- Center for Biomedical Computing, Division of National Supercomputing, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea
| | - Eunjoon Kim
- Center for Synaptic Brain Dysfunctions, Institute for Basic Science, Daejeon, Republic of Korea
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Hee Jeong Yoo
- Department of Psychiatry, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
- Department of Psychiatry, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Joon-Yong An
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, Republic of Korea
- School of Biosystem and Biomedical Science, College of Health Science, Korea University, Seoul, Republic of Korea
| |
Collapse
|
5
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024; 25:460-475. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
6
|
Plavskin Y, de Biase MS, Ziv N, Janská L, Zhu YO, Hall DW, Schwarz RF, Tranchina D, Siegal ML. Spontaneous single-nucleotide substitutions and microsatellite mutations have distinct distributions of fitness effects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.04.547687. [PMID: 37461506 PMCID: PMC10349969 DOI: 10.1101/2023.07.04.547687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/28/2023]
Abstract
The fitness effects of new mutations determine key properties of evolutionary processes. Beneficial mutations drive evolution, yet selection is also shaped by the frequency of small-effect deleterious mutations, whose combined effect can burden otherwise adaptive lineages and alter evolutionary trajectories and outcomes in clonally evolving organisms such as viruses, microbes, and tumors. The small effect sizes of these important mutations have made accurate measurements of their rates difficult. In microbes, assessing the effect of mutations on growth can be especially instructive, as this complex phenotype is closely linked to fitness in clonally evolving organisms. Here, we perform high-throughput time-lapse microscopy on cells from mutation-accumulation strains to precisely infer the distribution of mutational effects on growth rate in the budding yeast, Saccharomyces cerevisiae. We show that mutational effects on growth rate are overwhelmingly negative, highly skewed towards very small effect sizes, and frequent enough to suggest that deleterious hitchhikers may impose a significant burden on evolving lineages. By using lines that accumulated mutations in either wild-type or slippage repair-defective backgrounds, we further disentangle the effects of two common types of mutations, single-nucleotide substitutions and simple sequence repeat indels, and show that they have distinct effects on yeast growth rate. Although the average effect of a simple sequence repeat mutation is very small (~0.3%), many do alter growth rate, implying that this class of frequent mutations has an important evolutionary impact.
Collapse
|
7
|
King DG. Mutation protocols share with sexual reproduction the physiological role of producing genetic variation within 'constraints that deconstrain'. J Physiol 2024; 602:2615-2626. [PMID: 38178567 DOI: 10.1113/jp285478] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 12/14/2023] [Indexed: 01/06/2024] Open
Abstract
Because the universe of possible DNA sequences is inconceivably vast, organisms have evolved mechanisms for exploring DNA sequence space while substantially reducing the hazard that would otherwise accrue to any process of random, accidental mutation. One such mechanism is meiotic recombination. Although sexual reproduction imposes a seemingly paradoxical 50% cost to fitness, sex evidently prevails because this cost is outweighed by the advantage of equipping offspring with genetic variation to accommodate environmental vicissitudes. The potential adaptive utility of additional mechanisms for producing genetic variation has long been obscured by a presumption that the vast majority of mutations are deleterious. Perhaps surprisingly, the probability for adaptive variation can be increased by several mechanisms that generate mutations abundantly. Such mechanisms, here called 'mutation protocols', implement implicit 'constraints that deconstrain'. Like meiotic recombination, they produce genetic variation in forms that minimize potential for harm while providing a reasonably high probability for benefit. One example is replication slippage of simple sequence repeats (SSRs); this process yields abundant, reversible mutations, typically with small quantitative effect on phenotype. This enables SSRs to function as adjustable 'tuning knobs'. There exists a clear pathway for SSRs to be shaped through indirect selection favouring their implicit tuning-knob protocol. Several other molecular mechanisms comprise probable components of additional mutation protocols. Biologists might plausibly regard such mechanisms of mutation not primarily as sources of deleterious genetic mistakes but also as potentially adaptive processes for 'exploring' DNA sequence space.
Collapse
Affiliation(s)
- David G King
- Department of Anatomy, School of Medicine, Southern Illinois University Carbondale, Carbondale, Illinois, USA
- Department of Zoology, College of Agricultural, Life, and Physical Sciences, Southern Illinois University Carbondale, Carbondale, Illinois, USA
| |
Collapse
|
8
|
Liang Y, Hao J, Wang J, Zhang G, Su Y, Liu Z, Wang T. Statistical Genomics Analysis of Simple Sequence Repeats from the Paphiopedilum Malipoense Transcriptome Reveals Control Knob Motifs Modulating Gene Expression. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2304848. [PMID: 38647414 PMCID: PMC11200097 DOI: 10.1002/advs.202304848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 02/26/2024] [Indexed: 04/25/2024]
Abstract
Simple sequence repeats (SSRs) are found in nonrandom distributions in genomes and are thought to impact gene expression. The distribution patterns of 48 295 SSRs of Paphiopedilum malipoense are mined and characterized based on the first full-length transcriptome and comprehensive transcriptome dataset from 12 organs. Statistical genomics analyses are used to investigate how SSRs in transcripts affect gene expression. The results demonstrate the correlations between SSR distributions, characteristics, and expression level. Nine expression-modulating motifs (expMotifs) are identified and a model is proposed to explain the effect of their key features, potency, and gene function on an intra-transcribed region scale. The expMotif-transcribed region combination is the most predominant contributor to the expression-modulating effect of SSRs, and some intra-transcribed regions are critical for this effect. Genes containing the same type of expMotif-SSR elements in the same transcribed region are likely linked in function, regulation, or evolution aspects. This study offers novel evidence to understand how SSRs regulate gene expression and provides potential regulatory elements for plant genetic engineering.
Collapse
Affiliation(s)
- Yingyi Liang
- College of Life SciencesSouth China Agricultural UniversityGuangzhou510642China
| | - Jing Hao
- College of Life SciencesSouth China Agricultural UniversityGuangzhou510642China
| | - Jieyu Wang
- College of Forestry and Landscape ArchitectureSouth China Agricultural UniversityGuangzhou510642China
| | - Guoqiang Zhang
- Key Laboratory of National Forestry and Grassland Administration for Orchid Conservation and Utilization at College of Landscape Architecture and ArtFujian Agriculture and Forestry UniversityFuzhou350002China
| | - Yingjuan Su
- School of Life SciencesSun Yat‐sen UniversityGuangzhou510275China
- Research Institute of Sun Yat‐sen University in ShenzhenShenzhen518107China
| | - Zhong‐Jian Liu
- Key Laboratory of National Forestry and Grassland Administration for Orchid Conservation and Utilization at College of Landscape Architecture and ArtFujian Agriculture and Forestry UniversityFuzhou350002China
| | - Ting Wang
- College of Life SciencesSouth China Agricultural UniversityGuangzhou510642China
| |
Collapse
|
9
|
Hamilton F, Mitchell R, Ghazal P, Timpson N. Phenotypic Associations With the HMOX1 GT(n) Repeat in European Populations. Am J Epidemiol 2024; 193:718-726. [PMID: 37414746 PMCID: PMC11074708 DOI: 10.1093/aje/kwad154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 12/21/2023] [Accepted: 07/03/2023] [Indexed: 07/08/2023] Open
Abstract
Heme oxygenase 1 is a key enzyme in the management of heme in humans. A GT(n) repeat length in the heme oxygenase 1 gene (HMOX1) has been widely associated with a variety of phenotypes, including susceptibility to and outcomes in diabetes, cancer, infections, and neonatal jaundice. However, studies have generally been small and results inconsistent. In this study, we imputed the GT(n) repeat length in participants from 2 UK cohort studies (the UK Biobank study (n = 463,005; recruited in 2006-2010) and the Avon Longitudinal Study of Parents and Children (ALSPAC; n = 937; recruited in 1990-1991)), with the reliability of imputation tested in other cohorts (1000 Genomes Project, Human Genome Diversity Project, and Personal Genome Project UK). Subsequently, we measured the relationship between repeat length and previously identified associations (diabetes, chronic obstructive pulmonary disease, pneumonia, and infection-related mortality in the UK Biobank; neonatal jaundice in ALSPAC) and performed a phenomewide association study in the UK Biobank. Despite high-quality imputation (correlation between true repeat length and imputed repeat length > 0.9 in test cohorts), clinical associations were not identified in either the phenomewide association study or specific association studies. These findings were robust to definitions of repeat length and sensitivity analyses. Despite multiple smaller studies identifying associations across a variety of clinical settings, we could not replicate or identify any relevant phenotypic associations with the HMOX1 GT(n) repeat.
Collapse
Affiliation(s)
- Fergus Hamilton
- Correspondence to Dr. Fergus Hamilton, MRC Integrative Epidemiology Unit, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, United Kingdom (e-mail: )
| | | | | | | |
Collapse
|
10
|
Lee KH, Kim J, Kim JH. 3D epigenomics and 3D epigenopathies. BMB Rep 2024; 57:216-231. [PMID: 38627948 PMCID: PMC11139681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 01/15/2024] [Accepted: 03/18/2024] [Indexed: 05/25/2024] Open
Abstract
Mammalian genomes are intricately compacted to form sophisticated 3-dimensional structures within the tiny nucleus, so called 3D genome folding. Despite their shapes reminiscent of an entangled yarn, the rapid development of molecular and next-generation sequencing technologies (NGS) has revealed that mammalian genomes are highly organized in a hierarchical order that delicately affects transcription activities. An increasing amount of evidence suggests that 3D genome folding is implicated in diseases, giving us a clue on how to identify novel therapeutic approaches. In this review, we will study what 3D genome folding means in epigenetics, what types of 3D genome structures there are, how they are formed, and how the technologies have developed to explore them. We will also discuss the pathological implications of 3D genome folding. Finally, we will discuss how to leverage 3D genome folding and engineering for future studies. [BMB Reports 2024; 57(5): 216-231].
Collapse
Affiliation(s)
- Kyung-Hwan Lee
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
| | - Jungyu Kim
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
| | - Ji Hun Kim
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
| |
Collapse
|
11
|
Uppili B, Faruq M. STRIDE-DB: a comprehensive database for exploration of instability and phenotypic relevance of short tandem repeats in the human genome. Database (Oxford) 2024; 2024:baae020. [PMID: 38602506 PMCID: PMC11008502 DOI: 10.1093/database/baae020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 11/10/2023] [Accepted: 03/07/2024] [Indexed: 04/12/2024]
Abstract
Short Tandem Repeats (STRs) are genetic markers made up of repeating DNA sequences. The variations of the STRs are widely studied in forensic analysis, population studies and genetic testing for a variety of neuromuscular disorders. Understanding polymorphic STR variation and its cause is crucial for deciphering genetic information and finding links to various disorders. In this paper, we present STRIDE-DB, a novel and unique platform to explore STR Instability and its Phenotypic Relevance, and a comprehensive database of STRs in the human genome. We utilized RepeatMasker to identify all the STRs in the human genome (hg19) and combined it with frequency data from the 1000 Genomes Project. STRIDE-DB, a user-friendly resource, plays a pivotal role in investigating the relationship between STR variation, instability and phenotype. By harnessing data from genome-wide association studies (GWAS), ClinVar database, Alu loci, Haploblocks in genome and Conservation of the STRs, it serves as an important tool for researchers exploring the variability of STRs in the human genome and its direct impact on phenotypes. STRIDE-DB has its broad applicability and significance in various research domains like forensic sciences and other repeat expansion disorders. Database URL: https://stridedb.igib.res.in.
Collapse
Affiliation(s)
- Bharathram Uppili
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi 110007, India
- CSIR-HRDC Campus, Academy for Scientific and Innovative Research, Ghaziabad 201002, India
| | - Mohammed Faruq
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi 110007, India
| |
Collapse
|
12
|
Goldberg ME, Noyes MD, Eichler EE, Quinlan AR, Harris K. Effects of parental age and polymer composition on short tandem repeat de novo mutation rates. Genetics 2024; 226:iyae013. [PMID: 38298127 PMCID: PMC10990422 DOI: 10.1093/genetics/iyae013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 08/11/2023] [Accepted: 01/05/2024] [Indexed: 02/02/2024] Open
Abstract
Short tandem repeats (STRs) are hotspots of genomic variability in the human germline because of their high mutation rates, which have long been attributed largely to polymerase slippage during DNA replication. This model suggests that STR mutation rates should scale linearly with a father's age, as progenitor cells continually divide after puberty. In contrast, it suggests that STR mutation rates should not scale with a mother's age at her child's conception, since oocytes spend a mother's reproductive years arrested in meiosis II and undergo a fixed number of cell divisions that are independent of the age at ovulation. Yet, mirroring recent findings, we find that STR mutation rates covary with paternal and maternal age, implying that some STR mutations are caused by DNA damage in quiescent cells rather than polymerase slippage in replicating progenitor cells. These results echo the recent finding that DNA damage in oocytes is a significant source of de novo single nucleotide variants and corroborate evidence of STR expansion in postmitotic cells. However, we find that the maternal age effect is not confined to known hotspots of oocyte mutagenesis, nor are postzygotic mutations likely to contribute significantly. STR nucleotide composition demonstrates divergent effects on de novo mutation (DNM) rates between sexes. Unlike the paternal lineage, maternally derived DNMs at A/T STRs display a significantly greater association with maternal age than DNMs at G/C-containing STRs. These observations may suggest the mechanism and developmental timing of certain STR mutations and contradict prior attribution of replication slippage as the primary mechanism of STR mutagenesis.
Collapse
Affiliation(s)
- Michael E Goldberg
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT 84112, USA
| | - Michelle D Noyes
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Aaron R Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT 84112, USA
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Computational Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| |
Collapse
|
13
|
Oketch JW, Wain LV, Hollox EJ. A comparison of software for analysis of rare and common short tandem repeat (STR) variation using human genome sequences from clinical and population-based samples. PLoS One 2024; 19:e0300545. [PMID: 38558075 PMCID: PMC10984476 DOI: 10.1371/journal.pone.0300545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 02/27/2024] [Indexed: 04/04/2024] Open
Abstract
Short tandem repeat (STR) variation is an often overlooked source of variation between genomes. STRs comprise about 3% of the human genome and are highly polymorphic. Some cause Mendelian disease, and others affect gene expression. Their contribution to common disease is not well-understood, but recent software tools designed to genotype STRs using short read sequencing data will help address this. Here, we compare software that genotypes common STRs and rarer STR expansions genome-wide, with the aim of applying them to population-scale genomes. By using the Genome-In-A-Bottle (GIAB) consortium and 1000 Genomes Project short-read sequencing data, we compare performance in terms of sequence length, depth, computing resources needed, genotyping accuracy and number of STRs genotyped. To ensure broad applicability of our findings, we also measure genotyping performance against a set of genomes from clinical samples with known STR expansions, and a set of STRs commonly used for forensic identification. We find that HipSTR, ExpansionHunter and GangSTR perform well in genotyping common STRs, including the CODIS 13 core STRs used for forensic analysis. GangSTR and ExpansionHunter outperform HipSTR for genotyping call rate and memory usage. ExpansionHunter denovo (EHdn), STRling and GangSTR outperformed STRetch for detecting expanded STRs, and EHdn and STRling used considerably less processor time compared to GangSTR. Analysis on shared genomic sequence data provided by the GIAB consortium allows future performance comparisons of new software approaches on a common set of data, facilitating comparisons and allowing researchers to choose the best software that fulfils their needs.
Collapse
Affiliation(s)
- John W. Oketch
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| | - Louise V. Wain
- Department of Population Health Sciences, University of Leicester, Leicester, United Kingdom
- National Institute for Health Research, Leicester Respiratory Biomedical Research Centre, Glenfield Hospital, Leicester, United Kingdom
| | - Edward J. Hollox
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| |
Collapse
|
14
|
Jensen TD, Ni B, Reuter CM, Gorzynski JE, Fazal S, Bonner D, Ungar RA, Goddard PC, Raja A, Ashley EA, Bernstein JA, Zuchner S, Greicius MD, Montgomery SB, Schatz MC, Wheeler MT, Battle A. Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.22.24304565. [PMID: 38585781 PMCID: PMC10996727 DOI: 10.1101/2024.03.22.24304565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Rare structural variants (SVs) - insertions, deletions, and complex rearrangements - can cause Mendelian disease, yet they remain difficult to accurately detect and interpret. We sequenced and analyzed Oxford Nanopore long-read genomes of 68 individuals from the Undiagnosed Disease Network (UDN) with no previously identified diagnostic mutations from short-read sequencing. Using our optimized SV detection pipelines and 571 control long-read genomes, we detected 716 long-read rare (MAF < 0.01) SV alleles per genome on average, achieving a 2.4x increase from short-reads. To characterize the functional effects of rare SVs, we assessed their relationship with gene expression from blood or fibroblasts from the same individuals, and found that rare SVs overlapping enhancers were enriched (LOR = 0.46) near expression outliers. We also evaluated tandem repeat expansions (TREs) and found 14 rare TREs per genome; notably these TREs were also enriched near overexpression outliers. To prioritize candidate functional SVs, we developed Watershed-SV, a probabilistic model that integrates expression data with SV-specific genomic annotations, which significantly outperforms baseline models that don't incorporate expression data. Watershed-SV identified a median of eight high-confidence functional SVs per UDN genome. Notably, this included compound heterozygous deletions in FAM177A1 shared by two siblings, which were likely causal for a rare neurodevelopmental disorder. Our observations demonstrate the promise of integrating long-read sequencing with gene expression towards improving the prioritization of functional SVs and TREs in rare disease patients.
Collapse
|
15
|
Timmaraju VA, Finkelstein SD, Levine JA. Analytical Validation of Loss of Heterozygosity and Mutation Detection in Pancreatic Fine-Needle Aspirates by Capillary Electrophoresis and Sanger Sequencing. Diagnostics (Basel) 2024; 14:514. [PMID: 38472986 DOI: 10.3390/diagnostics14050514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 02/15/2024] [Accepted: 02/23/2024] [Indexed: 03/14/2024] Open
Abstract
Pancreatic cystic disease, including duct dilation, represents precursor states towards the development of pancreatic cancer, a form of malignancy with relatively low incidence but high mortality. While most of these cysts (>85%) are benign, the remainder can progress over time, leading to malignant transformation, invasion, and metastasis. Cytologic diagnosis is challenging, limited by the paucity or complete absence of cells representative of cystic lesions and fibrosis. Molecular analysis of fluids collected from endoscopic-guided fine-needle aspiration of pancreatic cysts and dilated duct lesions can be used to evaluate the risk of progression to malignancy. The basis for the enhanced diagnostic utility of molecular approaches is the ability to interrogate cell-free nucleic acid of the cyst/duct and/or extracellular fluid. The allelic imbalances at tumor suppressor loci and the selective oncogenic drivers are used clinically to help differentiate benign stable pancreatic cysts from those progressing toward high-grade dysplasia. Methods are discussed and used to determine the efficacy for diagnostic implementation. Here, we report the analytical validation of methods to detect causally associated molecular changes integral to the pathogenesis of pancreatic cancer from pancreatic cyst fluids.
Collapse
|
16
|
Verbiest MA, Lundström O, Xia F, Baudis M, Bilgin Sonay T, Anisimova M. Short tandem repeat mutations regulate gene expression in colorectal cancer. Sci Rep 2024; 14:3331. [PMID: 38336885 PMCID: PMC10858039 DOI: 10.1038/s41598-024-53739-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 02/04/2024] [Indexed: 02/12/2024] Open
Abstract
Short tandem repeat (STR) mutations are prevalent in colorectal cancer (CRC), especially in tumours with the microsatellite instability (MSI) phenotype. While STR length variations are known to regulate gene expression under physiological conditions, the functional impact of STR mutations in CRC remains unclear. Here, we integrate STR mutation data with clinical information and gene expression data to study the gene regulatory effects of STR mutations in CRC. We confirm that STR mutability in CRC highly depends on the MSI status, repeat unit size, and repeat length. Furthermore, we present a set of 1244 putative expression STRs (eSTRs) for which the STR length is associated with gene expression levels in CRC tumours. The length of 73 eSTRs is associated with expression levels of cancer-related genes, nine of which are CRC-specific genes. We show that linear models describing eSTR-gene expression relationships allow for predictions of gene expression changes in response to eSTR mutations. Moreover, we found an increased mutability of eSTRs in MSI tumours. Our evidence of gene regulatory roles for eSTRs in CRC highlights a mostly overlooked way through which tumours may modulate their phenotypes. Future extensions of these findings could uncover new STR-based targets in the treatment of cancer.
Collapse
Affiliation(s)
- Max A Verbiest
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland.
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| | - Oxana Lundström
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Feifei Xia
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Michael Baudis
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Tugce Bilgin Sonay
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Ecology, Evolution and Environmental Biology, Columbia University, New York, USA
| | - Maria Anisimova
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
17
|
Wang Q, Chen X, Meng Y, Niu M, Jia Y, Huang L, Ma W, Liang C, Li Z, Zhao L, Dang Z. The Potential Role of Genic-SSRs in Driving Ecological Adaptation Diversity in Caragana Plants. Int J Mol Sci 2024; 25:2084. [PMID: 38396759 PMCID: PMC10888960 DOI: 10.3390/ijms25042084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 01/26/2024] [Accepted: 01/30/2024] [Indexed: 02/25/2024] Open
Abstract
Caragana, a xerophytic shrub genus widely distributed in northern China, exhibits distinctive geographical substitution patterns and ecological adaptation diversity. This study employed transcriptome sequencing technology to investigate 12 Caragana species, aiming to explore genic-SSR variations in the Caragana transcriptome and identify their role as a driving force for environmental adaptation within the genus. A total of 3666 polymorphic genic-SSRs were identified across different species. The impact of these variations on the expression of related genes was analyzed, revealing a significant linear correlation (p < 0.05) between the length variation of 264 polymorphic genic-SSRs and the expression of associated genes. Additionally, 2424 polymorphic genic-SSRs were located in differentially expressed genes among Caragana species. Through weighted gene co-expression network analysis, the expressions of these genes were correlated with 19 climatic factors and 16 plant functional traits in various habitats. This approach facilitated the identification of biological processes associated with habitat adaptations in the studied Caragana species. Fifty-five core genes related to functional traits and climatic factors were identified, including various transcription factors such as MYB, TCP, ARF, and structural proteins like HSP90, elongation factor TS, and HECT. The roles of these genes in the ecological adaptation diversity of Caragana were discussed. Our study identified specific genomic components and genes in Caragana plants responsive to heterogeneous habitats. The results contribute to advancements in the molecular understanding of their ecological adaptation, lay a foundation for the conservation and development of Caragana germplasm resources, and provide a scientific basis for plant adaptation to global climate change.
Collapse
Affiliation(s)
- Qinglang Wang
- Ministry of Education Key Laboratory of Ecology and Resource Use of the Mongolian Plateau & Inner Mongolia Key Laboratory of Grassland Ecology, School of Ecology and Environment, Inner Mongolia University, Hohhot 010021, China; (Q.W.); (X.C.); (Y.M.); (M.N.); (Y.J.); (L.H.); (W.M.); (C.L.); (Z.L.); (L.Z.)
- Collaborative Innovation Center for Grassland Ecological Security, Ministry of Education of China, Inner Mongolia Autonomous Region, Hohhot 010021, China
| | - Xing’er Chen
- Ministry of Education Key Laboratory of Ecology and Resource Use of the Mongolian Plateau & Inner Mongolia Key Laboratory of Grassland Ecology, School of Ecology and Environment, Inner Mongolia University, Hohhot 010021, China; (Q.W.); (X.C.); (Y.M.); (M.N.); (Y.J.); (L.H.); (W.M.); (C.L.); (Z.L.); (L.Z.)
- Collaborative Innovation Center for Grassland Ecological Security, Ministry of Education of China, Inner Mongolia Autonomous Region, Hohhot 010021, China
| | - Yue Meng
- Ministry of Education Key Laboratory of Ecology and Resource Use of the Mongolian Plateau & Inner Mongolia Key Laboratory of Grassland Ecology, School of Ecology and Environment, Inner Mongolia University, Hohhot 010021, China; (Q.W.); (X.C.); (Y.M.); (M.N.); (Y.J.); (L.H.); (W.M.); (C.L.); (Z.L.); (L.Z.)
- Collaborative Innovation Center for Grassland Ecological Security, Ministry of Education of China, Inner Mongolia Autonomous Region, Hohhot 010021, China
| | - Miaomiao Niu
- Ministry of Education Key Laboratory of Ecology and Resource Use of the Mongolian Plateau & Inner Mongolia Key Laboratory of Grassland Ecology, School of Ecology and Environment, Inner Mongolia University, Hohhot 010021, China; (Q.W.); (X.C.); (Y.M.); (M.N.); (Y.J.); (L.H.); (W.M.); (C.L.); (Z.L.); (L.Z.)
- Collaborative Innovation Center for Grassland Ecological Security, Ministry of Education of China, Inner Mongolia Autonomous Region, Hohhot 010021, China
| | - Yuanyuan Jia
- Ministry of Education Key Laboratory of Ecology and Resource Use of the Mongolian Plateau & Inner Mongolia Key Laboratory of Grassland Ecology, School of Ecology and Environment, Inner Mongolia University, Hohhot 010021, China; (Q.W.); (X.C.); (Y.M.); (M.N.); (Y.J.); (L.H.); (W.M.); (C.L.); (Z.L.); (L.Z.)
- Collaborative Innovation Center for Grassland Ecological Security, Ministry of Education of China, Inner Mongolia Autonomous Region, Hohhot 010021, China
| | - Lei Huang
- Ministry of Education Key Laboratory of Ecology and Resource Use of the Mongolian Plateau & Inner Mongolia Key Laboratory of Grassland Ecology, School of Ecology and Environment, Inner Mongolia University, Hohhot 010021, China; (Q.W.); (X.C.); (Y.M.); (M.N.); (Y.J.); (L.H.); (W.M.); (C.L.); (Z.L.); (L.Z.)
- Collaborative Innovation Center for Grassland Ecological Security, Ministry of Education of China, Inner Mongolia Autonomous Region, Hohhot 010021, China
| | - Wenhong Ma
- Ministry of Education Key Laboratory of Ecology and Resource Use of the Mongolian Plateau & Inner Mongolia Key Laboratory of Grassland Ecology, School of Ecology and Environment, Inner Mongolia University, Hohhot 010021, China; (Q.W.); (X.C.); (Y.M.); (M.N.); (Y.J.); (L.H.); (W.M.); (C.L.); (Z.L.); (L.Z.)
- Collaborative Innovation Center for Grassland Ecological Security, Ministry of Education of China, Inner Mongolia Autonomous Region, Hohhot 010021, China
| | - Cunzhu Liang
- Ministry of Education Key Laboratory of Ecology and Resource Use of the Mongolian Plateau & Inner Mongolia Key Laboratory of Grassland Ecology, School of Ecology and Environment, Inner Mongolia University, Hohhot 010021, China; (Q.W.); (X.C.); (Y.M.); (M.N.); (Y.J.); (L.H.); (W.M.); (C.L.); (Z.L.); (L.Z.)
- Collaborative Innovation Center for Grassland Ecological Security, Ministry of Education of China, Inner Mongolia Autonomous Region, Hohhot 010021, China
| | - Zhiyong Li
- Ministry of Education Key Laboratory of Ecology and Resource Use of the Mongolian Plateau & Inner Mongolia Key Laboratory of Grassland Ecology, School of Ecology and Environment, Inner Mongolia University, Hohhot 010021, China; (Q.W.); (X.C.); (Y.M.); (M.N.); (Y.J.); (L.H.); (W.M.); (C.L.); (Z.L.); (L.Z.)
- Collaborative Innovation Center for Grassland Ecological Security, Ministry of Education of China, Inner Mongolia Autonomous Region, Hohhot 010021, China
| | - Liqing Zhao
- Ministry of Education Key Laboratory of Ecology and Resource Use of the Mongolian Plateau & Inner Mongolia Key Laboratory of Grassland Ecology, School of Ecology and Environment, Inner Mongolia University, Hohhot 010021, China; (Q.W.); (X.C.); (Y.M.); (M.N.); (Y.J.); (L.H.); (W.M.); (C.L.); (Z.L.); (L.Z.)
- Collaborative Innovation Center for Grassland Ecological Security, Ministry of Education of China, Inner Mongolia Autonomous Region, Hohhot 010021, China
| | - Zhenhua Dang
- Ministry of Education Key Laboratory of Ecology and Resource Use of the Mongolian Plateau & Inner Mongolia Key Laboratory of Grassland Ecology, School of Ecology and Environment, Inner Mongolia University, Hohhot 010021, China; (Q.W.); (X.C.); (Y.M.); (M.N.); (Y.J.); (L.H.); (W.M.); (C.L.); (Z.L.); (L.Z.)
- Collaborative Innovation Center for Grassland Ecological Security, Ministry of Education of China, Inner Mongolia Autonomous Region, Hohhot 010021, China
| |
Collapse
|
18
|
Lu J, Toro C, Adams DR, Moreno CAM, Lee WP, Leung YY, Harms MB, Vardarajan B, Heinzen EL. LUSTR: a new customizable tool for calling genome-wide germline and somatic short tandem repeat variants. BMC Genomics 2024; 25:115. [PMID: 38279154 PMCID: PMC10811831 DOI: 10.1186/s12864-023-09935-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 12/21/2023] [Indexed: 01/28/2024] Open
Abstract
BACKGROUND Short tandem repeats (STRs) are widely distributed across the human genome and are associated with numerous neurological disorders. However, the extent that STRs contribute to disease is likely under-estimated because of the challenges calling these variants in short read next generation sequencing data. Several computational tools have been developed for STR variant calling, but none fully address all of the complexities associated with this variant class. RESULTS Here we introduce LUSTR which is designed to address some of the challenges associated with STR variant calling by enabling more flexibility in defining STR loci, allowing for customizable modules to tailor analyses, and expanding the capability to call somatic and multiallelic STR variants. LUSTR is a user-friendly and easily customizable tool for targeted or unbiased genome-wide STR variant screening that can use either predefined or novel genome builds. Using both simulated and real data sets, we demonstrated that LUSTR accurately infers germline and somatic STR expansions in individuals with and without diseases. CONCLUSIONS LUSTR offers a powerful and user-friendly approach that allows for the identification of STR variants and can facilitate more comprehensive studies evaluating the role of pathogenic STR variants across human diseases.
Collapse
Affiliation(s)
- Jinfeng Lu
- Division of Pharmacotherapy and Experimental Therapeutics, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- The Taub Institute for Research On Alzheimer's Disease and the Aging Brain, Gertrude H. Sergievsky Center, Department of Neurology, College of Physicians and Surgeons, Columbia University, The New York Presbyterian Hospital, New York, NY, 10032, USA.
| | - Camilo Toro
- NIH Undiagnosed Diseases Program, National Human Genome Research Institute (NHGRI), National Institutes of Health, Bethesda, MD, 20892, USA
| | - David R Adams
- NIH Undiagnosed Diseases Program, National Human Genome Research Institute (NHGRI), National Institutes of Health, Bethesda, MD, 20892, USA
| | | | - Wan-Ping Lee
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory MedicinePerelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Yuk Yee Leung
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory MedicinePerelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Mathew B Harms
- Department of Neurology, Division of Neuromuscular Medicine, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Badri Vardarajan
- The Taub Institute for Research On Alzheimer's Disease and the Aging Brain, Gertrude H. Sergievsky Center, Department of Neurology, College of Physicians and Surgeons, Columbia University, The New York Presbyterian Hospital, New York, NY, 10032, USA
| | - Erin L Heinzen
- Division of Pharmacotherapy and Experimental Therapeutics, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
19
|
Manigbas CA, Jadhav B, Garg P, Shadrina M, Lee W, Martin-Trujillo A, Sharp AJ. A phenome-wide association study of tandem repeat variation in 168,554 individuals from the UK Biobank. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.22.24301630. [PMID: 38343850 PMCID: PMC10854328 DOI: 10.1101/2024.01.22.24301630] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2024]
Abstract
Most genetic association studies focus on binary variants. To identify the effects of multi-allelic variation of tandem repeats (TRs) on human traits, we performed direct TR genotyping and phenome-wide association studies in 168,554 individuals from the UK Biobank, identifying 47 TRs showing causal associations with 73 traits. We replicated 23 of 31 (74%) of these causal associations in the All of Us cohort. While this set included several known repeat expansion disorders, novel associations we found were attributable to common polymorphic variation in TR length rather than rare expansions and include e.g. a coding polyhistidine motif in HRCT1 influencing risk of hypertension and a poly(CGC) in the 5'UTR of GNB2 influencing heart rate. Causal TRs were strongly enriched for associations with local gene expression and DNA methylation. Our study highlights the contribution of multi-allelic TRs to the "missing heritability" of the human genome.
Collapse
|
20
|
Zhang J, Zhu B. Short, but matters: short tandem repeats confer variation in transcription factor-DNA binding. Sci Bull (Beijing) 2024; 69:9-10. [PMID: 38042705 DOI: 10.1016/j.scib.2023.11.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2023]
Affiliation(s)
- Jing Zhang
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; Key Laboratory of Epigenetic Regulation and Intervention, Chinese Academy of Sciences, Beijing 100101, China; New Cornerstone Science Laboratory, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Bing Zhu
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; Key Laboratory of Epigenetic Regulation and Intervention, Chinese Academy of Sciences, Beijing 100101, China; New Cornerstone Science Laboratory, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
21
|
Parikh K, Quintero Reis A, Wendt FR. Association between suicidal ideation and tandem repeats in contactins. Front Psychiatry 2024; 14:1236540. [PMID: 38239902 PMCID: PMC10794671 DOI: 10.3389/fpsyt.2023.1236540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 12/13/2023] [Indexed: 01/22/2024] Open
Abstract
Background Death by suicide is one of the leading causes of death among adolescents. Genome-wide association studies (GWAS) have identified loci that associate with suicidal ideation and related behaviours. One such group of loci are the six contactin genes (CNTN1-6) that are critical to neurodevelopment through regulating neurite structure. Because single nucleotide polymorphisms (SNPs) detected by GWAS often map to non-coding intergenic regions, we investigated whether repetitive variants in CNTNs associated with suicidality in a young cohort aged 8 to 21. Understanding the genetic liability of suicidal thought and behavior in this age group will promote early intervention and treatment. Methods Genotypic and phenotypic data were obtained from the Philadelphia Neurodevelopment Cohort (PNC). Across six CNTNs, 232 short tandem repeats (STRs) were analyzed in up to 4,595 individuals of European ancestry who expressed current, previous, or no suicidal ideation. STRs were imputed into SNP arrays using a phased SNP-STR haplotype reference panel from the 1000 Genomes Project. We tested several additive and interactive models of locus-level burden (i.e., sum of STR alleles) with respect to suicidal ideation. Additive models included sex, birth year, developmental stage ("DevStage"), and the first 10 principal components of ancestry as covariates; interactive models assessed the effect of STR-by-DevStage considering all other covariates. Results CNTN1-[T]N interacted with DevStage to increase risk for current suicidal ideation (CNTN1-[T]N-by-DevStage; p = 0.00035). Compared to the youngest age group, the middle (OR = 1.80, p = 0.0514) and oldest (OR = 3.82, p = 0.0002) participant groups had significantly higher odds of suicidal ideation as their STR length expanded; this result was independent of polygenic scores for suicidal ideation. Discussion These findings highlight diversity in the genetic effects (i.e., SNP and STR) acting on suicidal thoughts and behavior and advance our understanding of suicidal ideation across childhood and adolescence.
Collapse
Affiliation(s)
- Kairavi Parikh
- Forensic Science Program, University of Toronto, Mississauga, ON, Canada
| | - Andrea Quintero Reis
- Forensic Science Program, University of Toronto, Mississauga, ON, Canada
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
| | - Frank R. Wendt
- Forensic Science Program, University of Toronto, Mississauga, ON, Canada
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
| |
Collapse
|
22
|
TRGT-ing the dark genome to accurately characterize tandem repeats at scale. Nat Biotechnol 2024:10.1038/s41587-023-02073-3. [PMID: 38168998 DOI: 10.1038/s41587-023-02073-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
|
23
|
Dolzhenko E, English A, Dashnow H, De Sena Brandine G, Mokveld T, Rowell WJ, Karniski C, Kronenberg Z, Danzi MC, Cheung WA, Bi C, Farrow E, Wenger A, Chua KP, Martínez-Cerdeño V, Bartley TD, Jin P, Nelson DL, Zuchner S, Pastinen T, Quinlan AR, Sedlazeck FJ, Eberle MA. Characterization and visualization of tandem repeats at genome scale. Nat Biotechnol 2024:10.1038/s41587-023-02057-3. [PMID: 38168995 DOI: 10.1038/s41587-023-02057-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 11/06/2023] [Indexed: 01/05/2024]
Abstract
Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there is still a need for computational methods to profile TRs across the genome. Here we introduce the Tandem Repeat Genotyping Tool (TRGT) and an accompanying TR database. TRGT determines the consensus sequences and methylation levels of specified TRs from PacBio HiFi sequencing data. It also reports reads that support each repeat allele. These reads can be subsequently visualized with a companion TR visualization tool. Assessing 937,122 TRs, TRGT showed a Mendelian concordance of 98.38%, allowing a single repeat unit difference. In six samples with known repeat expansions, TRGT detected all expansions while also identifying methylation signals and mosaicism and providing finer repeat length resolution than existing methods. Additionally, we released a database with allele sequences and methylation levels for 937,122 TRs across 100 genomes.
Collapse
Affiliation(s)
| | - Adam English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Harriet Dashnow
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | | | - Tom Mokveld
- Pacific Biosciences of California, Menlo Park, CA, USA
| | | | | | | | - Matt C Danzi
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Warren A Cheung
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Chengpeng Bi
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Emily Farrow
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron Wenger
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Khi Pin Chua
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Verónica Martínez-Cerdeño
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
- MIND Institute, UC Davis School of Medicine, Sacramento, CA, USA
| | - Trevor D Bartley
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
| | - Peng Jin
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - David L Nelson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Stephan Zuchner
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Tomi Pastinen
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron R Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | | |
Collapse
|
24
|
Goldberg ME, Noyes MD, Eichler EE, Quinlan AR, Harris K. Effects of parental age and polymer composition on short tandem repeat de novo mutation rates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.22.573131. [PMID: 38187618 PMCID: PMC10769404 DOI: 10.1101/2023.12.22.573131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Short tandem repeats (STRs) are hotspots of genomic variability in the human germline because of their high mutation rates, which have long been attributed largely to polymerase slippage during DNA replication. This model suggests that STR mutation rates should scale linearly with a father's age, as progenitor cells continually divide after puberty. In contrast, it suggests that STR mutation rates should not scale with a mother's age at her child's conception, since oocytes spend a mother's reproductive years arrested in meiosis II and undergo a fixed number of cell divisions that are independent of the age at ovulation. Yet, mirroring recent findings, we find that STR mutation rates covary with paternal and maternal age, implying that some STR mutations are caused by DNA damage in quiescent cells rather than the classical mechanism of polymerase slippage in replicating progenitor cells. These results also echo the recent finding that DNA damage in quiescent oocytes is a significant source of de novo SNVs and corroborate evidence of STR expansion in postmitotic cells. However, we find that the maternal age effect is not confined to previously discovered hotspots of oocyte mutagenesis, nor are post-zygotic mutations likely to contribute significantly. STR nucleotide composition demonstrates divergent effects on DNM rates between sexes. Unlike the paternal lineage, maternally derived DNMs at A/T STRs display a significantly greater association with maternal age than DNMs at GC-containing STRs. These observations may suggest the mechanism and developmental timing of certain STR mutations and are especially surprising considering the prior belief in replication slippage as the dominant mechanism of STR mutagenesis.
Collapse
Affiliation(s)
- Michael E. Goldberg
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
- Departments of Human Genetics and Biomedical Informatics, University of Utah, 15 S 2030 E, Salt Lake City, UT, 84112
| | - Michelle D. Noyes
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
- Howard Hughes Medical Institute, 3720 15 Ave NE, University of Washington, Seattle, WA, 98195
| | - Aaron R. Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, 15 S 2030 E, Salt Lake City, UT, 84112
- These authors contributed equally to this work
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
- Computational Biology Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA, 98109
- These authors contributed equally to this work
| |
Collapse
|
25
|
Birnbaum R. Rediscovering tandem repeat variation in schizophrenia: challenges and opportunities. Transl Psychiatry 2023; 13:402. [PMID: 38123544 PMCID: PMC10733427 DOI: 10.1038/s41398-023-02689-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 11/23/2023] [Accepted: 11/27/2023] [Indexed: 12/23/2023] Open
Abstract
Tandem repeats (TRs) are prevalent throughout the genome, constituting at least 3% of the genome, and often highly polymorphic. The high mutation rate of TRs, which can be orders of magnitude higher than single-nucleotide polymorphisms and indels, indicates that they are likely to make significant contributions to phenotypic variation, yet their contribution to schizophrenia has been largely ignored by recent genome-wide association studies (GWAS). Tandem repeat expansions are already known causative factors for over 50 disorders, while common tandem repeat variation is increasingly being identified as significantly associated with complex disease and gene regulation. The current review summarizes key background concepts of tandem repeat variation as pertains to disease risk, elucidating their potential for schizophrenia association. An overview of next-generation sequencing-based methods that may be applied for TR genome-wide identification is provided, and some key methodological challenges in TR analyses are delineated.
Collapse
Affiliation(s)
- Rebecca Birnbaum
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
26
|
Chaisson MJP, Sulovari A, Valdmanis PN, Miller DE, Eichler EE. Advances in the discovery and analyses of human tandem repeats. Emerg Top Life Sci 2023; 7:361-381. [PMID: 37905568 PMCID: PMC10806765 DOI: 10.1042/etls20230074] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/18/2023] [Accepted: 10/18/2023] [Indexed: 11/02/2023]
Abstract
Long-read sequencing platforms provide unparalleled access to the structure and composition of all classes of tandemly repeated DNA from STRs to satellite arrays. This review summarizes our current understanding of their organization within the human genome, their importance with respect to disease, as well as the advances and challenges in understanding their genetic diversity and functional effects. Novel computational methods are being developed to visualize and associate these complex patterns of human variation with disease, expression, and epigenetic differences. We predict accurate characterization of this repeat-rich form of human variation will become increasingly relevant to both basic and clinical human genetics.
Collapse
Affiliation(s)
- Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, U.S.A
- The Genomic and Epigenomic Regulation Program, USC Norris Cancer Center, University of Southern California, Los Angeles, CA 90089, U.S.A
| | - Arvis Sulovari
- Computational Biology, Cajal Neuroscience Inc, Seattle, WA 98102, U.S.A
| | - Paul N Valdmanis
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
| | - Danny E Miller
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, U.S.A
- Department of Pediatrics, University of Washington, Seattle, WA 98195, U.S.A
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, U.S.A
| |
Collapse
|
27
|
Hannan AJ. Expanding horizons of tandem repeats in biology and medicine: Why 'genomic dark matter' matters. Emerg Top Life Sci 2023; 7:ETLS20230075. [PMID: 38088823 PMCID: PMC10754335 DOI: 10.1042/etls20230075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 11/27/2023] [Accepted: 11/27/2023] [Indexed: 12/30/2023]
Abstract
Approximately half of the human genome includes repetitive sequences, and these DNA sequences (as well as their transcribed repetitive RNA and translated amino-acid repeat sequences) are known as the repeatome. Within this repeatome there are a couple of million tandem repeats, dispersed throughout the genome. These tandem repeats have been estimated to constitute ∼8% of the entire human genome. These tandem repeats can be located throughout exons, introns and intergenic regions, thus potentially affecting the structure and function of tandemly repetitive DNA, RNA and protein sequences. Over more than three decades, more than 60 monogenic human disorders have been found to be caused by tandem-repeat mutations. These monogenic tandem-repeat disorders include Huntington's disease, a variety of ataxias, amyotrophic lateral sclerosis and frontotemporal dementia, as well as many other neurodegenerative diseases. Furthermore, tandem-repeat disorders can include fragile X syndrome, related fragile X disorders, as well as other neurological and psychiatric disorders. However, these monogenic tandem-repeat disorders, which were discovered via their dominant or recessive modes of inheritance, may represent the 'tip of the iceberg' with respect to tandem-repeat contributions to human disorders. A previous proposal that tandem repeats may contribute to the 'missing heritability' of various common polygenic human disorders has recently been supported by a variety of new evidence. This includes genome-wide studies that associate tandem-repeat mutations with autism, schizophrenia, Parkinson's disease and various types of cancers. In this article, I will discuss how tandem-repeat mutations and polymorphisms could contribute to a wide range of common disorders, along with some of the many major challenges of tandem-repeat biology and medicine. Finally, I will discuss the potential of tandem repeats to be therapeutically targeted, so as to prevent and treat an expanding range of human disorders.
Collapse
Affiliation(s)
- Anthony J Hannan
- Florey Institute of Neuroscience and Mental Health, University of Melbourne, Parkville, Victoria 3010, Australia
- Department of Anatomy and Physiology, University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
28
|
Guo MH, Lee WP, Vardarajan B, Schellenberg GD, Phillips-Cremins J. Polygenic burden of short tandem repeat expansions promote risk for Alzheimer's disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.11.16.23298623. [PMID: 38014121 PMCID: PMC10680900 DOI: 10.1101/2023.11.16.23298623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Studies of the genetics of Alzheimer's disease (AD) have largely focused on single nucleotide variants and short insertions/deletions. However, most of the disease heritability has yet to be uncovered, suggesting that there is substantial genetic risk conferred by other forms of genetic variation. There are over one million short tandem repeats (STRs) in the genome, and their link to AD risk has not been assessed. As pathogenic expansions of STR cause over 30 neurologic diseases, it is important to ascertain whether STRs may also be implicated in AD risk. Here, we genotyped 321,742 polymorphic STR tracts genome-wide using PCR-free whole genome sequencing data from 2,981 individuals (1,489 AD case and 1,492 control individuals). We implemented an approach to identify STR expansions as STRs with tract lengths that are outliers from the population. We then tested for differences in aggregate burden of expansions in case versus control individuals. AD patients had a 1.19-fold increase of STR expansions compared to healthy elderly controls (p=8.27×10-3, two-sided Mann Whitney test). Individuals carrying > 30 STR expansions had 3.62-fold higher odds of having AD and had more severe AD neuropathology. AD STR expansions were highly enriched within active promoters in post-mortem hippocampal brain tissues and particularly within SINE-VNTR-Alu (SVA) retrotransposons. Together, these results demonstrate that expanded STRs within active promoter regions of the genome promote risk of AD.
Collapse
Affiliation(s)
- Michael H Guo
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Wan-Ping Lee
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Badri Vardarajan
- Department of Neurology, College of Physicians and Surgeons, Columbia University, New York, NY
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Jennifer Phillips-Cremins
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA
- Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
29
|
Ziaei Jam H, Li Y, DeVito R, Mousavi N, Ma N, Lujumba I, Adam Y, Maksimov M, Huang B, Dolzhenko E, Qiu Y, Kakembo FE, Joseph H, Onyido B, Adeyemi J, Bakhtiari M, Park J, Javadzadeh S, Jjingo D, Adebiyi E, Bafna V, Gymrek M. A deep population reference panel of tandem repeat variation. Nat Commun 2023; 14:6711. [PMID: 37872149 PMCID: PMC10593948 DOI: 10.1038/s41467-023-42278-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 10/05/2023] [Indexed: 10/25/2023] Open
Abstract
Tandem repeats (TRs) represent one of the largest sources of genetic variation in humans and are implicated in a range of phenotypes. Here we present a deep characterization of TR variation based on high coverage whole genome sequencing from 3550 diverse individuals from the 1000 Genomes Project and H3Africa cohorts. We develop a method, EnsembleTR, to integrate genotypes from four separate methods resulting in high-quality genotypes at more than 1.7 million TR loci. Our catalog reveals novel sequence features influencing TR heterozygosity, identifies population-specific trinucleotide expansions, and finds hundreds of novel eQTL signals. Finally, we generate a phased haplotype panel which can be used to impute most TRs from nearby single nucleotide polymorphisms (SNPs) with high accuracy. Overall, the TR genotypes and reference haplotype panel generated here will serve as valuable resources for future genome-wide and population-wide studies of TRs and their role in human phenotypes.
Collapse
Affiliation(s)
- Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Yang Li
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Ross DeVito
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Nima Mousavi
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
| | - Nichole Ma
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Ibra Lujumba
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
| | - Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Mikhail Maksimov
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Bonnie Huang
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | | | - Yunjiang Qiu
- Illumina Incorporated, San Diego, CA, 92122, USA
| | - Fredrick Elishama Kakembo
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
| | - Habi Joseph
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
| | - Blessing Onyido
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Jumoke Adeyemi
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Mehrdad Bakhtiari
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Sara Javadzadeh
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Daudi Jjingo
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
- Department of Computer Science, Makerere University, Kampala, Uganda
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, Baden-Württemberg, 69120, Germany
| | - Vineet Bafna
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
30
|
Alhawatema M. GenoSSRFinder: a tool for rapid, precise, and targeted simple sequence repeat detection in genomic studies. BRAZ J BIOL 2023; 83:e276380. [PMID: 37878962 DOI: 10.1590/1519-6984.276380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 08/19/2023] [Indexed: 10/27/2023] Open
Abstract
The GenoSSRFinder is a new tool enables the research of Simple Sequence Repeats (SSRs) in DNA sequences and genomes much simpler and more precise in short time. The analysis is carried out by targeting a certain SSR in genome and gene sequences. This utility is quick, accurate, and does its function quite well. It quickly goes across the sequence, revealing all of the locations at which the selected SSR may be found. This tool will tell researchers where selected SSR begins and where it stops, how long it is, how often it repeats, and how long each repetition is. GenoSSRFinder gets the findings quickly, and they will be simple to comprehend. Therefore, when studying SSRs, researchers will have more time to use to thorough work as a result of this time savings. In addition, it provides a valuable information since it is highly precise. GenoSSRFinder is simple to use and produces high-quality findings. It is also accelerating SSRs gene research, which is a direct result of the new approach we use to analyse SSRs. Three case studies in this study demonstrated the usefulness of this program by immediately studying a particular SSR that was associated with genetic illness, biodiversity and criminal science in living organisms. This demonstration explains that GenoSSRFinder might be utilized in a wide variety of fields, such as the research of genetic illnesses, the biodiversity and genetic studies, or even in criminal investigations.
Collapse
Affiliation(s)
- M Alhawatema
- Tafila Technical University, Faculty of Science, Department of Applied Biological Science, Tafila, Jordan
| |
Collapse
|
31
|
Yeung SS, Ma SL, Wang X, Chen Y, Tsui SKW, Tang NLS, Woo J. Telomere Length among Chinese Aged 75+ Years. Gerontology 2023; 69:1414-1423. [PMID: 37857262 PMCID: PMC10652652 DOI: 10.1159/000534644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 07/19/2023] [Indexed: 10/21/2023] Open
Abstract
INTRODUCTION Telomere length (TL) is generally regarded as a biomarker of aging. TL, which is influenced by sociodemographic factors, has been shown to be inversely associated with morbidity. However, most studies examined the youngest, and whether the findings can be extended to older individuals is less clear. Further, few studies have examined these questions in Chinese older adults. This cross-sectional study examined TL and its associated factors in Chinese aged 75+ years in Hong Kong. METHODS Participants were from the Mr. and Ms. Osteoporosis cohort. A structured interview on sociodemographic factors and physical measurement was conducted. Frailty and sarcopenia status were respectively determined by Fried's criteria and the Asian Working Group for Sarcopenia definition. TL was measured by a molecular inversion probe-quantitative PCR assay and expressed as a novel telomere/a single copy reference gene (T/S) ratio. Adjusted binary logistic regressions were used to examine the associations between TL and the presence of multimorbidity, age-related diseases, frailty, and sarcopenia. RESULTS Among 555 participants (mean age 83.6 ± 3.8 years, 41.3% females), the mean T/S ratio was 1.01 ± 0.20. Males had a lower T/S ratio (0.97 ± 0.20) compared with females (1.07 ± 0.18) (p < 0.001). A lower education level was related to a longer TL (p = 0.016). Being a current smoker was related to a shorter TL (p = 0.007). TL was not significantly different across categories of age, subjective socioeconomic status, drinking status, physical activity level, and body mass index (p > 0.05). There were no associations between TL and the presence of multimorbidity, diabetes, stroke, cardiovascular diseases, cognitive impairment, frailty, and sarcopenia. CONCLUSION Among Chinese aged 75+ years, males had shorter TL compared with females. TL was not associated with age-related diseases, frailty, and sarcopenia in this age group. TL may not be a biological marker of aging among older individuals.
Collapse
Affiliation(s)
- Suey S.Y. Yeung
- Department of Medicine and Therapeutics, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR
| | - Suk Ling Ma
- Department of Psychiatry, The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR
| | - Xingyan Wang
- Department of Chemical Pathology and Li Ka Shing Institute of Health Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR
- Hong Kong Branch of CAS Center for Excellence in Animal Evolution and Genetics, Hong Kong, Hong Kong SAR
- KIZ/CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Hong Kong, Hong Kong SAR
- Functional Genomics and Biostatistical Computing Laboratory, CUHK Shenzhen Research Institute, Shenzhen, China
| | - Yangchao Chen
- School of Biomedical Sciences, The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR
- Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen, China
| | - Stephen Kwok Wing Tsui
- School of Biomedical Sciences, The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR
- Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR
- Centre for Microbial Genomics and Proteomics, The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR
| | - Nelson Leung Sang Tang
- Department of Chemical Pathology and Li Ka Shing Institute of Health Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR
- Hong Kong Branch of CAS Center for Excellence in Animal Evolution and Genetics, Hong Kong, Hong Kong SAR
- KIZ/CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Hong Kong, Hong Kong SAR
- Functional Genomics and Biostatistical Computing Laboratory, CUHK Shenzhen Research Institute, Shenzhen, China
- Cytomics Limited, Hong Kong Science Park, Hong Kong, Hong Kong SAR
| | - Jean Woo
- Department of Medicine and Therapeutics, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR
- Centre for Nutritional Studies, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR
| |
Collapse
|
32
|
Lundström OS, Adriaan Verbiest M, Xia F, Jam HZ, Zlobec I, Anisimova M, Gymrek M. WebSTR: A Population-wide Database of Short Tandem Repeat Variation in Humans. J Mol Biol 2023; 435:168260. [PMID: 37678708 DOI: 10.1016/j.jmb.2023.168260] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 08/29/2023] [Accepted: 08/29/2023] [Indexed: 09/09/2023]
Abstract
Short tandem repeats (STRs) are consecutive repetitions of one to six nucleotide motifs. They are hypervariable due to the high prevalence of repeat unit insertions or deletions primarily caused by polymerase slippage during replication. Genetic variation at STRs has been shown to influence a range of traits in humans, including gene expression, cancer risk, and autism. Until recently STRs have been poorly studied since they pose significant challenges to bioinformatics analyses. Moreover, genome-wide analysis of STR variation in population-scale cohorts requires large amounts of data and computational resources. However, the recent advent of genome-wide analysis tools has resulted in multiple large genome-wide datasets of STR variation spanning nearly two million genomic loci in thousands of individuals from diverse populations. Here we present WebSTR, a database of genetic variation and other characteristics of genome-wide STRs across human populations. WebSTR is based on reference panels of more than 1.7 million human STRs created with state of the art repeat annotation methods and can easily be extended to include additional cohorts or species. It currently contains data based on STR genotypes for individuals from the 1000 Genomes Project, H3Africa, the Genotype-Tissue Expression (GTEx) Project and colorectal cancer patients from the TCGA dataset. WebSTR is implemented as a relational database with programmatic access available through an API and a web portal for browsing data. The web portal is publicly available at https://webstr.ucsd.edu.
Collapse
Affiliation(s)
- Oxana Sachenkova Lundström
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden; Vildly AB, Kalmar, Sweden; Institute of Computational Life Sciences, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland. https://twitter.com/merenlin
| | - Max Adriaan Verbiest
- Institute of Computational Life Sciences, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland; Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
| | - Feifei Xia
- Institute of Computational Life Sciences, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland; Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland. https://twitter.com/Feifeix97
| | - Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Inti Zlobec
- Institute of Tissue Medicine and Pathology, University of Bern, Switzerland
| | - Maria Anisimova
- Institute of Computational Life Sciences, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA; Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
33
|
Poulet A, Kratkiewicz AJ, Li D, van Wolfswinkel JC. Chromatin analysis of adult pluripotent stem cells reveals a unique stemness maintenance strategy. SCIENCE ADVANCES 2023; 9:eadh4887. [PMID: 37801496 PMCID: PMC10558129 DOI: 10.1126/sciadv.adh4887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 09/05/2023] [Indexed: 10/08/2023]
Abstract
Many highly regenerative organisms maintain adult pluripotent stem cells throughout their life, but how the long-term maintenance of pluripotency is accomplished is unclear. To decipher the regulatory logic of adult pluripotent stem cells, we analyzed the chromatin organization of stem cell genes in the planarian Schmidtea mediterranea. We identify a special chromatin state of stem cell genes, which is distinct from that of tissue-specific genes and resembles constitutive genes. Where tissue-specific promoters have detectable transcription factor binding sites, the promoters of stem cell-specific genes instead have sequence features that broadly decrease nucleosome binding affinity. This genic organization makes pluripotency-related gene expression the default state in these cells, which is maintained by the activity of chromatin remodelers ISWI and SNF2 in the stem cells.
Collapse
Affiliation(s)
- Axel Poulet
- Department of Molecular Cellular and Developmental Biology, Yale University, New Haven, CT 06511, USA
| | - Arcadia J. Kratkiewicz
- Department of Molecular Cellular and Developmental Biology, Yale University, New Haven, CT 06511, USA
| | - Danyan Li
- Department of Molecular Cellular and Developmental Biology, Yale University, New Haven, CT 06511, USA
| | - Josien C. van Wolfswinkel
- Department of Molecular Cellular and Developmental Biology, Yale University, New Haven, CT 06511, USA
- Yale Stem Cell Center, Yale School of Medicine, New Haven, CT 06511, USA
- Yale Center for RNA Science and Medicine, Yale School of Medicine, New Haven, CT 06511, USA
| |
Collapse
|
34
|
Reinar WB, Tørresen OK, Nederbragt AJ, Matschiner M, Jentoft S, Jakobsen KS. Teleost genomic repeat landscapes in light of diversification rates and ecology. Mob DNA 2023; 14:14. [PMID: 37789366 PMCID: PMC10546739 DOI: 10.1186/s13100-023-00302-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 09/20/2023] [Indexed: 10/05/2023] Open
Abstract
Repetitive DNA make up a considerable fraction of most eukaryotic genomes. In fish, transposable element (TE) activity has coincided with rapid species diversification. Here, we annotated the repetitive content in 100 genome assemblies, covering the major branches of the diverse lineage of teleost fish. We investigated if TE content correlates with family level net diversification rates and found support for a weak negative correlation. Further, we demonstrated that TE proportion correlates with genome size, but not to the proportion of short tandem repeats (STRs), which implies independent evolutionary paths. Marine and freshwater fish had large differences in STR content, with the most extreme propagation detected in the genomes of codfish species and Atlantic herring. Such a high density of STRs is likely to increase the mutational load, which we propose could be counterbalanced by high fecundity as seen in codfishes and herring.
Collapse
Affiliation(s)
| | - Ole K Tørresen
- Department of Biosciences, University of Oslo, Oslo, Norway
| | - Alexander J Nederbragt
- Department of Biosciences, University of Oslo, Oslo, Norway
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Michael Matschiner
- Department of Biosciences, University of Oslo, Oslo, Norway
- University of Oslo, Natural History Museum, Oslo, Norway
| | - Sissel Jentoft
- Department of Biosciences, University of Oslo, Oslo, Norway
| | | |
Collapse
|
35
|
Klashami ZN, Mostafavi A, Roudbordeh MG, Abbasi A, Ebrahimi P, Asadi M, Amoli MM. Investigating the relationship between the VNTR variant of the interleukin-1 receptor antagonist gene and coronary in-stent restenosis. Mol Biol Rep 2023; 50:8575-8587. [PMID: 37644369 DOI: 10.1007/s11033-023-08759-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 08/16/2023] [Indexed: 08/31/2023]
Abstract
OBJECTIVE This study aimed to examine the association between the interleukin-1 receptor antagonist gene (IL-1RN) and coronary in-stent restenosis (ISR) through the analysis of the VNTR variant based on the previously reported results. MATERIALS AND METHODS The samples were classified into two clearly defined groups: the case group, which comprised 45 patients diagnosed with in-stent restenosis (ISR+), and the control group, which included 60 patients without ISR (ISR-). Polymerase chain reaction (PCR) was performed to examine the 86-bp VNTR variant of the IL-1RN gene. RESULTS In the analysis of six identified groups consisting of variant alleles of 86 base pairs of VNTR of the IL-1RN gene statistically significant difference was observed for the presence of IL1RN*2 allele between cases and controls (p = 0.04, OR; 0.045). CONCLUSION Individuals with allele 2 of the IL-1Ra gene may be more predisposed to ISR. This could be due to an imbalance between IL-1Ra and IL-1β which is crucial in preventing the initiation or advancement of inflammatory diseases in specific organs. The observed phenomenon can be characterized by increased production of IL-1β and potential reduction of IL-1Ra as a result of functional VNTR variation in IL-RN gene.
Collapse
Affiliation(s)
- Zeynab Nickhah Klashami
- Metabolic Disorders Research Centre, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Atoosa Mostafavi
- Department of Cardiology, Faculty of Medicine, Tehran university of medical sciences, Tehran, Iran
| | | | - Ali Abbasi
- Department of Cardiology, Faculty of Medicine, Tehran university of medical sciences, Tehran, Iran
| | - Pirooz Ebrahimi
- Department of Pharmacy, Health and Nutritional Sciences, University of Calabria, Arcavacata, Italy
| | - Mojgan Asadi
- Endocrinology and Metabolism Research Center, Endocrinology and Metabolism Clinical Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Mahsa M Amoli
- Metabolic Disorders Research Centre, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
36
|
Kuhlman TE. Repetitive DNA regulates gene expression. Science 2023; 381:1289-1290. [PMID: 37733865 DOI: 10.1126/science.adk2055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/23/2023]
Abstract
Short tandem repeats affect gene expression by binding regulatory proteins.
Collapse
Affiliation(s)
- Thomas E Kuhlman
- Department of Physics and Astronomy, University of California, Riverside, Riverside, CA, USA
| |
Collapse
|
37
|
Horton CA, Alexandari AM, Hayes MGB, Marklund E, Schaepe JM, Aditham AK, Shah N, Suzuki PH, Shrikumar A, Afek A, Greenleaf WJ, Gordân R, Zeitlinger J, Kundaje A, Fordyce PM. Short tandem repeats bind transcription factors to tune eukaryotic gene expression. Science 2023; 381:eadd1250. [PMID: 37733848 DOI: 10.1126/science.add1250] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 07/26/2023] [Indexed: 09/23/2023]
Abstract
Short tandem repeats (STRs) are enriched in eukaryotic cis-regulatory elements and alter gene expression, yet how they regulate transcription remains unknown. We found that STRs modulate transcription factor (TF)-DNA affinities and apparent on-rates by about 70-fold by directly binding TF DNA-binding domains, with energetic impacts exceeding many consensus motif mutations. STRs maximize the number of weakly preferred microstates near target sites, thereby increasing TF density, with impacts well predicted by statistical mechanics. Confirming that STRs also affect TF binding in cells, neural networks trained only on in vivo occupancies predicted effects identical to those observed in vitro. Approximately 90% of TFs preferentially bound STRs that need not resemble known motifs, providing a cis-regulatory mechanism to target TFs to genomic sites.
Collapse
Affiliation(s)
- Connor A Horton
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Amr M Alexandari
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Michael G B Hayes
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Emil Marklund
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Julia M Schaepe
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Arjun K Aditham
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- ChEM-H Institute, Stanford University, Stanford, CA 94305, USA
| | - Nilay Shah
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Peter H Suzuki
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Ariel Afek
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | | | - Raluca Gordân
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Computer Science, Duke University, Durham, NC 27708, USA
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC 27710, USA
| | - Julia Zeitlinger
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
- The University of Kansas Medical Center, Kansas City, KS 66103, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Polly M Fordyce
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- ChEM-H Institute, Stanford University, Stanford, CA 94305, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94110, USA
| |
Collapse
|
38
|
Liao X, Zhu W, Zhou J, Li H, Xu X, Zhang B, Gao X. Repetitive DNA sequence detection and its role in the human genome. Commun Biol 2023; 6:954. [PMID: 37726397 PMCID: PMC10509279 DOI: 10.1038/s42003-023-05322-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 09/04/2023] [Indexed: 09/21/2023] Open
Abstract
Repetitive DNA sequences playing critical roles in driving evolution, inducing variation, and regulating gene expression. In this review, we summarized the definition, arrangement, and structural characteristics of repeats. Besides, we introduced diverse biological functions of repeats and reviewed existing methods for automatic repeat detection, classification, and masking. Finally, we analyzed the type, structure, and regulation of repeats in the human genome and their role in the induction of complex diseases. We believe that this review will facilitate a comprehensive understanding of repeats and provide guidance for repeat annotation and in-depth exploration of its association with human diseases.
Collapse
Affiliation(s)
- Xingyu Liao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Wufei Zhu
- Department of Endocrinology, Yichang Central People's Hospital, The First College of Clinical Medical Science, China Three Gorges University, 443000, Yichang, P.R. China
| | - Juexiao Zhou
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Haoyang Li
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xiaopeng Xu
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Bin Zhang
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia.
| |
Collapse
|
39
|
Ichikawa K, Kawahara R, Asano T, Morishita S. A landscape of complex tandem repeats within individual human genomes. Nat Commun 2023; 14:5530. [PMID: 37709751 PMCID: PMC10502081 DOI: 10.1038/s41467-023-41262-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 08/28/2023] [Indexed: 09/16/2023] Open
Abstract
Markedly expanded tandem repeats (TRs) have been correlated with ~60 diseases. TR diversity has been considered a clue toward understanding missing heritability. However, haplotype-resolved long TRs remain mostly hidden or blacked out because their complex structures (TRs composed of various units and minisatellites containing >10-bp units) make them difficult to determine accurately with existing methods. Here, using a high-precision algorithm to determine complex TR structures from long, accurate reads of PacBio HiFi, an investigation of 270 Japanese control samples yields several genome-wide findings. Approximately 322,000 TRs are difficult to impute from the surrounding single-nucleotide variants. Greater genetic divergence of TR loci is significantly correlated with more events of younger replication slippage. Complex TRs are more abundant than single-unit TRs, and a tendency for complex TRs to consist of <10-bp units and single-unit TRs to be minisatellites is statistically significant at loci with ≥500-bp TRs. Of note, 8909 loci with extended TRs (>100b longer than the mode) contain several known disease-associated TRs and are considered candidates for association with disorders. Overall, complex TRs and minisatellites are found to be abundant and diverse, even in genetically small Japanese populations, yielding insights into the landscape of long TRs.
Collapse
Affiliation(s)
- Kazuki Ichikawa
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan
| | - Riki Kawahara
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan
| | - Takeshi Asano
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan
| | - Shinichi Morishita
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan.
| |
Collapse
|
40
|
Herbert A. Flipons and small RNAs accentuate the asymmetries of pervasive transcription by the reset and sequence-specific microcoding of promoter conformation. J Biol Chem 2023; 299:105140. [PMID: 37544644 PMCID: PMC10474125 DOI: 10.1016/j.jbc.2023.105140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 07/25/2023] [Accepted: 07/31/2023] [Indexed: 08/08/2023] Open
Abstract
The role of alternate DNA conformations such as Z-DNA in the regulation of transcription is currently underappreciated. These structures are encoded by sequences called flipons, many of which are enriched in promoter and enhancer regions. Through a change in their conformation, flipons provide a tunable mechanism to mechanically reset promoters for the next round of transcription. They act as actuators that capture and release energy to ensure that the turnover of the proteins at promoters is optimized to cell state. Likewise, the single-stranded DNA formed as flipons cycle facilitates the docking of RNAs that are able to microcode promoter conformations and canalize the pervasive transcription commonly observed in metazoan genomes. The strand-specific nature of the interaction between RNA and DNA likely accounts for the known asymmetry of epigenetic marks present on the histone tetramers that pair to form nucleosomes. The role of these supercoil-dependent processes in promoter choice and transcriptional interference is reviewed. The evolutionary implications are examined: the resilience and canalization of flipon-dependent gene regulation is contrasted with the rapid adaptation enabled by the spread of flipon repeats throughout the genome. Overall, the current findings underscore the important role of flipons in modulating the readout of genetic information and how little we know about their biology.
Collapse
Affiliation(s)
- Alan Herbert
- Discovery Division, InsideOutBio, Charlestown, Massachusetts, USA.
| |
Collapse
|
41
|
Lutz MW, Chiba-Falek O. Bioinformatics pipeline to guide post-GWAS studies in Alzheimer's: A new catalogue of disease candidate short structural variants. Alzheimers Dement 2023; 19:4094-4109. [PMID: 37253165 PMCID: PMC10524333 DOI: 10.1002/alz.13168] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 04/27/2023] [Accepted: 05/08/2023] [Indexed: 06/01/2023]
Abstract
BACKGROUND Short structural variants (SSVs), including insertions/deletions (indels), are common in the human genome and impact disease risk. The role of SSVs in late-onset Alzheimer's disease (LOAD) has been understudied. In this study, we developed a bioinformatics pipeline of SSVs within LOAD-genome-wide association study (GWAS) regions to prioritize regulatory SSVs based on the strength of their predicted effect on transcription factor (TF) binding sites. METHODS The pipeline utilized publicly available functional genomics data sources including candidate cis-regulatory elements (cCREs) from ENCODE and single-nucleus (sn)RNA-seq data from LOAD patient samples. RESULTS We catalogued 1581 SSVs in candidate cCREs in LOAD GWAS regions that disrupted 737 TF sites. That included SSVs that disrupted the binding of RUNX3, SPI1, and SMAD3, within the APOE-TOMM40, SPI1, and MS4A6A LOAD regions. CONCLUSIONS The pipeline developed here prioritized non-coding SSVs in cCREs and characterized their putative effects on TF binding. The approach integrates multiomics datasets for validation experiments using disease models.
Collapse
Affiliation(s)
- Michael W. Lutz
- Division of Translational Brain Sciences, Department of Neurology, Duke University Medical Center, Durham, NC 27710, USA
| | - Ornit Chiba-Falek
- Division of Translational Brain Sciences, Department of Neurology, Duke University Medical Center, Durham, NC 27710, USA
- Center for Genomic and Computational Biology, Duke University Medical Center, Durham, NC 27710, USA
| |
Collapse
|
42
|
Montanucci L, Lewis-Smith D, Collins RL, Niestroj LM, Parthasarathy S, Xian J, Ganesan S, Macnee M, Brünger T, Thomas RH, Talkowski M, Helbig I, Leu C, Lal D. Genome-wide identification and phenotypic characterization of seizure-associated copy number variations in 741,075 individuals. Nat Commun 2023; 14:4392. [PMID: 37474567 PMCID: PMC10359300 DOI: 10.1038/s41467-023-39539-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 06/16/2023] [Indexed: 07/22/2023] Open
Abstract
Copy number variants (CNV) are established risk factors for neurodevelopmental disorders with seizures or epilepsy. With the hypothesis that seizure disorders share genetic risk factors, we pooled CNV data from 10,590 individuals with seizure disorders, 16,109 individuals with clinically validated epilepsy, and 492,324 population controls and identified 25 genome-wide significant loci, 22 of which are novel for seizure disorders, such as deletions at 1p36.33, 1q44, 2p21-p16.3, 3q29, 8p23.3-p23.2, 9p24.3, 10q26.3, 15q11.2, 15q12-q13.1, 16p12.2, 17q21.31, duplications at 2q13, 9q34.3, 16p13.3, 17q12, 19p13.3, 20q13.33, and reciprocal CNVs at 16p11.2, and 22q11.21. Using genetic data from additional 248,751 individuals with 23 neuropsychiatric phenotypes, we explored the pleiotropy of these 25 loci. Finally, in a subset of individuals with epilepsy and detailed clinical data available, we performed phenome-wide association analyses between individual CNVs and clinical annotations categorized through the Human Phenotype Ontology (HPO). For six CNVs, we identified 19 significant associations with specific HPO terms and generated, for all CNVs, phenotype signatures across 17 clinical categories relevant for epileptologists. This is the most comprehensive investigation of CNVs in epilepsy and related seizure disorders, with potential implications for clinical practice.
Collapse
Affiliation(s)
- Ludovica Montanucci
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, USA
| | - David Lewis-Smith
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, UK
- Clinical Neurosciences, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK
- The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Ryan L Collins
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T.) and Harvard, Cambridge, USA
| | | | - Shridhar Parthasarathy
- The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Julie Xian
- The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Shiva Ganesan
- The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Marie Macnee
- Cologne Center for Genomics, University of Cologne, Cologne, Germany
| | - Tobias Brünger
- Cologne Center for Genomics, University of Cologne, Cologne, Germany
| | - Rhys H Thomas
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, UK
- Clinical Neurosciences, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK
| | - Michael Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T.) and Harvard, Cambridge, USA
| | - Ingo Helbig
- The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Neurology, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA
| | - Costin Leu
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, USA.
- Department of Clinical and Experimental Epilepsy, Institute of Neurology, University College London, London, UK.
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and M.I.T, Cambridge, MA, USA.
- Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH, US.
| | - Dennis Lal
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, USA.
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T.) and Harvard, Cambridge, USA.
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and M.I.T, Cambridge, MA, USA.
- Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH, US.
| |
Collapse
|
43
|
Haerter CAG, Blanco DR, Traldi JB, Feldberg E, Margarido VP, Lui RL. Are scattered microsatellites weak chromosomal markers? Guided mapping reveals new insights into Trachelyopterus (Siluriformes: Auchenipteridae) diversity. PLoS One 2023; 18:e0285388. [PMID: 37310952 DOI: 10.1371/journal.pone.0285388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 04/22/2023] [Indexed: 06/15/2023] Open
Abstract
The scattered distribution pattern of microsatellites is a challenging problem in fish cytogenetics. This type of array hinders the identification of useful patterns and the comparison between species, often resulting in over-limited interpretations that only label it as "scattered" or "widely distributed". However, several studies have shown that the distribution pattern of microsatellites is non-random. Thus, here we tested whether a scattered microsatellite could have distinct distribution patterns on homeologous chromosomes of closely related species. The clustered sites of 18S and 5S rDNA, U2 snRNA and H3/H4 histone genes were used as a guide to compare the (GATA)n microsatellite distribution pattern on the homeologous chromosomes of six Trachelyopterus species: T. coriaceus and Trachelyopterus aff. galeatus from the Araguaia River basin; T. striatulus, T. galeatus and T. porosus from the Amazonas River basin; and Trachelyopterus aff. coriaceus from the Paraguay River basin. Most species had similar patterns of the (GATA)n microsatellite in the histone genes and 5S rDNA carriers. However, we have found a chromosomal polymorphism of the (GATA)n sequence in the 18S rDNA carriers of Trachelyopterus galeatus, which is in Hard-Weinberg equilibrium and possibly originated through amplification events; and a chromosome polymorphism in Trachelyopterus aff. galeatus, which combined with an inversion polymorphism of the U2 snRNA in the same chromosome pair resulted in six possible cytotypes, which are in Hardy-Weinberg disequilibrium. Therefore, comparing the distribution pattern on homeologous chromosomes across the species, using gene clusters as a guide to identify it, seems to be an effective way to further the analysis of scattered microsatellites in fish cytogenetics.
Collapse
Affiliation(s)
| | | | - Josiane Baccarin Traldi
- Departamento de Genética, Instituto de Ciências Biológicas, Universidade Federal do Amazonas, Manaus, Brasil
| | | | - Vladimir Pavan Margarido
- Universidade Estadual do Oeste do Paraná, Centro de Ciências Biológicas e da Saúde, Cascavel, Paraná, Brasil
| | - Roberto Laridondo Lui
- Universidade Estadual do Oeste do Paraná, Centro de Ciências Biológicas e da Saúde, Cascavel, Paraná, Brasil
| |
Collapse
|
44
|
Hussain S, Sadouni N, van Essen D, Dao LTM, Ferré Q, Charbonnier G, Torres M, Gallardo F, Lecellier CH, Sexton T, Saccani S, Spicuglia S. Short tandem repeats are important contributors to silencer elements in T cells. Nucleic Acids Res 2023; 51:4845-4866. [PMID: 36929452 PMCID: PMC10250210 DOI: 10.1093/nar/gkad187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 02/26/2023] [Accepted: 03/15/2023] [Indexed: 03/18/2023] Open
Abstract
The action of cis-regulatory elements with either activation or repression functions underpins the precise regulation of gene expression during normal development and cell differentiation. Gene activation by the combined activities of promoters and distal enhancers has been extensively studied in normal and pathological contexts. In sharp contrast, gene repression by cis-acting silencers, defined as genetic elements that negatively regulate gene transcription in a position-independent fashion, is less well understood. Here, we repurpose the STARR-seq approach as a novel high-throughput reporter strategy to quantitatively assess silencer activity in mammals. We assessed silencer activity from DNase hypersensitive I sites in a mouse T cell line. Identified silencers were associated with either repressive or active chromatin marks and enriched for binding motifs of known transcriptional repressors. CRISPR-mediated genomic deletions validated the repressive function of distinct silencers involved in the repression of non-T cell genes and genes regulated during T cell differentiation. Finally, we unravel an association of silencer activity with short tandem repeats, highlighting the role of repetitive elements in silencer activity. Our results provide a general strategy for genome-wide identification and characterization of silencer elements.
Collapse
Affiliation(s)
- Saadat Hussain
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| | - Nori Sadouni
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| | - Dominic van Essen
- Institute for Research on Cancer and Ageing, IRCAN, 06107 Nice, France
| | - Lan T M Dao
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| | - Quentin Ferré
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| | - Guillaume Charbonnier
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| | - Magali Torres
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| | - Frederic Gallardo
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| | - Charles-Henri Lecellier
- Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, Montpellier, France
- LIRMM, University of Montpellier, CNRS, Montpellier, France
| | - Tom Sexton
- Institut de Génétique et de Biologie Moléculaire et Cellulaire – IGBMC (CNRS UMR 7104, INSERM U1258, Université de Strasbourg), 67404 Illkirch, France
| | - Simona Saccani
- Institute for Research on Cancer and Ageing, IRCAN, 06107 Nice, France
| | - Salvatore Spicuglia
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| |
Collapse
|
45
|
Alotaibi NM, Saeed M, Alshammari N, Alabdallah NM, Mahfooz S. Comparative genomics reveals the presence of simple sequence repeats in genes related to virulence in plant pathogenic Pythium ultimum and Pythium vexans. Arch Microbiol 2023; 205:256. [PMID: 37270724 DOI: 10.1007/s00203-023-03595-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 05/08/2023] [Accepted: 05/21/2023] [Indexed: 06/05/2023]
Abstract
In this study, we evaluated the occurrence, relative abundance (RA), and density (RD) of simple sequence repeats (SSRs) in the complete genome and transcriptomic sequences of the plant pathogenic species of Pythium to acquire a better knowledge of their genome structure and evolution. Among the species, P. ultimum had the highest RA and RD of SSRs in the genomic sequences, whereas P. vexans had the highest RA and RD in the transcriptomic sequences. The genomic and transcriptomic sequences of P. aphanidermatum showed the lowest RA and RD of SSRs. Trinucleotide SSRs were the most prevalent class in both genomic and transcriptomic sequences, while dinucleotide SSRs were the least prevalent. The G + C content of the transcriptomic sequences was found to be positively correlated with the number (r = 0.601) and RA (r = 0.710) of SSRs. A motif conservation study revealed the highest number of unique motifs in P. vexans (9.9%). Overall, a low conservation of motifs was observed among the species (25.9%). A gene enrichment study revealed P. vexans and P. ultimum carry SSRs in their genes that are directly connected to virulence, whereas the remaining two species, P. aphanidermatum and P. arrhenomanes, harbour SSRs in genes involved in transcription, translation, and ATP binding. In an effort to enhance the genomic resources, a total of 11,002 primers from the transcribed regions were designed for the pathogenic Pythium species. Furthermore, the unique motifs identified in this work could be employed as molecular probes for species identification.
Collapse
Affiliation(s)
- Nahaa M Alotaibi
- Department of Biology, College of Science, Princess Nourah Bint Abdulrahman University, P.O. Box 84428, Riyadh, 11671, Saudi Arabia
| | - Mohd Saeed
- Department of Biology, College of Science, University of Hail, Hail, 2440, Saudi Arabia
| | - Nawaf Alshammari
- Department of Biology, College of Science, University of Hail, Hail, 2440, Saudi Arabia
| | - Nadiyah M Alabdallah
- Department of Biology, College of Science, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam, 31441, Saudi Arabia
- Basic and Applied Scientific Research Centre, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam, 31441, Saudi Arabia
| | - Sahil Mahfooz
- Department of Biotechnology, V.B.S. Purvanchal University, Jaunpur, Uttar Pradesh, 222003, India.
- , The Academic Editors, Saryu Enclave, Awadh Vikas Yojna, Lucknow, 226002, India.
| |
Collapse
|
46
|
Cano AV, Gitschlag BL, Rozhoňová H, Stoltzfus A, McCandlish DM, Payne JL. Mutation bias and the predictability of evolution. Philos Trans R Soc Lond B Biol Sci 2023; 378:20220055. [PMID: 37004719 PMCID: PMC10067271 DOI: 10.1098/rstb.2022.0055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2023] Open
Abstract
Predicting evolutionary outcomes is an important research goal in a diversity of contexts. The focus of evolutionary forecasting is usually on adaptive processes, and efforts to improve prediction typically focus on selection. However, adaptive processes often rely on new mutations, which can be strongly influenced by predictable biases in mutation. Here, we provide an overview of existing theory and evidence for such mutation-biased adaptation and consider the implications of these results for the problem of prediction, in regard to topics such as the evolution of infectious diseases, resistance to biochemical agents, as well as cancer and other kinds of somatic evolution. We argue that empirical knowledge of mutational biases is likely to improve in the near future, and that this knowledge is readily applicable to the challenges of short-term prediction. This article is part of the theme issue 'Interdisciplinary approaches to predicting evolutionary biology'.
Collapse
Affiliation(s)
- Alejandro V Cano
- Institute of Integrative Biology, ETH Zurich, 8092 Zurich, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Bryan L Gitschlag
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Hana Rozhoňová
- Institute of Integrative Biology, ETH Zurich, 8092 Zurich, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Arlin Stoltzfus
- Office of Data and Informatics, Material Measurement Laboratory, National Institute of Standards and Technology, Rockville, MD 20899, USA
- Institute for Bioscience and Biotechnology Research, Rockville, MD 20850, USA
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Joshua L Payne
- Institute of Integrative Biology, ETH Zurich, 8092 Zurich, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
47
|
Weisburd B, Tiao G, Rehm HL. Insights from a genome-wide truth set of tandem repeat variation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.05.539588. [PMID: 37214979 PMCID: PMC10197592 DOI: 10.1101/2023.05.05.539588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Tools for genotyping tandem repeats (TRs) from short read sequencing data have improved significantly over the past decade. Extensive comparisons of these tools to gold standard diagnostic methods like RP-PCR have confirmed their accuracy for tens to hundreds of well-studied loci. However, a scarcity of high-quality orthogonal truth data limited our ability to measure tool accuracy for the millions of other loci throughout the genome. To address this, we developed a TR truth set based on the Synthetic Diploid Benchmark (SynDip). By identifying the subset of insertions and deletions that represent TR expansions or contractions with motifs between 2 and 50 base pairs, we obtained accurate genotypes for 139,795 pure and 6,845 interrupted repeats in a single diploid sample. Our approach did not require running existing genotyping tools on short read or long read sequencing data and provided an alternative, more accurate view of tandem repeat variation. We applied this truth set to compare the strengths and weaknesses of widely-used tools for genotyping TRs, evaluated the completeness of existing genome-wide TR catalogs, and explored the properties of tandem repeat variation throughout the genome. We found that, without filtering, ExpansionHunter had higher accuracy than GangSTR and HipSTR over a wide range of motifs and allele sizes. Also, when errors in allele size occurred, ExpansionHunter tended to overestimate expansion sizes, while GangSTR tended to underestimate them. Additionally, we saw that widely-used TR catalogs miss between 16% and 41% of variant loci in the truth set. These results suggest that genome-wide analyses would benefit from genotyping a larger set of loci as well as further tool development that builds on the strengths of current algorithms. To that end, we developed a new catalog of 2.8 million loci that captures 95% of variant loci in the truth set, and created a modified version of ExpansionHunter that runs 2 to 3x faster than the original while producing the same output.
Collapse
Affiliation(s)
- Ben Weisburd
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Grace Tiao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Heidi L. Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
48
|
Shi Y, Niu Y, Zhang P, Luo H, Liu S, Zhang S, Wang J, Li Y, Liu X, Song T, Xu T, He S. Characterization of genome-wide STR variation in 6487 human genomes. Nat Commun 2023; 14:2092. [PMID: 37045857 PMCID: PMC10097659 DOI: 10.1038/s41467-023-37690-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 03/27/2023] [Indexed: 04/14/2023] Open
Abstract
Short tandem repeats (STRs) are abundant and highly mutagenic in the human genome. Many STR loci have been associated with a range of human genetic disorders. However, most population-scale studies on STR variation in humans have focused on European ancestry cohorts or are limited by sequencing depth. Here, we depicted a comprehensive map of 366,013 polymorphic STRs (pSTRs) constructed from 6487 deeply sequenced genomes, comprising 3983 Chinese samples (~31.5x, NyuWa) and 2504 samples from the 1000 Genomes Project (~33.3x, 1KGP). We found that STR mutations were affected by motif length, chromosome context and epigenetic features. We identified 3273 and 1117 pSTRs whose repeat numbers were associated with gene expression and 3'UTR alternative polyadenylation, respectively. We also implemented population analysis, investigated population differentiated signatures, and genotyped 60 known disease-causing STRs. Overall, this study further extends the scale of STR variation in humans and propels our understanding of the semantics of STRs.
Collapse
Affiliation(s)
- Yirong Shi
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yiwei Niu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Peng Zhang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Huaxia Luo
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Shuai Liu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Sijia Zhang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jiajia Wang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Yanyan Li
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Xinyue Liu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Tingrui Song
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Tao Xu
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, 250117, Shandong, China.
| | - Shunmin He
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
49
|
Lu TY, Smaruj PN, Fudenberg G, Mancuso N, Chaisson MJP. The motif composition of variable number tandem repeats impacts gene expression. Genome Res 2023; 33:511-524. [PMID: 37037626 PMCID: PMC10234305 DOI: 10.1101/gr.276768.122] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 03/29/2023] [Indexed: 04/12/2023]
Abstract
Understanding the impact of DNA variation on human traits is a fundamental question in human genetics. Variable number tandem repeats (VNTRs) make up ∼3% of the human genome but are often excluded from association analysis owing to poor read mappability or divergent repeat content. Although methods exist to estimate VNTR length from short-read data, it is known that VNTRs vary in both length and repeat (motif) composition. Here, we use a repeat-pangenome graph (RPGG) constructed on 35 haplotype-resolved assemblies to detect variation in both VNTR length and repeat composition. We align population-scale data from the Genotype-Tissue Expression (GTEx) Consortium to examine how variations in sequence composition may be linked to expression, including cases independent of overall VNTR length. We find that 9422 out of 39,125 VNTRs are associated with nearby gene expression through motif variations, of which only 23.4% are accessible from length. Fine-mapping identifies 174 genes to be likely driven by variation in certain VNTR motifs and not overall length. We highlight two genes, CACNA1C and RNF213, that have expression associated with motif variation, showing the utility of RPGG analysis as a new approach for trait association in multiallelic and highly variable loci.
Collapse
Affiliation(s)
- Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| | - Paulina N Smaruj
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| | - Geoffrey Fudenberg
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| | - Nicholas Mancuso
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, California 90033, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, USA;
- The Genomic and Epigenomic Regulation Program, USC Norris Cancer Center, University of Southern California, Los Angeles, California 90033, USA
| |
Collapse
|
50
|
Wright SE, Todd PK. Native functions of short tandem repeats. eLife 2023; 12:e84043. [PMID: 36940239 PMCID: PMC10027321 DOI: 10.7554/elife.84043] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 03/08/2023] [Indexed: 03/21/2023] Open
Abstract
Over a third of the human genome is comprised of repetitive sequences, including more than a million short tandem repeats (STRs). While studies of the pathologic consequences of repeat expansions that cause syndromic human diseases are extensive, the potential native functions of STRs are often ignored. Here, we summarize a growing body of research into the normal biological functions for repetitive elements across the genome, with a particular focus on the roles of STRs in regulating gene expression. We propose reconceptualizing the pathogenic consequences of repeat expansions as aberrancies in normal gene regulation. From this altered viewpoint, we predict that future work will reveal broader roles for STRs in neuronal function and as risk alleles for more common human neurological diseases.
Collapse
Affiliation(s)
- Shannon E Wright
- Department of Neurology, University of Michigan–Ann ArborAnn ArborUnited States
- Neuroscience Graduate Program, University of Michigan–Ann ArborAnn ArborUnited States
- Department of Neuroscience, Picower InstituteCambridgeUnited States
| | - Peter K Todd
- Department of Neurology, University of Michigan–Ann ArborAnn ArborUnited States
- VA Ann Arbor Healthcare SystemAnn ArborUnited States
| |
Collapse
|