1
|
Fedele E, Wetton JH, Jobling MA. Sequencing the orthologs of human autosomal forensic short tandem repeats provides individual- and species-level identification in African great apes. BMC Ecol Evol 2024; 24:134. [PMID: 39482599 PMCID: PMC11526555 DOI: 10.1186/s12862-024-02324-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 10/17/2024] [Indexed: 11/03/2024] Open
Abstract
BACKGROUND Great apes are a global conservation concern, with anthropogenic pressures threatening their survival. Genetic analysis can be used to assess the effects of reduced population sizes and the effectiveness of conservation measures. In humans, autosomal short tandem repeats (aSTRs) are widely used in population genetics and for forensic individual identification and kinship testing. Traditionally, genotyping is length-based via capillary electrophoresis (CE), but there is an increasing move to direct analysis by massively parallel sequencing (MPS). An example is the ForenSeq DNA Signature Prep Kit, which amplifies multiple loci including 27 aSTRs, prior to sequencing via Illumina technology. Here we assess the applicability of this human-based kit in African great apes. We ask whether cross-species genotyping of the orthologs of these loci can provide both individual and (sub)species identification. RESULTS The ForenSeq kit was used to amplify and sequence aSTRs in 52 individuals (14 chimpanzees; 4 bonobos; 16 western lowland, 6 eastern lowland, and 12 mountain gorillas). The orthologs of 24/27 human aSTRs amplified across species, and a core set of thirteen loci could be genotyped in all individuals. Genotypes were individually and (sub)species identifying. Both allelic diversity and the power to discriminate (sub)species were greater when considering STR sequences rather than allele lengths. Comparing human and African great-ape STR sequences with an orangutan outgroup showed general conservation of repeat types and allele size ranges. Variation in repeat array structures and a weak relationship with the known phylogeny suggests stochastic origins of mutations giving rise to diverse imperfect repeat arrays. Interruptions within long repeat arrays in African great apes do not appear to reduce allelic diversity. CONCLUSIONS Orthologs of most human aSTRs in the ForenSeq DNA Signature Prep Kit can be analysed in African great apes. Primer redesign would reduce observed variability in amplification across some loci. MPS of the orthologs of human loci provides better resolution for both individual and (sub)species identification in great apes than standard CE-based approaches, and has the further advantage that there is no need to limit the number and size ranges of analysed loci.
Collapse
Affiliation(s)
- Ettore Fedele
- Department of Genetics, Genomics & Cancer Sciences, University of Leicester, University Road, Leicester, LE1 7RH, UK
- Current address: Faculty of Science & Engineering, Swansea University, Swansea, UK
| | - Jon H Wetton
- Department of Genetics, Genomics & Cancer Sciences, University of Leicester, University Road, Leicester, LE1 7RH, UK.
| | - Mark A Jobling
- Department of Genetics, Genomics & Cancer Sciences, University of Leicester, University Road, Leicester, LE1 7RH, UK.
| |
Collapse
|
2
|
Uguen K, Michaud JL, Génin E. Short Tandem Repeats in the era of next-generation sequencing: from historical loci to population databases. Eur J Hum Genet 2024; 32:1037-1044. [PMID: 38982300 PMCID: PMC11369099 DOI: 10.1038/s41431-024-01666-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 06/20/2024] [Accepted: 06/27/2024] [Indexed: 07/11/2024] Open
Abstract
In this study, we explore the landscape of short tandem repeats (STRs) within the human genome through the lens of evolving technologies to detect genomic variations. STRs, which encompass approximately 3% of our genomic DNA, are crucial for understanding human genetic diversity, disease mechanisms, and evolutionary biology. The advent of high-throughput sequencing methods has revolutionized our ability to accurately map and analyze STRs, highlighting their significance in genetic disorders, forensic science, and population genetics. We review the current available methodologies for STR analysis, the challenges in interpreting STR variations across different populations, and the implications of STRs in medical genetics. Our findings underscore the urgent need for comprehensive STR databases that reflect the genetic diversity of global populations, facilitating the interpretation of STR data in clinical diagnostics, genetic research, and forensic applications. This work sets the stage for future studies aimed at harnessing STR variations to elucidate complex genetic traits and diseases, reinforcing the importance of integrating STRs into genetic research and clinical practice.
Collapse
Affiliation(s)
- Kevin Uguen
- Univ Brest, Inserm, EFS, UMR 1078, GGB, Brest, France.
- Service de Génétique Médicale et Biologie de la Reproduction, CHU de Brest, Brest, France.
- CHU Sainte-Justine Azrieli Research Centre, Montréal, QC, Canada.
| | - Jacques L Michaud
- CHU Sainte-Justine Azrieli Research Centre, Montréal, QC, Canada
- Department of Pediatrics, Université de Montréal, Montréal, QC, Canada
- Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
| | | |
Collapse
|
3
|
Loh CA, Shields DA, Schwing A, Evrony GD. High-fidelity, large-scale targeted profiling of microsatellites. Genome Res 2024; 34:1008-1026. [PMID: 39013593 PMCID: PMC11368184 DOI: 10.1101/gr.278785.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 07/11/2024] [Indexed: 07/18/2024]
Abstract
Microsatellites are highly mutable sequences that can serve as markers for relationships among individuals or cells within a population. The accuracy and resolution of reconstructing these relationships depends on the fidelity of microsatellite profiling and the number of microsatellites profiled. However, current methods for targeted profiling of microsatellites incur significant "stutter" artifacts that interfere with accurate genotyping, and sequencing costs preclude whole-genome microsatellite profiling of a large number of samples. We developed a novel method for accurate and cost-effective targeted profiling of a panel of more than 150,000 microsatellites per sample, along with a computational tool for designing large-scale microsatellite panels. Our method addresses the greatest challenge for microsatellite profiling-"stutter" artifacts-with a low-temperature hybridization capture that significantly reduces these artifacts. We also developed a computational tool for accurate genotyping of the resulting microsatellite sequencing data that uses an ensemble approach integrating three microsatellite genotyping tools, which we optimize by analysis of de novo microsatellite mutations in human trios. Altogether, our suite of experimental and computational tools enables high-fidelity, large-scale profiling of microsatellites, which may find utility in diverse applications such as lineage tracing, population genetics, ecology, and forensics.
Collapse
Affiliation(s)
- Caitlin A Loh
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, New York 10016, USA
- Department of Pediatrics, Department of Neuroscience & Physiology, Institute for Systems Genetics, Perlmutter Cancer Center, and Neuroscience Institute, New York University Grossman School of Medicine, New York, New York 10016, USA
| | - Danielle A Shields
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, New York 10016, USA
- Department of Pediatrics, Department of Neuroscience & Physiology, Institute for Systems Genetics, Perlmutter Cancer Center, and Neuroscience Institute, New York University Grossman School of Medicine, New York, New York 10016, USA
| | - Adam Schwing
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, New York 10016, USA
- Department of Pediatrics, Department of Neuroscience & Physiology, Institute for Systems Genetics, Perlmutter Cancer Center, and Neuroscience Institute, New York University Grossman School of Medicine, New York, New York 10016, USA
| | - Gilad D Evrony
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, New York 10016, USA;
- Department of Pediatrics, Department of Neuroscience & Physiology, Institute for Systems Genetics, Perlmutter Cancer Center, and Neuroscience Institute, New York University Grossman School of Medicine, New York, New York 10016, USA
| |
Collapse
|
4
|
Sehgal A, Ziaei Jam H, Shen A, Gymrek M. Genome-wide detection of somatic mosaicism at short tandem repeats. Bioinformatics 2024; 40:btae485. [PMID: 39078205 PMCID: PMC11319640 DOI: 10.1093/bioinformatics/btae485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 06/30/2024] [Accepted: 07/29/2024] [Indexed: 07/31/2024] Open
Abstract
MOTIVATION Somatic mosaicism has been implicated in several developmental disorders, cancers, and other diseases. Short tandem repeats (STRs) consist of repeated sequences of 1-6 bp and comprise >1 million loci in the human genome. Somatic mosaicism at STRs is known to play a key role in the pathogenicity of loci implicated in repeat expansion disorders and is highly prevalent in cancers exhibiting microsatellite instability. While a variety of tools have been developed to genotype germline variation at STRs, a method for systematically identifying mosaic STRs is lacking. RESULTS We introduce prancSTR, a novel method for detecting mosaic STRs from individual high-throughput sequencing datasets. prancSTR is designed to detect loci characterized by a single high-frequency mosaic allele, but can also detect loci with multiple mosaic alleles. Unlike many existing mosaicism detection methods for other variant types, prancSTR does not require a matched control sample as input. We show that prancSTR accurately identifies mosaic STRs in simulated data, demonstrate its feasibility by identifying candidate mosaic STRs in Illumina whole genome sequencing data derived from lymphoblastoid cell lines for individuals sequenced by the 1000 Genomes Project, and evaluate the use of prancSTR on Element and PacBio data. In addition to prancSTR, we present simTR, a novel simulation framework which simulates raw sequencing reads with realistic error profiles at STRs. AVAILABILITY AND IMPLEMENTATION prancSTR and simTR are freely available at https://github.com/gymrek-lab/trtools. Detailed documentation is available at https://trtools.readthedocs.io/.
Collapse
Affiliation(s)
- Aarushi Sehgal
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, United States
| | - Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, United States
| | - Andrew Shen
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, United States
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, United States
- Department of Medicine, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, United States
| |
Collapse
|
5
|
Plavskin Y, de Biase MS, Ziv N, Janská L, Zhu YO, Hall DW, Schwarz RF, Tranchina D, Siegal ML. Spontaneous single-nucleotide substitutions and microsatellite mutations have distinct distributions of fitness effects. PLoS Biol 2024; 22:e3002698. [PMID: 38950062 PMCID: PMC11244821 DOI: 10.1371/journal.pbio.3002698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 07/12/2024] [Accepted: 06/04/2024] [Indexed: 07/03/2024] Open
Abstract
The fitness effects of new mutations determine key properties of evolutionary processes. Beneficial mutations drive evolution, yet selection is also shaped by the frequency of small-effect deleterious mutations, whose combined effect can burden otherwise adaptive lineages and alter evolutionary trajectories and outcomes in clonally evolving organisms such as viruses, microbes, and tumors. The small effect sizes of these important mutations have made accurate measurements of their rates difficult. In microbes, assessing the effect of mutations on growth can be especially instructive, as this complex phenotype is closely linked to fitness in clonally evolving organisms. Here, we perform high-throughput time-lapse microscopy on cells from mutation-accumulation strains to precisely infer the distribution of mutational effects on growth rate in the budding yeast, Saccharomyces cerevisiae. We show that mutational effects on growth rate are overwhelmingly negative, highly skewed towards very small effect sizes, and frequent enough to suggest that deleterious hitchhikers may impose a significant burden on evolving lineages. By using lines that accumulated mutations in either wild-type or slippage repair-defective backgrounds, we further disentangle the effects of 2 common types of mutations, single-nucleotide substitutions and simple sequence repeat indels, and show that they have distinct effects on yeast growth rate. Although the average effect of a simple sequence repeat mutation is very small (approximately 0.3%), many do alter growth rate, implying that this class of frequent mutations has an important evolutionary impact.
Collapse
Affiliation(s)
- Yevgeniy Plavskin
- Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
- Department of Biology, New York University, New York, New York, United States of America
| | - Maria Stella de Biase
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- Humboldt-Universität zu Berlin, Department of Biology, Berlin, Germany
| | - Naomi Ziv
- Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
- Department of Biology, New York University, New York, New York, United States of America
| | - Libuše Janská
- Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
- Department of Biology, New York University, New York, New York, United States of America
| | - Yuan O. Zhu
- Department of Genetics, Stanford University, Stanford, California, United States of America
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - David W. Hall
- Department of Genetics, University of Georgia, Athens, Georgia, United States of America
| | - Roland F. Schwarz
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- Institute for Computational Cancer Biology, Center for Integrated Oncology (CIO), Cancer Research Center Cologne Essen (CCCE), Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
| | - Daniel Tranchina
- Department of Biology, New York University, New York, New York, United States of America
- Courant Math Institute, New York University, New York, New York, United States of America
| | - Mark L. Siegal
- Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
- Department of Biology, New York University, New York, New York, United States of America
| |
Collapse
|
6
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024; 25:460-475. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
7
|
Plavskin Y, de Biase MS, Ziv N, Janská L, Zhu YO, Hall DW, Schwarz RF, Tranchina D, Siegal ML. Spontaneous single-nucleotide substitutions and microsatellite mutations have distinct distributions of fitness effects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.04.547687. [PMID: 37461506 PMCID: PMC10349969 DOI: 10.1101/2023.07.04.547687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/28/2023]
Abstract
The fitness effects of new mutations determine key properties of evolutionary processes. Beneficial mutations drive evolution, yet selection is also shaped by the frequency of small-effect deleterious mutations, whose combined effect can burden otherwise adaptive lineages and alter evolutionary trajectories and outcomes in clonally evolving organisms such as viruses, microbes, and tumors. The small effect sizes of these important mutations have made accurate measurements of their rates difficult. In microbes, assessing the effect of mutations on growth can be especially instructive, as this complex phenotype is closely linked to fitness in clonally evolving organisms. Here, we perform high-throughput time-lapse microscopy on cells from mutation-accumulation strains to precisely infer the distribution of mutational effects on growth rate in the budding yeast, Saccharomyces cerevisiae. We show that mutational effects on growth rate are overwhelmingly negative, highly skewed towards very small effect sizes, and frequent enough to suggest that deleterious hitchhikers may impose a significant burden on evolving lineages. By using lines that accumulated mutations in either wild-type or slippage repair-defective backgrounds, we further disentangle the effects of two common types of mutations, single-nucleotide substitutions and simple sequence repeat indels, and show that they have distinct effects on yeast growth rate. Although the average effect of a simple sequence repeat mutation is very small (~0.3%), many do alter growth rate, implying that this class of frequent mutations has an important evolutionary impact.
Collapse
|
8
|
Cui Y, Ye W, Li JS, Li JJ, Vilain E, Sallam T, Li W. A genome-wide spectrum of tandem repeat expansions in 338,963 humans. Cell 2024; 187:2336-2341.e5. [PMID: 38582080 PMCID: PMC11065452 DOI: 10.1016/j.cell.2024.03.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 01/23/2024] [Accepted: 03/05/2024] [Indexed: 04/08/2024]
Abstract
The Genome Aggregation Database (gnomAD), widely recognized as the gold-standard reference map of human genetic variation, has largely overlooked tandem repeat (TR) expansions, despite the fact that TRs constitute ∼6% of our genome and are linked to over 50 human diseases. Here, we introduce the TR-gnomAD (https://wlcb.oit.uci.edu/TRgnomAD), a biobank-scale reference of 0.86 million TRs derived from 338,963 whole-genome sequencing (WGS) samples of diverse ancestries (39.5% non-European samples). TR-gnomAD offers critical insights into ancestry-specific disease prevalence using disparities in TR unit number frequencies among ancestries. Moreover, TR-gnomAD is able to differentiate between common, presumably benign TR expansions, which are prevalent in TR-gnomAD, from those potentially pathogenic TR expansions, which are found more frequently in disease groups than within TR-gnomAD. Together, TR-gnomAD is an invaluable resource for researchers and physicians to interpret TR expansions in individuals with genetic diseases.
Collapse
Affiliation(s)
- Ya Cui
- Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA 92697, USA.
| | - Wenbin Ye
- Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA 92697, USA
| | - Jason Sheng Li
- Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA 92697, USA
| | - Jingyi Jessica Li
- Department of Statistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Eric Vilain
- Institute for Clinical and Translational Science, University of California, Irvine, Irvine, CA 92697, USA; Department of Pediatrics, University of California, Irvine, Irvine, CA 92697, USA
| | - Tamer Sallam
- Division of Cardiology, Department of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Wei Li
- Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA 92697, USA.
| |
Collapse
|
9
|
Goldberg ME, Noyes MD, Eichler EE, Quinlan AR, Harris K. Effects of parental age and polymer composition on short tandem repeat de novo mutation rates. Genetics 2024; 226:iyae013. [PMID: 38298127 PMCID: PMC10990422 DOI: 10.1093/genetics/iyae013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 08/11/2023] [Accepted: 01/05/2024] [Indexed: 02/02/2024] Open
Abstract
Short tandem repeats (STRs) are hotspots of genomic variability in the human germline because of their high mutation rates, which have long been attributed largely to polymerase slippage during DNA replication. This model suggests that STR mutation rates should scale linearly with a father's age, as progenitor cells continually divide after puberty. In contrast, it suggests that STR mutation rates should not scale with a mother's age at her child's conception, since oocytes spend a mother's reproductive years arrested in meiosis II and undergo a fixed number of cell divisions that are independent of the age at ovulation. Yet, mirroring recent findings, we find that STR mutation rates covary with paternal and maternal age, implying that some STR mutations are caused by DNA damage in quiescent cells rather than polymerase slippage in replicating progenitor cells. These results echo the recent finding that DNA damage in oocytes is a significant source of de novo single nucleotide variants and corroborate evidence of STR expansion in postmitotic cells. However, we find that the maternal age effect is not confined to known hotspots of oocyte mutagenesis, nor are postzygotic mutations likely to contribute significantly. STR nucleotide composition demonstrates divergent effects on de novo mutation (DNM) rates between sexes. Unlike the paternal lineage, maternally derived DNMs at A/T STRs display a significantly greater association with maternal age than DNMs at G/C-containing STRs. These observations may suggest the mechanism and developmental timing of certain STR mutations and contradict prior attribution of replication slippage as the primary mechanism of STR mutagenesis.
Collapse
Affiliation(s)
- Michael E Goldberg
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT 84112, USA
| | - Michelle D Noyes
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Aaron R Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT 84112, USA
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Computational Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| |
Collapse
|
10
|
McComish BJ, Charleston MA, Parks M, Baroni C, Salvatore MC, Li R, Zhang G, Millar CD, Holland BR, Lambert DM. Ancient and Modern Genomes Reveal Microsatellites Maintain a Dynamic Equilibrium Through Deep Time. Genome Biol Evol 2024; 16:evae017. [PMID: 38412309 PMCID: PMC10972684 DOI: 10.1093/gbe/evae017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 12/22/2023] [Accepted: 01/23/2024] [Indexed: 02/29/2024] Open
Abstract
Microsatellites are widely used in population genetics, but their evolutionary dynamics remain poorly understood. It is unclear whether microsatellite loci drift in length over time. This is important because the mutation processes that underlie these important genetic markers are central to the evolutionary models that employ microsatellites. We identify more than 27 million microsatellites using a novel and unique dataset of modern and ancient Adélie penguin genomes along with data from 63 published chordate genomes. We investigate microsatellite evolutionary dynamics over 2 timescales: one based on Adélie penguin samples dating to ∼46.5 ka and the other dating to the diversification of chordates aged more than 500 Ma. We show that the process of microsatellite allele length evolution is at dynamic equilibrium; while there is length polymorphism among individuals, the length distribution for a given locus remains stable. Many microsatellites persist over very long timescales, particularly in exons and regulatory sequences. These often retain length variability, suggesting that they may play a role in maintaining phenotypic variation within populations.
Collapse
Affiliation(s)
- Bennet J McComish
- School of Natural Sciences, University of Tasmania, Hobart, TAS 7001, Australia
- Menzies Institute for Medical Research, University of Tasmania, Hobart, TAS 7001, Australia
| | | | - Matthew Parks
- Australian Research Centre for Human Evolution, Griffith University, Nathan, QLD 4111, Australia
- Department of Biology, University of Central Oklahoma, Edmond, OK 73034, USA
| | - Carlo Baroni
- Dipartimento di Scienze della Terra, University of Pisa, Pisa, Italy
- CNR-IGG, Institute of Geosciences and Earth Resources, Pisa, Italy
| | - Maria Cristina Salvatore
- Dipartimento di Scienze della Terra, University of Pisa, Pisa, Italy
- CNR-IGG, Institute of Geosciences and Earth Resources, Pisa, Italy
| | - Ruiqiang Li
- Novogene Bioinformatics Technology Co. Ltd., Beijing 100083, China
| | - Guojie Zhang
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
- Department of Biology, Centre for Social Evolution, University of Copenhagen, Copenhagen DK-2100, Denmark
| | - Craig D Millar
- School of Biological Sciences, University of Auckland, Auckland, New Zealand
| | - Barbara R Holland
- School of Natural Sciences, University of Tasmania, Hobart, TAS 7001, Australia
| | - David M Lambert
- Australian Research Centre for Human Evolution, Griffith University, Nathan, QLD 4111, Australia
| |
Collapse
|
11
|
Antão-Sousa S, Gusmão L, Modesti NM, Feliziani S, Faustino M, Marcucci V, Sarapura C, Ribeiro J, Carvalho E, Pereira V, Tomas C, de Pancorbo MM, Baeta M, Alghafri R, Almheiri R, Builes JJ, Gouveia N, Burgos G, Pontes MDL, Ibarra A, da Silva CV, Parveen R, Benitez M, Amorim A, Pinto N. Microsatellites' mutation modeling through the analysis of the Y-chromosomal transmission: Results of a GHEP-ISFG collaborative study. Forensic Sci Int Genet 2024; 69:102999. [PMID: 38181588 DOI: 10.1016/j.fsigen.2023.102999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 10/25/2023] [Accepted: 12/10/2023] [Indexed: 01/07/2024]
Abstract
The Spanish and Portuguese Speaking Working Group of the International Society for Forensic Genetics (GHEP-ISFG) organized a collaborative study on mutations of Y-chromosomal short tandem repeats (Y-STRs). New data from 2225 father-son duos and data from 44 previously published reports, corresponding to 25,729 duos, were collected and analyzed. Marker-specific mutation rates were estimated for 33 Y-STRs. Although highly dependent on the analyzed marker, mutations compatible with the gain or loss of a single repeat were 23.2 times more likely than those involving a greater number of repeats. Longer alleles (relatively to the modal one) showed to be nearly twice more mutable than the shorter ones. Within the subset of longer alleles, the loss of repeats showed to be nearly twice more likely than the gain. Conversely, shorter alleles showed a symmetrical trend, with repeat gains being twofold more frequent than reductions. A positive correlation between the paternal age and the mutation rate was observed, strengthening previous findings. The results of a machine learning approach, via logistic regression analyses, allowed the establishment of algebraic formulas for estimating the probability of mutation depending on paternal age and allele length for DYS389I, DYS393 and DYS627. Algebraic formulas could also be established considering only the allele length as predictor for DYS19, DYS389I, DYS389II-I, DYS390, DYS391, DYS393, DYS437, DYS439, DYS449, DYS456, DYS458, DYS460, DYS481, DYS518, DYS533, DYS576, DYS626 and DYS627 loci. For the remaining Y-STRs, a lack of statistical significance was observed, probably as a consequence of the small effective size of the subsets available, a common difficulty in the modeling of rare events as is the case of mutations. The amount of data used in the different analyses varied widely, depending on how the data were reported in the publications analyzed. This shows a regrettable waste of produced data, due to inadequate communication of the results, supporting an urgent need of publication guidelines for mutation studies.
Collapse
Affiliation(s)
- Sofia Antão-Sousa
- Instituto de Investigação e Inovação em Saúde (i3S), Porto, Portugal; Institute of Molecular Pathology and Immunology, University of Porto (IPATIMUP), Porto, Portugal; Faculty of Sciences of the University of Porto (FCUP), Porto, Portugal; DNA Diagnostic Laboratory (LDD), State University of Rio de Janeiro (UERJ), Rio de Janeiro, Brazil
| | - Leonor Gusmão
- DNA Diagnostic Laboratory (LDD), State University of Rio de Janeiro (UERJ), Rio de Janeiro, Brazil
| | - Nidia M Modesti
- Centro de Genética Forense, Poder Judicial de Córdoba, Argentina
| | - Sofía Feliziani
- Centro de Genética Forense, Poder Judicial de Córdoba, Argentina
| | - Marisa Faustino
- Instituto de Investigação e Inovação em Saúde (i3S), Porto, Portugal; Faculty of Sciences of the University of Porto (FCUP), Porto, Portugal
| | - Valeria Marcucci
- Laboratorio Regional de Investigación Forense, Tribunal Superior de Justicia de Santa Cruz, Argentina
| | - Claudia Sarapura
- Laboratorio Regional de Investigación Forense, Tribunal Superior de Justicia de Santa Cruz, Argentina
| | - Julyana Ribeiro
- DNA Diagnostic Laboratory (LDD), State University of Rio de Janeiro (UERJ), Rio de Janeiro, Brazil
| | - Elizeu Carvalho
- DNA Diagnostic Laboratory (LDD), State University of Rio de Janeiro (UERJ), Rio de Janeiro, Brazil
| | - Vania Pereira
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
| | - Carmen Tomas
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
| | - Marian M de Pancorbo
- BIOMICs Research Group, Lascaray Research Center, Department of Zoology and Animal Cell Biology, University of the Basque Country UPV/EHU, Vitoria-Gasteiz, Spain
| | - Miriam Baeta
- BIOMICs Research Group, Lascaray Research Center, Department of Zoology and Animal Cell Biology, University of the Basque Country UPV/EHU, Vitoria-Gasteiz, Spain
| | - Rashed Alghafri
- International Center for Forensic Sciences, Dubai Police G.H.Q., Dubai, United Arab Emirates
| | - Reem Almheiri
- International Center for Forensic Sciences, Dubai Police G.H.Q., Dubai, United Arab Emirates
| | - Juan José Builes
- GENES SAS Laboratory, Medellín, Colombia; Institute of Biology, University of Antioquia, Medellín, Colombia
| | - Nair Gouveia
- Instituto Nacional de Medicina Legal e Ciências Forenses, I.P. / Serviço de Genética e Biologia Forenses, Delegação do Centro, Portugal
| | - German Burgos
- One Health Global Research Group, Facultad de Medicina, Universidad de Las Américas (UDLA), Quito, Ecuador; Grupo de Medicina Xenómica, Universidad de Santiago de Compostela, Santiago de Compostela, Spain
| | - Maria de Lurdes Pontes
- Instituto Nacional de Medicina Legal e Ciências Forenses, I.P. / Serviço de Genética e Biologia Forenses, Delegação do Norte, Portugal
| | - Adriana Ibarra
- Laboratorio IDENTIGEN, Universidad de Antioquia, Colombia
| | - Claudia Vieira da Silva
- Instituto Nacional de Medicina Legal e Ciências Forenses, I.P. / Serviço de Genética e Biologia Forenses, Delegação do Sul, Portugal
| | - Rukhsana Parveen
- Forensic Services Laboratory, Centre for Applied Molecular Biology, University of the Punjab, Lahore, Pakistan
| | - Marc Benitez
- Policia de la Generalitat de Catalunya - Mossos d'Esquadra. Unitat Central del Laboratori Biològic, Barcelona, Spain
| | - António Amorim
- Instituto de Investigação e Inovação em Saúde (i3S), Porto, Portugal; Institute of Molecular Pathology and Immunology, University of Porto (IPATIMUP), Porto, Portugal; Faculty of Sciences of the University of Porto (FCUP), Porto, Portugal
| | - Nadia Pinto
- Instituto de Investigação e Inovação em Saúde (i3S), Porto, Portugal; Institute of Molecular Pathology and Immunology, University of Porto (IPATIMUP), Porto, Portugal; Centre of Mathematics of the University of Porto, Porto, Portugal.
| |
Collapse
|
12
|
Verbiest MA, Lundström O, Xia F, Baudis M, Bilgin Sonay T, Anisimova M. Short tandem repeat mutations regulate gene expression in colorectal cancer. Sci Rep 2024; 14:3331. [PMID: 38336885 PMCID: PMC10858039 DOI: 10.1038/s41598-024-53739-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 02/04/2024] [Indexed: 02/12/2024] Open
Abstract
Short tandem repeat (STR) mutations are prevalent in colorectal cancer (CRC), especially in tumours with the microsatellite instability (MSI) phenotype. While STR length variations are known to regulate gene expression under physiological conditions, the functional impact of STR mutations in CRC remains unclear. Here, we integrate STR mutation data with clinical information and gene expression data to study the gene regulatory effects of STR mutations in CRC. We confirm that STR mutability in CRC highly depends on the MSI status, repeat unit size, and repeat length. Furthermore, we present a set of 1244 putative expression STRs (eSTRs) for which the STR length is associated with gene expression levels in CRC tumours. The length of 73 eSTRs is associated with expression levels of cancer-related genes, nine of which are CRC-specific genes. We show that linear models describing eSTR-gene expression relationships allow for predictions of gene expression changes in response to eSTR mutations. Moreover, we found an increased mutability of eSTRs in MSI tumours. Our evidence of gene regulatory roles for eSTRs in CRC highlights a mostly overlooked way through which tumours may modulate their phenotypes. Future extensions of these findings could uncover new STR-based targets in the treatment of cancer.
Collapse
Affiliation(s)
- Max A Verbiest
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland.
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| | - Oxana Lundström
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Feifei Xia
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Michael Baudis
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Tugce Bilgin Sonay
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Ecology, Evolution and Environmental Biology, Columbia University, New York, USA
| | - Maria Anisimova
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
13
|
Goldberg ME, Noyes MD, Eichler EE, Quinlan AR, Harris K. Effects of parental age and polymer composition on short tandem repeat de novo mutation rates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.22.573131. [PMID: 38187618 PMCID: PMC10769404 DOI: 10.1101/2023.12.22.573131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Short tandem repeats (STRs) are hotspots of genomic variability in the human germline because of their high mutation rates, which have long been attributed largely to polymerase slippage during DNA replication. This model suggests that STR mutation rates should scale linearly with a father's age, as progenitor cells continually divide after puberty. In contrast, it suggests that STR mutation rates should not scale with a mother's age at her child's conception, since oocytes spend a mother's reproductive years arrested in meiosis II and undergo a fixed number of cell divisions that are independent of the age at ovulation. Yet, mirroring recent findings, we find that STR mutation rates covary with paternal and maternal age, implying that some STR mutations are caused by DNA damage in quiescent cells rather than the classical mechanism of polymerase slippage in replicating progenitor cells. These results also echo the recent finding that DNA damage in quiescent oocytes is a significant source of de novo SNVs and corroborate evidence of STR expansion in postmitotic cells. However, we find that the maternal age effect is not confined to previously discovered hotspots of oocyte mutagenesis, nor are post-zygotic mutations likely to contribute significantly. STR nucleotide composition demonstrates divergent effects on DNM rates between sexes. Unlike the paternal lineage, maternally derived DNMs at A/T STRs display a significantly greater association with maternal age than DNMs at GC-containing STRs. These observations may suggest the mechanism and developmental timing of certain STR mutations and are especially surprising considering the prior belief in replication slippage as the dominant mechanism of STR mutagenesis.
Collapse
Affiliation(s)
- Michael E. Goldberg
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
- Departments of Human Genetics and Biomedical Informatics, University of Utah, 15 S 2030 E, Salt Lake City, UT, 84112
| | - Michelle D. Noyes
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
- Howard Hughes Medical Institute, 3720 15 Ave NE, University of Washington, Seattle, WA, 98195
| | - Aaron R. Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, 15 S 2030 E, Salt Lake City, UT, 84112
- These authors contributed equally to this work
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
- Computational Biology Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA, 98109
- These authors contributed equally to this work
| |
Collapse
|
14
|
Margoliash J, Fuchs S, Li Y, Zhang X, Massarat A, Goren A, Gymrek M. Polymorphic short tandem repeats make widespread contributions to blood and serum traits. CELL GENOMICS 2023; 3:100458. [PMID: 38116119 PMCID: PMC10726533 DOI: 10.1016/j.xgen.2023.100458] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Revised: 09/09/2023] [Accepted: 11/07/2023] [Indexed: 12/21/2023]
Abstract
Short tandem repeats (STRs) are genomic regions consisting of repeated sequences of 1-6 bp in succession. Single-nucleotide polymorphism (SNP)-based genome-wide association studies (GWASs) do not fully capture STR effects. To study these effects, we imputed 445,720 STRs into genotype arrays from 408,153 White British UK Biobank participants and tested for association with 44 blood phenotypes. Using two fine-mapping methods, we identify 119 candidate causal STR-trait associations and estimate that STRs account for 5.2%-7.6% of causal variants identifiable from GWASs for these traits. These are among the strongest associations for multiple phenotypes, including a coding CTG repeat associated with apolipoprotein B levels, a promoter CGG repeat with platelet traits, and an intronic poly(A) repeat with mean platelet volume. Our study suggests that STRs make widespread contributions to complex traits, provides stringently selected candidate causal STRs, and demonstrates the need to consider a more complete view of genetic variation in GWASs.
Collapse
Affiliation(s)
- Jonathan Margoliash
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Shai Fuchs
- Pediatric Endocrine and Diabetes Unit, Edmond and Lily Safra Children's Hospital, Sheba Medical Center, Ramat Gan, Israel
| | - Yang Li
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA; Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Xuan Zhang
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Arya Massarat
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Alon Goren
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA.
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA; Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
15
|
Sehgal A, Ziaei-Jam H, Shen A, Gymrek M. Genome-wide detection of somatic mosaicism at short tandem repeats. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.22.568371. [PMID: 38045311 PMCID: PMC10690266 DOI: 10.1101/2023.11.22.568371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Motivation Somatic mosaicism, in which a mutation occurs post-zygotically, has been implicated in several developmental disorders, cancers, and other diseases. Short tandem repeats (STRs) consist of repeated sequences of 1-6bp and comprise more than 1 million loci in the human genome. Somatic mosaicism at STRs is known to play a key role in the pathogenicity of loci implicated in repeat expansion disorders and is highly prevalent in cancers exhibiting microsatellite instability. While a variety of tools have been developed to genotype germline variation at STRs, a method for systematically identifying mosaic STRs (mSTRs) is lacking. Results We introduce prancSTR, a novel method for detecting mSTRs from individual high-throughput sequencing datasets. Unlike many existing mosaicism detection methods for other variant types, prancSTR does not require a matched control sample as input. We show that prancSTR accurately identifies mSTRs in simulated data and demonstrate its feasibility by identifying candidate mSTRs in whole genome sequencing (WGS) data derived from lymphoblastoid cell lines for individuals sequenced by the 1000 Genomes Project. Our analysis identified an average of 76 and 577 non-homopolymer and homopolymer mSTRs respectively per cell line as well as multiple cell lines with outlier mSTR counts more than 6 times the population average, suggesting a subset of cell lines have particularly high STR instability rates. Availability prancSTR is freely available at https://github.com/gymrek-lab/trtools. Documentation Detailed documentation is available at https://trtools.readthedocs.io/.
Collapse
Affiliation(s)
- Aarushi Sehgal
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, USA
| | - Helyaneh Ziaei-Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, USA
| | - Andrew Shen
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, USA
- Department of Medicine, University of California San Diego, La Jolla, USA
| |
Collapse
|
16
|
Bhati M, Mapel XM, Lloret-Villas A, Pausch H. Structural variants and short tandem repeats impact gene expression and splicing in bovine testis tissue. Genetics 2023; 225:iyad161. [PMID: 37655920 PMCID: PMC10627265 DOI: 10.1093/genetics/iyad161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 06/05/2023] [Accepted: 08/24/2023] [Indexed: 09/02/2023] Open
Abstract
Structural variants (SVs) and short tandem repeats (STRs) are significant sources of genetic variation. However, the impacts of these variants on gene regulation have not been investigated in cattle. Here, we genotyped and characterized 19,408 SVs and 374,821 STRs in 183 bovine genomes and investigated their impact on molecular phenotypes derived from testis transcriptomes. We found that 71% STRs were multiallelic. The vast majority (95%) of STRs and SVs were in intergenic and intronic regions. Only 37% SVs and 40% STRs were in high linkage disequilibrium (LD) (R2 > 0.8) with surrounding SNPs/insertions and deletions (Indels), indicating that SNP-based association testing and genomic prediction are blind to a nonnegligible portion of genetic variation. We showed that both SVs and STRs were more than 2-fold enriched among expression and splicing QTL (e/sQTL) relative to SNPs/Indels and were often associated with differential expression and splicing of multiple genes. Deletions and duplications had larger impacts on splicing and expression than any other type of SV. Exonic duplications predominantly increased gene expression either through alternative splicing or other mechanisms, whereas expression- and splicing-associated STRs primarily resided in intronic regions and exhibited bimodal effects on the molecular phenotypes investigated. Most e/sQTL resided within 100 kb of the affected genes or splicing junctions. We pinpoint candidate causal STRs and SVs associated with the expression of SLC13A4 and TTC7B and alternative splicing of a lncRNA and CAPP1. We provide a catalog of STRs and SVs for taurine cattle and show that these variants contribute substantially to gene expression and splicing variation.
Collapse
Affiliation(s)
- Meenu Bhati
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| | - Xena Marie Mapel
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| | | | - Hubert Pausch
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| |
Collapse
|
17
|
Ziaei Jam H, Li Y, DeVito R, Mousavi N, Ma N, Lujumba I, Adam Y, Maksimov M, Huang B, Dolzhenko E, Qiu Y, Kakembo FE, Joseph H, Onyido B, Adeyemi J, Bakhtiari M, Park J, Javadzadeh S, Jjingo D, Adebiyi E, Bafna V, Gymrek M. A deep population reference panel of tandem repeat variation. Nat Commun 2023; 14:6711. [PMID: 37872149 PMCID: PMC10593948 DOI: 10.1038/s41467-023-42278-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 10/05/2023] [Indexed: 10/25/2023] Open
Abstract
Tandem repeats (TRs) represent one of the largest sources of genetic variation in humans and are implicated in a range of phenotypes. Here we present a deep characterization of TR variation based on high coverage whole genome sequencing from 3550 diverse individuals from the 1000 Genomes Project and H3Africa cohorts. We develop a method, EnsembleTR, to integrate genotypes from four separate methods resulting in high-quality genotypes at more than 1.7 million TR loci. Our catalog reveals novel sequence features influencing TR heterozygosity, identifies population-specific trinucleotide expansions, and finds hundreds of novel eQTL signals. Finally, we generate a phased haplotype panel which can be used to impute most TRs from nearby single nucleotide polymorphisms (SNPs) with high accuracy. Overall, the TR genotypes and reference haplotype panel generated here will serve as valuable resources for future genome-wide and population-wide studies of TRs and their role in human phenotypes.
Collapse
Affiliation(s)
- Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Yang Li
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Ross DeVito
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Nima Mousavi
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
| | - Nichole Ma
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Ibra Lujumba
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
| | - Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Mikhail Maksimov
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Bonnie Huang
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | | | - Yunjiang Qiu
- Illumina Incorporated, San Diego, CA, 92122, USA
| | - Fredrick Elishama Kakembo
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
| | - Habi Joseph
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
| | - Blessing Onyido
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Jumoke Adeyemi
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Mehrdad Bakhtiari
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Sara Javadzadeh
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Daudi Jjingo
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
- Department of Computer Science, Makerere University, Kampala, Uganda
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, Baden-Württemberg, 69120, Germany
| | - Vineet Bafna
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
18
|
Ichikawa K, Kawahara R, Asano T, Morishita S. A landscape of complex tandem repeats within individual human genomes. Nat Commun 2023; 14:5530. [PMID: 37709751 PMCID: PMC10502081 DOI: 10.1038/s41467-023-41262-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 08/28/2023] [Indexed: 09/16/2023] Open
Abstract
Markedly expanded tandem repeats (TRs) have been correlated with ~60 diseases. TR diversity has been considered a clue toward understanding missing heritability. However, haplotype-resolved long TRs remain mostly hidden or blacked out because their complex structures (TRs composed of various units and minisatellites containing >10-bp units) make them difficult to determine accurately with existing methods. Here, using a high-precision algorithm to determine complex TR structures from long, accurate reads of PacBio HiFi, an investigation of 270 Japanese control samples yields several genome-wide findings. Approximately 322,000 TRs are difficult to impute from the surrounding single-nucleotide variants. Greater genetic divergence of TR loci is significantly correlated with more events of younger replication slippage. Complex TRs are more abundant than single-unit TRs, and a tendency for complex TRs to consist of <10-bp units and single-unit TRs to be minisatellites is statistically significant at loci with ≥500-bp TRs. Of note, 8909 loci with extended TRs (>100b longer than the mode) contain several known disease-associated TRs and are considered candidates for association with disorders. Overall, complex TRs and minisatellites are found to be abundant and diverse, even in genetically small Japanese populations, yielding insights into the landscape of long TRs.
Collapse
Affiliation(s)
- Kazuki Ichikawa
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan
| | - Riki Kawahara
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan
| | - Takeshi Asano
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan
| | - Shinichi Morishita
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan.
| |
Collapse
|
19
|
Zürcher JF, Kleefeldt AA, Funke LFH, Birnbaum J, Fredens J, Grazioli S, Liu KC, Spinck M, Petris G, Murat P, Rehm FBH, Sale JE, Chin JW. Continuous synthesis of E. coli genome sections and Mb-scale human DNA assembly. Nature 2023; 619:555-562. [PMID: 37380776 PMCID: PMC7614783 DOI: 10.1038/s41586-023-06268-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 05/26/2023] [Indexed: 06/30/2023]
Abstract
Whole-genome synthesis provides a powerful approach for understanding and expanding organism function1-3. To build large genomes rapidly, scalably and in parallel, we need (1) methods for assembling megabases of DNA from shorter precursors and (2) strategies for rapidly and scalably replacing the genomic DNA of organisms with synthetic DNA. Here we develop bacterial artificial chromosome (BAC) stepwise insertion synthesis (BASIS)-a method for megabase-scale assembly of DNA in Escherichia coli episomes. We used BASIS to assemble 1.1 Mb of human DNA containing numerous exons, introns, repetitive sequences, G-quadruplexes, and long and short interspersed nuclear elements (LINEs and SINEs). BASIS provides a powerful platform for building synthetic genomes for diverse organisms. We also developed continuous genome synthesis (CGS)-a method for continuously replacing sequential 100 kb stretches of the E. coli genome with synthetic DNA; CGS minimizes crossovers1,4 between the synthetic DNA and the genome such that the output for each 100 kb replacement provides, without sequencing, the input for the next 100 kb replacement. Using CGS, we synthesized a 0.5 Mb section of the E. coli genome-a key intermediate in its total synthesis1-from five episomes in 10 days. By parallelizing CGS and combining it with rapid oligonucleotide synthesis and episome assembly5,6, along with rapid methods for compiling a single genome from strains bearing distinct synthetic genome sections1,7,8, we anticipate that it will be possible to synthesize entire E. coli genomes from functional designs in less than 2 months.
Collapse
Affiliation(s)
- Jérôme F Zürcher
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK
| | - Askar A Kleefeldt
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK
| | - Louise F H Funke
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK
- Department of Biomedical Engineering, National University of Singapore, Singapore, Singapore
| | - Jakob Birnbaum
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK
| | - Julius Fredens
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK
- Synthetic Biology for Clinical and Technological Innovation, Department of Biochemistry, National University of Singapore, Singapore, Singapore
| | - Simona Grazioli
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK
| | - Kim C Liu
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK
| | - Martin Spinck
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK
| | - Gianluca Petris
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK
| | - Pierre Murat
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK
| | - Fabian B H Rehm
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK
| | - Julian E Sale
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK
| | - Jason W Chin
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK.
| |
Collapse
|
20
|
Kristmundsdottir S, Jonsson H, Hardarson MT, Palsson G, Beyter D, Eggertsson HP, Gylfason A, Sveinbjornsson G, Holley G, Stefansson OA, Halldorsson GH, Olafsson S, Arnadottir GA, Olason PI, Eiriksson O, Masson G, Thorsteinsdottir U, Rafnar T, Sulem P, Helgason A, Gudbjartsson DF, Halldorsson BV, Stefansson K. Sequence variants affecting the genome-wide rate of germline microsatellite mutations. Nat Commun 2023; 14:3855. [PMID: 37386006 PMCID: PMC10310707 DOI: 10.1038/s41467-023-39547-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 06/16/2023] [Indexed: 07/01/2023] Open
Abstract
Microsatellites are polymorphic tracts of short tandem repeats with one to six base-pair (bp) motifs and are some of the most polymorphic variants in the genome. Using 6084 Icelandic parent-offspring trios we estimate 63.7 (95% CI: 61.9-65.4) microsatellite de novo mutations (mDNMs) per offspring per generation, excluding one bp repeats motifs (homopolymers) the estimate is 48.2 mDNMs (95% CI: 46.7-49.6). Paternal mDNMs occur at longer repeats than maternal ones, which are in turn larger with a mean size of 3.4 bp vs 3.1 bp for paternal ones. mDNMs increase by 0.97 (95% CI: 0.90-1.04) and 0.31 (95% CI: 0.25-0.37) per year of father's and mother's age at conception, respectively. Here, we find two independent coding variants that associate with the number of mDNMs transmitted to offspring; The minor allele of a missense variant (allele frequency (AF) = 1.9%) in MSH2, a mismatch repair gene, increases transmitted mDNMs from both parents (effect: 13.1 paternal and 7.8 maternal mDNMs). A synonymous variant (AF = 20.3%) in NEIL2, a DNA damage repair gene, increases paternally transmitted mDNMs (effect: 4.4 mDNMs). Thus, the microsatellite mutation rate in humans is in part under genetic control.
Collapse
Affiliation(s)
- Snaedis Kristmundsdottir
- deCODE genetics / Amgen Inc., Reykjavik, Iceland
- School of Technology, Reykjavik University, Reykjavik, Iceland
| | | | - Marteinn T Hardarson
- deCODE genetics / Amgen Inc., Reykjavik, Iceland
- School of Technology, Reykjavik University, Reykjavik, Iceland
| | | | - Doruk Beyter
- deCODE genetics / Amgen Inc., Reykjavik, Iceland
| | | | | | | | | | | | - Gisli H Halldorsson
- deCODE genetics / Amgen Inc., Reykjavik, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | | | - Gudny A Arnadottir
- deCODE genetics / Amgen Inc., Reykjavik, Iceland
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | | | | | - Gisli Masson
- deCODE genetics / Amgen Inc., Reykjavik, Iceland
| | - Unnur Thorsteinsdottir
- deCODE genetics / Amgen Inc., Reykjavik, Iceland
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | | | | | - Agnar Helgason
- deCODE genetics / Amgen Inc., Reykjavik, Iceland
- Department of Anthropology, University of Iceland, Reykjavik, Iceland
| | - Daniel F Gudbjartsson
- deCODE genetics / Amgen Inc., Reykjavik, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | - Bjarni V Halldorsson
- deCODE genetics / Amgen Inc., Reykjavik, Iceland.
- School of Technology, Reykjavik University, Reykjavik, Iceland.
| | | |
Collapse
|
21
|
Antão-Sousa S, Pinto N, Rende P, Amorim A, Gusmão L. The sequence of the repetitive motif influences the frequency of multistep mutations in Short Tandem Repeats. Sci Rep 2023; 13:10251. [PMID: 37355683 PMCID: PMC10290632 DOI: 10.1038/s41598-023-32137-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 03/23/2023] [Indexed: 06/26/2023] Open
Abstract
Microsatellites, or Short Tandem Repeats (STRs), are subject to frequent length mutations that involve the loss or gain of an integer number of repeats. This work aimed to investigate the correlation between STRs' specific repetitive motif composition and mutational dynamics, specifically the occurrence of single- or multistep mutations. Allelic transmission data, comprising 323,818 allele transfers and 1,297 mutations, were gathered for 35 Y-chromosomal STRs with simple structure. Six structure groups were established: ATT, CTT, TCTA/GATA, GAAA/CTTT, CTTTT, and AGAGAT, according to the repetitive motif present in the DNA leading strand of the markers. Results show that the occurrence of multistep mutations varies significantly among groups of markers defined by the repetitive motif. The group of markers with the highest frequency of multistep mutations was the one with repetitive motif CTTTT (25% of the detected mutations) and the lowest frequency corresponding to the group with repetitive motifs TCTA/GATA (0.93%). Statistically significant differences (α = 0.05) were found between groups with repetitive motifs with different lengths, as is the case of TCTA/GATA and ATT (p = 0.0168), CTT (p < 0.0001) and CTTTT (p < 0.0001), as well as between GAAA/CTTT and CTTTT (p = 0.0102). The same occurred between the two tetrameric groups GAAA/CTTT and TCTA/GATA (p < 0.0001) - the first showing 5.7 times more multistep mutations than the second. When considering the number of repeats of the mutated paternal alleles, statistically significant differences were found for alleles with 10 or 12 repeats, between GATA and ATT structure groups. These results, which demonstrate the heterogeneity of mutational dynamics across repeat motifs, have implications in the fields of population genetics, epidemiology, or phylogeography, and whenever STR mutation models are used in evolutionary studies in general.
Collapse
Affiliation(s)
- Sofia Antão-Sousa
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal.
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal.
- Department of Biology, Faculty of Sciences of University of Porto (FCUP), Porto, Portugal.
- DNA Diagnostic Laboratory (LDD), State University of Rio de Janeiro (UERJ), Rio de Janeiro, Brazil.
| | - Nádia Pinto
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal
- Center of Mathematics of University of Porto (CMUP), Porto, Portugal
| | - Pablo Rende
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal
- Department of Biology, Faculty of Sciences of University of Porto (FCUP), Porto, Portugal
| | - António Amorim
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal
- Department of Biology, Faculty of Sciences of University of Porto (FCUP), Porto, Portugal
| | - Leonor Gusmão
- DNA Diagnostic Laboratory (LDD), State University of Rio de Janeiro (UERJ), Rio de Janeiro, Brazil
| |
Collapse
|
22
|
Han YJ, Liu LY, Rong Z, Zhang QZ, Cheng P, Xu GJ, Wang DF, Zhou Z, Wang SQ. Rapid genotyping of 32 insertion/deletion panel for human identification using fluorogenic probes-based multiplex real-time PCR. Anal Biochem 2023; 674:115208. [PMID: 37315679 DOI: 10.1016/j.ab.2023.115208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 05/17/2023] [Accepted: 06/04/2023] [Indexed: 06/16/2023]
Abstract
BACKGROUND Insertion and deletion polymorphisms (InDels) have considerable potential in the field of forensic genetics because of their low mutation rate and small amplicons. At present, InDel polymorphisms detection based on the technique of capillary electrophoresis is the main technique used in forensic DNA laboratory. However, this method is complicated and time-consuming, and is not suitable for rapid on-site paternity and personal identification. Next-generation sequencing analysis of InDels polymorphisms requires expensive instruments, large upfront reagent and supply costs, computational requirements and complex bioinformatics, increased the time to obtain results. Thus, there is an urgent need to establish a method to provide reliable, rapid, sensitive and economical genotyping for InDels. METHOD A rapid InDels (32 InDels) panel was established using fluorogenic probes-based multiplex real-time PCR with microfluidic test cartridge and portable real-time PCR instrument. Then, we performed several validation studies including concordance, accuracy, sensitivity, stability, species specificity. RESULTS It showed that the complete genotypes could be obtained from ≥100 pg of input DNA and from a series of challenging samples with high accuracy and specificity within 90 min. CONCLUSION This method provides a rapid and cost-effective solution for InDels genotyping and personal identification in portable format.
Collapse
Affiliation(s)
- Yong-Jun Han
- Bioinformatics Center of AMMS, Beijing 100850, China
| | - Li-Yan Liu
- Bioinformatics Center of AMMS, Beijing 100850, China
| | - Zhen Rong
- Bioinformatics Center of AMMS, Beijing 100850, China
| | | | - Peng Cheng
- Bioinformatics Center of AMMS, Beijing 100850, China
| | - Guo-Juan Xu
- Bioinformatics Center of AMMS, Beijing 100850, China
| | | | - Zhe Zhou
- Bioinformatics Center of AMMS, Beijing 100850, China
| | - Sheng-Qi Wang
- Bioinformatics Center of AMMS, Beijing 100850, China.
| |
Collapse
|
23
|
Maksimov MO, Wu C, Ashbrook DG, Villani F, Colonna V, Mousavi N, Ma N, Lu L, Pritchard JK, Goren A, Williams RW, Palmer AA, Gymrek M. A novel quantitative trait locus implicates Msh3 in the propensity for genome-wide short tandem repeat expansions in mice. Genome Res 2023; 33:689-702. [PMID: 37127331 PMCID: PMC10317118 DOI: 10.1101/gr.277576.122] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 04/26/2023] [Indexed: 05/03/2023]
Abstract
Short tandem repeats (STRs) are a class of rapidly mutating genetic elements typically characterized by repeated units of 1-6 bp. We leveraged whole-genome sequencing data for 152 recombinant inbred (RI) strains from the BXD family of mice to map loci that modulate genome-wide patterns of new mutations arising during parent-to-offspring transmission at STRs. We defined quantitative phenotypes describing the numbers and types of germline STR mutations in each strain and performed quantitative trait locus (QTL) analyses for each of these phenotypes. We identified a locus on Chromosome 13 at which strains inheriting the C57BL/6J (B) haplotype have a higher rate of STR expansions than those inheriting the DBA/2J (D) haplotype. The strongest candidate gene in this locus is Msh3, a known modifier of STR stability in cancer and at pathogenic repeat expansions in mice and humans, as well as a current drug target against Huntington's disease. The D haplotype at this locus harbors a cluster of variants near the 5' end of Msh3, including multiple missense variants near the DNA mismatch recognition domain. In contrast, the B haplotype contains a unique retrotransposon insertion. The rate of expansion covaries positively with Msh3 expression-with higher expression from the B haplotype. Finally, detailed analysis of mutation patterns showed that strains carrying the B allele have higher expansion rates, but slightly lower overall total mutation rates, compared with those with the D allele, particularly at tetranucleotide repeats. Our results suggest an important role for inherited variants in Msh3 in modulating genome-wide patterns of germline mutations at STRs.
Collapse
Affiliation(s)
- Mikhail O Maksimov
- Department of Medicine, University of California San Diego, La Jolla, California 92093, USA
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA
| | - Cynthia Wu
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, California 92093, USA
| | - David G Ashbrook
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
| | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
| | - Vincenza Colonna
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
- Institute of Genetics and Biophysics, National Research Council, Naples 80111, Italy
| | - Nima Mousavi
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, California 92093, USA
| | - Nichole Ma
- Department of Medicine, University of California San Diego, La Jolla, California 92093, USA
| | - Lu Lu
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
| | - Jonathan K Pritchard
- Department of Genetics, Stanford University, Stanford, California 94305, USA
- Department of Biology, Stanford University, Stanford, California 94305, USA
| | - Alon Goren
- Department of Medicine, University of California San Diego, La Jolla, California 92093, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, California 92093, USA
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
| | - Abraham A Palmer
- Institute for Genomic Medicine, University of California San Diego, La Jolla, California 92093, USA
- Department of Psychiatry, Department of Medicine, University of California San Diego, La Jolla, California 92093, USA
| | - Melissa Gymrek
- Department of Medicine, University of California San Diego, La Jolla, California 92093, USA;
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, California 92093, USA
- Department of Biomedical Informatics
| |
Collapse
|
24
|
Shi Y, Niu Y, Zhang P, Luo H, Liu S, Zhang S, Wang J, Li Y, Liu X, Song T, Xu T, He S. Characterization of genome-wide STR variation in 6487 human genomes. Nat Commun 2023; 14:2092. [PMID: 37045857 PMCID: PMC10097659 DOI: 10.1038/s41467-023-37690-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 03/27/2023] [Indexed: 04/14/2023] Open
Abstract
Short tandem repeats (STRs) are abundant and highly mutagenic in the human genome. Many STR loci have been associated with a range of human genetic disorders. However, most population-scale studies on STR variation in humans have focused on European ancestry cohorts or are limited by sequencing depth. Here, we depicted a comprehensive map of 366,013 polymorphic STRs (pSTRs) constructed from 6487 deeply sequenced genomes, comprising 3983 Chinese samples (~31.5x, NyuWa) and 2504 samples from the 1000 Genomes Project (~33.3x, 1KGP). We found that STR mutations were affected by motif length, chromosome context and epigenetic features. We identified 3273 and 1117 pSTRs whose repeat numbers were associated with gene expression and 3'UTR alternative polyadenylation, respectively. We also implemented population analysis, investigated population differentiated signatures, and genotyped 60 known disease-causing STRs. Overall, this study further extends the scale of STR variation in humans and propels our understanding of the semantics of STRs.
Collapse
Affiliation(s)
- Yirong Shi
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yiwei Niu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Peng Zhang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Huaxia Luo
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Shuai Liu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Sijia Zhang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jiajia Wang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Yanyan Li
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Xinyue Liu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Tingrui Song
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Tao Xu
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, 250117, Shandong, China.
| | - Shunmin He
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
25
|
Zhang G, Andersen EC. Interplay Between Polymorphic Short Tandem Repeats and Gene Expression Variation in Caenorhabditis elegans. Mol Biol Evol 2023; 40:msad067. [PMID: 36999565 PMCID: PMC10075192 DOI: 10.1093/molbev/msad067] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 02/20/2023] [Accepted: 03/29/2023] [Indexed: 04/01/2023] Open
Abstract
Short tandem repeats (STRs) have orders of magnitude higher mutation rates than single nucleotide variants (SNVs) and have been proposed to accelerate evolution in many organisms. However, only few studies have addressed the impact of STR variation on phenotypic variation at both the organismal and molecular levels. Potential driving forces underlying the high mutation rates of STRs also remain largely unknown. Here, we leverage the recently generated expression and STR variation data among wild Caenorhabditis elegans strains to conduct a genome-wide analysis of how STRs affect gene expression variation. We identify thousands of expression STRs (eSTRs) showing regulatory effects and demonstrate that they explain missing heritability beyond SNV-based expression quantitative trait loci. We illustrate specific regulatory mechanisms such as how eSTRs affect splicing sites and alternative splicing efficiency. We also show that differential expression of antioxidant genes and oxidative stresses might affect STR mutations systematically using both wild strains and mutation accumulation lines. Overall, we reveal the interplay between STRs and gene expression variation by providing novel insights into regulatory mechanisms of STRs and highlighting that oxidative stress could lead to higher STR mutation rates.
Collapse
Affiliation(s)
- Gaotian Zhang
- Department of Molecular Biosciences, Northwestern University, Evanston, IL
| | - Erik C Andersen
- Department of Molecular Biosciences, Northwestern University, Evanston, IL
| |
Collapse
|
26
|
Gharrett AJ, Chernova NV, Smé NA, Lyon S, Barry PD. Demography of a nearshore gadid navaga, Eleginus nawaga, from the Barents Sea coast during the last glacial period. Polar Biol 2023. [DOI: 10.1007/s00300-023-03123-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
|
27
|
Jam HZ, Li Y, DeVito R, Mousavi N, Ma N, Lujumba I, Adam Y, Maksimov M, Huang B, Dolzhenko E, Qiu Y, Kakembo FE, Joseph H, Onyido B, Adeyemi J, Bakhtiari M, Park J, Javadzadeh S, Jjingo D, Adebiyi E, Bafna V, Gymrek M. A deep population reference panel of tandem repeat variation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.09.531600. [PMID: 36945429 PMCID: PMC10028971 DOI: 10.1101/2023.03.09.531600] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
Tandem repeats (TRs) represent one of the largest sources of genetic variation in humans and are implicated in a range of phenotypes. Here we present a deep characterization of TR variation based on high coverage whole genome sequencing from 3,550 diverse individuals from the 1000 Genomes Project and H3Africa cohorts. We develop a method, EnsembleTR, to integrate genotypes from four separate methods resulting in high-quality genotypes at more than 1.7 million TR loci. Our catalog reveals novel sequence features influencing TR heterozygosity, identifies population-specific trinucleotide expansions, and finds hundreds of novel eQTL signals. Finally, we generate a phased haplotype panel which can be used to impute most TRs from nearby single nucleotide polymorphisms (SNPs) with high accuracy. Overall, the TR genotypes and reference haplotype panel generated here will serve as valuable resources for future genome-wide and population-wide studies of TRs and their role in human phenotypes.
Collapse
Affiliation(s)
- Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Yang Li
- Department of Medicine, University of California San Diego, La Jolla, CA
| | - Ross DeVito
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Nima Mousavi
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA
| | - Nichole Ma
- Department of Medicine, University of California San Diego, La Jolla, CA
| | - Ibra Lujumba
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala-Uganda
| | - Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Mikhail Maksimov
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Bonnie Huang
- Department of Bioengineering, University of California San Diego, La Jolla, CA
| | | | - Yunjiang Qiu
- Illumina Incorporated, San Diego, California 92122, USA
| | - Fredrick Elishama Kakembo
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala-Uganda
| | - Habi Joseph
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala-Uganda
| | - Blessing Onyido
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Jumoke Adeyemi
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Mehrdad Bakhtiari
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Sara Javadzadeh
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Daudi Jjingo
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala-Uganda
- Department of Computer Science, Makerere University, Kampala, Uganda
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, Baden-Württemberg, 69120, Germany
| | - Vineet Bafna
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
- Department of Medicine, University of California San Diego, La Jolla, CA
| |
Collapse
|
28
|
Chen A, Tao R, Li C, Zhang S. Investigation on the genetic-inconsistent paternity cases using the MiSeq FGx system. Forensic Sci Res 2023; 7:702-707. [PMID: 36817243 PMCID: PMC9930766 DOI: 10.1080/20961790.2021.2009631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
Abstract
Mutations might challenge the paternity index calculation in forensic identification. While many studies have focussed on the autosomal short tandem repeats (A-STR), the mutation status of sex chromosomes and single nucleotide polymorphism (SNP) remain blank. Next generation sequencing (NGS), known as high throughput and large sequence polymorphism, is a promising tool for forensic genetics. To describe the mutation landscapes in the paternity cases with genetic inconsistencies, a total of 63 parentage confirmed paternity cases contained at least one mismatched locus have been collected. The mutations were subsequently evaluated using Verogen's MPS ForenSeqTM DNA Signature Kit and a microsatellite instability (MSI) detection kit. The result showed 98.41% (62/63) of the cases had no additional autosomal mutations even when the number of A-STRs increased to 27. As for the sex chromosomes, about 11.11% (7/63) of the cases exhibited either X-STR or Y-STR mutations. D2S1338, FGA and Penta E were the most frequent altered STRs, which suggested they might be the mutation hotspots. In addition, a male with sex chromosome abnormality was observed accidently, whose genotype might be 47, XXY, rather than MSI. Nearly 56.90% of the STR loci possessed isoalleles, which might result in higher STR polymorphisms. No Mendelian incompatibility was detected among the SNP markers, which indicated that SNP was a more reliable genetic marker in the genetic-inconsistent paternity cases. Supplemental data for this article is available online at https://doi.org/10.1080/20961790.2021.2009631 .
Collapse
Affiliation(s)
- Anqi Chen
- Shanghai Key Laboratory of Forensic Medicine, Shanghai Forensic Service Platform, Academy of Forensic Science, Ministry of Justice, Shanghai, China,Department of Forensic Medicine, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, Shanghai, China
| | - Ruiyang Tao
- Shanghai Key Laboratory of Forensic Medicine, Shanghai Forensic Service Platform, Academy of Forensic Science, Ministry of Justice, Shanghai, China
| | - Chengtao Li
- Shanghai Key Laboratory of Forensic Medicine, Shanghai Forensic Service Platform, Academy of Forensic Science, Ministry of Justice, Shanghai, China,Department of Forensic Medicine, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, Shanghai, China,CONTACT Chengtao Li ;
| | - Suhua Zhang
- Shanghai Key Laboratory of Forensic Medicine, Shanghai Forensic Service Platform, Academy of Forensic Science, Ministry of Justice, Shanghai, China,Suhua Zhang
| |
Collapse
|
29
|
Calluori S, Stark R, Pearson BL. Gene-Environment Interactions in Repeat Expansion Diseases: Mechanisms of Environmentally Induced Repeat Instability. Biomedicines 2023; 11:515. [PMID: 36831049 PMCID: PMC9953593 DOI: 10.3390/biomedicines11020515] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 02/06/2023] [Accepted: 02/07/2023] [Indexed: 02/12/2023] Open
Abstract
Short tandem repeats (STRs) are units of 1-6 base pairs that occur in tandem repetition to form a repeat tract. STRs exhibit repeat instability, which generates expansions or contractions of the repeat tract. Over 50 diseases, primarily affecting the central nervous system and muscles, are characterized by repeat instability. Longer repeat tracts are typically associated with earlier age of onset and increased disease severity. Environmental exposures are suspected to play a role in the pathogenesis of repeat expansion diseases. Here, we review the current knowledge of mechanisms of environmentally induced repeat instability in repeat expansion diseases. The current evidence demonstrates that environmental factors modulate repeat instability via DNA damage and induction of DNA repair pathways, with distinct mechanisms for repeat expansion and contraction. Of particular note, oxidative stress is a key mediator of environmentally induced repeat instability. The preliminary evidence suggests epigenetic modifications as potential mediators of environmentally induced repeat instability. Future research incorporating an array of environmental exposures, new human cohorts, and improved model systems, with a continued focus on cell-types, tissues, and critical windows, will aid in identifying mechanisms of environmentally induced repeat instability. Identifying environmental modulators of repeat instability and their mechanisms of action will inform preventions, therapies, and public health measures.
Collapse
Affiliation(s)
- Stephanie Calluori
- Department of Environmental Health Sciences, Mailman School of Public Health Columbia University, New York, NY 10032, USA
- Barnard College of Columbia University, 3009 Broadway, New York, NY 10027, USA
| | - Rebecca Stark
- Department of Environmental Health Sciences, Mailman School of Public Health Columbia University, New York, NY 10032, USA
| | - Brandon L. Pearson
- Department of Environmental Health Sciences, Mailman School of Public Health Columbia University, New York, NY 10032, USA
| |
Collapse
|
30
|
Verbiest M, Maksimov M, Jin Y, Anisimova M, Gymrek M, Bilgin Sonay T. Mutation and selection processes regulating short tandem repeats give rise to genetic and phenotypic diversity across species. J Evol Biol 2023; 36:321-336. [PMID: 36289560 PMCID: PMC9990875 DOI: 10.1111/jeb.14106] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 06/29/2022] [Accepted: 08/01/2022] [Indexed: 02/03/2023]
Abstract
Short tandem repeats (STRs) are units of 1-6 bp that repeat in a tandem fashion in DNA. Along with single nucleotide polymorphisms and large structural variations, they are among the major genomic variants underlying genetic, and likely phenotypic, divergence. STRs experience mutation rates that are orders of magnitude higher than other well-studied genotypic variants. Frequent copy number changes result in a wide range of alleles, and provide unique opportunities for modulating complex phenotypes through variation in repeat length. While classical studies have identified key roles of individual STR loci, the advent of improved sequencing technology, high-quality genome assemblies for diverse species, and bioinformatics methods for genome-wide STR analysis now enable more systematic study of STR variation across wide evolutionary ranges. In this review, we explore mutation and selection processes that affect STR copy number evolution, and how these processes give rise to varying STR patterns both within and across species. Finally, we review recent examples of functional and adaptive changes linked to STRs.
Collapse
Affiliation(s)
- Max Verbiest
- Institute of Computational Life Sciences, School of Life Sciences and Facility ManagementZürich University of Applied SciencesWädenswilSwitzerland
- Department of Molecular Life SciencesUniversity of ZurichZurichSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Mikhail Maksimov
- Department of Computer Science & EngineeringUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Ye Jin
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of BioengineeringUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Maria Anisimova
- Institute of Computational Life Sciences, School of Life Sciences and Facility ManagementZürich University of Applied SciencesWädenswilSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Melissa Gymrek
- Department of Computer Science & EngineeringUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Tugce Bilgin Sonay
- Institute of Ecology, Evolution and Environmental BiologyColumbia UniversityNew YorkNew YorkUSA
| |
Collapse
|
31
|
Martin-Trujillo A, Garg P, Patel N, Jadhav B, Sharp AJ. Genome-wide evaluation of the effect of short tandem repeat variation on local DNA methylation. Genome Res 2023; 33:184-196. [PMID: 36577521 PMCID: PMC10069470 DOI: 10.1101/gr.277057.122] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 12/19/2022] [Indexed: 12/30/2022]
Abstract
Short tandem repeats (STRs) contribute significantly to genetic diversity in humans, including disease-causing variation. Although the effect of STR variation on gene expression has been extensively assessed, their impact on epigenetics has been poorly studied and limited to specific genomic regions. Here, we investigated the hypothesis that some STRs act as independent regulators of local DNA methylation in the human genome and modify risk of common human traits. To address these questions, we first analyzed two independent data sets comprising PCR-free whole-genome sequencing (WGS) and genome-wide DNA methylation levels derived from whole-blood samples in 245 (discovery cohort) and 484 individuals (replication cohort). Using genotypes for 131,635 polymorphic STRs derived from WGS using HipSTR, we identified 11,870 STRs that associated with DNA methylation levels (mSTRs) of 11,774 CpGs (Bonferroni P < 0.001) in our discovery cohort, with 90% successfully replicating in our second cohort. Subsequently, through fine-mapping using CAVIAR we defined 585 of these mSTRs as the likely causal variants underlying the observed associations (fm-mSTRs) and linked a fraction of these to previously reported genome-wide association study signals, providing insights into the mechanisms underlying complex human traits. Furthermore, by integrating gene expression data, we observed that 12.5% of the tested fm-mSTRs also modulate expression levels of nearby genes, reinforcing their regulatory potential. Overall, our findings expand the catalog of functional sequence variants that affect genome regulation, highlighting the importance of incorporating STRs in future genetic association analysis and epigenetics data for the interpretation of trait-associated variants.
Collapse
Affiliation(s)
- Alejandro Martin-Trujillo
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, Hess Center for Science and Medicine, New York, New York 10029, USA
| | - Paras Garg
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, Hess Center for Science and Medicine, New York, New York 10029, USA
| | - Nihir Patel
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, Hess Center for Science and Medicine, New York, New York 10029, USA
| | - Bharati Jadhav
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, Hess Center for Science and Medicine, New York, New York 10029, USA
| | - Andrew J Sharp
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, Hess Center for Science and Medicine, New York, New York 10029, USA
| |
Collapse
|
32
|
Gochi L, Kawai Y, Fujimoto A. Comprehensive analysis of microsatellite polymorphisms in human populations. Hum Genet 2023; 142:45-57. [PMID: 36048238 DOI: 10.1007/s00439-022-02484-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 08/24/2022] [Indexed: 01/18/2023]
Abstract
Microsatellites (MS) are tandem repeats of short units, and have been used for population genetics, individual identification, and medical genetics. However, studies of MS on a whole-genome level are limited, and genotyping methods for MS have yet to be established. Here, we analyzed approximately 8.5 million MS regions using a previously developed MS caller for short reads (MIVcall method) for three large publicly available human genome sequencing data sets: the Korean Personal Genome Project, Simons Genome Diversity Project, and Human Genome Diversity Project. Our analysis identified 253,114 polymorphic MS. A comparison among different populations suggests that MS in the coding region evolved by random genetic drift and natural selection. In an analysis of genetic structures, MS clearly revealed population structures as SNPs and detected clusters that were not found by SNPs in African and Oceanian populations. Based on the MS polymorphisms, we selected MS marker candidates for individual identification. Finally, we applied our method to a deep sequenced ancient DNA sample. This study provides a comprehensive picture of MS polymorphisms and application to human population studies.
Collapse
Affiliation(s)
- Leo Gochi
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, 113-0003, Japan
| | - Yosuke Kawai
- Genome Medical Science Project, National Center for Global Health and Medicine, Tokyo, Japan
| | - Akihiro Fujimoto
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, 113-0003, Japan.
| |
Collapse
|
33
|
Steely CJ, Watkins WS, Baird L, Jorde LB. The mutational dynamics of short tandem repeats in large, multigenerational families. Genome Biol 2022; 23:253. [PMID: 36510265 PMCID: PMC9743774 DOI: 10.1186/s13059-022-02818-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 11/17/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Short tandem repeats (STRs) compose approximately 3% of the genome, and mutations at STR loci have been linked to dozens of human diseases including amyotrophic lateral sclerosis, Friedreich ataxia, Huntington disease, and fragile X syndrome. Improving our understanding of these mutations would increase our knowledge of the mutational dynamics of the genome and may uncover additional loci that contribute to disease. To estimate the genome-wide pattern of mutations at STR loci, we analyze blood-derived whole-genome sequencing data for 544 individuals from 29 three-generation CEPH pedigrees. These pedigrees contain both sets of grandparents, the parents, and an average of 9 grandchildren per family. RESULTS We use HipSTR to identify de novo STR mutations in the 2nd generation of these pedigrees and require transmission to the third generation for validation. Analyzing approximately 1.6 million STR loci, we estimate the empirical de novo STR mutation rate to be 5.24 × 10-5 mutations per locus per generation. Perfect repeats mutate about 2 × more often than imperfect repeats. De novo STRs are significantly enriched in Alu elements. CONCLUSIONS Approximately 30% of new STR mutations occur within Alu elements, which compose only 11% of the genome, but only 10% are found in LINE-1 insertions, which compose 17% of the genome. Phasing these mutations to the parent of origin shows that parental transmission biases vary among families. We estimate the average number of de novo genome-wide STR mutations per individual to be approximately 85, which is similar to the average number of observed de novo single nucleotide variants.
Collapse
Affiliation(s)
- Cody J. Steely
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112 USA
| | - W. Scott Watkins
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112 USA
| | - Lisa Baird
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112 USA
| | - Lynn B. Jorde
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112 USA
| |
Collapse
|
34
|
Herbert A. Nucleosomes and flipons exchange energy to alter chromatin conformation, the readout of genomic information, and cell fate. Bioessays 2022; 44:e2200166. [DOI: 10.1002/bies.202200166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 09/24/2022] [Accepted: 09/28/2022] [Indexed: 11/27/2022]
|
35
|
Abstract
Roughly 3% of the human genome consists of microsatellites or tracts of short tandem repeats (STRs). These STRs are often unstable, undergoing high-frequency expansions (increases) or contractions (decreases) in the number of repeat units. Some microsatellite instability (MSI) is seen at multiple STRs within a single cell and is associated with certain types of cancer. A second form of MSI is characterised by expansion of a single gene-specific STR and such expansions are responsible for a group of 40+ human genetic disorders known as the repeat expansion diseases (REDs). While the mismatch repair (MMR) pathway prevents genome-wide MSI, emerging evidence suggests that some MMR factors are directly involved in generating expansions in the REDs. Thus, MMR suppresses some forms of expansion while some MMR factors promote expansion in other contexts. This review will cover what is known about the paradoxical effect of MMR on microsatellite expansion in mammalian cells.
Collapse
|
36
|
Antão-Sousa S, Conde-Sousa E, Gusmão L, Amorim A, Pinto N. How often have X- and Autosomal-STRs mutations equivocal parental origin been assigned? FORENSIC SCIENCE INTERNATIONAL GENETICS SUPPLEMENT SERIES 2022. [DOI: 10.1016/j.fsigss.2022.09.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
37
|
Omolaoye TS, El Shahawy O, Skosana BT, Boillat T, Loney T, du Plessis SS. The mutagenic effect of tobacco smoke on male fertility. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022; 29:62055-62066. [PMID: 34536221 PMCID: PMC9464177 DOI: 10.1007/s11356-021-16331-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Accepted: 08/30/2021] [Indexed: 05/15/2023]
Abstract
Despite the association between tobacco use and the harmful effects on general health as well as male fertility parameters, smoking remains globally prevalent. The main content of tobacco smoke is nicotine and its metabolite cotinine. These compounds can pass the blood-testis barrier, which subsequently causes harm of diverse degree to the germ cells. Although controversial, smoking has been shown to cause not only a decrease in sperm motility, sperm concentration, and an increase in abnormal sperm morphology, but also genetic and epigenetic aberrations in spermatozoa. Both animal and human studies have highlighted the occurrence of sperm DNA-strand breaks (fragmentation), genome instability, genetic mutations, and the presence of aneuploids in the germline of animals and men exposed to tobacco smoke. The question to be asked at this point is, if smoking has the potential to cause all these genetic aberrations, what is the extent of damage? Hence, this review aimed to provide evidence that smoking has a mutagenic effect on sperm and how this subsequently affects male fertility. Additionally, the role of tobacco smoke as an aneugen will be explored. We furthermore aim to incorporate the epidemiological aspects of the aforementioned and provide a holistic approach to the topic.
Collapse
Affiliation(s)
- Temidayo S Omolaoye
- College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, United Arab Emirates
- Division of Medical Physiology, Faculty of Medicine and Health Sciences, Stellenbosch University, Tygerberg, South Africa
| | - Omar El Shahawy
- Department of Population Health, New York University Grossman School of Medicine, New York City, NY, USA
| | - Bongekile T Skosana
- Division of Medical Physiology, Faculty of Medicine and Health Sciences, Stellenbosch University, Tygerberg, South Africa
| | - Thomas Boillat
- College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, United Arab Emirates
| | - Tom Loney
- College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, United Arab Emirates
| | - Stefan S du Plessis
- College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, United Arab Emirates.
- Division of Medical Physiology, Faculty of Medicine and Health Sciences, Stellenbosch University, Tygerberg, South Africa.
| |
Collapse
|
38
|
Antão-Sousa S, Conde-Sousa E, Gusmão L, Amorim A, Pinto N. Estimations of Mutation Rates Depend on Population Allele Frequency Distribution: The Case of Autosomal Microsatellites. Genes (Basel) 2022; 13:genes13071248. [PMID: 35886031 PMCID: PMC9323320 DOI: 10.3390/genes13071248] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 06/28/2022] [Accepted: 07/11/2022] [Indexed: 01/27/2023] Open
Abstract
Microsatellites (or short-tandem repeats (STRs)) are widely used in anthropology and evolutionary studies. Their extensive polymorphism and rapid evolution make them the ideal genetic marker for dating events, such as the age of a gene or a population. This usage requires the estimation of mutation rates, which are usually estimated by counting the observed Mendelian incompatibilities in one-generation familial configurations (typically parent(s)–child duos or trios). Underestimations are inevitable when using this approach, due to the occurrence of mutational events that do not lead to incompatibilities with the parental genotypes (‘hidden’ or ‘covert’ mutations). It is known that the likelihood that one mutation event leads to a Mendelian incompatibility depends on the mode of genetic transmission considered, the type of familial configuration (duos or trios) considered, and the genotype(s) of the progenitor(s). In this work, we show how the magnitude of the underestimation of autosomal microsatellite mutation rates varies with the populations’ allele frequency distribution spectrum. The Mendelian incompatibilities approach (MIA) was applied to simulated parent(s)/offspring duos and trios in different populational scenarios. The results showed that the magnitude and type of biases depend on the population allele frequency distribution, whatever the type of familial data considered, and are greater when duos, instead of trios, are used to obtain the estimates. The implications for molecular anthropology are discussed and a simple framework is presented to correct the naïf estimates, along with an informatics tool for the correction of incompatibility rates obtained through the MIA.
Collapse
Affiliation(s)
- Sofia Antão-Sousa
- Instituto de Investigação e Inovação em Saúde (i3S), 4200-135 Porto, Portugal; (E.C.-S.); (A.A.); (N.P.)
- Institute of Molecular Pathology and Immunology, University of Porto (IPATIMUP), 4200-465 Porto, Portugal
- Faculty of Sciences, University of Porto (FCUP), 4169-007 Porto, Portugal
- DNA Diagnostic Laboratory (LDD), State University of Rio de Janeiro (UERJ), Rio de Janeiro 20550-013, Brazil;
- Correspondence:
| | - Eduardo Conde-Sousa
- Instituto de Investigação e Inovação em Saúde (i3S), 4200-135 Porto, Portugal; (E.C.-S.); (A.A.); (N.P.)
- Instituto de Engenharia Biomédica (INEB), 4200-135 Porto, Portugal
| | - Leonor Gusmão
- DNA Diagnostic Laboratory (LDD), State University of Rio de Janeiro (UERJ), Rio de Janeiro 20550-013, Brazil;
| | - António Amorim
- Instituto de Investigação e Inovação em Saúde (i3S), 4200-135 Porto, Portugal; (E.C.-S.); (A.A.); (N.P.)
- Institute of Molecular Pathology and Immunology, University of Porto (IPATIMUP), 4200-465 Porto, Portugal
- Faculty of Sciences, University of Porto (FCUP), 4169-007 Porto, Portugal
| | - Nádia Pinto
- Instituto de Investigação e Inovação em Saúde (i3S), 4200-135 Porto, Portugal; (E.C.-S.); (A.A.); (N.P.)
- Institute of Molecular Pathology and Immunology, University of Porto (IPATIMUP), 4200-465 Porto, Portugal
- Center of Mathematics, University of Porto (CMUP), 4169-007 Porto, Portugal
| |
Collapse
|
39
|
Rashed WM, Marcotte EL, Spector LG. Germline De Novo Mutations as a Cause of Childhood Cancer. JCO Precis Oncol 2022; 6:e2100505. [PMID: 35820085 DOI: 10.1200/po.21.00505] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Germline de novo mutations (DNMs) represent one of the important topics that need extensive attention from epidemiologists, geneticists, and other relevant stakeholders. Advances in next-generation sequencing technologies allowed examination of parent-offspring trios to ascertain the frequency of germline DNMs. Many epidemiological risk factors for childhood cancer are indicative of DNMs as a mechanism. The aim of this review was to give an overview of germline DNMs, their causes in general, and to discuss their relation to childhood cancer risk. In addition, we highlighted existing gaps in knowledge in many topics of germline DNMs in childhood cancer that need exploration and collaborative efforts.
Collapse
Affiliation(s)
- Wafaa M Rashed
- Research Department, Children's Cancer Hospital-Egypt 57357 (CCHE-57357), Cairo, Egypt
| | - Erin L Marcotte
- Division of Epidemiology/Clinical, Research, Department of Pediatrics, University of Minnesota, Minneapolis, MN.,Masonic Cancer Center, University of Minnesota, Minneapolis, MN
| | - Logan G Spector
- Division of Epidemiology/Clinical, Research, Department of Pediatrics, University of Minnesota, Minneapolis, MN.,Masonic Cancer Center, University of Minnesota, Minneapolis, MN
| |
Collapse
|
40
|
Boldyreva LV, Andreyeva EN, Pindyurin AV. Position Effect Variegation: Role of the Local Chromatin Context in Gene Expression Regulation. Mol Biol 2022. [DOI: 10.1134/s0026893322030049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
41
|
Mei H, Zhao T, Dong Z, Han J, Xu B, Chen R, Zhang J, Zhang J, Hu Y, Zhang T, Fang L. Population-Scale Polymorphic Short Tandem Repeat Provides an Alternative Strategy for Allele Mining in Cotton. FRONTIERS IN PLANT SCIENCE 2022; 13:916830. [PMID: 35599867 PMCID: PMC9120961 DOI: 10.3389/fpls.2022.916830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Accepted: 04/20/2022] [Indexed: 06/15/2023]
Abstract
Short tandem repeats (STRs), which vary in size due to featuring variable numbers of repeat units, are present throughout most eukaryotic genomes. To date, few population-scale studies identifying STRs have been reported for crops. Here, we constructed a high-density polymorphic STR map by investigating polymorphic STRs from 911 Gossypium hirsutum accessions. In total, we identified 556,426 polymorphic STRs with an average length of 21.1 bp, of which 69.08% were biallelic. Moreover, 7,718 (1.39%) were identified in the exons of 6,021 genes, which were significantly enriched in transcription, ribosome biogenesis, and signal transduction. Only 5.88% of those exonic STRs altered open reading frames, of which 97.16% were trinucleotide. An alternative strategy STR-GWAS analysis revealed that 824 STRs were significantly associated with agronomic traits, including 491 novel alleles that undetectable by previous SNP-GWAS methods. For instance, a novel polymorphic STR consisting of GAACCA repeats was identified in GH_D06G1697, with its (GAACCA)5 allele increasing fiber length by 1.96-4.83% relative to the (GAACCA)4 allele. The database CottonSTRDB was further developed to facilitate use of STR datasets in breeding programs. Our study provides functional roles for STRs in influencing complex traits, an alternative strategy STR-GWAS for allele mining, and a database serving the cotton community as a valuable resource.
Collapse
Affiliation(s)
- Huan Mei
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Ting Zhao
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Zeyu Dong
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Jin Han
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Biyu Xu
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Rui Chen
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Jun Zhang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Juncheng Zhang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Yan Hu
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
- Hainan Institute of Zhejiang University, Sanya, China
| | - Tianzhen Zhang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
- Hainan Institute of Zhejiang University, Sanya, China
| | - Lei Fang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
- Hainan Institute of Zhejiang University, Sanya, China
| |
Collapse
|
42
|
Kaplanis J, Ide B, Sanghvi R, Neville M, Danecek P, Coorens T, Prigmore E, Short P, Gallone G, McRae J, Carmichael J, Barnicoat A, Firth H, O'Brien P, Rahbari R, Hurles M. Genetic and chemotherapeutic influences on germline hypermutation. Nature 2022; 605:503-508. [PMID: 35545669 PMCID: PMC9117138 DOI: 10.1038/s41586-022-04712-2] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 03/31/2022] [Indexed: 01/06/2023]
Abstract
Mutations in the germline generates all evolutionary genetic variation and is a cause of genetic disease. Parental age is the primary determinant of the number of new germline mutations in an individual's genome1,2. Here we analysed the genome-wide sequences of 21,879 families with rare genetic diseases and identified 12 individuals with a hypermutated genome with between two and seven times more de novo single-nucleotide variants than expected. In most families (9 out of 12), the excess mutations came from the father. Two families had genetic drivers of germline hypermutation, with fathers carrying damaging genetic variation in DNA-repair genes. For five of the families, paternal exposure to chemotherapeutic agents before conception was probably a key driver of hypermutation. Our results suggest that the germline is well protected from mutagenic effects, hypermutation is rare, the number of excess mutations is relatively modest and most individuals with a hypermutated genome will not have a genetic disease.
Collapse
Affiliation(s)
- Joanna Kaplanis
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Benjamin Ide
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Rashesh Sanghvi
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Matthew Neville
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Petr Danecek
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Tim Coorens
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Elena Prigmore
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Patrick Short
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | | | - Jeremy McRae
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Jenny Carmichael
- East Anglian Medical Genetics Service, Cambridge University Hospitals, Cambridge, UK
| | - Angela Barnicoat
- North East Thames Regional Genetics Service, Great Ormond Street Hospital, London, UK
| | - Helen Firth
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- East Anglian Medical Genetics Service, Cambridge University Hospitals, Cambridge, UK
| | - Patrick O'Brien
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Raheleh Rahbari
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Matthew Hurles
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
| |
Collapse
|
43
|
Luttman AM, Komine M, Thaiwong T, Carpenter T, Ewart SL, Kiupel M, Langohr IM, Venta PJ. Development of a 17-Plex of Penta- and Tetra-Nucleotide Microsatellites for DNA Profiling and Paternity Testing in Horses. Front Vet Sci 2022; 9:861623. [PMID: 35464354 PMCID: PMC9021955 DOI: 10.3389/fvets.2022.861623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 02/28/2022] [Indexed: 11/13/2022] Open
Abstract
Tetranucleotide and pentanucleotide short tandem repeat (hereafter termed tetraSTR and pentaSTR) polymorphisms have properties that make them desirable for DNA profiling and paternity testing. However, certain species, such as the horse, have far fewer tetraSTRs than other species and for this reason dinucleotide STRs (diSTRs) have become the standard for DNA profiling in horses, despite being less desirable for technical reasons. During our testing of a series of candidate genes as potentially underlying a heritable condition characterized by megaesophagus in the Friesian horse breed, we found that good tetraSTRs do exist in horses but, as expected, at a much lower frequency than in other species, e.g., dogs and humans. Using a series of efficient methods developed in our laboratory for the production of multiplexed tetraSTRs in other species, we identified a set of tetra- and pentaSTRs that we developed into a 17-plex panel for the horse, plus a sex-identifying marker near the amelogenin gene. These markers were tested in 128 horses representing 16 breeds as well as crossbred horses, and we found that these markers have useful genetic variability. Average observed heterozygosities (Ho) ranged from 0.53 to 0.89 for the individual markers (0.66 average Ho for all markers), and 0.62-0.82 for expected heterozygosity (He) within breeds (0.72 average He for all markers). The probability of identity (PI) within breeds for which 10 or more samples were available was at least 1.1 x 10−11, and the PI among siblings (PIsib) was 1.5 x 10−5. Stutter was ≤ 11% (average stutter for all markers combined was 6.9%) compared to the more than 30% typically seen with diSTRs. We predict that it will be possible to develop accurate allelic ladders for this multiplex panel that will make cross-laboratory comparisons easier and will also improve DNA profiling accuracy. Although we were only able to exclude candidate genes for Friesian horse megaesophagus with no unexcluded genes that are possibly causative at this point in time, the study helped us to refine the methods used to develop better tetraSTR multiplexed panels for species such as the horse that have a low frequency of tetraSTRs.
Collapse
Affiliation(s)
- Andrea M. Luttman
- Microbiology and Molecular Genetics, College of Veterinary Medicine, Michigan State University, East Lansing, MI, United States
- Genetics and Genomic Sciences, Michigan State University, East Lansing, MI, United States
| | - Misa Komine
- Pathobiology and Diagnostic Investigation, College of Veterinary Medicine, Michigan State University, East Lansing, MI, United States
| | - Tuddow Thaiwong
- Veterinary Diagnostic Laboratory, College of Veterinary Medicine, Michigan State University, East Lansing, MI, United States
- *Correspondence: Tuddow Thaiwong
| | - Tyler Carpenter
- Microbiology and Molecular Genetics, College of Veterinary Medicine, Michigan State University, East Lansing, MI, United States
- Department of Obstetrics, Gynecology and Reproductive Biology, Michigan State University College of Human Medicine, Grand Rapids, MI, United States
| | - Susan L. Ewart
- Large Animal Clinical Sciences, College of Veterinary Medicine, Michigan State University, East Lansing, MI, United States
| | - Matti Kiupel
- Pathobiology and Diagnostic Investigation, College of Veterinary Medicine, Michigan State University, East Lansing, MI, United States
- Veterinary Diagnostic Laboratory, College of Veterinary Medicine, Michigan State University, East Lansing, MI, United States
| | - Ingeborg M. Langohr
- Pathobiology and Diagnostic Investigation, College of Veterinary Medicine, Michigan State University, East Lansing, MI, United States
- Pathobiological Sciences, School of Veterinary Medicine, Louisiana State University, Baton Rouge, LA, United States
| | - Patrick J. Venta
- Microbiology and Molecular Genetics, College of Veterinary Medicine, Michigan State University, East Lansing, MI, United States
- Small Animal Clinical Sciences, College of Veterinary Medicine, Michigan State University, East Lansing, MI, United States
| |
Collapse
|
44
|
Xiao X, Zhang CY, Zhang Z, Hu Z, Li M, Li T. Revisiting tandem repeats in psychiatric disorders from perspectives of genetics, physiology, and brain evolution. Mol Psychiatry 2022; 27:466-475. [PMID: 34650204 DOI: 10.1038/s41380-021-01329-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 09/16/2021] [Accepted: 09/28/2021] [Indexed: 01/28/2023]
Abstract
Genome-wide association studies (GWASs) have revealed substantial genetic components comprised of single nucleotide polymorphisms (SNPs) in the heritable risk of psychiatric disorders. However, genetic risk factors not covered by GWAS also play pivotal roles in these illnesses. Tandem repeats, which are likely functional but frequently overlooked by GWAS, may account for an important proportion in the "missing heritability" of psychiatric disorders. Despite difficulties in characterizing and quantifying tandem repeats in the genome, studies have been carried out in an attempt to describe impact of tandem repeats on gene regulation and human phenotypes. In this review, we have introduced recent research progress regarding the genomic distribution and regulatory mechanisms of tandem repeats. We have also summarized the current knowledge of the genetic architecture and biological underpinnings of psychiatric disorders brought by studies of tandem repeats. These findings suggest that tandem repeats, in candidate psychiatric risk genes or in different levels of linkage disequilibrium (LD) with psychiatric GWAS SNPs and haplotypes, may modulate biological phenotypes related to psychiatric disorders (e.g., cognitive function and brain physiology) through regulating alternative splicing, promoter activity, enhancer activity and so on. In addition, many tandem repeats undergo tight natural selection in the human lineage, and likely exert crucial roles in human brain evolution. Taken together, the putative roles of tandem repeats in the pathogenesis of psychiatric disorders is strongly implicated, and using examples from previous literatures, we wish to call for further attention to tandem repeats in the post-GWAS era of psychiatric disorders.
Collapse
Affiliation(s)
- Xiao Xiao
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Chu-Yi Zhang
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.,Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Zhuohua Zhang
- Institute of Molecular Precision Medicine and Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China.,Center for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Zhonghua Hu
- Institute of Molecular Precision Medicine and Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,Center for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China. .,Department of Critical Care Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,Hunan Key Laboratory of Animal Models for Human Diseases, School of Life Sciences, Central South University, Changsha, Hunan, China. .,Eye Center of Xiangya Hospital and Hunan Key Laboratory of Ophthalmology, Central South University, Changsha, Hunan, China. .,National Clinical Research Center on Mental Disorders, Changsha, Hunan, China.
| | - Ming Li
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China. .,CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China. .,KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.
| | - Tao Li
- Affiliated Mental Health Center & Hangzhou Seventh People's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China. .,Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Guangzhou, China.
| |
Collapse
|
45
|
Honka J, Baini S, Searle JB, Kvist L, Aspi J. Genetic assessment reveals inbreeding, possible hybridization, and low levels of genetic structure in a declining goose population. Ecol Evol 2022; 12:e8547. [PMID: 35127046 PMCID: PMC8796947 DOI: 10.1002/ece3.8547] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 12/18/2021] [Accepted: 12/21/2021] [Indexed: 11/30/2022] Open
Abstract
The population numbers of taiga bean goose (Anser fabalis fabalis) have halved during recent decades. Since this subspecies is hunted throughout most of its range, the decline is of management concern. Knowledge of the genetic population structure and diversity is important for guiding management and conservation efforts. Genetically unique subpopulations might be hunted to extinction if not managed separately, and any inbreeding depression or lack of genetic diversity may affect the ability to adapt to changing environments and increase extinction risk. We used microsatellite and mitochondrial DNA markers to study the genetic population structure and diversity among taiga bean geese breeding within the Central flyway management unit using non-invasively collected feathers. We found some genetic structuring with the maternally inherited mitochondrial DNA between four geographic regions (ɸ ST = 0.11-0.20) but none with the nuclear microsatellite markers (all pairwise F ST-values = 0.002-0.005). These results could be explained by female natal philopatry and male-biased dispersal, which completely homogenizes the nuclear genome. Therefore, the population could be managed as a single unit. Genetic diversity was still at a moderate level (average H E = 0.69) and there were no signs of past population size reductions, although significantly positive inbreeding coefficients in all sampling sites (F IS = 0.05-0.10) and high relatedness values (r = 0.60-0.86) between some individuals could indicate inbreeding. In addition, there was evidence of either incomplete lineage sorting or introgression from the pink-footed goose (Anser brachyrhynchus). The current population is not under threat by genetic impoverishment but monitoring in the future is desirable.
Collapse
Affiliation(s)
- Johanna Honka
- Ecology and Genetics Research UnitUniversity of OuluOuluFinland
| | - Serena Baini
- Department of BiologyUniversity of Rome “Tor Vergata”RomeItaly
| | - Jeremy B. Searle
- Department of Ecology and Evolutionary BiologyCornell UniversityIthacaNew YorkUSA
| | - Laura Kvist
- Ecology and Genetics Research UnitUniversity of OuluOuluFinland
| | - Jouni Aspi
- Ecology and Genetics Research UnitUniversity of OuluOuluFinland
| |
Collapse
|
46
|
The molecular pathogenesis of repeat expansion diseases. Biochem Soc Trans 2021; 50:119-134. [PMID: 34940797 DOI: 10.1042/bst20200143] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 11/30/2021] [Accepted: 12/06/2021] [Indexed: 12/28/2022]
Abstract
Expanded short tandem repeats in the genome cause various monogenic diseases, particularly neurological disorders. Since the discovery of a CGG repeat expansion in the FMR1 gene in 1991, more than 40 repeat expansion diseases have been identified to date. In the coding repeat expansion diseases, in which the expanded repeat sequence is located in the coding regions of genes, the toxicity of repeat polypeptides, particularly misfolding and aggregation of proteins containing an expanded polyglutamine tract, have been the focus of investigation. On the other hand, in the non-coding repeat expansion diseases, in which the expanded repeat sequence is located in introns or untranslated regions, the toxicity of repeat RNAs has been the focus of investigation. Recently, these repeat RNAs were demonstrated to be translated into repeat polypeptides by the novel mechanism of repeat-associated non-AUG translation, which has extended the research direction of the pathological mechanisms of this disease entity to include polypeptide toxicity. Thus, a common pathogenesis has been suggested for both coding and non-coding repeat expansion diseases. In this review, we briefly outline the major pathogenic mechanisms of repeat expansion diseases, including a loss-of-function mechanism caused by repeat expansion, repeat RNA toxicity caused by RNA foci formation and protein sequestration, and toxicity by repeat polypeptides. We also discuss perturbation of the physiological liquid-liquid phase separation state caused by these repeat RNAs and repeat polypeptides, as well as potential therapeutic approaches against repeat expansion diseases.
Collapse
|
47
|
Lyne AM, Perie L. Comparing Phylogenetic Approaches to Reconstructing Cell Lineage From Microsatellites With Missing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2291-2301. [PMID: 32386163 DOI: 10.1109/tcbb.2020.2992813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Due to the imperfect fidelity of DNA replication, somatic cells acquire DNA mutations at each division which record their lineage history. Microsatellites, tandem repeats of DNA nucleotide motifs, mutate more frequently than other genomic regions and by observing microsatellite lengths in single cells and implementing suitable inference procedures, the cell lineage tree of an organism can be reconstructed. Due to recent advances in single cell Next Generation Sequencing (NGS) and the phylogenetic methods used to infer lineage trees, this work investigates which computational approaches best exploit the lineage information found in single cell NGS data. We simulated trees representing cell division with mutating microsatellites, and tested a range of available phylogenetic algorithms to reconstruct cell lineage. We found that distance-based approaches are fast and accurate with fully observed data. However, Maximum Parsimony and the computationally intensive probabilistic methods are more robust to missing data and therefore better suited to reconstructing cell lineage from NGS datasets. We also investigated how robust reconstruction algorithms are to different tree topologies and mutation generation models. Our results show that the flexibility of Maximum Parsimony and the probabilistic approaches mean they can be adapted to allow good reconstruction across a range of biologically relevant scenarios.
Collapse
|
48
|
Ledoux J, Ghanem R, Horaud M, López‐Sendino P, Romero‐Soriano V, Antunes A, Bensoussan N, Gómez‐Gras D, Linares C, Machordom A, Ocaña O, Templado J, Leblois R, Ben Souissi J, Garrabou J. Gradients of genetic diversity and differentiation across the distribution range of a Mediterranean coral: Patterns, processes and conservation implications. DIVERS DISTRIB 2021. [DOI: 10.1111/ddi.13382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Affiliation(s)
- Jean‐Baptiste Ledoux
- CIIMAR/CIMAR Centro Interdisciplinar de Investigação Marinha e Ambiental Universidade do Porto Porto Portugal
- Institut de Ciències del Mar CSIC Barcelona Spain
| | - Raouia Ghanem
- Institut National Agronomique de Tunisie Université de Carthage Tunis Tunisie
- Laboratoire de Biodiversité, Biotechnologies et Changements Climatiques (LR11ES09) Université Tunis El Manar Tunis Tunisie
| | | | | | | | - Agostinho Antunes
- CIIMAR/CIMAR Centro Interdisciplinar de Investigação Marinha e Ambiental Universidade do Porto Porto Portugal
- Departamento de Biologia Faculdade de Ciências Universidade do Porto Porto Portugal
| | | | | | - Cristina Linares
- Departament de Biologia Evolutiva, Ecologia i Ciències Ambientals Institut de Recerca de la Biodiversitat (IRBIO) Universitat de Barcelona Barcelona Spain
| | - Annie Machordom
- Museo Nacional de Ciencias Naturales (MNCN‐CSIC) Madrid Spain
| | - Oscar Ocaña
- Departamento de Oceanografía Biológica y Biodiversidad Fundación Museo del Mar de Ceuta Ceuta Spain
| | - José Templado
- Museo Nacional de Ciencias Naturales (MNCN‐CSIC) Madrid Spain
| | - Raphaêl Leblois
- CBGP INRAE CIRAD IRD Montpellier SupAgro University of Montpellier Montpellier France
- Institut de Biologie Computationnelle University of Montpellier Montpellier France
| | - Jamila Ben Souissi
- Institut National Agronomique de Tunisie Université de Carthage Tunis Tunisie
- Laboratoire de Biodiversité, Biotechnologies et Changements Climatiques (LR11ES09) Université Tunis El Manar Tunis Tunisie
| | | |
Collapse
|
49
|
Lupski JR. Clan genomics: From OMIM phenotypic traits to genes and biology. Am J Med Genet A 2021; 185:3294-3313. [PMID: 34405553 PMCID: PMC8530976 DOI: 10.1002/ajmg.a.62434] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 06/29/2021] [Accepted: 07/04/2021] [Indexed: 12/20/2022]
Abstract
Clinical characterization of a patient phenotype has been the quintessential approach for elucidating a differential diagnosis and a hypothesis to explore a potential clinical diagnosis. This has resulted in a language of medicine and a semantic ontology, with both specialty- and subspecialty-specific lexicons, that can be challenging to translate and interpret. There is no 'Rosetta Stone' of clinical medicine such as the genetic code that can assist translation and interpretation of the language of genetics. Nevertheless, the information content embodied within a clinical diagnosis can guide management, therapeutic intervention, and potentially prognostic outlook of disease enabling anticipatory guidance for patients and families. Clinical genomics is now established firmly in medical practice. The granularity and informative content of a personal genome is immense. Yet, we are limited in our utility of much of that personal genome information by the lack of functional characterization of the overwhelming majority of computationally annotated genes in the haploid human reference genome sequence. Whereas DNA and the genetic code have provided a 'Rosetta Stone' to translate genetic variant information, clinical medicine, and clinical genomics provide the context to understand human biology and disease. A path forward will integrate deep phenotyping, such as available in a clinical synopsis in the Online Mendelian Inheritance in Man (OMIM) entries, with personal genome analyses.
Collapse
Affiliation(s)
- James R Lupski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- Department of Pediatrics, Baylor College of Medicine, Houston, Texas, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
- Texas Children's Hospital, Houston, Texas, USA
| |
Collapse
|
50
|
Gauffre B, Boissinot A, Quiquempois V, Leblois R, Grillet P, Morin S, Picard D, Ribout C, Lourdais O. Agricultural intensification alters marbled newt genetic diversity and gene flow through density and dispersal reduction. Mol Ecol 2021; 31:119-133. [PMID: 34674328 DOI: 10.1111/mec.16236] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Revised: 10/08/2021] [Accepted: 10/18/2021] [Indexed: 11/30/2022]
Abstract
Recent agricultural intensification threatens global biodiversity with amphibians being one of the most impacted groups. Because of their biphasic life cycle, amphibians are particularly vulnerable to habitat loss and fragmentation that often result in small, isolated populations and loss of genetic diversity. Here, we studied how landscape heterogeneity affects genetic diversity, gene flow and demographic parameters in the marbled newt, Triturus marmoratus, over a hedgerow network landscape in Western France. While the northern part of the study area consists of preserved hedged farmland, the southern part was more profoundly converted for intensive arable crops production after WWII. Based on 67 sampled ponds and 10 microsatellite loci, we characterized regional population genetic structure and evaluated the correlation between landscape variables and (i) local genetic diversity using mixed models and (ii) genetic distance using multiple regression methods and commonality analysis. We identified a single genetic population characterized by a spatially heterogeneous isolation-by-distance pattern. Pond density in the surrounding landscape positively affected local genetic diversity while arable crop land cover negatively affected gene flow and connectivity. We used demographic inferences to quantitatively assess differences in effective population density and dispersal between the contrasted landscapes characterizing the northern and southern parts of the study area. Altogether, results suggest recent land conversion affected T. marmoratus through reduction in both effective population density and dispersal due to habitat loss and reduced connectivity.
Collapse
Affiliation(s)
- Bertrand Gauffre
- INRAE, UR 1115 PSH, Plantes et Systèmes de culture Horticoles, Avignon, France.,School of Biological Sciences, Monash University, Clayton, Vic., Australia
| | - Alexandre Boissinot
- CNRS, UMR 7372 CEBC - Université de La Rochelle, Villiers-en-Bois, France.,Réserve Naturelle Régionale du Bocage des Antonins - Deux-Sèvres Nature Environnement, Niort, France
| | | | - Raphael Leblois
- CBGP UMR 1062, INRAE, CIRAD, IRD, Montpellier SupAgro, Univ. Montpellier, Montpellier, France.,Institut de Biologie Computationnelle, Univ. Montpellier, Montpelier, France
| | - Pierre Grillet
- CNRS, UMR 7372 CEBC - Université de La Rochelle, Villiers-en-Bois, France
| | - Sophie Morin
- Office Français de la Biodiversité, Villiers-en-Bois, France
| | - Damien Picard
- Département de Biologie, UFR Sciences, Angers, France
| | - Cécile Ribout
- CNRS, UMR 7372 CEBC - Université de La Rochelle, Villiers-en-Bois, France
| | - Olivier Lourdais
- CNRS, UMR 7372 CEBC - Université de La Rochelle, Villiers-en-Bois, France.,School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|