1
|
Caporale LH. Evolutionary feedback from the environment shapes mechanisms that generate genome variation. J Physiol 2024; 602:2601-2614. [PMID: 38194279 DOI: 10.1113/jp284411] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 12/14/2023] [Indexed: 01/10/2024] Open
Abstract
Darwin recognized that 'a grand and almost untrodden field of inquiry will be opened, on the causes and laws of variation.' However, because the Modern Synthesis assumes that the intrinsic probability of any individual mutation is unrelated to that mutation's potential adaptive value, attention has been focused on selection rather than on the intrinsic generation of variation. Yet many examples illustrate that the term 'random' mutation, as widely understood, is inaccurate. The probabilities of distinct classes of variation are neither evenly distributed across a genome nor invariant over time, nor unrelated to their potential adaptive value. Because selection acts upon variation, multiple biochemical mechanisms can and have evolved that increase the relative probability of adaptive mutations. In effect, the generation of heritable variation is in a feedback loop with selection, such that those mechanisms that tend to generate variants that survive recurring challenges in the environment would be captured by this survival and thus inherited and accumulated within lineages of genomes. Moreover, because genome variation is affected by a wide range of biochemical processes, genome variation can be regulated. Biochemical mechanisms that sense stress, from lack of nutrients to DNA damage, can increase the probability of specific classes of variation. A deeper understanding of evolution involves attention to the evolution of, and environmental influences upon, the intrinsic variation generated in gametes, in other words upon the biochemical mechanisms that generate variation across generations. These concepts have profound implications for the types of questions that can and should be asked, as omics databases become more comprehensive, detection methods more sensitive, and computation and experimental analyses even more high throughput and thus capable of revealing the intrinsic generation of variation in individual gametes. These concepts also have profound implications for evolutionary theory, which, upon reflection it will be argued, predicts that selection would increase the probability of generating adaptive mutations, in other words, predicts that the ability to evolve itself evolves.
Collapse
|
2
|
Chen Z, Gustavsson EK, Macpherson H, Anderson C, Clarkson C, Rocca C, Self E, Alvarez Jerez P, Scardamaglia A, Pellerin D, Montgomery K, Lee J, Gagliardi D, Luo H, Hardy J, Polke J, Singleton AB, Blauwendraat C, Mathews KD, Tucci A, Fu YH, Houlden H, Ryten M, Ptáček LJ. Adaptive Long-Read Sequencing Reveals GGC Repeat Expansion in ZFHX3 Associated with Spinocerebellar Ataxia Type 4. Mov Disord 2024; 39:486-497. [PMID: 38197134 DOI: 10.1002/mds.29704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 11/29/2023] [Accepted: 12/15/2023] [Indexed: 01/11/2024] Open
Abstract
BACKGROUND Spinocerebellar ataxia type 4 (SCA4) is an autosomal dominant ataxia with invariable sensory neuropathy originally described in a family with Swedish ancestry residing in Utah more than 25 years ago. Despite tight linkage to the 16q22 region, the molecular diagnosis has since remained elusive. OBJECTIVES Inspired by pathogenic structural variation implicated in other 16q-ataxias with linkage to the same locus, we revisited the index SCA4 cases from the Utah family using novel technologies to investigate structural variation within the candidate region. METHODS We adopted a targeted long-read sequencing approach with adaptive sampling on the Oxford Nanopore Technologies (ONT) platform that enables the detection of segregating structural variants within a genomic region without a priori assumptions about any variant features. RESULTS Using this approach, we found a heterozygous (GGC)n repeat expansion in the last coding exon of the zinc finger homeobox 3 (ZFHX3) gene that segregates with disease, ranging between 48 and 57 GGC repeats in affected probands. This finding was replicated in a separate family with SCA4. Furthermore, the estimation of this GGC repeat size in short-read whole genome sequencing (WGS) data of 21,836 individuals recruited to the 100,000 Genomes Project in the UK and our in-house dataset of 11,258 exomes did not reveal any pathogenic repeats, indicating that the variant is ultrarare. CONCLUSIONS These findings support the utility of adaptive long-read sequencing as a powerful tool to decipher causative structural variation in unsolved cases of inherited neurological disease. © 2024 The Authors. Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society.
Collapse
Affiliation(s)
- Zhongbo Chen
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London, London, United Kingdom
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, United Kingdom
| | - Emil K Gustavsson
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, United Kingdom
| | - Hannah Macpherson
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London, London, United Kingdom
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - Claire Anderson
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, United Kingdom
| | - Chris Clarkson
- William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
| | - Clarissa Rocca
- Department of Neuromuscular Disease, Queen Square Institute of Neurology, University College London, London, United Kingdom
| | - Eleanor Self
- Department of Neuromuscular Disease, Queen Square Institute of Neurology, University College London, London, United Kingdom
| | - Pilar Alvarez Jerez
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London, London, United Kingdom
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
| | - Annarita Scardamaglia
- Department of Neuromuscular Disease, Queen Square Institute of Neurology, University College London, London, United Kingdom
| | - David Pellerin
- Department of Neuromuscular Disease, Queen Square Institute of Neurology, University College London, London, United Kingdom
| | - Kylie Montgomery
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, United Kingdom
| | - Jasmaine Lee
- Department of Neuromuscular Disease, Queen Square Institute of Neurology, University College London, London, United Kingdom
| | - Delia Gagliardi
- Department of Neuromuscular Disease, Queen Square Institute of Neurology, University College London, London, United Kingdom
| | - Huihui Luo
- Department of Neuromuscular Disease, Queen Square Institute of Neurology, University College London, London, United Kingdom
| | - John Hardy
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London, London, United Kingdom
- Reta Lila Weston Institute, Queen Square Institute of Neurology, University College London, London, United Kingdom
- UK Dementia Research Institute, University College London, London, United Kingdom
- NIHR University College London Hospitals Biomedical Research Centre, London, United Kingdom
- Institute for Advanced Study, The Hong Kong University of Science and Technology, Hong Kong, China
| | - James Polke
- The Neurogenetics Laboratory, National Hospital for Neurology and Neurosurgery, London, United Kingdom
| | - Andrew B Singleton
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, Maryland, USA
| | - Cornelis Blauwendraat
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, Maryland, USA
| | - Katherine D Mathews
- Department of Pediatrics, University of Iowa Carver College of Medicine, Iowa City, Iowa, USA
- Department of Neurology, University of Iowa Carver College of Medicine, Iowa City, Iowa, USA
| | - Arianna Tucci
- William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
| | - Ying-Hui Fu
- Department of Neurology, University of California San Francisco, San Francisco, California, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
- Weill Institute for Neuroscience, University of California San Francisco, San Francisco, California, USA
- Kavli Institute for Fundamental Neuroscience, University of California San Francisco, San Francisco, California, USA
| | - Henry Houlden
- Department of Neuromuscular Disease, Queen Square Institute of Neurology, University College London, London, United Kingdom
- The Neurogenetics Laboratory, National Hospital for Neurology and Neurosurgery, London, United Kingdom
| | - Mina Ryten
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, United Kingdom
| | - Louis J Ptáček
- Department of Neurology, University of California San Francisco, San Francisco, California, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
- Weill Institute for Neuroscience, University of California San Francisco, San Francisco, California, USA
- Kavli Institute for Fundamental Neuroscience, University of California San Francisco, San Francisco, California, USA
| |
Collapse
|
3
|
Rafehi H, Bennett MF, Bahlo M. Detection and discovery of repeat expansions in ataxia enabled by next-generation sequencing: present and future. Emerg Top Life Sci 2023; 7:349-359. [PMID: 37733280 PMCID: PMC10754322 DOI: 10.1042/etls20230018] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 08/29/2023] [Accepted: 09/12/2023] [Indexed: 09/22/2023]
Abstract
Hereditary cerebellar ataxias are a heterogenous group of progressive neurological disorders that are disproportionately caused by repeat expansions (REs) of short tandem repeats (STRs). Genetic diagnosis for RE disorders such as ataxias are difficult as the current gold standard for diagnosis is repeat-primed PCR assays or Southern blots, neither of which are scalable nor readily available for all STR loci. In the last five years, significant advances have been made in our ability to detect STRs and REs in short-read sequencing data, especially whole-genome sequencing. Given the increasing reliance of genomics in diagnosis of rare diseases, the use of established RE detection pipelines for RE disorders is now a highly feasible and practical first-step alternative to molecular testing methods. In addition, many new pathogenic REs have been discovered in recent years by utilising WGS data. Collectively, genomes are an important resource/platform for further advancements in both the discovery and diagnosis of REs that cause ataxia and will lead to much needed improvement in diagnostic rates for patients with hereditary ataxia.
Collapse
Affiliation(s)
- Haloom Rafehi
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia
| | - Mark F Bennett
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia
- Epilepsy Research Centre, Department of Medicine, University of Melbourne, Austin Health, Heidelberg, VIC, Australia
| | - Melanie Bahlo
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia
| |
Collapse
|
5
|
Lundström OS, Adriaan Verbiest M, Xia F, Jam HZ, Zlobec I, Anisimova M, Gymrek M. WebSTR: A Population-wide Database of Short Tandem Repeat Variation in Humans. J Mol Biol 2023; 435:168260. [PMID: 37678708 DOI: 10.1016/j.jmb.2023.168260] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 08/29/2023] [Accepted: 08/29/2023] [Indexed: 09/09/2023]
Abstract
Short tandem repeats (STRs) are consecutive repetitions of one to six nucleotide motifs. They are hypervariable due to the high prevalence of repeat unit insertions or deletions primarily caused by polymerase slippage during replication. Genetic variation at STRs has been shown to influence a range of traits in humans, including gene expression, cancer risk, and autism. Until recently STRs have been poorly studied since they pose significant challenges to bioinformatics analyses. Moreover, genome-wide analysis of STR variation in population-scale cohorts requires large amounts of data and computational resources. However, the recent advent of genome-wide analysis tools has resulted in multiple large genome-wide datasets of STR variation spanning nearly two million genomic loci in thousands of individuals from diverse populations. Here we present WebSTR, a database of genetic variation and other characteristics of genome-wide STRs across human populations. WebSTR is based on reference panels of more than 1.7 million human STRs created with state of the art repeat annotation methods and can easily be extended to include additional cohorts or species. It currently contains data based on STR genotypes for individuals from the 1000 Genomes Project, H3Africa, the Genotype-Tissue Expression (GTEx) Project and colorectal cancer patients from the TCGA dataset. WebSTR is implemented as a relational database with programmatic access available through an API and a web portal for browsing data. The web portal is publicly available at https://webstr.ucsd.edu.
Collapse
Affiliation(s)
- Oxana Sachenkova Lundström
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden; Vildly AB, Kalmar, Sweden; Institute of Computational Life Sciences, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland. https://twitter.com/merenlin
| | - Max Adriaan Verbiest
- Institute of Computational Life Sciences, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland; Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
| | - Feifei Xia
- Institute of Computational Life Sciences, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland; Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland. https://twitter.com/Feifeix97
| | - Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Inti Zlobec
- Institute of Tissue Medicine and Pathology, University of Bern, Switzerland
| | - Maria Anisimova
- Institute of Computational Life Sciences, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA; Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
6
|
Weisburd B, Tiao G, Rehm HL. Insights from a genome-wide truth set of tandem repeat variation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.05.539588. [PMID: 37214979 PMCID: PMC10197592 DOI: 10.1101/2023.05.05.539588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Tools for genotyping tandem repeats (TRs) from short read sequencing data have improved significantly over the past decade. Extensive comparisons of these tools to gold standard diagnostic methods like RP-PCR have confirmed their accuracy for tens to hundreds of well-studied loci. However, a scarcity of high-quality orthogonal truth data limited our ability to measure tool accuracy for the millions of other loci throughout the genome. To address this, we developed a TR truth set based on the Synthetic Diploid Benchmark (SynDip). By identifying the subset of insertions and deletions that represent TR expansions or contractions with motifs between 2 and 50 base pairs, we obtained accurate genotypes for 139,795 pure and 6,845 interrupted repeats in a single diploid sample. Our approach did not require running existing genotyping tools on short read or long read sequencing data and provided an alternative, more accurate view of tandem repeat variation. We applied this truth set to compare the strengths and weaknesses of widely-used tools for genotyping TRs, evaluated the completeness of existing genome-wide TR catalogs, and explored the properties of tandem repeat variation throughout the genome. We found that, without filtering, ExpansionHunter had higher accuracy than GangSTR and HipSTR over a wide range of motifs and allele sizes. Also, when errors in allele size occurred, ExpansionHunter tended to overestimate expansion sizes, while GangSTR tended to underestimate them. Additionally, we saw that widely-used TR catalogs miss between 16% and 41% of variant loci in the truth set. These results suggest that genome-wide analyses would benefit from genotyping a larger set of loci as well as further tool development that builds on the strengths of current algorithms. To that end, we developed a new catalog of 2.8 million loci that captures 95% of variant loci in the truth set, and created a modified version of ExpansionHunter that runs 2 to 3x faster than the original while producing the same output.
Collapse
Affiliation(s)
- Ben Weisburd
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Grace Tiao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Heidi L. Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|