1
|
Hobara T, Ando M, Higuchi Y, Yuan JH, Yoshimura A, Kojima F, Noguchi Y, Takei J, Hiramatsu Y, Nozuma S, Nakamura T, Adachi T, Toyooka K, Yamashita T, Sakiyama Y, Hashiguchi A, Matsuura E, Okamoto Y, Takashima H. Linking LRP12 CGG repeat expansion to inherited peripheral neuropathy. J Neurol Neurosurg Psychiatry 2024:jnnp-2024-333403. [PMID: 39013564 DOI: 10.1136/jnnp-2024-333403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 06/12/2024] [Indexed: 07/18/2024]
Abstract
BACKGROUND The causative genes for over 60% of inherited peripheral neuropathy (IPN) remain unidentified. This study endeavours to enhance the genetic diagnostic rate in IPN cases by conducting screenings focused on non-coding repeat expansions. METHODS We gathered data from 2424 unrelated Japanese patients diagnosed with IPN, among whom 1555 cases with unidentified genetic causes, as determined through comprehensive prescreening analyses, were selected for the study. Screening for CGG non-coding repeat expansions in LRP12, GIPC1 and RILPL1 genes was conducted using PCR and long-read sequencing technologies. RESULTS We identified CGG repeat expansions in LRP12 from 44 cases, establishing it as the fourth most common aetiology in Japanese IPN. Most cases (29/37) exhibited distal limb weakness, without ptosis, ophthalmoplegia, facial muscle weakness or bulbar palsy. Neurogenic changes were frequently observed in both needle electromyography (97%) and skeletal muscle tissue (100%). In nerve conduction studies, 28 cases primarily showed impairment in motor nerves without concurrent involvement of sensory nerves, consistent with the phenotype of hereditary motor neuropathy. In seven cases, both motor and sensory nerves were affected, resembling the Charcot-Marie-Tooth (CMT) phenotype. Importantly, the mean CGG repeat number detected in the present patients was significantly shorter than that of patients with LRP12-oculopharyngodistal myopathy (p<0.0001). Additionally, GIPC1 and RILPL1 repeat expansions were absent in our IPN cases. CONCLUSION We initially elucidate LRP12 repeat expansions as a prevalent cause of CMT, highlighting the necessity for an adapted screening strategy in clinical practice, particularly when addressing patients with IPN.
Collapse
Affiliation(s)
- Takahiro Hobara
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Masahiro Ando
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Yujiro Higuchi
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Jun-Hui Yuan
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Akiko Yoshimura
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Fumikazu Kojima
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Yutaka Noguchi
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Jun Takei
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Yu Hiramatsu
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Satoshi Nozuma
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Tomonori Nakamura
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Tadashi Adachi
- Division of Neuropathology, Department of Brain and Neurosciences, Tottori University Faculty of Medicine, Tottori, Japan
| | - Keiko Toyooka
- Department of Neurology, National Hospital Organization Osaka Toneyama Medical Center, Osaka, Japan
| | - Toru Yamashita
- Department of Neurology, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
| | - Yusuke Sakiyama
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Akihiro Hashiguchi
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Eiji Matsuura
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Yuji Okamoto
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
- Department of Physical Therapy, Kagoshima University Faculty of Medicine School of Health Sciences, Kagoshima, Japan
| | - Hiroshi Takashima
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| |
Collapse
|
2
|
Ziaei Jam H, Zook JM, Javadzadeh S, Park J, Sehgal A, Gymrek M. LongTR: genome-wide profiling of genetic variation at tandem repeats from long reads. Genome Biol 2024; 25:176. [PMID: 38965568 PMCID: PMC11229021 DOI: 10.1186/s13059-024-03319-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 06/21/2024] [Indexed: 07/06/2024] Open
Abstract
Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve tandem repeat analysis, especially for long or complex repeats. Here, we introduce LongTR, which accurately genotypes tandem repeats from high-fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr and https://zenodo.org/doi/10.5281/zenodo.11403979 .
Collapse
Affiliation(s)
- Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, Gaithersburg, MD, USA
| | - Sara Javadzadeh
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Aarushi Sehgal
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
3
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024; 25:460-475. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
4
|
Rajan-Babu IS, Dolzhenko E, Eberle MA, Friedman JM. Sequence composition changes in short tandem repeats: heterogeneity, detection, mechanisms and clinical implications. Nat Rev Genet 2024; 25:476-499. [PMID: 38467784 DOI: 10.1038/s41576-024-00696-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2024] [Indexed: 03/13/2024]
Abstract
Short tandem repeats (STRs) are a class of repetitive elements, composed of tandem arrays of 1-6 base pair sequence motifs, that comprise a substantial fraction of the human genome. STR expansions can cause a wide range of neurological and neuromuscular conditions, known as repeat expansion disorders, whose age of onset, severity, penetrance and/or clinical phenotype are influenced by the length of the repeats and their sequence composition. The presence of non-canonical motifs, depending on the type, frequency and position within the repeat tract, can alter clinical outcomes by modifying somatic and intergenerational repeat stability, gene expression and mutant transcript-mediated and/or protein-mediated toxicities. Here, we review the diverse structural conformations of repeat expansions, technological advances for the characterization of changes in sequence composition, their clinical correlations and the impact on disease mechanisms.
Collapse
Affiliation(s)
- Indhu-Shree Rajan-Babu
- Department of Medical Genetics, The University of British Columbia, and Children's & Women's Hospital, Vancouver, British Columbia, Canada.
| | | | | | - Jan M Friedman
- Department of Medical Genetics, The University of British Columbia, and Children's & Women's Hospital, Vancouver, British Columbia, Canada
- BC Children's Hospital Research Institute, Vancouver, British Columbia, Canada
| |
Collapse
|
5
|
Zhang M. STRAS:a snakemake pipeline for genome-wide short tandem repeats annotation and score. Hum Genet 2024; 143:735-738. [PMID: 38507015 DOI: 10.1007/s00439-024-02662-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 02/13/2024] [Indexed: 03/22/2024]
Abstract
High-throughput whole genome sequencing (WGS) is clinically used in finding single nucleotide variants and small indels. Several bioinformatics tools are developed to call short tandem repeats (STRs) copy numbers from WGS data, such as ExpansionHunter denovo, GangSTR and HipSTR. However, expansion disorders are rare and it is hard to find candidate expansions in single patient sequencing data with ~ 800,000 STRs calls. In this paper I describe a snakemake pipeline for genome-wide STRs Annotation and Score (STRAS) using a Random Forest (RF) model to predict pathogenicity. The predictor was validated by benchmark data from Clinvar and PUBMED. True positive rate was 93.8%. True negative rate was 98.0%.Precision was 98.6% and recall rate was 93.8%. F1-score was 0.961. Sensitivity was 93.8% and specificity was 99.6%. These results showed STRAS could be a useful tool for clinical researchers to find STR loci of interest and filter out neutral STRs. STRAS is freely available at https://github.com/fancheyu5/STRAS .
Collapse
Affiliation(s)
- Mengna Zhang
- Molecular Diagnosis Center, The Affiliated Hospital of Chengde Medical University, Chengde, 067000, China.
| |
Collapse
|
6
|
Alvarez Jerez P, Daida K, Miano-Burkhardt A, Iwaki H, Malik L, Cogan G, Makarious MB, Sullivan R, Vandrovcova J, Ding J, Gibbs JR, Markham A, Nalls MA, Kesharwani RK, Sedlazeck FJ, Casey B, Hardy J, Houlden H, Blauwendraat C, Singleton AB, Billingsley KJ. Profiling complex repeat expansions in RFC1 in Parkinson's disease. NPJ Parkinsons Dis 2024; 10:108. [PMID: 38789445 PMCID: PMC11126591 DOI: 10.1038/s41531-024-00723-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 05/10/2024] [Indexed: 05/26/2024] Open
Abstract
A biallelic (AAGGG) expansion in the poly(A) tail of an AluSx3 transposable element within the gene RFC1 is a frequent cause of cerebellar ataxia, neuropathy, vestibular areflexia syndrome (CANVAS), and more recently, has been reported as a rare cause of Parkinson's disease (PD) in the Finnish population. Here, we investigate the prevalence of RFC1 (AAGGG) expansions in PD patients of non-Finnish European ancestry in 1609 individuals from the Parkinson's Progression Markers Initiative study. We identified four PD patients carrying the biallelic RFC1 (AAGGG) expansion and did not identify any carriers in controls.
Collapse
Affiliation(s)
- Pilar Alvarez Jerez
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Kensuke Daida
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Abigail Miano-Burkhardt
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Hirotaka Iwaki
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
- DataTecnica LLC, Washington, DC, USA
| | - Laksh Malik
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Guillaume Cogan
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Sorbonne Université, Institut du Cerveau-Paris Brain Institute-ICM, Institut National de la Recherche Médicale-U1127, Centre National de la Recherche Scientifique, Paris, France
| | - Mary B Makarious
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- UCL Movement Disorders Centre, University College London, London, UK
| | - Roisin Sullivan
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
| | - Jana Vandrovcova
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
| | - Jinhui Ding
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - J Raphael Gibbs
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | | | - Mike A Nalls
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
- DataTecnica LLC, Washington, DC, USA
| | - Rupesh K Kesharwani
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Bradford Casey
- The Michael J. Fox Foundation for Parkinson's Research, New York, NY, USA
| | - John Hardy
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Henry Houlden
- UCL Movement Disorders Centre, University College London, London, UK
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
| | - Cornelis Blauwendraat
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Andrew B Singleton
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Kimberley J Billingsley
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA.
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA.
| |
Collapse
|
7
|
Zhang Y, Liu X, Li Z, Li H, Miao Z, Wan B, Xu X. Advances on the Mechanisms and Therapeutic Strategies in Non-coding CGG Repeat Expansion Diseases. Mol Neurobiol 2024:10.1007/s12035-024-04239-9. [PMID: 38780719 DOI: 10.1007/s12035-024-04239-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 05/02/2024] [Indexed: 05/25/2024]
Abstract
Non-coding CGG repeat expansions within the 5' untranslated region are implicated in a range of neurological disorders, including fragile X-associated tremor/ataxia syndrome, oculopharyngeal myopathy with leukodystrophy, and oculopharyngodistal myopathy. This review outlined the general characteristics of diseases associated with non-coding CGG repeat expansions, detailing their clinical manifestations and neuroimaging patterns, which often overlap and indicate shared pathophysiological traits. We summarized the underlying molecular mechanisms of these disorders, providing new insights into the roles that DNA, RNA, and toxic proteins play. Understanding these mechanisms is crucial for the development of targeted therapeutic strategies. These strategies include a range of approaches, such as antisense oligonucleotides, RNA interference, genomic DNA editing, small molecule interventions, and other treatments aimed at correcting the dysregulated processes inherent in these disorders. A deeper understanding of the shared mechanisms among non-coding CGG repeat expansion disorders may hold the potential to catalyze the development of innovative therapies, ultimately offering relief to individuals grappling with these debilitating neurological conditions.
Collapse
Affiliation(s)
- Yutong Zhang
- Departments of Neurology, The First Affiliated Hospital of Soochow University, Suzhou City, China
| | - Xuan Liu
- Departments of Neurology, The First Affiliated Hospital of Soochow University, Suzhou City, China
| | - Zeheng Li
- Departments of Neurology, The First Affiliated Hospital of Soochow University, Suzhou City, China
| | - Hao Li
- Departments of Neurology, The First Affiliated Hospital of Soochow University, Suzhou City, China
- Department of Neurology, The Fourth Affiliated Hospital of Soochow University, Suzhou, 215124, China
| | - Zhigang Miao
- The Institute of Neuroscience, Soochow University, Suzhou City, China
| | - Bo Wan
- The Institute of Neuroscience, Soochow University, Suzhou City, China
| | - Xingshun Xu
- Departments of Neurology, The First Affiliated Hospital of Soochow University, Suzhou City, China.
- The Institute of Neuroscience, Soochow University, Suzhou City, China.
- Department of Neurology, The First Affiliated Hospital of Soochow University, Suzhou, 215000, China.
| |
Collapse
|
8
|
Van Deynze K, Mumm C, Maltby CJ, Switzenberg JA, Todd PK, Boyle AP. Enhanced Detection and Genotyping of Disease-Associated Tandem Repeats Using HMMSTR and Targeted Long-Read Sequencing. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.01.24306681. [PMID: 38746091 PMCID: PMC11092683 DOI: 10.1101/2024.05.01.24306681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Tandem repeat sequences comprise approximately 8% of the human genome and are linked to more than 50 neurodegenerative disorders. Accurate characterization of disease-associated repeat loci remains resource intensive and often lacks high resolution genotype calls. We introduce a multiplexed, targeted nanopore sequencing panel and HMMSTR, a sequence-based tandem repeat copy number caller. HMMSTR outperforms current signal- and sequence-based callers relative to two assemblies and we show it performs with high accuracy in heterozygous regions and at low read coverage. The flexible panel allows us to capture disease associated regions at an average coverage of >150x. Using these tools, we successfully characterize known or suspected repeat expansions in patient derived samples. In these samples we also identify unexpected expanded alleles at tandem repeat loci not previously associated with the underlying diagnosis. This genotyping approach for tandem repeat expansions is scalable, simple, flexible, and accurate, offering significant potential for diagnostic applications and investigation of expansion co-occurrence in neurodegenerative disorders. Abstract Figure
Collapse
|
9
|
Su C, Chandradoss KR, Malachowski T, Boya R, Ryu HS, Brennand KJ, Phillips-Cremins JE. MASTR-seq: Multiplexed Analysis of Short Tandem Repeats with sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.29.591790. [PMID: 38746155 PMCID: PMC11092654 DOI: 10.1101/2024.04.29.591790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
More than 60 human disorders have been linked to unstable expansion of short tandem repeat (STR) tracts. STR length and the extent of DNA methylation is linked to disease pathology and can be mosaic in a cell type-specific manner in several repeat expansion disorders. Mosaic phenomenon have been difficult to study to date due to technical bias intrinsic to repeat sequences and the need for multi-modal measurements at single-allele resolution. Nanopore long-read sequencing accurately measures STR length and DNA methylation in the same single molecule but is cost prohibitive for studies assessing a target locus across multiple experimental conditions or patient samples. Here, we describe MASTR-seq, M ultiplexed A nalysis of S hort T andem R epeats, for cost-effective, high-throughput, accurate, multi-modal measurements of DNA methylation and STR genotype at single-allele resolution. MASTR-seq couples long-read sequencing, Cas9-mediated target enrichment, and PCR-free multiplexed barcoding to achieve a >ten-fold increase in on-target read mapping for 8-12 pooled samples in a single MinION flow cell. We provide a detailed experimental protocol and computational tools and present evidence that MASTR-seq quantifies tract length and DNA methylation status for CGG and CAG STR loci in normal-length and mutation-length human cell lines. The MASTR-seq protocol takes approximately eight days for experiments and one additional day for data processing and analyses. Key points We provide a protocol for MASTR-seq: M ultiplexed A nalysis of S hort T andem R epeats using Cas9-mediated target enrichment and PCR-free, multiplexed nanopore sequencing. MASTR-seq achieves a >10-fold increase in on-target read proportion for highly repetitive, technically inaccessible regions of the genome relevant for human health and disease.MASTR-seq allows for high-throughput, efficient, accurate, and cost-effective measurement of STR length and DNA methylation in the same single allele for up to 8-12 samples in parallel in one Nanopore MinION flow cell.
Collapse
|
10
|
Tachikawa K, Shimizu T, Imai T, Ko R, Kawai Y, Omae Y, Tokunaga K, Frith MC, Yamano Y, Mitsuhashi S. Cost-Effective Cas9-Mediated Targeted Sequencing of Spinocerebellar Ataxia Repeat Expansions. J Mol Diagn 2024; 26:85-95. [PMID: 38008286 DOI: 10.1016/j.jmoldx.2023.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 10/09/2023] [Accepted: 10/23/2023] [Indexed: 11/28/2023] Open
Abstract
Hereditary repeat diseases are caused by an abnormal expansion of short tandem repeats in the genome. Among them, spinocerebellar ataxia (SCA) is a heterogeneous disease, and currently, 16 responsible repeats are known. Genetic diagnosis is obtained by analyzing the number of repeats through separate testing of each repeat. Although simultaneous detection of candidate repeats using current massively parallel sequencing technologies has been developed to avoid complicated multiple experiments, these methods are generally expensive. This study developed a cost-effective SCA repeat panel [Flongle SCA repeat panel sequencing (FLO-SCAp)] using Cas9-mediated targeted long-read sequencing and the smallest long-read sequencing apparatus, Flongle. This panel enabled the detection of repeat copy number changes, internal repeat sequences, and DNA methylation in seven patients with different repeat expansion diseases. The median (interquartile range) values of coverage and on-target rate were 39.5 (12 to 72) and 11.6% (7.5% to 16.5%), respectively. This approach was validated by comparing repeat copy number changes measured by FLO-SCAp and short-read whole-genome sequencing. A high correlation was observed between FLO-SCAp and short-read whole-genome sequencing when the repeat length was ≤250 bp (r = 0.98; P < 0.001). Thus, FLO-SCAp represents the most cost-effective method for conducting multiplex testing of repeats and can serve as the first-line diagnostic tool for SCA.
Collapse
Affiliation(s)
- Keiji Tachikawa
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, Japan
| | - Takahiro Shimizu
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, Japan
| | - Takeshi Imai
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, Japan
| | - Riyoko Ko
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, Japan
| | - Yosuke Kawai
- Genome Medical Science Project, National Center for Global Health and Medicine, Tokyo, Japan
| | - Yosuke Omae
- Genome Medical Science Project, National Center for Global Health and Medicine, Tokyo, Japan
| | - Katsushi Tokunaga
- Genome Medical Science Project, National Center for Global Health and Medicine, Tokyo, Japan
| | - Martin C Frith
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan; Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Japan; Computational Bio Big-Data Open Innovation Laboratory, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Yoshihisa Yamano
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, Japan; Department of Rare Diseases Research, Institute of Medical Science, St. Marianna University School of Medicine, Kawasaki, Japan
| | - Satomi Mitsuhashi
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, Japan.
| |
Collapse
|
11
|
Jam HZ, Zook JM, Javadzadeh S, Park J, Sehgal A, Gymrek M. Genome-wide profiling of genetic variation at tandem repeat from long reads. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.20.576266. [PMID: 38328152 PMCID: PMC10849534 DOI: 10.1101/2024.01.20.576266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve TR analysis, especially for long or complex repeats. Here we introduce LongTR, which accurately genotypes tandem repeats from high fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr.
Collapse
Affiliation(s)
- Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr., Gaithersburg, MD, USA
| | - Sara Javadzadeh
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Aarushi Sehgal
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
12
|
Audet S, Triassi V, Gelinas M, Legault-Cadieux N, Ferraro V, Duquette A, Tetreault M. Integration of multi-omics technologies for molecular diagnosis in ataxia patients. Front Genet 2024; 14:1304711. [PMID: 38239855 PMCID: PMC10794629 DOI: 10.3389/fgene.2023.1304711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 11/27/2023] [Indexed: 01/22/2024] Open
Abstract
Background: Episodic ataxias are rare neurological disorders characterized by recurring episodes of imbalance and coordination difficulties. Obtaining definitive molecular diagnoses poses challenges, as clinical presentation is highly heterogeneous, and literature on the underlying genetics is limited. While the advent of high-throughput sequencing technologies has significantly contributed to Mendelian disorders genetics, interpretation of variants of uncertain significance and other limitations inherent to individual methods still leaves many patients undiagnosed. This study aimed to investigate the utility of multi-omics for the identification and validation of molecular candidates in a cohort of complex cases of ataxia with episodic presentation. Methods: Eight patients lacking molecular diagnosis despite extensive clinical examination were recruited following standard genetic testing. Whole genome and RNA sequencing were performed on samples isolated from peripheral blood mononuclear cells. Integration of expression and splicing data facilitated genomic variants prioritization. Subsequently, long-read sequencing played a crucial role in the validation of those candidate variants. Results: Whole genome sequencing uncovered pathogenic variants in four genes (SPG7, ATXN2, ELOVL4, PMPCB). A missense and a nonsense variant, both previously reported as likely pathogenic, configured in trans in individual #1 (SPG7: c.2228T>C/p.I743T, c.1861C>T/p.Q621*). An ATXN2 microsatellite expansion (CAG32) in another late-onset case. In two separate individuals, intronic variants near splice sites (ELOVL4: c.541 + 5G>A; PMPCB: c.1154 + 5G>C) were predicted to induce loss-of-function splicing, but had never been reported as disease-causing. Long-read sequencing confirmed the compound heterozygous variants configuration, repeat expansion length, as well as splicing landscape for those pathogenic variants. A potential genetic modifier of the ATXN2 expansion was discovered in ZFYVE26 (c.3022C>T/p.R1008*). Conclusion: Despite failure to identify pathogenic variants through clinical genetic testing, the multi-omics approach enabled the molecular diagnosis in 50% of patients, also giving valuable insights for variant prioritization in remaining cases. The findings demonstrate the value of long-read sequencing for the validation of candidate variants in various scenarios. Our study demonstrates the effectiveness of leveraging complementary omics technologies to unravel the underlying genetics in patients with unresolved rare diseases such as ataxia. Molecular diagnoses not only hold significant promise in improving patient care management, but also alleviates the burden of diagnostic odysseys, more broadly enhancing quality of life.
Collapse
Affiliation(s)
- Sebastien Audet
- University of Montreal Hospital Research Center (CRCHUM), Montreal, QC, Canada
- Department of Neurosciences, University of Montreal, Montreal, QC, Canada
| | - Valerie Triassi
- University of Montreal Hospital Research Center (CRCHUM), Montreal, QC, Canada
| | - Myriam Gelinas
- Department of Medicine, University of Montreal Hospital Centre (CHUM), Montreal, QC, Canada
| | - Nab Legault-Cadieux
- University of Montreal Hospital Research Center (CRCHUM), Montreal, QC, Canada
- Department of Neurosciences, University of Montreal, Montreal, QC, Canada
| | - Vincent Ferraro
- Department of Medicine, University of Montreal Hospital Centre (CHUM), Montreal, QC, Canada
| | - Antoine Duquette
- University of Montreal Hospital Research Center (CRCHUM), Montreal, QC, Canada
- Department of Neurosciences, University of Montreal, Montreal, QC, Canada
- Neurology Service, Department of Medicine, André-Barbeau Movement Disorders Unit, University of Montreal Hospital (CHUM), Montreal, QC, Canada
- Genetic Service, Department of Medicine, University of Montreal Hospital (CHUM), Montreal, QC, Canada
| | - Martine Tetreault
- University of Montreal Hospital Research Center (CRCHUM), Montreal, QC, Canada
- Department of Neurosciences, University of Montreal, Montreal, QC, Canada
| |
Collapse
|
13
|
Dolzhenko E, English A, Dashnow H, De Sena Brandine G, Mokveld T, Rowell WJ, Karniski C, Kronenberg Z, Danzi MC, Cheung WA, Bi C, Farrow E, Wenger A, Chua KP, Martínez-Cerdeño V, Bartley TD, Jin P, Nelson DL, Zuchner S, Pastinen T, Quinlan AR, Sedlazeck FJ, Eberle MA. Characterization and visualization of tandem repeats at genome scale. Nat Biotechnol 2024:10.1038/s41587-023-02057-3. [PMID: 38168995 DOI: 10.1038/s41587-023-02057-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 11/06/2023] [Indexed: 01/05/2024]
Abstract
Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there is still a need for computational methods to profile TRs across the genome. Here we introduce the Tandem Repeat Genotyping Tool (TRGT) and an accompanying TR database. TRGT determines the consensus sequences and methylation levels of specified TRs from PacBio HiFi sequencing data. It also reports reads that support each repeat allele. These reads can be subsequently visualized with a companion TR visualization tool. Assessing 937,122 TRs, TRGT showed a Mendelian concordance of 98.38%, allowing a single repeat unit difference. In six samples with known repeat expansions, TRGT detected all expansions while also identifying methylation signals and mosaicism and providing finer repeat length resolution than existing methods. Additionally, we released a database with allele sequences and methylation levels for 937,122 TRs across 100 genomes.
Collapse
Affiliation(s)
| | - Adam English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Harriet Dashnow
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | | | - Tom Mokveld
- Pacific Biosciences of California, Menlo Park, CA, USA
| | | | | | | | - Matt C Danzi
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Warren A Cheung
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Chengpeng Bi
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Emily Farrow
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron Wenger
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Khi Pin Chua
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Verónica Martínez-Cerdeño
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
- MIND Institute, UC Davis School of Medicine, Sacramento, CA, USA
| | - Trevor D Bartley
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
| | - Peng Jin
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - David L Nelson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Stephan Zuchner
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Tomi Pastinen
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron R Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | | |
Collapse
|
14
|
Yeetong P, Dembélé ME, Pongpanich M, Cissé L, Srichomthong C, Maiga AB, Dembélé K, Assawapitaksakul A, Bamba S, Yalcouyé A, Diarra S, Mefoung SE, Rakwongkhachon S, Traoré O, Tongkobpetch S, Fischbeck KH, Gahl WA, Guinto CO, Shotelersuk V, Landouré G. Pentanucleotide Repeat Insertions in RAI1 Cause Benign Adult Familial Myoclonic Epilepsy Type 8. Mov Disord 2024; 39:164-172. [PMID: 37994247 PMCID: PMC10872918 DOI: 10.1002/mds.29654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 10/04/2023] [Accepted: 10/24/2023] [Indexed: 11/24/2023] Open
Abstract
BACKGROUND Benign adult familial myoclonic epilepsy (BAFME) is an autosomal dominant disorder characterized by cortical tremors and seizures. Six types of BAFME, all caused by pentanucleotide repeat expansions in different genes, have been reported. However, several other BAFME cases remain with no molecular diagnosis. OBJECTIVES We aim to characterize clinical features and identify the mutation causing BAFME in a large Malian family with 10 affected members. METHODS Long-read whole genome sequencing, repeat-primed polymerase chain reaction and RNA studies were performed. RESULTS We identified TTTTA repeat expansions and TTTCA repeat insertions in intron 4 of the RAI1 gene that co-segregated with disease status in this family. TTTCA repeats were absent in 200 Malian controls. In the affected individuals, we found a read with only nine TTTCA repeat units and somatic instability. The RAI1 repeat expansions cause the only BAFME type in which the disease-causing repeats are in a gene associated with a monogenic disorder in the haploinsufficiency state (ie, Smith-Magenis syndrome [SMS]). Nevertheless, none of the Malian patients exhibited symptoms related to SMS. Moreover, leukocyte RNA levels of RAI1 in six Malian BAFME patients were no different from controls. CONCLUSIONS These findings establish a new type of BAFME, BAFME8, in an African family and suggest that haploinsufficiency is unlikely to be the main pathomechanism of BAFME. © 2023 International Parkinson and Movement Disorder Society. This article has been contributed to by U.S. Government employees and their work is in the public domain in the USA.
Collapse
Affiliation(s)
- Patra Yeetong
- Division of Human Genetics, Department of Botany, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand
| | | | - Monnat Pongpanich
- Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand
- Omics Sciences and Bioinformatics Center, Faculty of Science, Chulalongkorn University, Bangkok, 10330, Thailand
| | - Lassana Cissé
- Service de Neurologie, Centre Hospitalier Universitaire du Point G, Bamako, Mali
| | - Chalurmpon Srichomthong
- Center of Excellence for Medical Genomics, Department of Pediatrics, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
- Excellence Center for Genomics and Precision Medicine, King Chulalongkorn Memorial Hospital, the Thai Red Cross Society, Bangkok 10330, Thailand
| | | | | | - Adjima Assawapitaksakul
- Center of Excellence for Medical Genomics, Department of Pediatrics, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
- Excellence Center for Genomics and Precision Medicine, King Chulalongkorn Memorial Hospital, the Thai Red Cross Society, Bangkok 10330, Thailand
| | - Salia Bamba
- Faculté de Médecine et d’Odontostomatologie, USTTB, Bamako, Mali
| | | | - Salimata Diarra
- Faculté de Médecine et d’Odontostomatologie, USTTB, Bamako, Mali
- Yale University, Pediatric Genomics Discovery Program, Department of Pediatrics, New Haven, CT, United States
- Neurogenetics Branch, NINDS, NIH, Bethesda, MD, United States
| | | | - Supphakorn Rakwongkhachon
- Center of Excellence for Medical Genomics, Department of Pediatrics, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
- Excellence Center for Genomics and Precision Medicine, King Chulalongkorn Memorial Hospital, the Thai Red Cross Society, Bangkok 10330, Thailand
| | - Oumou Traoré
- Faculté de Médecine et d’Odontostomatologie, USTTB, Bamako, Mali
| | - Siraprapa Tongkobpetch
- Center of Excellence for Medical Genomics, Department of Pediatrics, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
- Excellence Center for Genomics and Precision Medicine, King Chulalongkorn Memorial Hospital, the Thai Red Cross Society, Bangkok 10330, Thailand
| | | | - William A Gahl
- Medical Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Cheick O Guinto
- Faculté de Médecine et d’Odontostomatologie, USTTB, Bamako, Mali
- Service de Neurologie, Centre Hospitalier Universitaire du Point G, Bamako, Mali
| | - Vorasuk Shotelersuk
- Center of Excellence for Medical Genomics, Department of Pediatrics, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
- Excellence Center for Genomics and Precision Medicine, King Chulalongkorn Memorial Hospital, the Thai Red Cross Society, Bangkok 10330, Thailand
| | - Guida Landouré
- Faculté de Médecine et d’Odontostomatologie, USTTB, Bamako, Mali
- Service de Neurologie, Centre Hospitalier Universitaire du Point G, Bamako, Mali
| |
Collapse
|
15
|
Ando M, Higuchi Y, Yuan J, Yoshimura A, Kojima F, Yamanishi Y, Aso Y, Izumi K, Imada M, Maki Y, Nakagawa H, Hobara T, Noguchi Y, Takei J, Hiramatsu Y, Nozuma S, Sakiyama Y, Hashiguchi A, Matsuura E, Okamoto Y, Takashima H. Clinical variability associated with intronic FGF14 GAA repeat expansion in Japan. Ann Clin Transl Neurol 2024; 11:96-104. [PMID: 37916889 PMCID: PMC10791012 DOI: 10.1002/acn3.51936] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 10/18/2023] [Accepted: 10/19/2023] [Indexed: 11/03/2023] Open
Abstract
BACKGROUND AND OBJECTIVES The GAA repeat expansion within the fibroblast growth factor 14 (FGF14) gene has been found to be associated with late-onset cerebellar ataxia. This study aimed to investigate the genetic causes of cerebellar ataxia in patients in Japan. METHODS We collected a case series of 940 index patients who presented with chronic cerebellar ataxia and remained genetically undiagnosed after our preliminary genetic screening. To investigate the FGF14 repeat locus, we employed an integrated diagnostic strategy that involved fluorescence amplicon length analysis polymerase chain reaction (PCR), repeat-primed PCR, and long-read sequencing. RESULTS Pathogenic FGF14 GAA repeat expansions were detected in 12 patients from 11 unrelated families. The median size of the pathogenic GAA repeat was 309 repeats (range: 270-316 repeats). In these patients, the mean age of onset was 66.9 ± 9.6 years, with episodic symptoms observed in 56% of patients and parkinsonism in 30% of patients. We also detected FGF14 repeat expansions in a patient with a phenotype of multiple system atrophy, including cerebellar ataxia, parkinsonism, autonomic ataxia, and bilateral vocal cord paralysis. Brain magnetic resonance imaging (MRI) showed normal to mild cerebellar atrophy, and a follow-up study conducted after a mean period of 6 years did not reveal any significant progression. DISCUSSION This study highlights the importance of FGF14 GAA repeat analysis in patients with late-onset cerebellar ataxia, particularly when they exhibit episodic symptoms, or their brain MRI shows no apparent cerebellar atrophy. Our findings contribute to a better understanding of the clinical variability of GAA-FGF14-related diseases.
Collapse
Affiliation(s)
- Masahiro Ando
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Yujiro Higuchi
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Junhui Yuan
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Akiko Yoshimura
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Fumikazu Kojima
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Yuki Yamanishi
- Department of Neurology and Clinical PharmacologyEhime University HospitalToonEhimeJapan
| | - Yasuhiro Aso
- Department of NeurologyOita Prefecture HospitalOitaJapan
| | - Kotaro Izumi
- Department of NeurologyOhashi Go Neurosurgical Neurology ClinicFukuokaJapan
| | - Minako Imada
- Department of NeurologyNational Hospital Organization Minamikyushu HospitalKagoshimaJapan
| | - Yoshimitsu Maki
- Department of NeurologyKagoshima City HospitalKagoshimaJapan
| | - Hiroto Nakagawa
- Department of NeurologyKagoshima Medical Association HospitalKagoshimaJapan
| | - Takahiro Hobara
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Yutaka Noguchi
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Jun Takei
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Yu Hiramatsu
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Satoshi Nozuma
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Yusuke Sakiyama
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Akihiro Hashiguchi
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Eiji Matsuura
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| | - Yuji Okamoto
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
- Department of Physical Therapy, Faculty of MedicineSchool of Health Sciences, Kagoshima UniversityKagoshimaJapan
| | - Hiroshi Takashima
- Department of Neurology and GeriatricsKagoshima University Graduate School of Medical and Dental SciencesKagoshimaJapan
| |
Collapse
|
16
|
LoTempio J, Delot E, Vilain E. Benchmarking long-read genome sequence alignment tools for human genomics applications. PeerJ 2023; 11:e16515. [PMID: 38130927 PMCID: PMC10734412 DOI: 10.7717/peerj.16515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 11/02/2023] [Indexed: 12/23/2023] Open
Abstract
Background The utility of long-read genome sequencing platforms has been shown in many fields including whole genome assembly, metagenomics, and amplicon sequencing. Less clear is the applicability of long reads to reference-guided human genomics, which is the foundation of genomic medicine. Here, we benchmark available platform-agnostic alignment tools on datasets from nanopore and single-molecule real-time platforms to understand their suitability in producing a genome representation. Results For this study, we leveraged publicly-available data from sample NA12878 generated on Oxford Nanopore and sample NA24385 on Pacific Biosciences platforms. We employed state of the art sequence alignment tools including GraphMap2, long-read aligner (LRA), Minimap2, CoNvex Gap-cost alignMents for Long Reads (NGMLR), and Winnowmap2. Minimap2 and Winnowmap2 were computationally lightweight enough for use at scale, while GraphMap2 was not. NGMLR took a long time and required many resources, but produced alignments each time. LRA was fast, but only worked on Pacific Biosciences data. Each tool widely disagreed on which reads to leave unaligned, affecting the end genome coverage and the number of discoverable breakpoints. No alignment tool independently resolved all large structural variants (1,001-100,000 base pairs) present in the Database of Genome Variants (DGV) for sample NA12878 or the truthset for NA24385. Conclusions These results suggest a combined approach is needed for LRS alignments for human genomics. Specifically, leveraging alignments from three tools will be more effective in generating a complete picture of genomic variability. It should be best practice to use an analysis pipeline that generates alignments with both Minimap2 and Winnowmap2 as they are lightweight and yield different views of the genome. Depending on the question at hand, the data available, and the time constraints, NGMLR and LRA are good options for a third tool. If computational resources and time are not a factor for a given case or experiment, NGMLR will provide another view, and another chance to resolve a case. LRA, while fast, did not work on the nanopore data for our cluster, but PacBio results were promising in that those computations completed faster than Minimap2. Due to its significant burden on computational resources and slow run time, Graphmap2 is not an ideal tool for exploration of a whole human genome generated on a long-read sequencing platform.
Collapse
Affiliation(s)
- Jonathan LoTempio
- Institute for Clinical and Translational Science, University of California, Irvine, CA, United States of America
- International Research Laboratory (IRL2006) “Epigenetics, Data, Politics (EpiDaPo)”, Centre National de la Recherche Scientifique, Washington, DC, United States of America
| | - Emmanuele Delot
- Center for Genetic Medicine Research, Children’s National Hospital, Washington, DC, United States of America
- Department of Genomics and Precision Medicine, George Washington University, Washington, DC, United States of America
| | - Eric Vilain
- Institute for Clinical and Translational Science, University of California, Irvine, CA, United States of America
- International Research Laboratory (IRL2006) “Epigenetics, Data, Politics (EpiDaPo)”, Centre National de la Recherche Scientifique, Washington, DC, United States of America
| |
Collapse
|
17
|
Panoyan MA, Wendt FR. The role of tandem repeat expansions in brain disorders. Emerg Top Life Sci 2023; 7:249-263. [PMID: 37401564 DOI: 10.1042/etls20230022] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 06/05/2023] [Accepted: 06/19/2023] [Indexed: 07/05/2023]
Abstract
The human genome contains numerous genetic polymorphisms contributing to different health and disease outcomes. Tandem repeat (TR) loci are highly polymorphic yet under-investigated in large genomic studies, which has prompted research efforts to identify novel variations and gain a deeper understanding of their role in human biology and disease outcomes. We summarize the current understanding of TRs and their implications for human health and disease, including an overview of the challenges encountered when conducting TR analyses and potential solutions to overcome these challenges. By shedding light on these issues, this article aims to contribute to a better understanding of the impact of TRs on the development of new disease treatments.
Collapse
Affiliation(s)
- Mary Anne Panoyan
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
| | - Frank R Wendt
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Forensic Science Program, University of Toronto, Mississauga, ON, Canada
| |
Collapse
|
18
|
Hannan AJ. Expanding horizons of tandem repeats in biology and medicine: Why 'genomic dark matter' matters. Emerg Top Life Sci 2023; 7:ETLS20230075. [PMID: 38088823 PMCID: PMC10754335 DOI: 10.1042/etls20230075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 11/27/2023] [Accepted: 11/27/2023] [Indexed: 12/30/2023]
Abstract
Approximately half of the human genome includes repetitive sequences, and these DNA sequences (as well as their transcribed repetitive RNA and translated amino-acid repeat sequences) are known as the repeatome. Within this repeatome there are a couple of million tandem repeats, dispersed throughout the genome. These tandem repeats have been estimated to constitute ∼8% of the entire human genome. These tandem repeats can be located throughout exons, introns and intergenic regions, thus potentially affecting the structure and function of tandemly repetitive DNA, RNA and protein sequences. Over more than three decades, more than 60 monogenic human disorders have been found to be caused by tandem-repeat mutations. These monogenic tandem-repeat disorders include Huntington's disease, a variety of ataxias, amyotrophic lateral sclerosis and frontotemporal dementia, as well as many other neurodegenerative diseases. Furthermore, tandem-repeat disorders can include fragile X syndrome, related fragile X disorders, as well as other neurological and psychiatric disorders. However, these monogenic tandem-repeat disorders, which were discovered via their dominant or recessive modes of inheritance, may represent the 'tip of the iceberg' with respect to tandem-repeat contributions to human disorders. A previous proposal that tandem repeats may contribute to the 'missing heritability' of various common polygenic human disorders has recently been supported by a variety of new evidence. This includes genome-wide studies that associate tandem-repeat mutations with autism, schizophrenia, Parkinson's disease and various types of cancers. In this article, I will discuss how tandem-repeat mutations and polymorphisms could contribute to a wide range of common disorders, along with some of the many major challenges of tandem-repeat biology and medicine. Finally, I will discuss the potential of tandem repeats to be therapeutically targeted, so as to prevent and treat an expanding range of human disorders.
Collapse
Affiliation(s)
- Anthony J Hannan
- Florey Institute of Neuroscience and Mental Health, University of Melbourne, Parkville, Victoria 3010, Australia
- Department of Anatomy and Physiology, University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
19
|
Kang X, Xu J, Luo X, Schönhuth A. Hybrid-hybrid correction of errors in long reads with HERO. Genome Biol 2023; 24:275. [PMID: 38041098 PMCID: PMC10690975 DOI: 10.1186/s13059-023-03112-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 11/16/2023] [Indexed: 12/03/2023] Open
Abstract
Although generally superior, hybrid approaches for correcting errors in third-generation sequencing (TGS) reads, using next-generation sequencing (NGS) reads, mistake haplotype-specific variants for errors in polyploid and mixed samples. We suggest HERO, as the first "hybrid-hybrid" approach, to make use of both de Bruijn graphs and overlap graphs for optimal catering to the particular strengths of NGS and TGS reads. Extensive benchmarking experiments demonstrate that HERO improves indel and mismatch error rates by on average 65% (27[Formula: see text]95%) and 20% (4[Formula: see text]61%). Using HERO prior to genome assembly significantly improves the assemblies in the majority of the relevant categories.
Collapse
Affiliation(s)
- Xiongbin Kang
- College of Biology, Hunan University, Changsha, China
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Jialu Xu
- College of Biology, Hunan University, Changsha, China
| | - Xiao Luo
- College of Biology, Hunan University, Changsha, China.
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany.
| |
Collapse
|
20
|
Abstract
DNA sequencing has revolutionized medicine over recent decades. However, analysis of large structural variation and repetitive DNA, a hallmark of human genomes, has been limited by short-read technology, with read lengths of 100-300 bp. Long-read sequencing (LRS) permits routine sequencing of human DNA fragments tens to hundreds of kilobase pairs in size, using both real-time sequencing by synthesis and nanopore-based direct electronic sequencing. LRS permits analysis of large structural variation and haplotypic phasing in human genomes and has enabled the discovery and characterization of rare pathogenic structural variants and repeat expansions. It has also recently enabled the assembly of a complete, gapless human genome that includes previously intractable regions, such as highly repetitive centromeres and homologous acrocentric short arms. With the addition of protocols for targeted enrichment, direct epigenetic DNA modification detection, and long-range chromatin profiling, LRS promises to launch a new era of understanding of genetic diversity and pathogenic mutations in human populations.
Collapse
Affiliation(s)
- Peter E Warburton
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; ,
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Robert P Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; ,
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
21
|
Hård J, Mold JE, Eisfeldt J, Tellgren-Roth C, Häggqvist S, Bunikis I, Contreras-Lopez O, Chin CS, Nordlund J, Rubin CJ, Feuk L, Michaëlsson J, Ameur A. Long-read whole-genome analysis of human single cells. Nat Commun 2023; 14:5164. [PMID: 37620373 PMCID: PMC10449900 DOI: 10.1038/s41467-023-40898-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 08/07/2023] [Indexed: 08/26/2023] Open
Abstract
Long-read sequencing has dramatically increased our understanding of human genome variation. Here, we demonstrate that long-read technology can give new insights into the genomic architecture of individual cells. Clonally expanded CD8+ T-cells from a human donor were subjected to droplet-based multiple displacement amplification (dMDA) to generate long molecules with reduced bias. PacBio sequencing generated up to 40% genome coverage per single-cell, enabling detection of single nucleotide variants (SNVs), structural variants (SVs), and tandem repeats, also in regions inaccessible by short reads. 28 somatic SNVs were detected, including one case of mitochondrial heteroplasmy. 5473 high-confidence SVs/cell were discovered, a sixteen-fold increase compared to Illumina-based results from clonally related cells. Single-cell de novo assembly generated a genome size of up to 598 Mb and 1762 (12.8%) complete gene models. In summary, our work shows the promise of long-read sequencing toward characterization of the full spectrum of genetic variation in single cells.
Collapse
Affiliation(s)
- Joanna Hård
- Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden.
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.
- ETH AI Center, ETH Zurich, Zurich, Switzerland.
| | - Jeff E Mold
- Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
| | - Jesper Eisfeldt
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| | - Christian Tellgren-Roth
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Susana Häggqvist
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Ignas Bunikis
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | | | | | - Jessica Nordlund
- Science for Life Laboratory, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Carl-Johan Rubin
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Lars Feuk
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Jakob Michaëlsson
- Center for Infectious Medicine, Department of Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Adam Ameur
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden.
| |
Collapse
|
22
|
Ahsan MU, Liu Q, Perdomo JE, Fang L, Wang K. A survey of algorithms for the detection of genomic structural variants from long-read sequencing data. Nat Methods 2023; 20:1143-1158. [PMID: 37386186 PMCID: PMC11208083 DOI: 10.1038/s41592-023-01932-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 05/31/2023] [Indexed: 07/01/2023]
Abstract
As long-read sequencing technologies are becoming increasingly popular, a number of methods have been developed for the discovery and analysis of structural variants (SVs) from long reads. Long reads enable detection of SVs that could not be previously detected from short-read sequencing, but computational methods must adapt to the unique challenges and opportunities presented by long-read sequencing. Here, we summarize over 50 long-read-based methods for SV detection, genotyping and visualization, and discuss how new telomere-to-telomere genome assemblies and pangenome efforts can improve the accuracy and drive the development of SV callers in the future.
Collapse
Affiliation(s)
- Mian Umair Ahsan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Jonathan Elliot Perdomo
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
23
|
Kume K, Kurashige T, Muguruma K, Morino H, Tada Y, Kikumoto M, Miyamoto T, Akutsu SN, Matsuda Y, Matsuura S, Nakamori M, Nishiyama A, Izumi R, Niihori T, Ogasawara M, Eura N, Kato T, Yokomura M, Nakayama Y, Ito H, Nakamura M, Saito K, Riku Y, Iwasaki Y, Maruyama H, Aoki Y, Nishino I, Izumi Y, Aoki M, Kawakami H. CGG repeat expansion in LRP12 in amyotrophic lateral sclerosis. Am J Hum Genet 2023; 110:1086-1097. [PMID: 37339631 PMCID: PMC10357476 DOI: 10.1016/j.ajhg.2023.05.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 05/25/2023] [Accepted: 05/25/2023] [Indexed: 06/22/2023] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disorder characterized by the degeneration of motor neurons. Although repeat expansion in C9orf72 is its most common cause, the pathogenesis of ALS isn't fully clear. In this study, we show that repeat expansion in LRP12, a causative variant of oculopharyngodistal myopathy type 1 (OPDM1), is a cause of ALS. We identify CGG repeat expansion in LRP12 in five families and two simplex individuals. These ALS individuals (LRP12-ALS) have 61-100 repeats, which contrasts with most OPDM individuals with repeat expansion in LRP12 (LRP12-OPDM), who have 100-200 repeats. Phosphorylated TDP-43 is present in the cytoplasm of iPS cell-derived motor neurons (iPSMNs) in LRP12-ALS, a finding that reproduces the pathological hallmark of ALS. RNA foci are more prominent in muscle and iPSMNs in LRP12-ALS than in LRP12-OPDM. Muscleblind-like 1 aggregates are observed only in OPDM muscle. In conclusion, CGG repeat expansions in LRP12 cause ALS and OPDM, depending on the length of the repeat. Our findings provide insight into the repeat length-dependent switching of phenotypes.
Collapse
Affiliation(s)
- Kodai Kume
- Department of Molecular Epidemiology, Research Institute for Radiation Biology and Medicine, Hiroshima University, Hiroshima, Japan
| | - Takashi Kurashige
- Department of Neurology, National Hospital Organization Kure Medical Center and Chugoku Cancer Center, Hiroshima, Japan
| | - Keiko Muguruma
- Department of iPS Cell Applied Medicine, Graduate School of Medicine, Kansai Medical University, Osaka, Japan
| | - Hiroyuki Morino
- Department of Molecular Epidemiology, Research Institute for Radiation Biology and Medicine, Hiroshima University, Hiroshima, Japan
| | - Yui Tada
- Department of Molecular Epidemiology, Research Institute for Radiation Biology and Medicine, Hiroshima University, Hiroshima, Japan
| | - Mai Kikumoto
- Department of Molecular Epidemiology, Research Institute for Radiation Biology and Medicine, Hiroshima University, Hiroshima, Japan; Department of Clinical Neuroscience and Therapeutics, Hiroshima University Graduate School of Biomedical and Health Sciences, Hiroshima, Japan
| | - Tatsuo Miyamoto
- Department of Genetics and Cell Biology, Research Institute for Radiation Biology and Medicine, Hiroshima University, Hiroshima, Japan
| | - Silvia Natsuko Akutsu
- Department of Genetics and Cell Biology, Research Institute for Radiation Biology and Medicine, Hiroshima University, Hiroshima, Japan
| | - Yukiko Matsuda
- Department of Molecular Epidemiology, Research Institute for Radiation Biology and Medicine, Hiroshima University, Hiroshima, Japan
| | - Shinya Matsuura
- Department of Genetics and Cell Biology, Research Institute for Radiation Biology and Medicine, Hiroshima University, Hiroshima, Japan
| | - Masahiro Nakamori
- Department of Clinical Neuroscience and Therapeutics, Hiroshima University Graduate School of Biomedical and Health Sciences, Hiroshima, Japan
| | - Ayumi Nishiyama
- Department of Neurology, Tohoku University Graduate School of Medicine, Miyagi, Japan
| | - Rumiko Izumi
- Department of Neurology, Tohoku University Graduate School of Medicine, Miyagi, Japan
| | - Tetsuya Niihori
- Department of Medical Genetics, Tohoku University Graduate School of Medicine, Miyagi, Japan
| | - Masashi Ogasawara
- Department of Neuromuscular Research, National Institute of Neuroscience, National Centre of Neurology and Psychiatry, National Centre Hospital, Tokyo, Japan
| | - Nobuyuki Eura
- Department of Neuromuscular Research, National Institute of Neuroscience, National Centre of Neurology and Psychiatry, National Centre Hospital, Tokyo, Japan
| | - Tamaki Kato
- Institute of Medical Genetics, Tokyo Women's Medical University, Tokyo, Japan
| | - Mamoru Yokomura
- Institute of Medical Genetics, Tokyo Women's Medical University, Tokyo, Japan
| | - Yoshiaki Nakayama
- Department of Neurology, Wakayama Medical University, Wakayama, Japan
| | - Hidefumi Ito
- Department of Neurology, Wakayama Medical University, Wakayama, Japan
| | | | - Kayoko Saito
- Institute of Medical Genetics, Tokyo Women's Medical University, Tokyo, Japan
| | - Yuichi Riku
- Department of Neuropathology, Institute for Medical Science of Aging, Aichi Medical University, Nagakute, Japan
| | - Yasushi Iwasaki
- Department of Neuropathology, Institute for Medical Science of Aging, Aichi Medical University, Nagakute, Japan
| | - Hirofumi Maruyama
- Department of Clinical Neuroscience and Therapeutics, Hiroshima University Graduate School of Biomedical and Health Sciences, Hiroshima, Japan
| | - Yoko Aoki
- Department of Medical Genetics, Tohoku University Graduate School of Medicine, Miyagi, Japan
| | - Ichizo Nishino
- Department of Neuromuscular Research, National Institute of Neuroscience, National Centre of Neurology and Psychiatry, National Centre Hospital, Tokyo, Japan
| | - Yuishin Izumi
- Department of Neurology, Tokushima University Graduate School of Biomedical Sciences, Tokushima, Japan
| | - Masashi Aoki
- Department of Neurology, Tohoku University Graduate School of Medicine, Miyagi, Japan
| | - Hideshi Kawakami
- Department of Molecular Epidemiology, Research Institute for Radiation Biology and Medicine, Hiroshima University, Hiroshima, Japan.
| |
Collapse
|
24
|
Ikemoto K, Fujimoto H, Fujimoto A. Localized assembly for long reads enables genome-wide analysis of repetitive regions at single-base resolution in human genomes. Hum Genomics 2023; 17:21. [PMID: 36895025 PMCID: PMC9996862 DOI: 10.1186/s40246-023-00467-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 03/01/2023] [Indexed: 03/11/2023] Open
Abstract
BACKGROUND Long-read sequencing technologies have the potential to overcome the limitations of short reads and provide a comprehensive picture of the human genome. However, the characterization of repetitive sequences by reconstructing genomic structures at high resolution solely from long reads remains difficult. Here, we developed a localized assembly method (LoMA) that constructs highly accurate consensus sequences (CSs) from long reads. METHODS We developed LoMA by combining minimap2, MAFFT, and our algorithm, which classifies diploid haplotypes based on structural variants and CSs. Using this tool, we analyzed two human samples (NA18943 and NA19240) sequenced with the Oxford Nanopore sequencer. We defined target regions in each genome based on mapping patterns and then constructed a high-quality catalog of the human insertion solely from the long-read data. RESULTS The assessment of LoMA showed a high accuracy of CSs (error rate < 0.3%) compared with raw data (error rate > 8%) and superiority to a previous study. The genome-wide analysis of NA18943 and NA19240 identified 5516 and 6542 insertions (≥ 100 bp), respectively. Most insertions (~ 80%) were derived from tandem repeats and transposable elements. We also detected processed pseudogenes, insertions in transposable elements, and long insertions (> 10 kbp). Finally, our analysis suggested that short tandem duplications are associated with gene expression and transposons. CONCLUSIONS Our analysis showed that LoMA constructs high-quality sequences from long reads with substantial errors. This study revealed the true structures of the insertions with high accuracy and inferred the mechanisms for the insertions, thus contributing to future human genome studies. LoMA is available at our GitHub page: https://github.com/kolikem/loma .
Collapse
Affiliation(s)
- Ko Ikemoto
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Hongo 7-3-1, Bunkyo, Tokyo, Japan
| | - Hinano Fujimoto
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Hongo 7-3-1, Bunkyo, Tokyo, Japan
| | - Akihiro Fujimoto
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Hongo 7-3-1, Bunkyo, Tokyo, Japan.
| |
Collapse
|
25
|
MSINGB: A Novel Computational Method Based on NGBoost for Identifying Microsatellite Instability Status from Tumor Mutation Annotation Data. Interdiscip Sci 2023; 15:100-110. [PMID: 36350503 DOI: 10.1007/s12539-022-00544-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 10/19/2022] [Accepted: 10/22/2022] [Indexed: 11/11/2022]
Abstract
Microsatellite instability (MSI), a vital mutator phenotype caused by DNA mismatch repair deficiency, is frequently observed in several tumors. MSI is recognized as a critical molecular biomarker for diagnosis, prognosis, and therapeutic selection in several cancers. Identifying MSI status for current gold standard methods based on experimental analysis is laborious, time-consuming, and costly. Although several computational methods based on machine learning have been proposed to identify MSI status, we need to further understand which machine learning model would favor identification for MSI and which feature subset is strongly related to MSI. On this basis, more effective machine learning-based methods can be developed to improve the performance of MSI status identification. In this work, we present MSINGB, an NGBoost-based method for identifying MSI status from tumor somatic mutation annotation data. MSINGB first evaluates the prediction performance of 11 popular machine learning algorithms and 9 deep learning models to identify MSI. Among 20 models, NGBoost, a novel natural gradient boosting method, achieves the overall best performance. MSINGB then introduces two feature selection strategies to find the compact feature subset, which is strongly related to MSI, and employs the SHAP approach to interpreting how selected features impact the model prediction. MSINGB achieves a better prediction performance on both the tenfold cross-validation test and independent test compared with state-of-the-art methods.
Collapse
|
26
|
Wang P, Wang F. A proposed metric set for evaluation of genome assembly quality. Trends Genet 2023; 39:175-186. [PMID: 36402623 DOI: 10.1016/j.tig.2022.10.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 10/24/2022] [Accepted: 10/26/2022] [Indexed: 11/18/2022]
Abstract
Quality control is essential for genome assemblies; however, a consensus has yet to be reached on what metrics should be adopted for the evaluation of assembly quality. N50 is widely used for contiguity measurement, but its effectiveness is constantly in question. Prevailing metrics for the completeness evaluation focus on gene space, yet challenging areas such as tandem repeats are commonly overlooked. Achieving correctness has become an indispensable dimension for quality control, while prevailing assembly releases lack scores reflecting this aspect. We propose a metric set with a set of statistic indexes for effective, comprehensive evaluation of assemblies and provide a score of a finished assembly for each metric, which can be utilized as a benchmark for achieving high-quality genome assemblies.
Collapse
Affiliation(s)
- Peng Wang
- Key Laboratory of Crop Gene Resources and Germplasm Enhancement in Southern China, Ministry of Agriculture and Rural Affairs, Institute of Tropical Crop Genetic Resources, Chinese Academy of Tropical Agricultural Sciences, No. 4 Xueyuan Rd, Haikou City, Hainan 571101, China.
| | - Fei Wang
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, No. 100 Haiquan Rd, Shanghai 201416, China.
| |
Collapse
|
27
|
Mitsuhashi S, Frith MC. Analysis of Tandem Repeat Expansions Using Long DNA Reads. Methods Mol Biol 2023; 2632:147-159. [PMID: 36781727 DOI: 10.1007/978-1-0716-2996-3_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
Abstract
Abnormal expansion or shortening of tandem repeats can cause a variety of genetic diseases. The use of long DNA reads has facilitated the analysis of disease-causing repeats in the human genome. Long read sequencers enable us to directly analyze repeat length and sequence content by covering whole repeats; they are therefore considered suitable for the analysis of long tandem repeats. Here, we describe an expanded repeat analysis using target sequencing data produced by the Oxford Nanopore Technologies (hereafter referred to as ONT) nanopore sequencer.
Collapse
Affiliation(s)
- Satomi Mitsuhashi
- Department of Genomic Function and Diversity, Tokyo Medical and Dental University, Tokyo, Japan.
- Division of Neurology, Department of Internal Medicine, St. Marianna University School of Medicine, Kawasaki, Kanagawa, Japan.
| | - Martin C Frith
- Artificial Intelligence Research Center, AIST, Tokyo, Japan
- Graduate School of Frontier Sciences, University of Tokyo, Chiba, Japan
- Computational Bio Big-Data Open Innovation Laboratory, AIST, Tokyo, Japan
| |
Collapse
|
28
|
Chen P, Sun Z, Wang J, Liu X, Bai Y, Chen J, Liu A, Qiao F, Chen Y, Yuan C, Sha J, Zhang J, Xu LQ, Li J. Portable nanopore-sequencing technology: Trends in development and applications. Front Microbiol 2023; 14:1043967. [PMID: 36819021 PMCID: PMC9929578 DOI: 10.3389/fmicb.2023.1043967] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 01/03/2023] [Indexed: 02/04/2023] Open
Abstract
Sequencing technology is the most commonly used technology in molecular biology research and an essential pillar for the development and applications of molecular biology. Since 1977, when the first generation of sequencing technology opened the door to interpreting the genetic code, sequencing technology has been developing for three generations. It has applications in all aspects of life and scientific research, such as disease diagnosis, drug target discovery, pathological research, species protection, and SARS-CoV-2 detection. However, the first- and second-generation sequencing technology relied on fluorescence detection systems and DNA polymerization enzyme systems, which increased the cost of sequencing technology and limited its scope of applications. The third-generation sequencing technology performs PCR-free and single-molecule sequencing, but it still depends on the fluorescence detection device. To break through these limitations, researchers have made arduous efforts to develop a new advanced portable sequencing technology represented by nanopore sequencing. Nanopore technology has the advantages of small size and convenient portability, independent of biochemical reagents, and direct reading using physical methods. This paper reviews the research and development process of nanopore sequencing technology (NST) from the laboratory to commercially viable tools; discusses the main types of nanopore sequencing technologies and their various applications in solving a wide range of real-world problems. In addition, the paper collates the analysis tools necessary for performing different processing tasks in nanopore sequencing. Finally, we highlight the challenges of NST and its future research and application directions.
Collapse
Affiliation(s)
- Pin Chen
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Zepeng Sun
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China
| | - Jiawei Wang
- School of Computer Science and Technology, Southeast University, Nanjing, China
| | - Xinlong Liu
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China
| | - Yun Bai
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Jiang Chen
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Anna Liu
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Feng Qiao
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China
| | - Yang Chen
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Chenyan Yuan
- Clinical Laboratory, Southeast University Zhongda Hospital, Nanjing, China
| | - Jingjie Sha
- School of Mechanical Engineering, Southeast University, Nanjing, China
| | - Jinghui Zhang
- School of Computer Science and Technology, Southeast University, Nanjing, China
| | - Li-Qun Xu
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China,*Correspondence: Li-Qun Xu, ✉
| | - Jian Li
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China,Jian Li, ✉
| |
Collapse
|
29
|
Fan C, Chen K, Wang Y, Ball EV, Stenson PD, Mort M, Bacolla A, Kehrer-Sawatzki H, Tainer JA, Cooper DN, Zhao H. Profiling human pathogenic repeat expansion regions by synergistic and multi-level impacts on molecular connections. Hum Genet 2023; 142:245-274. [PMID: 36344696 PMCID: PMC10290229 DOI: 10.1007/s00439-022-02500-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 10/24/2022] [Indexed: 11/09/2022]
Abstract
Whilst DNA repeat expansions cause numerous heritable human disorders, their origins and underlying pathological mechanisms are often unclear. We collated a dataset comprising 224 human repeat expansions encompassing 203 different genes, and performed a systematic analysis with respect to key topological features at the DNA, RNA and protein levels. Comparison with controls without known pathogenicity and genomic regions lacking repeats, allowed the construction of the first tool to discriminate repeat regions harboring pathogenic repeat expansions (DPREx). At the DNA level, pathogenic repeat expansions exhibited stronger signals for DNA regulatory factors (e.g. H3K4me3, transcription factor-binding sites) in exons, promoters, 5'UTRs and 5'genes but were not significantly different from controls in introns, 3'UTRs and 3'genes. Additionally, pathogenic repeat expansions were also found to be enriched in non-B DNA structures. At the RNA level, pathogenic repeat expansions were characterized by lower free energy for forming RNA secondary structure and were closer to splice sites in introns, exons, promoters and 5'genes than controls. At the protein level, pathogenic repeat expansions exhibited a preference to form coil rather than other types of secondary structure, and tended to encode surface-located protein domains. Guided by these features, DPREx ( http://biomed.nscc-gz.cn/zhaolab/geneprediction/# ) achieved an Area Under the Curve (AUC) value of 0.88 in a test on an independent dataset. Pathogenic repeat expansions are thus located such that they exert a synergistic influence on the gene expression pathway involving inter-molecular connections at the DNA, RNA and protein levels.
Collapse
Affiliation(s)
- Cong Fan
- Department of Medical Research Center, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, 107 Yan Jiang West Road, Guangzhou, 500001, People's Republic of China
| | - Ken Chen
- School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou, 500001, China
| | - Yukai Wang
- School of Life Science, Sun Yat-Sen University, Guangzhou, 500001, China
| | - Edward V Ball
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Matthew Mort
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Albino Bacolla
- Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, 6767 Bertner Avenue, Houston, TX, 77030, USA
| | | | - John A Tainer
- Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, 6767 Bertner Avenue, Houston, TX, 77030, USA
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Huiying Zhao
- Department of Medical Research Center, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, 107 Yan Jiang West Road, Guangzhou, 500001, People's Republic of China.
| |
Collapse
|
30
|
Lang J, Xu Z, Wang Y, Sun J, Yang Z. NanoSTR: A method for detection of target short tandem repeats based on nanopore sequencing data. Front Mol Biosci 2023; 10:1093519. [PMID: 36743210 PMCID: PMC9889824 DOI: 10.3389/fmolb.2023.1093519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 01/06/2023] [Indexed: 01/19/2023] Open
Abstract
Short tandem repeats (STRs) are widely present in the human genome. Studies have confirmed that STRs are associated with more than 30 diseases, and they have also been used in forensic identification and paternity testing. However, there are few methods for STR detection based on nanopore sequencing due to the challenges posed by the sequencing principles and the data characteristics of nanopore sequencing. We developed NanoSTR for detection of target STR loci based on the length-number-rank (LNR) information of reads. NanoSTR can be used for STR detection and genotyping based on long-read data from nanopore sequencing with improved accuracy and efficiency compared with other existing methods, such as Tandem-Genotypes and TRiCoLOR. NanoSTR showed 100% concordance with the expected genotypes using error-free simulated data, and also achieved >85% concordance using the standard samples (containing autosomal and Y-chromosomal loci) with MinION sequencing platform, respectively. NanoSTR showed high performance for detection of target STR markers. Although NanoSTR needs further optimization and development, it is useful as an analytical method for the detection of STR loci by nanopore sequencing. This method adds to the toolbox for nanopore-based STR analysis and expands the applications of nanopore sequencing in scientific research and clinical scenarios. The main code and the data are available at https://github.com/langjidong/NanoSTR.
Collapse
|
31
|
Rafehi H, Read J, Szmulewicz DJ, Davies KC, Snell P, Fearnley LG, Scott L, Thomsen M, Gillies G, Pope K, Bennett MF, Munro JE, Ngo KJ, Chen L, Wallis MJ, Butler EG, Kumar KR, Wu KHC, Tomlinson SE, Tisch S, Malhotra A, Lee-Archer M, Dolzhenko E, Eberle MA, Roberts LJ, Fogel BL, Brüggemann N, Lohmann K, Delatycki MB, Bahlo M, Lockhart PJ. An intronic GAA repeat expansion in FGF14 causes the autosomal-dominant adult-onset ataxia SCA50/ATX-FGF14. Am J Hum Genet 2023; 110:105-119. [PMID: 36493768 PMCID: PMC9892775 DOI: 10.1016/j.ajhg.2022.11.015] [Citation(s) in RCA: 58] [Impact Index Per Article: 58.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 11/19/2022] [Indexed: 12/13/2022] Open
Abstract
Adult-onset cerebellar ataxias are a group of neurodegenerative conditions that challenge both genetic discovery and molecular diagnosis. In this study, we identified an intronic (GAA) repeat expansion in fibroblast growth factor 14 (FGF14). Genetic analysis of 95 Australian individuals with adult-onset ataxia identified four (4.2%) with (GAA)>300 and a further nine individuals with (GAA)>250. PCR and long-read sequence analysis revealed these were pure (GAA) repeats. In comparison, no control subjects had (GAA)>300 and only 2/311 control individuals (0.6%) had a pure (GAA)>250. In a German validation cohort, 9/104 (8.7%) of affected individuals had (GAA)>335 and a further six had (GAA)>250, whereas 10/190 (5.3%) control subjects had (GAA)>250 but none were (GAA)>335. The combined data suggest (GAA)>335 are disease causing and fully penetrant (p = 6.0 × 10-8, OR = 72 [95% CI = 4.3-1,227]), while (GAA)>250 is likely pathogenic with reduced penetrance. Affected individuals had an adult-onset, slowly progressive cerebellar ataxia with variable features including vestibular impairment, hyper-reflexia, and autonomic dysfunction. A negative correlation between age at onset and repeat length was observed (R2 = 0.44, p = 0.00045, slope = -0.12) and identification of a shared haplotype in a minority of individuals suggests that the expansion can be inherited or generated de novo during meiotic division. This study demonstrates the power of genome sequencing and advanced bioinformatic tools to identify novel repeat expansions via model-free, genome-wide analysis and identifies SCA50/ATX-FGF14 as a frequent cause of adult-onset ataxia.
Collapse
Affiliation(s)
- Haloom Rafehi
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia,Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia
| | - Justin Read
- Bruce Lefroy Centre, Murdoch Children’s Research Institute, Parkville, VIC 3052, Australia,Department of Paediatrics, University of Melbourne, Royal Children’s Hospital, Parkville, VIC, Australia
| | - David J. Szmulewicz
- Cerebellar Ataxia Clinic, Eye and Ear Hospital, Melbourne, VIC, Australia,The Florey Institute of Neuroscience and Mental Health, University of Melbourne, Melbourne, VIC, Australia
| | - Kayli C. Davies
- Bruce Lefroy Centre, Murdoch Children’s Research Institute, Parkville, VIC 3052, Australia,Department of Paediatrics, University of Melbourne, Royal Children’s Hospital, Parkville, VIC, Australia
| | - Penny Snell
- Bruce Lefroy Centre, Murdoch Children’s Research Institute, Parkville, VIC 3052, Australia
| | - Liam G. Fearnley
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia,Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia,Bruce Lefroy Centre, Murdoch Children’s Research Institute, Parkville, VIC 3052, Australia
| | - Liam Scott
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
| | - Mirja Thomsen
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany
| | - Greta Gillies
- Bruce Lefroy Centre, Murdoch Children’s Research Institute, Parkville, VIC 3052, Australia
| | - Kate Pope
- Bruce Lefroy Centre, Murdoch Children’s Research Institute, Parkville, VIC 3052, Australia
| | - Mark F. Bennett
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia,Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia,Epilepsy Research Centre, Department of Medicine, University of Melbourne, Austin Health, Heidelberg, VIC, Australia
| | - Jacob E. Munro
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia,Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia
| | - Kathie J. Ngo
- Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA, USA
| | - Luke Chen
- Alfred Hospital, Department of Neurology, Melbourne, VIC, Australia
| | - Mathew J. Wallis
- Clinical Genetics Service, Austin Health, Melbourne, VIC, Australia,Department of Medicine, University of Melbourne, Austin Health, Melbourne, VIC, Australia,School of Medicine and Menzies Institute for Medical Research, University of Tasmania, Hobart, TAS, Australia
| | | | - Kishore R. Kumar
- Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia,Molecular Medicine Laboratory and Department of Neurology, Concord Repatriation General Hospital, Concord, NSW, Australia,Garvan Institute of Medical Research, Sydney, NSW, Australia
| | - Kathy HC. Wu
- School of Medicine, University of New South Wales, Sydney, NSW, Australia,Clinical Genomics, St Vincent’s Hospital, Darlinghurst, NSW, Australia,Discipline of Genomic Medicine, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, Australia,School of Medicine, University of Notre Dame, Sydney, NSW, Australia
| | - Susan E. Tomlinson
- School of Medicine, University of Notre Dame, Sydney, NSW, Australia,Department of Neurology, St Vincent’s Hospital, Darlinghurst, NSW, Australia
| | - Stephen Tisch
- School of Medicine, University of New South Wales, Sydney, NSW, Australia,Department of Neurology, St Vincent’s Hospital, Darlinghurst, NSW, Australia
| | - Abhishek Malhotra
- Department of Neuroscience, University Hospital Geelong, Geelong, VIC, Australia
| | - Matthew Lee-Archer
- Launceston General Hospital, Tasmanian Health Service, Launceston, TAS, Australia
| | | | | | - Leslie J. Roberts
- Department of Neurology and Neurological Research, St. Vincent’s Hospital, Melbourne, VIC, Australia
| | - Brent L. Fogel
- Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA, USA,Departments of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA, USA
| | - Norbert Brüggemann
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany,Department of Neurology, University Medical Center Schleswig-Holstein, Campus Lübeck, Germany
| | - Katja Lohmann
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany
| | - Martin B. Delatycki
- Bruce Lefroy Centre, Murdoch Children’s Research Institute, Parkville, VIC 3052, Australia,Department of Paediatrics, University of Melbourne, Royal Children’s Hospital, Parkville, VIC, Australia,Victorian Clinical Genetics Services, Melbourne, VIC, Australia
| | - Melanie Bahlo
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia; Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia.
| | - Paul J. Lockhart
- Bruce Lefroy Centre, Murdoch Children’s Research Institute, Parkville, VIC 3052, Australia,Department of Paediatrics, University of Melbourne, Royal Children’s Hospital, Parkville, VIC, Australia,Corresponding author
| |
Collapse
|
32
|
Frith MC, Mitsuhashi S. Finding Rearrangements in Nanopore DNA Reads with LAST and dnarrange. Methods Mol Biol 2023; 2632:161-175. [PMID: 36781728 DOI: 10.1007/978-1-0716-2996-3_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/27/2023]
Abstract
Long-read DNA sequencing techniques such as nanopore are especially useful for characterizing complex sequence rearrangements, which occur in some genetic diseases and also during evolution. Analyzing the sequence data to understand such rearrangements is not trivial, due to sequencing error, rearrangement intricacy, and abundance of repeated similar sequences in genomes.The LAST and dnarrange software packages can resolve complex relationships between DNA sequences and characterize changes such as gene conversion, processed pseudogene insertion, and chromosome shattering. They can filter out numerous rearrangements shared by controls, e.g., healthy humans versus a patient, to focus on rearrangements unique to the patient. One useful ingredient is last-train, which learns the rates (probabilities) of deletions, insertions, and each kind of base match and mismatch. These probabilities are then used to find the most likely sequence relationships/alignments, which is especially useful for DNA with unusual rates, such as DNA from Plasmodium falciparum (malaria) with ∼80% a+t. This is also useful for less-studied species that lack reference genomes, so the DNA reads are compared to a different species' genome. We also point out that a reference genome with ancestral alleles would be ideal.
Collapse
Affiliation(s)
- Martin C Frith
- Artificial Intelligence Research Center, AIST, Tokyo, Japan.
- Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Japan.
- Computational Bio Big-Data Open Innovation Laboratory, AIST, Tokyo, Japan.
| | - Satomi Mitsuhashi
- Department of Genomic Function and Diversity, Tokyo Medical and Dental University, Tokyo, Japan
- Division of Neurology, Department of Internal Medicine, St. Marianna University School of Medicine, Kawasaki, Japan
| |
Collapse
|
33
|
Taylor A, Barros D, Gobet N, Schuepbach T, McAllister B, Aeschbach L, Randall E, Trofimenko E, Heuchan E, Barszcz P, Ciosi M, Morgan J, Hafford-Tear N, Davidson A, Massey T, Monckton D, Jones L, network REGISTRYH, Xenarios I, Dion V. Repeat Detector: versatile sizing of expanded tandem repeats and identification of interrupted alleles from targeted DNA sequencing. NAR Genom Bioinform 2022; 4:lqac089. [PMID: 36478959 PMCID: PMC9719798 DOI: 10.1093/nargab/lqac089] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 10/25/2022] [Accepted: 11/08/2022] [Indexed: 12/07/2022] Open
Abstract
Targeted DNA sequencing approaches will improve how the size of short tandem repeats is measured for diagnostic tests and preclinical studies. The expansion of these sequences causes dozens of disorders, with longer tracts generally leading to a more severe disease. Interrupted alleles are sometimes present within repeats and can alter disease manifestation. Determining repeat size mosaicism and identifying interruptions in targeted sequencing datasets remains a major challenge. This is in part because standard alignment tools are ill-suited for repetitive and unstable sequences. To address this, we have developed Repeat Detector (RD), a deterministic profile weighting algorithm for counting repeats in targeted sequencing data. We tested RD using blood-derived DNA samples from Huntington's disease and Fuchs endothelial corneal dystrophy patients sequenced using either Illumina MiSeq or Pacific Biosciences single-molecule, real-time sequencing platforms. RD was highly accurate in determining repeat sizes of 609 blood-derived samples from Huntington's disease individuals and did not require prior knowledge of the flanking sequences. Furthermore, RD can be used to identify alleles with interruptions and provide a measure of repeat instability within an individual. RD is therefore highly versatile and may find applications in the diagnosis of expanded repeat disorders and in the development of novel therapies.
Collapse
Affiliation(s)
- Alysha S Taylor
- UK Dementia Research Institute, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff, CF24 4HQ, UK
| | - Dinis Barros
- Centre for Integrative Genomics, University of Lausanne, Bâtiment Génopode, 1015 Lausanne, Switzerland
| | - Nastassia Gobet
- Centre for Integrative Genomics, University of Lausanne, Bâtiment Génopode, 1015 Lausanne, Switzerland
| | - Thierry Schuepbach
- Vital-IT Group, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Newbiologix, Ch. De la corniche 6-8, 1066 Epalinges, Switzerland
| | - Branduff McAllister
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff CF24 4HQ, UK
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Lorene Aeschbach
- Centre for Integrative Genomics, University of Lausanne, Bâtiment Génopode, 1015 Lausanne, Switzerland
| | - Emma L Randall
- UK Dementia Research Institute, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff, CF24 4HQ, UK
| | - Evgeniya Trofimenko
- Centre for Integrative Genomics, University of Lausanne, Bâtiment Génopode, 1015 Lausanne, Switzerland
- Sorbonne Université, École normale supérieure, PSL University, CNRS, Laboratoire des biomolécules, LBM, 75005 Paris, France
| | - Eleanor R Heuchan
- UK Dementia Research Institute, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff, CF24 4HQ, UK
| | - Paula Barszcz
- Centre for Integrative Genomics, University of Lausanne, Bâtiment Génopode, 1015 Lausanne, Switzerland
| | - Marc Ciosi
- School of Molecular Biosciences, College of Medical, Veterinary and Life Sciences, Davidson Building, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Joanne Morgan
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff CF24 4HQ, UK
| | | | - Alice E Davidson
- UCL Institute of Ophthalmology, 11-43 Bath Street, London, EC1V 9EL UK
| | - Thomas H Massey
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff CF24 4HQ, UK
| | - Darren G Monckton
- School of Molecular Biosciences, College of Medical, Veterinary and Life Sciences, Davidson Building, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Lesley Jones
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff CF24 4HQ, UK
| | | | - Ioannis Xenarios
- Centre for Integrative Genomics, University of Lausanne, Bâtiment Génopode, 1015 Lausanne, Switzerland
- Health2030 Genome Center, Ch des Mines 14, 1202 Genève, Switzerland
| | - Vincent Dion
- UK Dementia Research Institute, Cardiff University, Hadyn Ellis Building, Maindy Road, Cardiff, CF24 4HQ, UK
| |
Collapse
|
34
|
Xylogiannopoulos KF. Multiple genome analytics framework: The case of all SARS-CoV-2 complete variants. J Biotechnol 2022; 359:130-141. [PMID: 36195206 PMCID: PMC9527188 DOI: 10.1016/j.jbiotec.2022.09.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 06/26/2022] [Accepted: 09/26/2022] [Indexed: 11/05/2022]
Abstract
Pattern detection and string matching are fundamental problems in computer science and the accelerated expansion of bioinformatics and computational biology have made them a core topic for both disciplines. The requirement for computational tools for genomic analyses, such as sequence alignment, is very important, although, in most cases the resources and computational power required are enormous. The presented Multiple Genome Analytics Framework combines data structures and algorithms, specifically built for text mining and (repeated) pattern detection, that can help to efficiently address several computational biology and bioinformatics problems, concurrently, with minimal resources. A single execution of advanced algorithms, with space and time complexity O(nlogn), is enough to acquire knowledge on all repeated patterns that exist in multiple genome sequences and this information can be used as input by meta-algorithms for further meta-analyses. For the proof of concept and technology of the proposed Framework scalability, agility and efficiency, a publicly available dataset of more than 300,000 SARS-CoV-2 genome sequences from the National Center for Biotechnology Information has been used for the detection of all repeated patterns. These results have been used by newly introduced algorithms to provide answers to questions such as common patterns among all variants, sequence alignment, palindromes and tandem repeats detection, different organism genome comparisons, polymerase chain reaction primers detection, etc.
Collapse
|
35
|
Arslan A. Systematic Inspection of Genomic Tandem Repeats and Rearrangements in Autism Model. BRAIN DISORDERS 2022. [DOI: 10.1016/j.dscb.2022.100059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
|
36
|
Rapid and comprehensive diagnostic method for repeat expansion diseases using nanopore sequencing. NPJ Genom Med 2022; 7:62. [PMID: 36289212 PMCID: PMC9606279 DOI: 10.1038/s41525-022-00331-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 09/30/2022] [Indexed: 11/23/2022] Open
Abstract
We developed a diagnostic method for repeat expansion diseases using a long-read sequencer to improve currently available, low throughput diagnostic methods. We employed the real-time target enrichment system of the nanopore GridION sequencer using the adaptive sampling option, in which software-based target assignment is available without prior sample enrichment, and built an analysis pipeline that prioritized the disease-causing loci. Twenty-two patients with various neurological and neuromuscular diseases, including 12 with genetically diagnosed repeat expansion diseases and 10 manifesting cerebellar ataxia, but without genetic diagnosis, were analyzed. We first sequenced the 12 molecularly diagnosed patients and accurately confirmed expanded repeats in all with uniform depth of coverage across the loci. Next, we applied our method and a conventional method to 10 molecularly undiagnosed patients. Our method corrected inaccurate diagnoses of two patients by the conventional method. Our method is superior to conventional diagnostic methods in terms of speed, accuracy, and comprehensiveness.
Collapse
|
37
|
van Vliet EA, Hildebrand MS, Mills JD, Brennan GP, Eid T, Masino SA, Whittemore V, Bindila L, Wang KK, Patel M, Perucca P, Reid CA. A companion to the preclinical common data elements for genomics, transcriptomics, and epigenomics data in rodent epilepsy models. A report of the TASK3-WG4 omics working group of the ILAE/AES joint translational TASK force. Epilepsia Open 2022. [PMID: 35950645 DOI: 10.1002/epi4.12640] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Accepted: 02/22/2022] [Indexed: 11/06/2022] Open
Abstract
The International League Against Epilepsy/American Epilepsy Society (ILAE/AES) Joint Translational Task Force established the TASK3 working groups to create common data elements (CDEs) for various preclinical epilepsy research disciplines. The aim of the CDEs is to improve the standardization of experimental designs across a range of epilepsy research-related methods. Here, we have generated CDE tables with key parameters and case report forms (CRFs) containing the essential contents of the study protocols for genomics, transcriptomics, and epigenomics in rodent models of epilepsy, with a specific focus on adult rats and mice. We discuss the important elements that need to be considered for genomics, transcriptomics, and epigenomics methodologies, providing a rationale for the parameters that should be collected. This is the first in a two-part series of omics papers with the second installment to cover proteomics, lipidomics, and metabolomics in adult rodents.
Collapse
Affiliation(s)
- Erwin A van Vliet
- Center for Neuroscience, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands
- Amsterdam UMC location University of Amsterdam, Department of (Neuro)Pathology, Amsterdam Neuroscience, Amsterdam, The Netherlands
| | - Michael S Hildebrand
- Epilepsy Research Centre, Department of Medicine (Austin Health), The University of Melbourne, Heidelberg, Victoria, Australia
- Murdoch Children's Research Institute, The Royal Children's Hospital, Parkville, Victoria, Australia
| | - James D Mills
- Amsterdam UMC location University of Amsterdam, Department of (Neuro)Pathology, Amsterdam Neuroscience, Amsterdam, The Netherlands
| | - Gary P Brennan
- UCD School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Dublin, Ireland
- FutureNeuro Research Centre, Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Tore Eid
- Department of Laboratory Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Susan A Masino
- Neuroscience Program and Psychology Department, Life Sciences Center, Trinity College, Hartford, Connecticut, USA
| | - Vicky Whittemore
- Division of Neuroscience, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
| | - Laura Bindila
- Clinical Lipidomics Unit, Institute of Physiological Chemistry, University Medical Center of the Johannes Gutenberg University of Mainz, Mainz, Germany
| | - Kevin K Wang
- Department of Emergency Medicine, Psychiatry and Neuroscience, University of Florida, Gainesville, Florida, USA
- Brain Rehabilitation Research Center, Malcom Randall VA Medical Center, North Florida/South Georgia Veterans Health System, Gainesville, Florida, USA
| | - Manisha Patel
- Department of Pharmaceutical Sciences, University of Colorado, Aurora, Colorado, USA
| | - Piero Perucca
- Epilepsy Research Centre, Department of Medicine (Austin Health), The University of Melbourne, Heidelberg, Victoria, Australia
- Bladin-Berkovic Comprehensive Epilepsy Program, Austin Health, Heidelberg, Victoria, Australia
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, Victoria, Australia
- Department of Neurology, The Royal Melbourne Hospital, Melbourne, Victoria, Australia
- Department of Neurology, Alfred Health, Melbourne, Victoria, Australia
| | - Christopher A Reid
- Epilepsy Research Centre, Department of Medicine (Austin Health), The University of Melbourne, Heidelberg, Victoria, Australia
- Murdoch Children's Research Institute, The Royal Children's Hospital, Parkville, Victoria, Australia
- Florey Institute of Neuroscience and Mental Health, University of Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
38
|
Cohen ASA, Farrow EG, Abdelmoity AT, Alaimo JT, Amudhavalli SM, Anderson JT, Bansal L, Bartik L, Baybayan P, Belden B, Berrios CD, Biswell RL, Buczkowicz P, Buske O, Chakraborty S, Cheung WA, Coffman KA, Cooper AM, Cross LA, Curran T, Dang TTT, Elfrink MM, Engleman KL, Fecske ED, Fieser C, Fitzgerald K, Fleming EA, Gadea RN, Gannon JL, Gelineau-Morel RN, Gibson M, Goldstein J, Grundberg E, Halpin K, Harvey BS, Heese BA, Hein W, Herd SM, Hughes SS, Ilyas M, Jacobson J, Jenkins JL, Jiang S, Johnston JJ, Keeler K, Korlach J, Kussmann J, Lambert C, Lawson C, Le Pichon JB, Leeder JS, Little VC, Louiselle DA, Lypka M, McDonald BD, Miller N, Modrcin A, Nair A, Neal SH, Oermann CM, Pacicca DM, Pawar K, Posey NL, Price N, Puckett LMB, Quezada JF, Raje N, Rowell WJ, Rush ET, Sampath V, Saunders CJ, Schwager C, Schwend RM, Shaffer E, Smail C, Soden S, Strenk ME, Sullivan BR, Sweeney BR, Tam-Williams JB, Walter AM, Welsh H, Wenger AM, Willig LK, Yan Y, Younger ST, Zhou D, Zion TN, Thiffault I, Pastinen T. Genomic answers for children: Dynamic analyses of >1000 pediatric rare disease genomes. Genet Med 2022; 24:1336-1348. [PMID: 35305867 DOI: 10.1016/j.gim.2022.02.007] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 02/05/2022] [Accepted: 02/07/2022] [Indexed: 12/17/2022] Open
Abstract
PURPOSE This study aimed to provide comprehensive diagnostic and candidate analyses in a pediatric rare disease cohort through the Genomic Answers for Kids program. METHODS Extensive analyses of 960 families with suspected genetic disorders included short-read exome sequencing and short-read genome sequencing (srGS); PacBio HiFi long-read genome sequencing (HiFi-GS); variant calling for single nucleotide variants (SNV), structural variant (SV), and repeat variants; and machine-learning variant prioritization. Structured phenotypes, prioritized variants, and pedigrees were stored in PhenoTips database, with data sharing through controlled access the database of Genotypes and Phenotypes. RESULTS Diagnostic rates ranged from 11% in patients with prior negative genetic testing to 34.5% in naive patients. Incorporating SVs from genome sequencing added up to 13% of new diagnoses in previously unsolved cases. HiFi-GS yielded increased discovery rate with >4-fold more rare coding SVs compared with srGS. Variants and genes of unknown significance remain the most common finding (58% of nondiagnostic cases). CONCLUSION Computational prioritization is efficient for diagnostic SNVs. Thorough identification of non-SNVs remains challenging and is partly mitigated using HiFi-GS sequencing. Importantly, community research is supported by sharing real-time data to accelerate gene validation and by providing HiFi variant (SNV/SV) resources from >1000 human alleles to facilitate implementation of new sequencing platforms for rare disease diagnoses.
Collapse
Affiliation(s)
- Ana S A Cohen
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO; Department of Pathology and Laboratory Medicine, Children's Mercy Kansas City, Kansas City, MO; UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO
| | - Emily G Farrow
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO; UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO; Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | | | - Joseph T Alaimo
- Department of Pathology and Laboratory Medicine, Children's Mercy Kansas City, Kansas City, MO; UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO
| | - Shivarajan M Amudhavalli
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO; Division of Genetics, Children's Mercy Kansas City, Kansas City, MO
| | - John T Anderson
- Department of Orthopaedic Surgery, Children's Mercy Kansas City, Kansas City, MO
| | - Lalit Bansal
- Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | - Lauren Bartik
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO; Division of Genetics, Children's Mercy Kansas City, Kansas City, MO
| | | | - Bradley Belden
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO
| | | | - Rebecca L Biswell
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO
| | | | | | | | - Warren A Cheung
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO
| | - Keith A Coffman
- Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | - Ashley M Cooper
- Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | - Laura A Cross
- Division of Genetics, Children's Mercy Kansas City, Kansas City, MO
| | - Tom Curran
- Children's Mercy Research Institute, Kansas City, MO
| | - Thuy Tien T Dang
- Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | - Mary M Elfrink
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO
| | | | - Erin D Fecske
- Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | - Cynthia Fieser
- Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | - Keely Fitzgerald
- Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | - Emily A Fleming
- Division of Genetics, Children's Mercy Kansas City, Kansas City, MO
| | - Randi N Gadea
- Division of Genetics, Children's Mercy Kansas City, Kansas City, MO
| | | | - Rose N Gelineau-Morel
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO; Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | - Margaret Gibson
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO
| | - Jeffrey Goldstein
- Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | - Elin Grundberg
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO
| | - Kelsee Halpin
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO; Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | - Brian S Harvey
- Department of Orthopaedic Surgery, Children's Mercy Kansas City, Kansas City, MO
| | - Bryce A Heese
- Division of Genetics, Children's Mercy Kansas City, Kansas City, MO
| | - Wendy Hein
- Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | - Suzanne M Herd
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO
| | - Susan S Hughes
- Division of Genetics, Children's Mercy Kansas City, Kansas City, MO
| | - Mohammed Ilyas
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO; Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | - Jill Jacobson
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO; Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | - Janda L Jenkins
- Division of Genetics, Children's Mercy Kansas City, Kansas City, MO
| | | | | | - Kathryn Keeler
- Department of Orthopaedic Surgery, Children's Mercy Kansas City, Kansas City, MO
| | - Jonas Korlach
- Pacific Biosciences of California, Inc, Menlo Park, CA
| | | | | | - Caitlin Lawson
- Division of Genetics, Children's Mercy Kansas City, Kansas City, MO
| | | | | | - Vicki C Little
- Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | | | | | | | - Neil Miller
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO; UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO; Division of Allergy Immunology Pulmonary and Sleep Medicine, Children's Mercy Kansas City, Kansas City, MO
| | - Ann Modrcin
- Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | - Annapoorna Nair
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO
| | - Shelby H Neal
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO
| | | | - Donna M Pacicca
- Department of Orthopaedic Surgery, Children's Mercy Kansas City, Kansas City, MO
| | - Kailash Pawar
- Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | - Nyshele L Posey
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO
| | - Nigel Price
- Department of Orthopaedic Surgery, Children's Mercy Kansas City, Kansas City, MO
| | - Laura M B Puckett
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO
| | - Julio F Quezada
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO; Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | - Nikita Raje
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO; Division of Neonatology, Children's Mercy Kansas City, Kansas City, MO
| | | | - Eric T Rush
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO; Division of Genetics, Children's Mercy Kansas City, Kansas City, MO; Department of Internal Medicine, University of Kansas School of Medicine, Kansas City, MO
| | - Venkatesh Sampath
- Division of Neonatology, Children's Mercy Hospital Kansas City, Kansas City, MO
| | - Carol J Saunders
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO; Department of Pathology and Laboratory Medicine, Children's Mercy Kansas City, Kansas City, MO; UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO
| | - Caitlin Schwager
- Division of Genetics, Children's Mercy Kansas City, Kansas City, MO
| | - Richard M Schwend
- Department of Orthopaedic Surgery, Children's Mercy Kansas City, Kansas City, MO
| | - Elizabeth Shaffer
- Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | - Craig Smail
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO
| | - Sarah Soden
- Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | - Meghan E Strenk
- Division of Genetics, Children's Mercy Kansas City, Kansas City, MO
| | | | - Brooke R Sweeney
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO; Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | | | - Adam M Walter
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO
| | - Holly Welsh
- Division of Genetics, Children's Mercy Kansas City, Kansas City, MO
| | | | - Laurel K Willig
- Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | - Yun Yan
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO; Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO
| | - Scott T Younger
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO
| | - Dihong Zhou
- Division of Genetics, Children's Mercy Kansas City, Kansas City, MO
| | - Tricia N Zion
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO; UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO; Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO; Division of Genetics, Children's Mercy Kansas City, Kansas City, MO
| | - Isabelle Thiffault
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO; Department of Pathology and Laboratory Medicine, Children's Mercy Kansas City, Kansas City, MO; UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO.
| | - Tomi Pastinen
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO; UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO; Children's Mercy Research Institute, Kansas City, MO.
| |
Collapse
|
39
|
Fang L, Liu Q, Monteys AM, Gonzalez-Alegre P, Davidson BL, Wang K. DeepRepeat: direct quantification of short tandem repeats on signal data from nanopore sequencing. Genome Biol 2022; 23:108. [PMID: 35484600 PMCID: PMC9052667 DOI: 10.1186/s13059-022-02670-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 04/08/2022] [Indexed: 12/12/2022] Open
Abstract
Despite recent improvements in basecalling accuracy, nanopore sequencing still has higher error rates on short-tandem repeats (STRs). Instead of using basecalled reads, we developed DeepRepeat which converts ionic current signals into red-green-blue channels, thus transforming the repeat detection problem into an image recognition problem. DeepRepeat identifies and accurately quantifies telomeric repeats in the CHM13 cell line and achieves higher accuracy in quantifying repeats in long STRs than competing methods. We also evaluate DeepRepeat on genome-wide or candidate region datasets from seven different sources. In summary, DeepRepeat enables accurate quantification of long STRs and complements existing methods relying on basecalled reads.
Collapse
Affiliation(s)
- Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA. .,School of Life Sciences, College of Science, University of Nevada, Las Vegas, 4505 S Maryland Pkwy, Las Vegas, NV, 89154, USA. .,Nevada Institute of Personalized Medicine, College of Science, University of Nevada, Las Vegas, 4505 S Maryland Pkwy, Las Vegas, NV, 89154, USA.
| | - Alex Mas Monteys
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Pedro Gonzalez-Alegre
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Beverly L Davidson
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.,Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA. .,Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
40
|
Liu YH, Chou YT, Chang FP, Lee WJ, Guo YC, Chou CT, Huang HC, Mizuguchi T, Chou CC, Yu HY, Yu KW, Wu HM, Tsai PC, Matsumoto N, Lee YC, Liao YC. Neuronal intranuclear inclusion disease in patients with adult-onset non-vascular leukoencephalopathy. Brain 2022; 145:3010-3021. [PMID: 35411397 DOI: 10.1093/brain/awac135] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 03/24/2022] [Accepted: 03/27/2022] [Indexed: 11/12/2022] Open
Abstract
Neuronal intranuclear inclusion disease (NIID), caused by an expansion of GGC repeats in the 5'-untranslated region of NOTCH2NLC, is an important but underdiagnosed cause of adult-onset leukoencephalopathies. The present study aimed to investigate the prevalence, clinical spectrum, and brain MRI characteristics of NIID in adult-onset nonvascular leukoencephalopathies and assess the diagnostic performance of neuroimaging features. One hundred and sixty-one unrelated Taiwanese patients with genetically undetermined nonvascular leukoencephalopathies were screened for the NOTCH2NLC GGC repeat expansions using fragment analysis, repeat-primed PCR, southern blot analysis and/or nanopore sequencing with Cas9-mediated enrichment. Among them, 32 (19.9%) patients had an expanded NOTCH2NLC allele and diagnosed with NIID. We enrolled another two affected family members from one patient for further analysis. The size of the expanded NOTCH2NLC GGC repeats in the 34 patients ranged from 73 to 323 repeats. Skin biopsy from five patients all showed eosinophilic, p62-positive intranuclear inclusions in the sweat gland cells and dermal adipocytes. Among the 34 NIID patents presenting with nonvascular leukoencephalopathies, the median age at symptom onset was 61 years (range, 41-78 years) and the initial presentations included cognitive decline (44.1%; 15/34), acute encephalitis-like episodes (32.4%; 11/34), limb weakness (11.8%, 4/34), and parkinsonism (11.8%; 4/34). Cognitive decline (64.7%; 22/34) and acute encephalitis-like episodes (55.9%; 19/34) were also the most common overall manifestations. Two-thirds of the patients had either bladder dysfunction or visual disturbance. Comparing the brain MRI features between the NIID patients and individuals with other undetermined leukoencephalopathies, corticomedullary junction curvilinear lesion on diffusion weighted imaging (DWI) was the best biomarker to diagnose NIID with high specificity (98.4%) and sensitivity (88.2%). However, such DWI abnormality was absent in 11.8% of the NIID patients. When only fluid-attenuated inversion recovery images were available, presence of white matter hyperintensity lesions (WMH) either in paravermis or middle cerebellar peduncles also favored the diagnosis of NIID with a specificity of 85.3% and a sensitivity of 76.5%. Among the ten patients' MRI performed within 5 days of the onset of acute encephalitis-like episodes, five showed cortical DWI hyperintense lesions and two revealed focal brain edema. In conclusion, NIID accounts for 19.9% (32/161) of patients with adult-onset genetically undiagnosed nonvascular leukoencephalopathies in Taiwan. Half of the NIID patients ever developed encephalitis-like episodes with restricted diffusion in the cortical regions at the acute stage DWI. Corticomedullary junction hyperintense lesions, WMH in paravermis or middle cerebellar peduncles, bladder dysfunction and visual disturbance are useful hints to diagnose NIID.
Collapse
Affiliation(s)
- Yi-Hong Liu
- Department of Neurology, Taipei Veterans General Hospital, Taipei 11217, Taiwan
| | - Ying-Tsen Chou
- Department of Neurology, Taipei Veterans General Hospital, Taipei 11217, Taiwan
| | - Fu-Pang Chang
- Department of Pathology and Laboratory Medicine, Taipei Veterans General Hospital, Taipei 11217, Taiwan.,Institute of Clinical Medicine, National Yang Ming Chiao Tung University, Taipei 11221, Taiwan
| | - Wei-Ju Lee
- Neurological Institute, Taichung Veterans General Hospital, Taichung 40705, Taiwan.,Faculty of Medicine, School of Medicine, National Yang Ming Chiao Tung University, Taipei 11221, Taiwan.,College of Medicine, National Chung Hsing University, Taichung 40227, Taiwan
| | - Yuh-Cherng Guo
- Department of Neurology, China Medical University Hospital, Taichung 404332, Taiwan.,School of Medicine, College of Medicine, China Medical University, Taichung 404333, Taiwan
| | - Cheng-Ta Chou
- Neurological Institute, Taichung Veterans General Hospital, Taichung 40705, Taiwan.,Rong Hsing Research Center for Translational Medicine, National Chung Hsing University, Taichung 40227, Taiwan
| | - Hui-Chun Huang
- Department of Neurology, China Medical University Hospital, Taichung 404332, Taiwan.,School of Medicine, College of Medicine, China Medical University, Taichung 404333, Taiwan
| | - Takeshi Mizuguchi
- Yokohama City University Graduate School of Medicine, Yokohama 236-0004, Japan
| | - Chien-Chen Chou
- Department of Neurology, Taipei Veterans General Hospital, Taipei 11217, Taiwan.,Faculty of Medicine, School of Medicine, National Yang Ming Chiao Tung University, Taipei 11221, Taiwan.,Brain Research Center, National Yang Ming Chiao Tung University, Taipei 11221, Taiwan
| | - Hsiang-Yu Yu
- Department of Neurology, Taipei Veterans General Hospital, Taipei 11217, Taiwan.,Faculty of Medicine, School of Medicine, National Yang Ming Chiao Tung University, Taipei 11221, Taiwan.,Brain Research Center, National Yang Ming Chiao Tung University, Taipei 11221, Taiwan
| | - Kai-Wei Yu
- Faculty of Medicine, School of Medicine, National Yang Ming Chiao Tung University, Taipei 11221, Taiwan.,Brain Research Center, National Yang Ming Chiao Tung University, Taipei 11221, Taiwan.,Department of Radiology, Taipei Veterans General Hospital, Taipei 11217, Taiwan
| | - Hsiu-Mei Wu
- Faculty of Medicine, School of Medicine, National Yang Ming Chiao Tung University, Taipei 11221, Taiwan.,Brain Research Center, National Yang Ming Chiao Tung University, Taipei 11221, Taiwan.,Department of Radiology, Taipei Veterans General Hospital, Taipei 11217, Taiwan
| | - Pei-Chien Tsai
- Department of Life Sciences, National Chung Hsing University, Taichung 40227, Taiwan
| | - Naomichi Matsumoto
- Yokohama City University Graduate School of Medicine, Yokohama 236-0004, Japan
| | - Yi-Chung Lee
- Department of Neurology, Taipei Veterans General Hospital, Taipei 11217, Taiwan.,Faculty of Medicine, School of Medicine, National Yang Ming Chiao Tung University, Taipei 11221, Taiwan.,Brain Research Center, National Yang Ming Chiao Tung University, Taipei 11221, Taiwan
| | - Yi-Chu Liao
- Department of Neurology, Taipei Veterans General Hospital, Taipei 11217, Taiwan.,Faculty of Medicine, School of Medicine, National Yang Ming Chiao Tung University, Taipei 11221, Taiwan.,Brain Research Center, National Yang Ming Chiao Tung University, Taipei 11221, Taiwan
| |
Collapse
|
41
|
Park H, Yamanaka T, Toyama Y, Fujita A, Doi H, Nirasawa T, Murayama S, Matsumoto N, Shimogori T, Ikegawa M, Haltia MJ, Nukina N. Hornerin deposits in neuronal intranuclear inclusion disease: direct identification of proteins with compositionally biased regions in inclusions. Acta Neuropathol Commun 2022; 10:28. [PMID: 35246273 PMCID: PMC8895595 DOI: 10.1186/s40478-022-01333-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 02/16/2022] [Indexed: 11/10/2022] Open
Abstract
Neuronal intranuclear inclusion disease (NIID) is a neurodegenerative disorder, characterized by the presence of eosinophilic inclusions (NIIs) within nuclei of central and peripheral nervous system cells. This study aims to identify the components of NIIs, which have been difficult to analyze directly due to their insolubility. In order to establish a method to directly identify the components of NIIs, we first analyzed the huntingtin inclusion-rich fraction obtained from the brains of Huntington disease model mice. Although the sequence with expanded polyglutamine could not be identified by liquid-chromatography mass spectrometry, amino acid analysis revealed that glutamine of the huntingtin inclusion-rich fraction increased significantly. This is compatible with the calculated amino acid content of the transgene product. Therefore, we applied this method to analyze the NIIs of diseased human brains, which may have proteins with compositionally biased regions, and identified a serine-rich protein called hornerin. Since the analyzed NII-rich fraction was also serine-rich, we suggested hornerin as a major component of the NIIs. A specific distribution of hornerin in NIID was also investigated by Matrix-assisted laser desorption/ionization imaging mass spectrometry and immunofluorescence. Finally, we confirmed a variant of hornerin by whole-exome sequencing and DNA sequencing. This study suggests that hornerin may be related to the pathological process of this NIID, and the direct analysis of NIIs, especially by amino acid analysis using the NII-rich fractions, would contribute to a deeper understanding of the disease pathogenesis.
Collapse
Affiliation(s)
- Hongsun Park
- Laboratory of Structural Neuropathology, Doshisha University Graduate School of Brain Science, 1-3 Miyakodanitatara, Kyotanabe-shi, Kyoto, 610-0394, Japan
| | - Tomoyuki Yamanaka
- Laboratory of Structural Neuropathology, Doshisha University Graduate School of Brain Science, 1-3 Miyakodanitatara, Kyotanabe-shi, Kyoto, 610-0394, Japan
- Department of Neuroscience of Disease, Brain Research Institute, Niigata University, Niigata, Japan
| | - Yumiko Toyama
- Department of Life and Medical Systems, Doshisha University, Kyoto, Japan
| | - Atsushi Fujita
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Hiroshi Doi
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | | | - Shigeo Murayama
- The Brain Bank for Aging Research, Tokyo Metropolitan Geriatric Hospital and Institute of Gerontology, Tokyo, Japan
| | - Naomichi Matsumoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Tomomi Shimogori
- Molecular Mechanisms of Brain Development, RIKEN Center for Brain Science, Saitama, Japan
| | - Masaya Ikegawa
- Department of Life and Medical Systems, Doshisha University, Kyoto, Japan
| | - Matti J Haltia
- Department of Pathology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Nobuyuki Nukina
- Laboratory of Structural Neuropathology, Doshisha University Graduate School of Brain Science, 1-3 Miyakodanitatara, Kyotanabe-shi, Kyoto, 610-0394, Japan.
- Laboratory for Structural Neuropathology, RIKEN Brain Science Institute, Saitama, Japan.
| |
Collapse
|
42
|
Nanodevices for Biological and Medical Applications: Development of Single-Molecule Electrical Measurement Method. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12031539] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
A comprehensive detection of a wide variety of diagnostic markers is required for the realization of personalized medicine. As a sensor to realize such personalized medicine, a single molecule electrical measurement method using nanodevices is currently attracting interest for its comprehensive simultaneous detection of various target markers for use in biological and medical application. Single-molecule electrical measurement using nanodevices, such as nanopore, nanogap, or nanopipette devices, has the following features:; high sensitivity, low-cost, high-throughput detection, easy-portability, low-cost availability by mass production technologies, and the possibility of integration of various functions and multiple sensors. In this review, I focus on the medical applications of single- molecule electrical measurement using nanodevices. This review provides information on the current status and future prospects of nanodevice-based single-molecule electrical measurement technology, which is making a full-scale contribution to realizing personalized medicine in the future. Future prospects include some discussion on of the current issues on the expansion of the application requirements for single-mole-cule measurement.
Collapse
|
43
|
Chen J, Li F, Wang M, Li J, Marquez-Lago TT, Leier A, Revote J, Li S, Liu Q, Song J. BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing Data. Front Big Data 2022; 4:727216. [PMID: 35118375 PMCID: PMC8805145 DOI: 10.3389/fdata.2021.727216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 12/13/2021] [Indexed: 11/22/2022] Open
Abstract
Background Simple Sequence Repeats (SSRs) are short tandem repeats of nucleotide sequences. It has been shown that SSRs are associated with human diseases and are of medical relevance. Accordingly, a variety of computational methods have been proposed to mine SSRs from genomes. Conventional methods rely on a high-quality complete genome to identify SSRs. However, the sequenced genome often misses several highly repetitive regions. Moreover, many non-model species have no entire genomes. With the recent advances of next-generation sequencing (NGS) techniques, large-scale sequence reads for any species can be rapidly generated using NGS. In this context, a number of methods have been proposed to identify thousands of SSR loci within large amounts of reads for non-model species. While the most commonly used NGS platforms (e.g., Illumina platform) on the market generally provide short paired-end reads, merging overlapping paired-end reads has become a common way prior to the identification of SSR loci. This has posed a big data analysis challenge for traditional stand-alone tools to merge short read pairs and identify SSRs from large-scale data. Results In this study, we present a new Hadoop-based software program, termed BigFiRSt, to address this problem using cutting-edge big data technology. BigFiRSt consists of two major modules, BigFLASH and BigPERF, implemented based on two state-of-the-art stand-alone tools, FLASH and PERF, respectively. BigFLASH and BigPERF address the problem of merging short read pairs and mining SSRs in the big data manner, respectively. Comprehensive benchmarking experiments show that BigFiRSt can dramatically reduce the execution times of fast read pairs merging and SSRs mining from very large-scale DNA sequence data. Conclusions The excellent performance of BigFiRSt mainly resorts to the Big Data Hadoop technology to merge read pairs and mine SSRs in parallel and distributed computing on clusters. We anticipate BigFiRSt will be a valuable tool in the coming biological Big Data era.
Collapse
Affiliation(s)
- Jinxiang Chen
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling, China
| | - Fuyi Li
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC, Australia
- Monash Centre for Data Science, Monash University, Melbourne, VIC, Australia
- Department of Microbiology and Immunity, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC, Australia
| | - Miao Wang
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling, China
| | - Junlong Li
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling, China
| | - Tatiana T. Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Jerico Revote
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC, Australia
| | - Shuqin Li
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling, China
| | - Quanzhong Liu
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling, China
- Quanzhong Liu
| | - Jiangning Song
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC, Australia
- Monash Centre for Data Science, Monash University, Melbourne, VIC, Australia
- *Correspondence: Jiangning Song
| |
Collapse
|
44
|
Gall-Duncan T, Sato N, Yuen RKC, Pearson CE. Advancing genomic technologies and clinical awareness accelerates discovery of disease-associated tandem repeat sequences. Genome Res 2022; 32:1-27. [PMID: 34965938 PMCID: PMC8744678 DOI: 10.1101/gr.269530.120] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 11/29/2021] [Indexed: 11/25/2022]
Abstract
Expansions of gene-specific DNA tandem repeats (TRs), first described in 1991 as a disease-causing mutation in humans, are now known to cause >60 phenotypes, not just disease, and not only in humans. TRs are a common form of genetic variation with biological consequences, observed, so far, in humans, dogs, plants, oysters, and yeast. Repeat diseases show atypical clinical features, genetic anticipation, and multiple and partially penetrant phenotypes among family members. Discovery of disease-causing repeat expansion loci accelerated through technological advances in DNA sequencing and computational analyses. Between 2019 and 2021, 17 new disease-causing TR expansions were reported, totaling 63 TR loci (>69 diseases), with a likelihood of more discoveries, and in more organisms. Recent and historical lessons reveal that properly assessed clinical presentations, coupled with genetic and biological awareness, can guide discovery of disease-causing unstable TRs. We highlight critical but underrecognized aspects of TR mutations. Repeat motifs may not be present in current reference genomes but will be in forthcoming gapless long-read references. Repeat motif size can be a single nucleotide to kilobases/unit. At a given locus, repeat motif sequence purity can vary with consequence. Pathogenic repeats can be "insertions" within nonpathogenic TRs. Expansions, contractions, and somatic length variations of TRs can have clinical/biological consequences. TR instabilities occur in humans and other organisms. TRs can be epigenetically modified and/or chromosomal fragile sites. We discuss the expanding field of disease-associated TR instabilities, highlighting prospects, clinical and genetic clues, tools, and challenges for further discoveries of disease-causing TR instabilities and understanding their biological and pathological impacts-a vista that is about to expand.
Collapse
Affiliation(s)
- Terence Gall-Duncan
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Nozomu Sato
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
| | - Ryan K C Yuen
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Christopher E Pearson
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| |
Collapse
|
45
|
Melas M, Kautto EA, Franklin SJ, Mori M, McBride KL, Mosher TM, Pfau RB, Hernandez-Gonzalez ME, McGrath SD, Magrini VJ, White P, Samora JB, Koboldt DC, Wilson RK. Long-read whole genome sequencing reveals HOXD13 alterations in synpolydactyly. Hum Mutat 2021; 43:189-199. [PMID: 34859533 DOI: 10.1002/humu.24304] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 09/24/2021] [Accepted: 11/20/2021] [Indexed: 12/11/2022]
Abstract
Synpolydactyly 1, also called syndactyly type II (SDTY2), is a genetic limb malformation characterized by polydactyly with syndactyly involving the webbing of the third and fourth fingers, and the fourth and fifth toes. It is caused by heterozygous alterations in HOXD13 with incomplete penetrance and phenotypic variability. In our study, a five-generation family with an SPD phenotype was enrolled in our Rare Disease Genomics Protocol. A comprehensive examination of three generations using Illumina short-read whole-genome sequencing (WGS) did not identify any causative variants. Subsequent WGS using Pacific Biosciences (PacBio) long-read HiFi Circular Consensus Sequencing (CCS) revealed a heterozygous 27-bp duplication in the polyalanine tract of HOXD13. Sanger sequencing of all available family members confirmed that the variant segregates with affected individuals. Reanalysis of an unrelated family with a similar SPD phenotype uncovered a 21-bp (7-alanine) duplication in the same region of HOXD13. Although ExpansionHunter identified these events in most individuals in a retrospective analysis, low sequence coverage due to high GC content in the HOXD13 polyalanine tract makes detection of these events challenging. Our findings highlight the value of long-read WGS in elucidating the molecular etiology of congenital limb malformation disorders.
Collapse
Affiliation(s)
- Marilena Melas
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, Ohio, USA
| | - Esko A Kautto
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, Ohio, USA
| | - Samuel J Franklin
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, Ohio, USA
| | - Mari Mori
- Division of Genetic and Genomic Medicine, Nationwide Children's Hospital, Columbus, Ohio, USA.,Department of Pediatrics, The Ohio State University, Columbus, Ohio, USA
| | - Kim L McBride
- Division of Genetic and Genomic Medicine, Nationwide Children's Hospital, Columbus, Ohio, USA.,Department of Pediatrics, The Ohio State University, Columbus, Ohio, USA.,Center for Cardiovascular Research, The Research Institute at Nationwide Children's Hospital, Columbus, Ohio, USA
| | - Theresa Mihalic Mosher
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, Ohio, USA.,Division of Genetic and Genomic Medicine, Nationwide Children's Hospital, Columbus, Ohio, USA.,Department of Pediatrics, The Ohio State University, Columbus, Ohio, USA
| | - Ruthann B Pfau
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, Ohio, USA.,Department of Pediatrics, The Ohio State University, Columbus, Ohio, USA.,Department of Pathology, The Ohio State University, Columbus, Ohio, USA
| | | | - Sean D McGrath
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, Ohio, USA
| | - Vincent J Magrini
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, Ohio, USA.,Department of Pediatrics, The Ohio State University, Columbus, Ohio, USA
| | - Peter White
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, Ohio, USA.,Department of Pediatrics, The Ohio State University, Columbus, Ohio, USA
| | - Julie Balch Samora
- Department of Orthopedic Surgery, Nationwide Children's Hospital, Columbus, Ohio, USA
| | - Daniel C Koboldt
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, Ohio, USA.,Department of Pediatrics, The Ohio State University, Columbus, Ohio, USA
| | - Richard K Wilson
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, Ohio, USA.,Department of Pediatrics, The Ohio State University, Columbus, Ohio, USA
| |
Collapse
|
46
|
Schröder C, Horsthemke B, Depienne C. GC-rich repeat expansions: associated disorders and mechanisms. MED GENET-BERLIN 2021; 33:325-335. [PMID: 38835438 PMCID: PMC11006399 DOI: 10.1515/medgen-2021-2099] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 11/12/2021] [Indexed: 06/06/2024]
Abstract
Noncoding repeat expansions are a well-known cause of genetic disorders mainly affecting the central nervous system. Missed by most standard technologies used in routine diagnosis, pathogenic noncoding repeat expansions have to be searched for using specific techniques such as repeat-primed PCR or specific bioinformatics tools applied to genome data, such as ExpansionHunter. In this review, we focus on GC-rich repeat expansions, which represent at least one third of all noncoding repeat expansions described so far. GC-rich expansions are mainly located in regulatory regions (promoter, 5' untranslated region, first intron) of genes and can lead to either a toxic gain-of-function mediated by RNA toxicity and/or repeat-associated non-AUG (RAN) translation, or a loss-of-function of the associated gene, depending on their size and their methylation status. We herein review the clinical and molecular characteristics of disorders associated with these difficult-to-detect expansions.
Collapse
Affiliation(s)
- Christopher Schröder
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Bernhard Horsthemke
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Christel Depienne
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| |
Collapse
|
47
|
Abramzon Y, Dewan R, Cortese A, Resnick S, Ferrucci L, Houlden H, Traynor BJ. Investigating RFC1 expansions in sporadic amyotrophic lateral sclerosis. J Neurol Sci 2021; 430:118061. [PMID: 34537679 PMCID: PMC9014296 DOI: 10.1016/j.jns.2021.118061] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 08/23/2021] [Accepted: 08/25/2021] [Indexed: 10/20/2022]
Abstract
A homozygous AAGGG repeat expansion within the RFC1 gene was recently described as a common cause of CANVAS syndrome. We examined 1069 sporadic ALS patients for the presence of this repeat expansion. We did not discover any carriers of the homozygous AAGGG expansion in our ALS cohort, indicating that this form of RFC1 repeat expansions is not a common cause of sporadic ALS. However, our study did identify a novel repeat conformation and further expanded on the highly polymorphic nature of the RFC1 locus.
Collapse
Affiliation(s)
- Yevgenya Abramzon
- Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, National Institute on Aging, NIH, Bethesda, MD 20892, USA; Reta Lila Weston Institute, UCL Queen Square Institute of Neurology, University College London, London WC1N 1PJ, UK.
| | - Ramita Dewan
- Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, National Institute on Aging, NIH, Bethesda, MD 20892, USA
| | - Andrea Cortese
- Department of Neuromuscular Disease, UCL Queen Square Institute of Neurology, London, UK; Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
| | - Susan Resnick
- Laboratory of Behavioral Neuroscience, National Institute on Aging, NIH, Baltimore, MD 21224, USA
| | - Luigi Ferrucci
- Translational Gerontology Branch, National Institute on Aging, NIH, Baltimore, MD 21224, USA
| | - Henry Houlden
- Department of Neuromuscular Disease, UCL Queen Square Institute of Neurology, London, UK
| | - Bryan J Traynor
- Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, National Institute on Aging, NIH, Bethesda, MD 20892, USA; Reta Lila Weston Institute, UCL Queen Square Institute of Neurology, University College London, London WC1N 1PJ, UK; Neurology Department, Johns Hopkins University, Baltimore, MD 21205, USA
| |
Collapse
|
48
|
Fukuda H, Yamaguchi D, Nyquist K, Yabuki Y, Miyatake S, Uchiyama Y, Hamanaka K, Saida K, Koshimizu E, Tsuchida N, Fujita A, Mitsuhashi S, Ohbo K, Satake Y, Sone J, Doi H, Morihara K, Okamoto T, Takahashi Y, Wenger AM, Shioda N, Tanaka F, Matsumoto N, Mizuguchi T. Father-to-offspring transmission of extremely long NOTCH2NLC repeat expansions with contractions: genetic and epigenetic profiling with long-read sequencing. Clin Epigenetics 2021; 13:204. [PMID: 34774111 PMCID: PMC8590777 DOI: 10.1186/s13148-021-01192-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Accepted: 10/27/2021] [Indexed: 12/11/2022] Open
Abstract
Background GGC repeat expansions in NOTCH2NLC are associated with neuronal intranuclear inclusion disease. Very recently, asymptomatic carriers with NOTCH2NLC repeat expansions were reported. In these asymptomatic individuals, the CpG island in NOTCH2NLC is hypermethylated, suggesting that two factors repeat length and DNA methylation status should be considered to evaluate pathogenicity. Long-read sequencing can be used to simultaneously profile genomic and epigenomic alterations. We analyzed four sporadic cases with NOTCH2NLC repeat expansion and their phenotypically normal parents. The native genomic DNA that retains base modification was sequenced on a per-trio basis using both PacBio and Oxford Nanopore long-read sequencing technologies. A custom workflow was developed to evaluate DNA modifications. With these two technologies combined, long-range DNA methylation information was integrated with complete repeat DNA sequences to investigate the genetic origins of expanded GGC repeats in these sporadic cases. Results In all four families, asymptomatic fathers had longer expansions (median: 522, 390, 528 and 650 repeats) compared with their affected offspring (median: 93, 117, 162 and 140 repeats, respectively). These expansions are much longer than the disease-causing range previously reported (in general, 41–300 repeats). Repeat lengths were extremely variable in the father, suggesting somatic mosaicism. Instability is more frequent in alleles with uninterrupted pure GGCs. Single molecule epigenetic analysis revealed complex DNA methylation patterns and epigenetic heterogeneity. We identified an aberrant gain-of-methylation region (2.2 kb in size beyond the CpG island and GGC repeats) in asymptomatic fathers. This methylated region was unmethylated in the normal allele with bilateral transitional zones with both methylated and unmethylated CpG dinucleotides, which may be protected from methylation to ensure NOTCH2NLC expression. Conclusions We clearly demonstrate that the four sporadic NOTCH2NLC-related cases are derived from the paternal GGC repeat contraction associated with demethylation. The entire genetic and epigenetic landscape of the NOTCH2NLC region was uncovered using the custom workflow of long-read sequence data, demonstrating the utility of this method for revealing epigenetic/mutational changes in repetitive elements, which are difficult to characterize by conventional short-read/bisulfite sequencing methods. Our approach should be useful for biomedical research, aiding the discovery of DNA methylation abnormalities through the entire genome. Supplementary Information The online version contains supplementary material available at 10.1186/s13148-021-01192-5.
Collapse
Affiliation(s)
- Hiromi Fukuda
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan.,Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | | | | | - Yasushi Yabuki
- Department of Genomic Neurology, Institute of Molecular Embryology and Genetics (IMEG), Kumamoto University, Kumamoto, Japan.,Graduate School of Pharmaceutical Sciences, Kumamoto University, Kumamoto, Japan
| | - Satoko Miyatake
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan.,Clinical Genetics Department, Yokohama City University Hospital, Yokohama, Japan
| | - Yuri Uchiyama
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan.,Department of Rare Disease Genomics, Yokohama City University Hospital, Yokohama, Japan
| | - Kohei Hamanaka
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Ken Saida
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Eriko Koshimizu
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Naomi Tsuchida
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan.,Department of Rare Disease Genomics, Yokohama City University Hospital, Yokohama, Japan
| | - Atsushi Fujita
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Satomi Mitsuhashi
- Department of Genomic Function and Diversity, Medical Research Institute Tokyo Medical and Dental University, Tokyo, Japan
| | - Kazuyuki Ohbo
- Department of Histology and Cell Biology, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Yuki Satake
- Department of Neurology, Yokkaichi Municipal Hospital, Yokkaichi, Japan
| | - Jun Sone
- Department of Neuropathology, Institute for Medical Science of Aging, Aichi Medical University, Nagakute, Japan.,Department of Neurology, National Hospital Organization Suzuka National Hospital, Suzuka, Japan
| | - Hiroshi Doi
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Keisuke Morihara
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Tomoko Okamoto
- Department of Neurology, National Center Hospital, National Center of Neurology and Psychiatry, Tokyo, Japan
| | - Yuji Takahashi
- Department of Neurology, National Center Hospital, National Center of Neurology and Psychiatry, Tokyo, Japan
| | | | - Norifumi Shioda
- Department of Genomic Neurology, Institute of Molecular Embryology and Genetics (IMEG), Kumamoto University, Kumamoto, Japan.,Graduate School of Pharmaceutical Sciences, Kumamoto University, Kumamoto, Japan
| | - Fumiaki Tanaka
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Naomichi Matsumoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan.
| | - Takeshi Mizuguchi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan.
| |
Collapse
|
49
|
Sacristán-Horcajada E, González-de la Fuente S, Peiró-Pastor R, Carrasco-Ramiro F, Amils R, Requena JM, Berenguer J, Aguado B. ARAMIS: From systematic errors of NGS long reads to accurate assemblies. Brief Bioinform 2021; 22:bbab170. [PMID: 34013348 PMCID: PMC8574707 DOI: 10.1093/bib/bbab170] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 03/31/2021] [Accepted: 04/11/2021] [Indexed: 01/23/2023] Open
Abstract
NGS long-reads sequencing technologies (or third generation) such as Pacific BioSciences (PacBio) have revolutionized the sequencing field over the last decade improving multiple genomic applications like de novo genome assemblies. However, their error rate, mostly involving insertions and deletions (indels), is currently an important concern that requires special attention to be solved. Multiple algorithms are available to fix these sequencing errors using short reads (such as Illumina), although they require long processing times and some errors may persist. Here, we present Accurate long-Reads Assembly correction Method for Indel errorS (ARAMIS), the first NGS long-reads indels correction pipeline that combines several correction software in just one step using accurate short reads. As a proof OF concept, six organisms were selected based on their different GC content, size and genome complexity, and their PacBio-assembled genomes were corrected thoroughly by this pipeline. We found that the presence of systematic sequencing errors in long-reads PacBio sequences affecting homopolymeric regions, and that the type of indel error introduced during PacBio sequencing are related to the GC content of the organism. The lack of knowledge of this fact leads to the existence of numerous published studies where such errors have been found and should be resolved since they may contain incorrect biological information. ARAMIS yields better results with less computational resources needed than other correction tools and gives the possibility of detecting the nature of the found indel errors found and its distribution along the genome. The source code of ARAMIS is available at https://github.com/genomics-ngsCBMSO/ARAMIS.git.
Collapse
Affiliation(s)
| | | | - R Peiró-Pastor
- Centro de Biología Molecular Severo Ochoa (CBMSO) (CSIC-UAM), Madrid, Spain
| | - F Carrasco-Ramiro
- Centro de Biología Molecular Severo Ochoa (CBMSO) (CSIC-UAM), Madrid, Spain
| | - R Amils
- Centro de Biología Molecular Severo Ochoa (CBMSO) (CSIC-UAM), Madrid, Spain
| | - J M Requena
- Centro de Biología Molecular Severo Ochoa (CBMSO) (CSIC-UAM), Madrid, Spain
| | - J Berenguer
- Centro de Biología Molecular Severo Ochoa (CBMSO) (CSIC-UAM), Madrid, Spain
| | - B Aguado
- Centro de Biología Molecular Severo Ochoa (CBMSO) (CSIC-UAM), Madrid, Spain
| |
Collapse
|
50
|
A rapidly reversible mutation generates subclonal genetic diversity and unstable drug resistance. Proc Natl Acad Sci U S A 2021; 118:2019060118. [PMID: 34675074 PMCID: PMC8639346 DOI: 10.1073/pnas.2019060118] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/11/2021] [Indexed: 11/18/2022] Open
Abstract
Most genetic changes have negligible reversion rates. As most mutations that confer resistance to an adverse condition (e.g., drug treatment) also confer a growth defect in its absence, it is challenging for cells to genetically adapt to transient environmental changes. Here, we identify a set of rapidly reversible drug-resistance mutations in Schizosaccharomyces pombe that are caused by microhomology-mediated tandem duplication (MTD) and reversion back to the wild-type sequence. Using 10,000× coverage whole-genome sequencing, we identify nearly 6,000 subclonal MTDs in a single clonal population and determine, using machine learning, how MTD frequency is encoded in the genome. We find that sequences with the highest-predicted MTD rates tend to generate insertions that maintain the correct reading frame, suggesting that MTD formation has shaped the evolution of coding sequences. Our study reveals a common mechanism of reversible genetic variation that is beneficial for adaptation to environmental fluctuations and facilitates evolutionary divergence.
Collapse
|