51
|
Abstract
Long-read sequencing technologies have now reached a level of accuracy and yield that allows their application to variant detection at a scale of tens to thousands of samples. Concomitant with the development of new computational tools, the first population-scale studies involving long-read sequencing have emerged over the past 2 years and, given the continuous advancement of the field, many more are likely to follow. In this Review, we survey recent developments in population-scale long-read sequencing, highlight potential challenges of a scaled-up approach and provide guidance regarding experimental design. We provide an overview of current long-read sequencing platforms, variant calling methodologies and approaches for de novo assemblies and reference-based mapping approaches. Furthermore, we summarize strategies for variant validation, genotyping and predicting functional impact and emphasize challenges remaining in achieving long-read sequencing at a population scale.
Collapse
Affiliation(s)
- Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
- Applied and Translational Neurogenomics Group, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
52
|
Chiu R, Rajan-Babu IS, Friedman JM, Birol I. Straglr: discovering and genotyping tandem repeat expansions using whole genome long-read sequences. Genome Biol 2021; 22:224. [PMID: 34389037 PMCID: PMC8361843 DOI: 10.1186/s13059-021-02447-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 07/26/2021] [Indexed: 12/11/2022] Open
Abstract
Tandem repeat (TR) expansion is the underlying cause of over 40 neurological disorders. Long-read sequencing offers an exciting avenue over conventional technologies for detecting TR expansions. Here, we present Straglr, a robust software tool for both targeted genotyping and novel expansion detection from long-read alignments. We benchmark Straglr using various simulations, targeted genotyping data of cell lines carrying expansions of known diseases, and whole genome sequencing data with chromosome-scale assembly. Our results suggest that Straglr may be useful for investigating disease-associated TR expansions using long-read sequencing.
Collapse
Affiliation(s)
- Readman Chiu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Indhu-Shree Rajan-Babu
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
- BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical and Molecular Genetics, King's College London, Strand, London, WC2R 2LS, UK
| | - Jan M Friedman
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
- BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada.
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada.
| |
Collapse
|
53
|
Sutton JM, Millwood JD, Case McCormack A, Fierst JL. Optimizing experimental design for genome sequencing and assembly with Oxford Nanopore Technologies. GIGABYTE 2021; 2021:gigabyte27. [PMID: 36824342 PMCID: PMC9650304 DOI: 10.46471/gigabyte.27] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 07/05/2021] [Indexed: 11/09/2022] Open
Abstract
High quality reference genome sequences are the core of modern genomics. Oxford Nanopore Technologies (ONT) produces inexpensive DNA sequences, but has high error rates, which make sequence assembly and analysis difficult as genome size and complexity increases. Robust experimental design is necessary for ONT genome sequencing and assembly, but few studies have addressed eukaryotic organisms. Here, we present novel results using simulated and empirical ONT and DNA libraries to identify best practices for sequencing and assembly for several model species. We find that the unique error structure of ONT libraries causes errors to accumulate and assembly statistics plateau as sequence depth increases. High-quality assembled eukaryotic sequences require high-molecular-weight DNA extractions that increase sequence read length, and computational protocols that reduce error through pre-assembly correction and read selection. Our quantitative results will be helpful for researchers seeking guidance for de novo assembly projects.
Collapse
Affiliation(s)
- John M. Sutton
- Department of Biological Sciences, University of Alabama, Tuscaloosa, AL 35487-0344, USA
| | - Joshua D. Millwood
- Department of Biological Sciences, University of Alabama, Tuscaloosa, AL 35487-0344, USA
| | - A. Case McCormack
- Department of Biological Sciences, University of Alabama, Tuscaloosa, AL 35487-0344, USA
| | - Janna L. Fierst
- Department of Biological Sciences, University of Alabama, Tuscaloosa, AL 35487-0344, USA
| |
Collapse
|
54
|
Simultaneous Screening of the FRAXA and FRAXE Loci for Rapid Detection of FMR1 CGG and/or AFF2 CCG Repeat Expansions by Triplet-Primed PCR. J Mol Diagn 2021; 23:941-951. [PMID: 34111553 DOI: 10.1016/j.jmoldx.2021.04.015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 03/29/2021] [Accepted: 04/29/2021] [Indexed: 11/22/2022] Open
Abstract
Moderate to hyper-expansion of trinucleotide repeats at the FRAXA and FRAXE fragile sites, with or without concurrent hypermethylation, has been associated with intellectual disability and other conditions. Unlike molecular diagnosis of FMR1 CGG repeat expansions in FRAXA, current detection of AFF2 CCG repeat expansions in FRAXE relies on low-throughput and otherwise inefficient techniques combining Southern blot analysis and PCR. A novel triplet-primed PCR assay was developed for simultaneous screening for trinucleotide repeat expansions at the FRAXA and FRAXE fragile sites, and was validated using archived clinical samples of known FMR1 and AFF2 genotypes. Population samples and FRAXE-affected samples were sequenced for the evaluation of variations in the AFF2 CCG repeat structure. The duplex assay accurately identified expansions at the FMR1 and AFF2 trinucleotide repeat loci. On Sanger sequencing of the AFF2 CCG repeat, the single-nucleotide polymorphism variant rs868914124(C) that effectively adds two CCG repeats at the 5'-end, was enriched in the Malay population and with short repeats (<11 CCGs), and was present in all six expanded AFF2 alleles of this study. All expanded AFF2 alleles contained multiple non-CCG interruptions toward the 5'-end of the repeat. A sensitive, robust, and rapid assay has been developed for the simultaneous detection of expansion mutations at the FMR1 and AFF2 trinucleotide repeat loci, simplifying screening for FRAXA- and FRAXE-associated disorders.
Collapse
|
55
|
Chintalaphani SR, Pineda SS, Deveson IW, Kumar KR. An update on the neurological short tandem repeat expansion disorders and the emergence of long-read sequencing diagnostics. Acta Neuropathol Commun 2021; 9:98. [PMID: 34034831 PMCID: PMC8145836 DOI: 10.1186/s40478-021-01201-x] [Citation(s) in RCA: 80] [Impact Index Per Article: 26.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 05/17/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Short tandem repeat (STR) expansion disorders are an important cause of human neurological disease. They have an established role in more than 40 different phenotypes including the myotonic dystrophies, Fragile X syndrome, Huntington's disease, the hereditary cerebellar ataxias, amyotrophic lateral sclerosis and frontotemporal dementia. MAIN BODY STR expansions are difficult to detect and may explain unsolved diseases, as highlighted by recent findings including: the discovery of a biallelic intronic 'AAGGG' repeat in RFC1 as the cause of cerebellar ataxia, neuropathy, and vestibular areflexia syndrome (CANVAS); and the finding of 'CGG' repeat expansions in NOTCH2NLC as the cause of neuronal intranuclear inclusion disease and a range of clinical phenotypes. However, established laboratory techniques for diagnosis of repeat expansions (repeat-primed PCR and Southern blot) are cumbersome, low-throughput and poorly suited to parallel analysis of multiple gene regions. While next generation sequencing (NGS) has been increasingly used, established short-read NGS platforms (e.g., Illumina) are unable to genotype large and/or complex repeat expansions. Long-read sequencing platforms recently developed by Oxford Nanopore Technology and Pacific Biosciences promise to overcome these limitations to deliver enhanced diagnosis of repeat expansion disorders in a rapid and cost-effective fashion. CONCLUSION We anticipate that long-read sequencing will rapidly transform the detection of short tandem repeat expansion disorders for both clinical diagnosis and gene discovery.
Collapse
Affiliation(s)
- Sanjog R. Chintalaphani
- School of Medicine, University of New South Wales, Sydney, 2052 Australia
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, NSW 2010 Australia
| | - Sandy S. Pineda
- Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, Darlinghurst, NSW 2010 Australia
- Brain and Mind Centre, University of Sydney, Camperdown, NSW 2050 Australia
| | - Ira W. Deveson
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, NSW 2010 Australia
- Faculty of Medicine, St Vincent’s Clinical School, University of New South Wales, Sydney, NSW 2010 Australia
| | - Kishore R. Kumar
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, NSW 2010 Australia
- Molecular Medicine Laboratory and Neurology Department, Central Clinical School, Concord Repatriation General Hospital, University of Sydney, Concord, NSW 2137 Australia
| |
Collapse
|
56
|
Depienne C, Mandel JL. 30 years of repeat expansion disorders: What have we learned and what are the remaining challenges? Am J Hum Genet 2021; 108:764-785. [PMID: 33811808 PMCID: PMC8205997 DOI: 10.1016/j.ajhg.2021.03.011] [Citation(s) in RCA: 178] [Impact Index Per Article: 59.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 03/05/2021] [Indexed: 12/13/2022] Open
Abstract
Tandem repeats represent one of the most abundant class of variations in human genomes, which are polymorphic by nature and become highly unstable in a length-dependent manner. The expansion of repeat length across generations is a well-established process that results in human disorders mainly affecting the central nervous system. At least 50 disorders associated with expansion loci have been described to date, with half recognized only in the last ten years, as prior methodological difficulties limited their identification. These limitations still apply to the current widely used molecular diagnostic methods (exome or gene panels) and thus result in missed diagnosis detrimental to affected individuals and their families, especially for disorders that are very rare and/or clinically not recognizable. Most of these disorders have been identified through family-driven approaches and many others likely remain to be identified. The recent development of long-read technologies provides a unique opportunity to systematically investigate the contribution of tandem repeats and repeat expansions to the genetic architecture of human disorders. In this review, we summarize the current and most recent knowledge about the genetics of repeat expansion disorders and the diversity of their pathophysiological mechanisms and outline the perspectives of developing personalized treatments in the future.
Collapse
Affiliation(s)
- Christel Depienne
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany; Institut du Cerveau et de la Moelle épinière (ICM), Sorbonne Université, UMR S 1127, Inserm U1127, CNRS UMR 7225, 75013 Paris, France.
| | - Jean-Louis Mandel
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Illkirch 67400, France; Centre National de la Recherche Scientifique, UMR 7104, Illkirch 67400, France; Institut National de la Santé et de la Recherche Médicale, U 1258, Illkirch 67400, France; Université de Strasbourg, Illkirch 67400, France; USIAS University of Strasbourg Institute of Advanced study, 67000 Strasbourg, France.
| |
Collapse
|
57
|
Yu J, Luan XH, Yu M, Zhang W, Lv H, Cao L, Meng L, Zhu M, Zhou B, Wu XR, Li P, Gang Q, Liu J, Shi X, Liang W, Jia Z, Yao S, Yuan Y, Deng J, Hong D, Wang Z. GGC repeat expansions in NOTCH2NLC causing a phenotype of distal motor neuropathy and myopathy. Ann Clin Transl Neurol 2021; 8:1330-1342. [PMID: 33943039 PMCID: PMC8164861 DOI: 10.1002/acn3.51371] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 03/23/2021] [Accepted: 04/09/2021] [Indexed: 12/22/2022] Open
Abstract
Background The expansion of GGC repeat in the 5' untranslated region of the NOTCH2NLC has been associated with various neurogenerative disorders of the central nervous system and, more recently, oculopharyngodistal myopathy. This study aimed to report patients with distal weakness with both neuropathic and myopathic features on electrophysiology and pathology who present GGC repeat expansions in the NOTCH2NLC. Methods Whole‐exome sequencing (WES) and long‐read sequencing were implemented to identify the candidate genes. In addition, the available clinical data and the pathological changes associated with peripheral nerve and muscle biopsies were reviewed and studied. Results We identified and validated GGC repeat expansions of NOTCH2NLC in three unrelated patients who presented with progressive weakness predominantly affecting distal lower limb muscles, following negative results in an initial WES. We found intranuclear inclusions with multiple proteins deposits in the nuclei of both myofibers and Schwann cells. The clinical features of these patients are compatible with the diagnosis of distal motor neuropathy and rimmed vacuolar myopathy. Interpretation These phenotypes enrich the class of features associated with NOTCH2NLC‐related repeat expansion disorders (NRED), and provide further evidence that the neurological symptoms of NRED include not only brain, spinal cord, and peripheral nerves damage, but also myopathy, and that overlapping symptoms might exist.
Collapse
Affiliation(s)
- Jiaxi Yu
- Department of Neurology, Peking University First Hospital, Beijing, 100034, China.,Beijing Key Laboratory of Neurovascular Disease Discovery, Beijing, 100034, China
| | - Xing-Hua Luan
- Department of Neurology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, 200030, China
| | - Meng Yu
- Department of Neurology, Peking University First Hospital, Beijing, 100034, China.,Beijing Key Laboratory of Neurovascular Disease Discovery, Beijing, 100034, China
| | - Wei Zhang
- Department of Neurology, Peking University First Hospital, Beijing, 100034, China.,Beijing Key Laboratory of Neurovascular Disease Discovery, Beijing, 100034, China
| | - He Lv
- Department of Neurology, Peking University First Hospital, Beijing, 100034, China.,Beijing Key Laboratory of Neurovascular Disease Discovery, Beijing, 100034, China
| | - Li Cao
- Department of Neurology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, 200030, China
| | - Lingchao Meng
- Department of Neurology, Peking University First Hospital, Beijing, 100034, China.,Beijing Key Laboratory of Neurovascular Disease Discovery, Beijing, 100034, China
| | - Min Zhu
- Department of Neurology, The First Affiliated Hospital of Nanchang University, Nanchang, 330006, China
| | - Binbin Zhou
- Department of Neurology, The First Affiliated Hospital of Nanchang University, Nanchang, 330006, China
| | - Xiao-Rong Wu
- Department of Neurology, The First Affiliated Hospital of Nanchang University, Nanchang, 330006, China
| | - Pidong Li
- Grandomics Biosciences, Beijing, 100176, China
| | - Qiang Gang
- Department of Neurology, Peking University First Hospital, Beijing, 100034, China.,Beijing Key Laboratory of Neurovascular Disease Discovery, Beijing, 100034, China
| | - Jing Liu
- Department of Neurology, Peking University First Hospital, Beijing, 100034, China.,Beijing Key Laboratory of Neurovascular Disease Discovery, Beijing, 100034, China
| | - Xin Shi
- Department of Neurology, Peking University First Hospital, Beijing, 100034, China.,Beijing Key Laboratory of Neurovascular Disease Discovery, Beijing, 100034, China
| | - Wei Liang
- Department of Neurology, Peking University First Hospital, Beijing, 100034, China.,Beijing Key Laboratory of Neurovascular Disease Discovery, Beijing, 100034, China
| | - Zhirong Jia
- Department of Neurology, Peking University First Hospital, Beijing, 100034, China.,Beijing Key Laboratory of Neurovascular Disease Discovery, Beijing, 100034, China
| | - Sheng Yao
- Department of Neurology, Sixth Medical Center of PLA General Hospital, Beijing, 100853, China
| | - Yun Yuan
- Department of Neurology, Peking University First Hospital, Beijing, 100034, China.,Beijing Key Laboratory of Neurovascular Disease Discovery, Beijing, 100034, China
| | - Jianwen Deng
- Department of Neurology, Peking University First Hospital, Beijing, 100034, China.,Beijing Key Laboratory of Neurovascular Disease Discovery, Beijing, 100034, China
| | - Daojun Hong
- Department of Neurology, The First Affiliated Hospital of Nanchang University, Nanchang, 330006, China
| | - Zhaoxia Wang
- Department of Neurology, Peking University First Hospital, Beijing, 100034, China.,Beijing Key Laboratory of Neurovascular Disease Discovery, Beijing, 100034, China
| |
Collapse
|
58
|
Lopes M, Louzada S, Gama-Carvalho M, Chaves R. Genomic Tackling of Human Satellite DNA: Breaking Barriers through Time. Int J Mol Sci 2021; 22:4707. [PMID: 33946766 PMCID: PMC8125562 DOI: 10.3390/ijms22094707] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 04/24/2021] [Accepted: 04/27/2021] [Indexed: 12/12/2022] Open
Abstract
(Peri)centromeric repetitive sequences and, more specifically, satellite DNA (satDNA) sequences, constitute a major human genomic component. SatDNA sequences can vary on a large number of features, including nucleotide composition, complexity, and abundance. Several satDNA families have been identified and characterized in the human genome through time, albeit at different speeds. Human satDNA families present a high degree of sub-variability, leading to the definition of various subfamilies with different organization and clustered localization. Evolution of satDNA analysis has enabled the progressive characterization of satDNA features. Despite recent advances in the sequencing of centromeric arrays, comprehensive genomic studies to assess their variability are still required to provide accurate and proportional representation of satDNA (peri)centromeric/acrocentric short arm sequences. Approaches combining multiple techniques have been successfully applied and seem to be the path to follow for generating integrated knowledge in the promising field of human satDNA biology.
Collapse
Affiliation(s)
- Mariana Lopes
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (M.L.); (S.L.)
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| | - Sandra Louzada
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (M.L.); (S.L.)
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| | - Margarida Gama-Carvalho
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| | - Raquel Chaves
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (M.L.); (S.L.)
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| |
Collapse
|
59
|
Wallace AD, Sasani TA, Swanier J, Gates BL, Greenland J, Pedersen BS, Varley KE, Quinlan AR. CaBagE: A Cas9-based Background Elimination strategy for targeted, long-read DNA sequencing. PLoS One 2021; 16:e0241253. [PMID: 33830997 PMCID: PMC8031414 DOI: 10.1371/journal.pone.0241253] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 01/19/2021] [Indexed: 11/29/2022] Open
Abstract
A substantial fraction of the human genome is difficult to interrogate with short-read DNA sequencing technologies due to paralogy, complex haplotype structures, or tandem repeats. Long-read sequencing technologies, such as Oxford Nanopore's MinION, enable direct measurement of complex loci without introducing many of the biases inherent to short-read methods, though they suffer from relatively lower throughput. This limitation has motivated recent efforts to develop amplification-free strategies to target and enrich loci of interest for subsequent sequencing with long reads. Here, we present CaBagE, a method for target enrichment that is efficient and useful for sequencing large, structurally complex targets. The CaBagE method leverages the stable binding of Cas9 to its DNA target to protect desired fragments from digestion with exonuclease. Enriched DNA fragments are then sequenced with Oxford Nanopore's MinION long-read sequencing technology. Enrichment with CaBagE resulted in a median of 116X coverage (range 39-416) of target loci when tested on five genomic targets ranging from 4-20kb in length using healthy donor DNA. Four cancer gene targets were enriched in a single reaction and multiplexed on a single MinION flow cell. We further demonstrate the utility of CaBagE in two ALS patients with C9orf72 short tandem repeat expansions to produce genotype estimates commensurate with genotypes derived from repeat-primed PCR for each individual. With CaBagE there is a physical enrichment of on-target DNA in a given sample prior to sequencing. This feature allows adaptability across sequencing platforms and potential use as an enrichment strategy for applications beyond sequencing. CaBagE is a rapid enrichment method that can illuminate regions of the 'hidden genome' underlying human disease.
Collapse
Affiliation(s)
- Amelia D. Wallace
- Department of Human Genetics, School of Medicine, University of Utah, Salt Lake City, Utah, United States of America
- Utah Center for Genetic Discovery, School of Medicine, University of Utah, Salt Lake City, Utah, United States of America
| | - Thomas A. Sasani
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Jordan Swanier
- Department of Human Genetics, School of Medicine, University of Utah, Salt Lake City, Utah, United States of America
| | - Brooke L. Gates
- Department of Oncological Sciences, Huntsman Cancer Institute, Salt Lake City, Utah, United States of America
| | - Jeff Greenland
- Department of Oncological Sciences, Huntsman Cancer Institute, Salt Lake City, Utah, United States of America
| | - Brent S. Pedersen
- Department of Human Genetics, School of Medicine, University of Utah, Salt Lake City, Utah, United States of America
- Utah Center for Genetic Discovery, School of Medicine, University of Utah, Salt Lake City, Utah, United States of America
| | - Katherine E. Varley
- Department of Oncological Sciences, Huntsman Cancer Institute, Salt Lake City, Utah, United States of America
| | - Aaron R. Quinlan
- Department of Human Genetics, School of Medicine, University of Utah, Salt Lake City, Utah, United States of America
- Utah Center for Genetic Discovery, School of Medicine, University of Utah, Salt Lake City, Utah, United States of America
- Department of Biomedical Informatics, School of Medicine, University of Utah, Salt Lake City, Utah, United States of America
| |
Collapse
|
60
|
Bakhtiari M, Park J, Ding YC, Shleizer-Burko S, Neuhausen SL, Halldórsson BV, Stefánsson K, Gymrek M, Bafna V. Variable number tandem repeats mediate the expression of proximal genes. Nat Commun 2021; 12:2075. [PMID: 33824302 PMCID: PMC8024321 DOI: 10.1038/s41467-021-22206-z] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 02/17/2021] [Indexed: 12/12/2022] Open
Abstract
Variable number tandem repeats (VNTRs) account for significant genetic variation in many organisms. In humans, VNTRs have been implicated in both Mendelian and complex disorders, but are largely ignored by genomic pipelines due to the complexity of genotyping and the computational expense. We describe adVNTR-NN, a method that uses shallow neural networks to genotype a VNTR in 18 seconds on 55X whole genome data, while maintaining high accuracy. We use adVNTR-NN to genotype 10,264 VNTRs in 652 GTEx individuals. Associating VNTR length with gene expression in 46 tissues, we identify 163 "eVNTRs". Of the 22 eVNTRs in blood where independent data is available, 21 (95%) are replicated in terms of significance and direction of association. 49% of the eVNTR loci show a strong and likely causal impact on the expression of genes and 80% have maximum effect size at least 0.3. The impacted genes are involved in diseases including Alzheimer's, obesity and familial cancers, highlighting the importance of VNTRs for understanding the genetic basis of complex diseases.
Collapse
Affiliation(s)
- Mehrdad Bakhtiari
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Jonghun Park
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Yuan-Chun Ding
- Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, CA, USA
| | | | - Susan L Neuhausen
- Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, CA, USA
| | | | | | - Melissa Gymrek
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, USA
- Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Vineet Bafna
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
61
|
Mizuguchi T, Toyota T, Miyatake S, Mitsuhashi S, Doi H, Kudo Y, Kishida H, Hayashi N, Tsuburaya RS, Kinoshita M, Fukuyama T, Fukuda H, Koshimizu E, Tsuchida N, Uchiyama Y, Fujita A, Takata A, Miyake N, Kato M, Tanaka F, Adachi H, Matsumoto N. Complete sequencing of expanded SAMD12 repeats by long-read sequencing and Cas9-mediated enrichment. Brain 2021; 144:1103-1117. [PMID: 33791773 DOI: 10.1093/brain/awab021] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Revised: 11/02/2020] [Accepted: 11/17/2020] [Indexed: 11/14/2022] Open
Abstract
A pentanucleotide TTTCA repeat insertion into a polymorphic TTTTA repeat element in SAMD12 causes benign adult familial myoclonic epilepsy. Although the precise determination of the entire SAMD12 repeat sequence is important for molecular diagnosis and research, obtaining this sequence remains challenging when using conventional genomic/genetic methods, and even short-read and long-read next-generation sequencing technologies have been insufficient. Incomplete information regarding expanded repeat sequences may hamper our understanding of the pathogenic roles played by varying numbers of repeat units, genotype-phenotype correlations, and mutational mechanisms. Here, we report a new approach for the precise determination of the entire expanded repeat sequence and present a workflow designed to improve the diagnostic rates in various repeat expansion diseases. We examined 34 clinically diagnosed benign adult familial myoclonic epilepsy patients, from 29 families using repeat-primed PCR, Southern blot, and long-read sequencing with Cas9-mediated enrichment. Two cases with questionable results from repeat-primed PCR and/or Southern blot were confirmed as pathogenic using long-read sequencing with Cas9-mediated enrichment, resulting in the identification of pathogenic SAMD12 repeat expansions in 76% of examined families (22/29). Importantly, long-read sequencing with Cas9-mediated enrichment was able to provide detailed information regarding the sizes, configurations, and compositions of the expanded repeats. The inserted TTTCA repeat size and the proportion of TTTCA sequences among the overall repeat sequences were highly variable, and a novel repeat configuration was identified. A genotype-phenotype correlation study suggested that the insertion of even short (TTTCA)14 repeats contributed to the development of benign adult familial myoclonic epilepsy. However, the sizes of the overall TTTTA and TTTCA repeat units are also likely to be involved in the pathology of benign adult familial myoclonic epilepsy. Seven unsolved SAMD12-negative cases were investigated using whole-genome long-read sequencing, and infrequent, disease-associated, repeat expansions were identified in two cases. The strategic workflow resolved two questionable SAMD12-positive cases and two previously SAMD12-negative cases, increasing the diagnostic yield from 69% (20/29 families) to 83% (24/29 families). This study indicates the significant utility of long-read sequencing technologies to explore the pathogenic contributions made by various repeat units in complex repeat expansions and to improve the overall diagnostic rate.
Collapse
Affiliation(s)
- Takeshi Mizuguchi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama 236-0004, Japan
| | - Tomoko Toyota
- Department of Neurology, University of Occupational and Environmental Health School of Medicine, Kitakyushu 807-8555, Japan
| | - Satoko Miyatake
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama 236-0004, Japan.,Clinical Genetics Department, Yokohama City University Hospital, Yokohama 236-0004, Japan
| | - Satomi Mitsuhashi
- Department of Genomic Function and Diversity, Medical Research Institute Tokyo Medical and Dental University, Tokyo 113-8510, Japan
| | - Hiroshi Doi
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama 236-0004, Japan
| | - Yosuke Kudo
- Department of Neurology, Yokohama Brain and Spine Center, Yokohama 235-0012, Japan
| | - Hitaru Kishida
- Department of Neurology, Yokohama City University Medical Center, Yokohama 232-0024, Japan
| | - Noriko Hayashi
- Department of Neurology, Yamato Municipal Hospital, Yamato 242-8602, Japan
| | - Rie S Tsuburaya
- Department of Pediatric Neurology, National Hospital Organization Utano National Hospital, Kyoto 616-8255, Japan
| | - Masako Kinoshita
- Department of Neurology, National Hospital Organization Utano National Hospital, Kyoto 616-8255, Japan
| | - Tetsuhiro Fukuyama
- Department of Pediatrics, Shinshu University School of Medicine, Matsumoto 390-8621, Japan
| | - Hiromi Fukuda
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama 236-0004, Japan.,Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama 236-0004, Japan
| | - Eriko Koshimizu
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama 236-0004, Japan
| | - Naomi Tsuchida
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama 236-0004, Japan
| | - Yuri Uchiyama
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama 236-0004, Japan
| | - Atsushi Fujita
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama 236-0004, Japan
| | - Atsushi Takata
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama 236-0004, Japan
| | - Noriko Miyake
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama 236-0004, Japan
| | - Mitsuhiro Kato
- Department of Pediatrics, Showa University School of Medicine, Tokyo 142-8666, Japan
| | - Fumiaki Tanaka
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama 236-0004, Japan
| | - Hiroaki Adachi
- Department of Neurology, University of Occupational and Environmental Health School of Medicine, Kitakyushu 807-8555, Japan
| | - Naomichi Matsumoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama 236-0004, Japan
| |
Collapse
|
62
|
Macken WL, Vandrovcova J, Hanna MG, Pitceathly RDS. Applying genomic and transcriptomic advances to mitochondrial medicine. Nat Rev Neurol 2021; 17:215-230. [PMID: 33623159 DOI: 10.1038/s41582-021-00455-2] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/06/2021] [Indexed: 02/07/2023]
Abstract
Next-generation sequencing (NGS) has increased our understanding of the molecular basis of many primary mitochondrial diseases (PMDs). Despite this progress, many patients with suspected PMD remain without a genetic diagnosis, which restricts their access to in-depth genetic counselling, reproductive options and clinical trials, in addition to hampering efforts to understand the underlying disease mechanisms. Although they represent a considerable improvement over their predecessors, current methods for sequencing the mitochondrial and nuclear genomes have important limitations, and molecular diagnostic techniques are often manual and time consuming. However, recent advances in genomics and transcriptomics offer realistic solutions to these challenges. In this Review, we discuss the current genetic testing approach for PMDs and the opportunities that exist for increased use of whole-genome NGS of nuclear and mitochondrial DNA (mtDNA) in the clinical environment. We consider the possible role for long-read approaches in sequencing of mtDNA and in the identification of novel nuclear genomic causes of PMDs. We examine the expanding applications of RNA sequencing, including the detection of cryptic variants that affect splicing and gene expression and the interpretation of rare and novel mitochondrial transfer RNA variants.
Collapse
Affiliation(s)
- William L Macken
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology and The National Hospital for Neurology and Neurosurgery, London, UK
| | - Jana Vandrovcova
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology and The National Hospital for Neurology and Neurosurgery, London, UK
| | - Michael G Hanna
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology and The National Hospital for Neurology and Neurosurgery, London, UK
| | - Robert D S Pitceathly
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology and The National Hospital for Neurology and Neurosurgery, London, UK.
| |
Collapse
|
63
|
Sun X, Song L, Yang W, Zhang L, Liu M, Li X, Tian G, Wang W. Nanopore Sequencing and Its Clinical Applications. Methods Mol Biol 2021; 2204:13-32. [PMID: 32710311 DOI: 10.1007/978-1-0716-0904-0_2] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Nanopore sequencing is a method for determining the order and modifications of DNA/RNA nucleotides by detecting the electric current variations when DNA/RNA oligonucleotides pass through the nanometer-sized hole (nanopore). Nanopore-based DNA analysis techniques have been commercialized by Oxford Nanopore Technologies, NabSys, and Sequenom, and widely used in scientific researches recently including human genomics, cancer, metagenomics, plant sciences, etc., moreover, it also has potential applications in the field of healthcare due to its fast turn-around time, portable and real-time data analysis. Those features make it a promising technology for the point-of-care testing (POCT) and its potential clinical applications are briefly discussed in this chapter.
Collapse
Affiliation(s)
- Xue Sun
- Geneis (Beijing) Co., Ltd., Beijing, People's Republic of China
| | - Lei Song
- Geneis (Beijing) Co., Ltd., Beijing, People's Republic of China
| | - Wenjuan Yang
- Geneis (Beijing) Co., Ltd., Beijing, People's Republic of China
| | - Lili Zhang
- Geneis (Beijing) Co., Ltd., Beijing, People's Republic of China
| | - Meng Liu
- Geneis (Beijing) Co., Ltd., Beijing, People's Republic of China
| | - Xiaoshuang Li
- Geneis (Beijing) Co., Ltd., Beijing, People's Republic of China
| | - Geng Tian
- Geneis (Beijing) Co., Ltd., Beijing, People's Republic of China
| | - Weiwei Wang
- Geneis (Beijing) Co., Ltd., Beijing, People's Republic of China.
| |
Collapse
|
64
|
Flynn JM, Long M, Wing RA, Clark AG. Evolutionary Dynamics of Abundant 7-bp Satellites in the Genome of Drosophila virilis. Mol Biol Evol 2021; 37:1362-1375. [PMID: 31960929 DOI: 10.1093/molbev/msaa010] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The factors that drive the rapid changes in abundance of tandem arrays of highly repetitive sequences, known as satellite DNA, are not well understood. Drosophila virilis has one of the highest relative amounts of simple satellites of any organism that has been studied, with an estimated >40% of its genome composed of a few related 7-bp satellites. Here, we use D. virilis as a model to understand technical biases affecting satellite sequencing and the evolutionary processes that drive satellite composition. By analyzing sequencing data from Illumina, PacBio, and Nanopore platforms, we identify platform-specific biases and suggest best practices for accurate characterization of satellites by sequencing. We use comparative genomics and cytogenetics to demonstrate that the highly abundant AAACTAC satellite family arose from a related satellite in the branch leading to the virilis phylad 4.5-11 Ma before exploding in abundance in some species of the clade. The most abundant satellite is conserved in sequence and location in the pericentromeric region but has diverged widely in abundance among species, whereas the satellites nearest the centromere are rapidly turning over in sequence composition. By analyzing multiple strains of D. virilis, we saw that the abundances of two centromere-proximal satellites are anticorrelated along a geographical gradient, which we suggest could be caused by ongoing conflicts at the centromere. In conclusion, we illuminate several key attributes of satellite evolutionary dynamics that we hypothesize to be driven by processes including selection, meiotic drive, and constraints on satellite sequence and abundance.
Collapse
Affiliation(s)
- Jullien M Flynn
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY
| | - Manyuan Long
- Department of Ecology and Evolution, University of Chicago, Chicago, IL
| | - Rod A Wing
- School of Plant Sciences, Arizona Genomics Institute, University of Arizona, Tucson, AZ
| | - Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY
| |
Collapse
|
65
|
Holley G, Beyter D, Ingimundardottir H, Møller PL, Kristmundsdottir S, Eggertsson HP, Halldorsson BV. Ratatosk: hybrid error correction of long reads enables accurate variant calling and assembly. Genome Biol 2021; 22:28. [PMID: 33419473 PMCID: PMC7792008 DOI: 10.1186/s13059-020-02244-4] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 12/15/2020] [Indexed: 12/20/2022] Open
Abstract
A major challenge to long read sequencing data is their high error rate of up to 15%. We present Ratatosk, a method to correct long reads with short read data. We demonstrate on 5 human genome trios that Ratatosk reduces the error rate of long reads 6-fold on average with a median error rate as low as 0.22 %. SNP calls in Ratatosk corrected reads are nearly 99 % accurate and indel calls accuracy is increased by up to 37 %. An assembly of Ratatosk corrected reads from an Ashkenazi individual yields a contig N50 of 45 Mbp and less misassemblies than a PacBio HiFi reads assembly.
Collapse
Affiliation(s)
| | | | | | - Peter L Møller
- Department of Biomedicine, Aarhus University, Aarhus, Denmark
| | - Snædis Kristmundsdottir
- deCODE genetics/Amgen Inc., Reykjavík, Iceland
- School of Technology, Reykjavik University, Reykjavík, Iceland
| | | | - Bjarni V Halldorsson
- deCODE genetics/Amgen Inc., Reykjavík, Iceland
- School of Technology, Reykjavik University, Reykjavík, Iceland
| |
Collapse
|
66
|
Mitsuhashi S, Frith MC, Matsumoto N. Genome-wide survey of tandem repeats by nanopore sequencing shows that disease-associated repeats are more polymorphic in the general population. BMC Med Genomics 2021; 14:17. [PMID: 33413375 PMCID: PMC7791882 DOI: 10.1186/s12920-020-00853-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 12/08/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Tandem repeats are highly mutable and contribute to the development of human disease by a variety of mechanisms. It is difficult to predict which tandem repeats may cause a disease. One hypothesis is that changeable tandem repeats are the source of genetic diseases, because disease-causing repeats are polymorphic in healthy individuals. However, it is not clear whether disease-causing repeats are more polymorphic than other repeats. METHODS We performed a genome-wide survey of the millions of human tandem repeats using publicly available long read genome sequencing data from 21 humans. We measured tandem repeat copy number changes using tandem-genotypes. Length variation of known disease-associated repeats was compared to other repeat loci. RESULTS We found that known Mendelian disease-causing or disease-associated repeats, especially CAG and 5'UTR GGC repeats, are relatively long and polymorphic in the general population. We also show that repeat lengths of two disease-causing tandem repeats, in ATXN3 and GLS, are correlated with near-by GWAS SNP genotypes. CONCLUSIONS We provide a catalog of polymorphic tandem repeats across a variety of repeat unit lengths and sequences, from long read sequencing data. This method especially if used in genome wide association study, may indicate possible new candidates of pathogenic or biologically important tandem repeats in human genomes.
Collapse
Affiliation(s)
- Satomi Mitsuhashi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama, 236-0004, Japan.
- Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University, M&D Tower 24F, 1-5-45 Yushima, Bunkyo-ku, Tokyo, 113-8510, Japan.
| | - Martin C Frith
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
- Graduate School of Frontier Sciences, University of Tokyo, Chiba, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Tokyo, Japan
| | - Naomichi Matsumoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama, 236-0004, Japan.
| |
Collapse
|
67
|
Abstract
Long DNA and RNA reads from nanopore and PacBio technologies have many applications, but the raw reads have a substantial error rate. More accurate sequences can be obtained by merging multiple reads from overlapping parts of the same sequence. lamassemble aligns up to ∼1000 reads to each other, and makes a consensus sequence, which is often much more accurate than the raw reads. It is useful for studying a region of interest such as an expanded tandem repeat or other disease-causing mutation.
Collapse
|
68
|
Cechova M. Probably Correct: Rescuing Repeats with Short and Long Reads. Genes (Basel) 2020; 12:48. [PMID: 33396198 PMCID: PMC7823596 DOI: 10.3390/genes12010048] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 12/23/2020] [Accepted: 12/24/2020] [Indexed: 02/07/2023] Open
Abstract
Ever since the introduction of high-throughput sequencing following the human genome project, assembling short reads into a reference of sufficient quality posed a significant problem as a large portion of the human genome-estimated 50-69%-is repetitive. As a result, a sizable proportion of sequencing reads is multi-mapping, i.e., without a unique placement in the genome. The two key parameters for whether or not a read is multi-mapping are the read length and genome complexity. Long reads are now able to span difficult, heterochromatic regions, including full centromeres, and characterize chromosomes from "telomere to telomere". Moreover, identical reads or repeat arrays can be differentiated based on their epigenetic marks, such as methylation patterns, aiding in the assembly process. This is despite the fact that long reads still contain a modest percentage of sequencing errors, disorienting the aligners and assemblers both in accuracy and speed. Here, I review the proposed and implemented solutions to the repeat resolution and the multi-mapping read problem, as well as the downstream consequences of reference choice, repeat masking, and proper representation of sex chromosomes. I also consider the forthcoming challenges and solutions with regards to long reads, where we expect the shift from the problem of repeat localization within a single individual to the problem of repeat positioning within pangenomes.
Collapse
Affiliation(s)
- Monika Cechova
- Genetics and Reproductive Biotechnologies, Veterinary Research Institute, Central European Institute of Technology (CEITEC), 621 00 Brno, Czech Republic
| |
Collapse
|
69
|
Bolognini D, Magi A, Benes V, Korbel JO, Rausch T. TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data. Gigascience 2020; 9:giaa101. [PMID: 33034633 PMCID: PMC7539535 DOI: 10.1093/gigascience/giaa101] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 08/07/2020] [Accepted: 09/07/2020] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Tandem repeat sequences are widespread in the human genome, and their expansions cause multiple repeat-mediated disorders. Genome-wide discovery approaches are needed to fully elucidate their roles in health and disease, but resolving tandem repeat variation accurately remains a challenging task. While traditional mapping-based approaches using short-read data have severe limitations in the size and type of tandem repeats they can resolve, recent third-generation sequencing technologies exhibit substantially higher sequencing error rates, which complicates repeat resolution. RESULTS We developed TRiCoLOR, a freely available tool for tandem repeat profiling using error-prone long reads from third-generation sequencing technologies. The method can identify repetitive regions in sequencing data without a prior knowledge of their motifs or locations and resolve repeat multiplicity and period size in a haplotype-specific manner. The tool includes methods to interactively visualize the identified repeats and to trace their Mendelian consistency in pedigrees. CONCLUSIONS TRiCoLOR demonstrates excellent performance and improved sensitivity and specificity compared with alternative tools on synthetic data. For real human whole-genome sequencing data, TRiCoLOR achieves high validation rates, suggesting its suitability to identify tandem repeat variation in personal genomes.
Collapse
Affiliation(s)
- Davide Bolognini
- Department of Experimental and Clinical Medicine, University of Florence, Viale Pieraccini 6, Florence 50134, Italy
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, Heidelberg 69117, Germany
| | - Alberto Magi
- Department of Information Engineering, University of Florence, Via di S. Marta 3, Florence 50134, Italy
| | - Vladimir Benes
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, Heidelberg 69117, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, Heidelberg 69117, Germany
| | - Tobias Rausch
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, Heidelberg 69117, Germany
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, Heidelberg 69117, Germany
| |
Collapse
|
70
|
Deng J, Yu J, Li P, Luan X, Cao L, Zhao J, Yu M, Zhang W, Lv H, Xie Z, Meng L, Zheng Y, Zhao Y, Gang Q, Wang Q, Liu J, Zhu M, Guo X, Su Y, Liang Y, Liang F, Hayashi T, Maeda MH, Sato T, Ura S, Oya Y, Ogasawara M, Iida A, Nishino I, Zhou C, Yan C, Yuan Y, Hong D, Wang Z. Expansion of GGC Repeat in GIPC1 Is Associated with Oculopharyngodistal Myopathy. Am J Hum Genet 2020; 106:793-804. [PMID: 32413282 DOI: 10.1016/j.ajhg.2020.04.011] [Citation(s) in RCA: 86] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2019] [Accepted: 04/15/2020] [Indexed: 11/27/2022] Open
Abstract
Oculopharyngodistal myopathy (OPDM) is an adult-onset inherited neuromuscular disorder characterized by progressive ptosis, external ophthalmoplegia, and weakness of the masseter, facial, pharyngeal, and distal limb muscles. The myopathological features are presence of rimmed vacuoles (RVs) in the muscle fibers and myopathic changes of differing severity. Inheritance is variable, with either putative autosomal-dominant or autosomal-recessive pattern. Here, using a comprehensive strategy combining whole-genome sequencing (WGS), long-read whole-genome sequencing (LRS), linkage analysis, repeat-primed polymerase chain reaction (RP-PCR), and fluorescence amplicon length analysis polymerase chain reaction (AL-PCR), we identified an abnormal GGC repeat expansion in the 5' UTR of GIPC1 in one out of four families and three sporadic case subjects from a Chinese OPDM cohort. Expanded GGC repeats were further confirmed as the cause of OPDM in an additional 2 out of 4 families and 6 out of 13 sporadic Chinese individuals with OPDM, as well as 7 out of 194 unrelated Japanese individuals with OPDM. Methylation, qRT-PCR, and western blot analysis indicated that GIPC1 mRNA levels were increased while protein levels were unaltered in OPDM-affected individuals. RNA sequencing indicated p53 signaling, vascular smooth muscle contraction, ubiquitin-mediated proteolysis, and ribosome pathways were involved in the pathogenic mechanisms of OPDM-affected individuals with GGC repeat expansion in GIPC1. This study provides further evidence that OPDM is associated with GGC repeat expansions in distinct genes and highly suggests that expanded GGC repeat units are essential in the pathogenesis of OPDM, regardless of the genes in which the expanded repeats are located.
Collapse
|
71
|
Dolzhenko E, Bennett MF, Richmond PA, Trost B, Chen S, van Vugt JJFA, Nguyen C, Narzisi G, Gainullin VG, Gross AM, Lajoie BR, Taft RJ, Wasserman WW, Scherer SW, Veldink JH, Bentley DR, Yuen RKC, Bahlo M, Eberle MA. ExpansionHunter Denovo: a computational method for locating known and novel repeat expansions in short-read sequencing data. Genome Biol 2020; 21:102. [PMID: 32345345 PMCID: PMC7187524 DOI: 10.1186/s13059-020-02017-z] [Citation(s) in RCA: 103] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Accepted: 04/14/2020] [Indexed: 12/13/2022] Open
Abstract
Repeat expansions are responsible for over 40 monogenic disorders, and undoubtedly more pathogenic repeat expansions remain to be discovered. Existing methods for detecting repeat expansions in short-read sequencing data require predefined repeat catalogs. Recent discoveries emphasize the need for methods that do not require pre-specified candidate repeats. To address this need, we introduce ExpansionHunter Denovo, an efficient catalog-free method for genome-wide repeat expansion detection. Analysis of real and simulated data shows that our method can identify large expansions of 41 out of 44 pathogenic repeats, including nine recently reported non-reference repeat expansions not discoverable via existing methods.
Collapse
Affiliation(s)
- Egor Dolzhenko
- Illumina Inc., 5200 Illumina Way, San Diego, CA, 92122, USA
| | - Mark F Bennett
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC, 3052, Australia.,Department of Medical Biology, The University of Melbourne, 1G Royal Parade, Parkville, VIC, 3052, Australia.,Epilepsy Research Centre, Department of Medicine, The University of Melbourne, Austin Health, 245 Burgundy Street, Heidelberg, VIC, 3084, Australia
| | - Phillip A Richmond
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada
| | - Brett Trost
- Genetics and Genome Biology, The Hospital for Sick Children, 686 Bay Street, Toronto, ON, M5G 0A4, Canada.,The Centre for Applied Genomics, The Hospital for Sick Children, 686 Bay Street, Toronto, ON, M5G 0A4, Canada
| | - Sai Chen
- Illumina Inc., 5200 Illumina Way, San Diego, CA, 92122, USA
| | - Joke J F A van Vugt
- Department of Neurology, UMC Utrecht Brain Center, Utrecht University, Universiteitsweg 100, 3584 CG, Utrecht, The Netherlands
| | - Charlotte Nguyen
- Genetics and Genome Biology, The Hospital for Sick Children, 686 Bay Street, Toronto, ON, M5G 0A4, Canada.,The Centre for Applied Genomics, The Hospital for Sick Children, 686 Bay Street, Toronto, ON, M5G 0A4, Canada.,Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, ON, M5S 2E5, Canada
| | - Giuseppe Narzisi
- New York Genome Center, 101 Avenue of the Americas, New York, 10013, USA
| | | | - Andrew M Gross
- Illumina Inc., 5200 Illumina Way, San Diego, CA, 92122, USA
| | - Bryan R Lajoie
- Illumina Inc., 5200 Illumina Way, San Diego, CA, 92122, USA
| | - Ryan J Taft
- Illumina Inc., 5200 Illumina Way, San Diego, CA, 92122, USA
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada
| | - Stephen W Scherer
- Genetics and Genome Biology, The Hospital for Sick Children, 686 Bay Street, Toronto, ON, M5G 0A4, Canada.,The Centre for Applied Genomics, The Hospital for Sick Children, 686 Bay Street, Toronto, ON, M5G 0A4, Canada.,Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, ON, M5S 2E5, Canada.,The McLaughlin Centre, University of Toronto, 686 Bay Street, Toronto, ON, M5G 0A4, Canada
| | - Jan H Veldink
- Department of Neurology, UMC Utrecht Brain Center, Utrecht University, Universiteitsweg 100, 3584 CG, Utrecht, The Netherlands
| | - David R Bentley
- Illumina Cambridge Ltd, Illumina Centre, 19 Granta Park, Great Abington, Cambridge, CB21 6DF, UK
| | - Ryan K C Yuen
- Genetics and Genome Biology, The Hospital for Sick Children, 686 Bay Street, Toronto, ON, M5G 0A4, Canada.,The Centre for Applied Genomics, The Hospital for Sick Children, 686 Bay Street, Toronto, ON, M5G 0A4, Canada.,Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, ON, M5S 2E5, Canada
| | - Melanie Bahlo
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC, 3052, Australia.,Department of Medical Biology, The University of Melbourne, 1G Royal Parade, Parkville, VIC, 3052, Australia
| | | |
Collapse
|
72
|
Kalendar R, Raskina O, Belyayev A, Schulman AH. Long Tandem Arrays of Cassandra Retroelements and Their Role in Genome Dynamics in Plants. Int J Mol Sci 2020; 21:ijms21082931. [PMID: 32331257 PMCID: PMC7215508 DOI: 10.3390/ijms21082931] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Revised: 04/15/2020] [Accepted: 04/17/2020] [Indexed: 02/07/2023] Open
Abstract
Retrotransposable elements are widely distributed and diverse in eukaryotes. Their copy number increases through reverse-transcription-mediated propagation, while they can be lost through recombinational processes, generating genomic rearrangements. We previously identified extensive structurally uniform retrotransposon groups in which no member contains the gag, pol, or env internal domains. Because of the lack of protein-coding capacity, these groups are non-autonomous in replication, even if transcriptionally active. The Cassandra element belongs to the non-autonomous group called terminal-repeat retrotransposons in miniature (TRIM). It carries 5S RNA sequences with conserved RNA polymerase (pol) III promoters and terminators in its long terminal repeats (LTRs). Here, we identified multiple extended tandem arrays of Cassandra retrotransposons within different plant species, including ferns. At least 12 copies of repeated LTRs (as the tandem unit) and internal domain (as a spacer), giving a pattern that resembles the cellular 5S rRNA genes, were identified. A cytogenetic analysis revealed the specific chromosomal pattern of the Cassandra retrotransposon with prominent clustering at and around 5S rDNA loci. The secondary structure of the Cassandra retroelement RNA is predicted to form super-loops, in which the two LTRs are complementary to each other and can initiate local recombination, leading to the tandem arrays of Cassandra elements. The array structures are conserved for Cassandra retroelements of different species. We speculate that recombination events similar to those of 5S rRNA genes may explain the wide variation in Cassandra copy number. Likewise, the organization of 5S rRNA gene sequences is very variable in flowering plants; part of what is taken for 5S gene copy variation may be variation in Cassandra number. The role of the Cassandra 5S sequences remains to be established.
Collapse
Affiliation(s)
- Ruslan Kalendar
- Department of Agricultural Sciences, University of Helsinki, P.O. Box 27 (Latokartanonkaari 5), FI-00014 Helsinki, Finland
- RSE “National Center for Biotechnology”, Korgalzhyn Highway 13/5, Nur-Sultan 010000, Kazakhstan
- Correspondence: (R.K.); (A.H.S.)
| | - Olga Raskina
- Institute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel;
| | - Alexander Belyayev
- Laboratory of Molecular Cytogenetics and Karyology, Institute of Botany of the ASCR, Zámek 1, CZ-252 43 Průhonice, Czech Republic;
| | - Alan H. Schulman
- Natural Resources Institute Finland (Luke), Latokartanonkaari 9, FI-00790 Helsinki, Finland
- Institute of Biotechnology and Viikki Plant Science Centre, University of Helsinki, P.O. Box 65, FI-00014 Helsinki, Finland
- Correspondence: (R.K.); (A.H.S.)
| |
Collapse
|
73
|
Abstract
Identifying structural variation (SV) is essential for genome interpretation but has been historically difficult due to limitations inherent to available genome technologies. Detection methods that use ensemble algorithms and emerging sequencing technologies have enabled the discovery of thousands of SVs, uncovering information about their ubiquity, relationship to disease and possible effects on biological mechanisms. Given the variability in SV type and size, along with unique detection biases of emerging genomic platforms, multiplatform discovery is necessary to resolve the full spectrum of variation. Here, we review modern approaches for investigating SVs and proffer that, moving forwards, studies integrating biological information with detection will be necessary to comprehensively understand the impact of SV in the human genome.
Collapse
Affiliation(s)
- Steve S Ho
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Alexander E Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Ryan E Mills
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
74
|
Nakamura H, Doi H, Mitsuhashi S, Miyatake S, Katoh K, Frith MC, Asano T, Kudo Y, Ikeda T, Kubota S, Kunii M, Kitazawa Y, Tada M, Okamoto M, Joki H, Takeuchi H, Matsumoto N, Tanaka F. Long-read sequencing identifies the pathogenic nucleotide repeat expansion in RFC1 in a Japanese case of CANVAS. J Hum Genet 2020; 65:475-480. [PMID: 32066831 DOI: 10.1038/s10038-020-0733-y] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Revised: 01/14/2020] [Accepted: 02/03/2020] [Indexed: 11/09/2022]
Abstract
Recently, a recessively inherited intronic repeat expansion in replication factor C1 (RFC1) was identified in cerebellar ataxia with neuropathy and bilateral vestibular areflexia syndrome (CANVAS). Here, we describe a Japanese case of genetically confirmed CANVAS with autonomic failure and auditory hallucination. The case showed impaired uptake of iodine-123-metaiodobenzylguanidine and 123I-ioflupane in the cardiac sympathetic nerve and dopaminergic neurons, respectively, by single-photon emission computed tomography. Long-read sequencing identified biallelic pathogenic (AAGGG)n nucleotide repeat expansion in RFC1 and heterozygous benign (TAAAA)n and (TAGAA)n expansions in brain expressed, associated with NEDD4 (BEAN1). Enrichment of the repeat regions in RFC1 and BEAN1 using a Cas9-mediated system clearly distinguished between pathogenic and benign repeat expansions. The haplotype around RFC1 indicated that the (AAGGG)n expansion in our case was on the same ancestral allele as that of European cases. Thus, long-read sequencing facilitates precise genetic diagnosis of diseases with complex repeat structures and various expansions.
Collapse
Affiliation(s)
- Haruko Nakamura
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Hiroshi Doi
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan.
| | - Satomi Mitsuhashi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Satoko Miyatake
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Kazutaka Katoh
- Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, Osaka, 565-0871, Japan.,Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo, 135-0064, Japan
| | - Martin C Frith
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo, 135-0064, Japan.,Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa-shi, Chiba, 277-8568, Japan.,Computational Bio Big-Data Open Innovation Laboratory, National Institute of Advanced Industrial Science and Technology, 3-4-1 Okubo, Shinjuku-ku, Tokyo, 169-8555, Japan
| | - Tetsuya Asano
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Yosuke Kudo
- Department of Neurology, Yokohama Brain and Spine Center, Yokohama, 235-0012, Japan
| | - Takuya Ikeda
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Shun Kubota
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Misako Kunii
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Yu Kitazawa
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Mikiko Tada
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Mitsuo Okamoto
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Hideto Joki
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Hideyuki Takeuchi
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Naomichi Matsumoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Fumiaki Tanaka
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan.
| |
Collapse
|
75
|
Vondrak T, Ávila Robledillo L, Novák P, Koblížková A, Neumann P, Macas J. Characterization of repeat arrays in ultra-long nanopore reads reveals frequent origin of satellite DNA from retrotransposon-derived tandem repeats. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 101:484-500. [PMID: 31559657 PMCID: PMC7004042 DOI: 10.1111/tpj.14546] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 09/09/2019] [Accepted: 09/12/2019] [Indexed: 05/21/2023]
Abstract
Amplification of monomer sequences into long contiguous arrays is the main feature distinguishing satellite DNA from other tandem repeats, yet it is also the main obstacle in its investigation because these arrays are in principle difficult to assemble. Here we explore an alternative, assembly-free approach that utilizes ultra-long Oxford Nanopore reads to infer the length distribution of satellite repeat arrays, their association with other repeats and the prevailing sequence periodicities. Using the satellite DNA-rich legume plant Lathyrus sativus as a model, we demonstrated this approach by analyzing 11 major satellite repeats using a set of nanopore reads ranging from 30 to over 200 kb in length and representing 0.73× genome coverage. We found surprising differences between the analyzed repeats because only two of them were predominantly organized in long arrays typical for satellite DNA. The remaining nine satellites were found to be derived from short tandem arrays located within LTR-retrotransposons that occasionally expanded in length. While the corresponding LTR-retrotransposons were dispersed across the genome, this array expansion occurred mainly in the primary constrictions of the L. sativus chromosomes, which suggests that these genome regions are favourable for satellite DNA accumulation.
Collapse
Affiliation(s)
- Tihana Vondrak
- Biology CentreCzech Academy of SciencesBranišovská 31České BudějoviceCZ‐37005Czech Republic
- Faculty of ScienceUniversity of South BohemiaČeské BudějoviceCzech Republic
| | - Laura Ávila Robledillo
- Biology CentreCzech Academy of SciencesBranišovská 31České BudějoviceCZ‐37005Czech Republic
- Faculty of ScienceUniversity of South BohemiaČeské BudějoviceCzech Republic
| | - Petr Novák
- Biology CentreCzech Academy of SciencesBranišovská 31České BudějoviceCZ‐37005Czech Republic
| | - Andrea Koblížková
- Biology CentreCzech Academy of SciencesBranišovská 31České BudějoviceCZ‐37005Czech Republic
| | - Pavel Neumann
- Biology CentreCzech Academy of SciencesBranišovská 31České BudějoviceCZ‐37005Czech Republic
| | - Jiří Macas
- Biology CentreCzech Academy of SciencesBranišovská 31České BudějoviceCZ‐37005Czech Republic
| |
Collapse
|
76
|
Chiara M, Zambelli F, Picardi E, Horner DS, Pesole G. Critical assessment of bioinformatics methods for the characterization of pathological repeat expansions with single-molecule sequencing data. Brief Bioinform 2019; 21:1971-1986. [DOI: 10.1093/bib/bbz099] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Revised: 06/22/2019] [Accepted: 07/09/2019] [Indexed: 01/19/2023] Open
Abstract
Abstract
A number of studies have reported the successful application of single-molecule sequencing technologies to the determination of the size and sequence of pathological expanded microsatellite repeats over the last 5 years. However, different custom bioinformatics pipelines were employed in each study, preventing meaningful comparisons and somewhat limiting the reproducibility of the results. In this review, we provide a brief summary of state-of-the-art methods for the characterization of expanded repeats alleles, along with a detailed comparison of bioinformatics tools for the determination of repeat length and sequence, using both real and simulated data. Our reanalysis of publicly available human genome sequencing data suggests a modest, but statistically significant, increase of the error rate of single-molecule sequencing technologies at genomic regions containing short tandem repeats. However, we observe that all the methods herein tested, irrespective of the strategy used for the analysis of the data (either based on the alignment or assembly of the reads), show high levels of sensitivity in both the detection of expanded tandem repeats and the estimation of the expansion size, suggesting that approaches based on single-molecule sequencing technologies are highly effective for the detection and quantification of tandem repeat expansions and contractions.
Collapse
Affiliation(s)
- Matteo Chiara
- Department of Biosciences, University of Milan, via Celoria 26, 20133 Milan, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Via Amendola e, 70126 Bari, Italy
| | - Federico Zambelli
- Department of Biosciences, University of Milan, via Celoria 26, 20133 Milan, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Via Amendola e, 70126 Bari, Italy
| | - Ernesto Picardi
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Via Amendola e, 70126 Bari, Italy
- Department of Biosciences, Biotechnology and Biopharmaceutics, University of Bari “A. Moro”, Via Orabona 4, 70126 Bari, Italy
| | - David S Horner
- Department of Biosciences, University of Milan, via Celoria 26, 20133 Milan, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Via Amendola e, 70126 Bari, Italy
| | - Graziano Pesole
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Via Amendola e, 70126 Bari, Italy
- Department of Biosciences, Biotechnology and Biopharmaceutics, University of Bari “A. Moro”, Via Orabona 4, 70126 Bari, Italy
| |
Collapse
|
77
|
De Roeck A, De Coster W, Bossaerts L, Cacace R, De Pooter T, Van Dongen J, D’Hert S, De Rijk P, Strazisar M, Van Broeckhoven C, Sleegers K. NanoSatellite: accurate characterization of expanded tandem repeat length and sequence through whole genome long-read sequencing on PromethION. Genome Biol 2019; 20:239. [PMID: 31727106 PMCID: PMC6857246 DOI: 10.1186/s13059-019-1856-3] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 10/10/2019] [Indexed: 12/13/2022] Open
Abstract
Technological limitations have hindered the large-scale genetic investigation of tandem repeats in disease. We show that long-read sequencing with a single Oxford Nanopore Technologies PromethION flow cell per individual achieves 30× human genome coverage and enables accurate assessment of tandem repeats including the 10,000-bp Alzheimer's disease-associated ABCA7 VNTR. The Guppy "flip-flop" base caller and tandem-genotypes tandem repeat caller are efficient for large-scale tandem repeat assessment, but base calling and alignment challenges persist. We present NanoSatellite, which analyzes tandem repeats directly on electric current data and improves calling of GC-rich tandem repeats, expanded alleles, and motif interruptions.
Collapse
Affiliation(s)
- Arne De Roeck
- Neurodegenerative Brain Diseases Group, VIB Center for Molecular Neurology, University of Antwerp-CDE, Universiteitsplein 1, B-2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Wouter De Coster
- Neurodegenerative Brain Diseases Group, VIB Center for Molecular Neurology, University of Antwerp-CDE, Universiteitsplein 1, B-2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Liene Bossaerts
- Neurodegenerative Brain Diseases Group, VIB Center for Molecular Neurology, University of Antwerp-CDE, Universiteitsplein 1, B-2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Rita Cacace
- Neurodegenerative Brain Diseases Group, VIB Center for Molecular Neurology, University of Antwerp-CDE, Universiteitsplein 1, B-2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Tim De Pooter
- Neuromics Support Facility, Center for Molecular Neurology, VIB - University of Antwerp, Antwerp, Belgium
| | - Jasper Van Dongen
- Neurodegenerative Brain Diseases Group, VIB Center for Molecular Neurology, University of Antwerp-CDE, Universiteitsplein 1, B-2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Svenn D’Hert
- Neuromics Support Facility, Center for Molecular Neurology, VIB - University of Antwerp, Antwerp, Belgium
| | - Peter De Rijk
- Neuromics Support Facility, Center for Molecular Neurology, VIB - University of Antwerp, Antwerp, Belgium
| | - Mojca Strazisar
- Neuromics Support Facility, Center for Molecular Neurology, VIB - University of Antwerp, Antwerp, Belgium
| | - Christine Van Broeckhoven
- Neurodegenerative Brain Diseases Group, VIB Center for Molecular Neurology, University of Antwerp-CDE, Universiteitsplein 1, B-2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Kristel Sleegers
- Neurodegenerative Brain Diseases Group, VIB Center for Molecular Neurology, University of Antwerp-CDE, Universiteitsplein 1, B-2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| |
Collapse
|
78
|
Long-read sequencing for rare human genetic diseases. J Hum Genet 2019; 65:11-19. [PMID: 31558760 DOI: 10.1038/s10038-019-0671-8] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 08/28/2019] [Accepted: 09/03/2019] [Indexed: 12/19/2022]
Abstract
During the past decade, the search for pathogenic mutations in rare human genetic diseases has involved huge efforts to sequence coding regions, or the entire genome, using massively parallel short-read sequencers. However, the approximate current diagnostic rate is <50% using these approaches, and there remain many rare genetic diseases with unknown cause. There may be many reasons for this, but one plausible explanation is that the responsible mutations are in regions of the genome that are difficult to sequence using conventional technologies (e.g., tandem-repeat expansion or complex chromosomal structural aberrations). Despite the drawbacks of high cost and a shortage of standard analytical methods, several studies have analyzed pathogenic changes in the genome using long-read sequencers. The results of these studies provide hope that further application of long-read sequencers to identify the causative mutations in unsolved genetic diseases may expand our understanding of the human genome and diseases. Such approaches may also be applied to molecular diagnosis and therapeutic strategies for patients with genetic diseases in the future.
Collapse
|
79
|
Makałowski W, Shabardina V. Bioinformatics of nanopore sequencing. J Hum Genet 2019; 65:61-67. [PMID: 31451715 DOI: 10.1038/s10038-019-0659-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 07/26/2019] [Accepted: 08/05/2019] [Indexed: 12/12/2022]
Abstract
Nanopore sequencing is one of the most exciting new technologies that undergo dynamic development. With its development, a growing number of analytical tools are becoming available for researchers. To help them better navigate this ever changing field, we discuss a range of software available to analyze sequences obtained using nanopore technology.
Collapse
Affiliation(s)
- Wojciech Makałowski
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, 48149, Münster, Germany.
| | - Victoria Shabardina
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, 48149, Münster, Germany
| |
Collapse
|
80
|
Ishiura H, Shibata S, Yoshimura J, Suzuki Y, Qu W, Doi K, Almansour MA, Kikuchi JK, Taira M, Mitsui J, Takahashi Y, Ichikawa Y, Mano T, Iwata A, Harigaya Y, Matsukawa MK, Matsukawa T, Tanaka M, Shirota Y, Ohtomo R, Kowa H, Date H, Mitsue A, Hatsuta H, Morimoto S, Murayama S, Shiio Y, Saito Y, Mitsutake A, Kawai M, Sasaki T, Sugiyama Y, Hamada M, Ohtomo G, Terao Y, Nakazato Y, Takeda A, Sakiyama Y, Umeda-Kameyama Y, Shinmi J, Ogata K, Kohno Y, Lim SY, Tan AH, Shimizu J, Goto J, Nishino I, Toda T, Morishita S, Tsuji S. Noncoding CGG repeat expansions in neuronal intranuclear inclusion disease, oculopharyngodistal myopathy and an overlapping disease. Nat Genet 2019; 51:1222-1232. [DOI: 10.1038/s41588-019-0458-z] [Citation(s) in RCA: 178] [Impact Index Per Article: 35.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Accepted: 05/29/2019] [Indexed: 11/09/2022]
|
81
|
Sone J, Mitsuhashi S, Fujita A, Mizuguchi T, Hamanaka K, Mori K, Koike H, Hashiguchi A, Takashima H, Sugiyama H, Kohno Y, Takiyama Y, Maeda K, Doi H, Koyano S, Takeuchi H, Kawamoto M, Kohara N, Ando T, Ieda T, Kita Y, Kokubun N, Tsuboi Y, Katoh K, Kino Y, Katsuno M, Iwasaki Y, Yoshida M, Tanaka F, Suzuki IK, Frith MC, Matsumoto N, Sobue G. Long-read sequencing identifies GGC repeat expansions in NOTCH2NLC associated with neuronal intranuclear inclusion disease. Nat Genet 2019; 51:1215-1221. [PMID: 31332381 DOI: 10.1038/s41588-019-0459-y] [Citation(s) in RCA: 295] [Impact Index Per Article: 59.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2019] [Accepted: 05/29/2019] [Indexed: 12/20/2022]
Abstract
. The average onset age is 59.7 years among approximately 140 NIID cases consisting of mostly sporadic and several familial cases. By linkage mapping of a large NIID family with several affected members (Family 1), we identified a 58.1 Mb linked region at 1p22.1-q21.3 with a maximum logarithm of the odds score of 4.21. By long-read sequencing, we identified a GGC repeat expansion in the 5' region of NOTCH2NLC (Notch 2 N-terminal like C) in all affected family members. Furthermore, we found similar expansions in 8 unrelated families with NIID and 40 sporadic NIID cases. We observed abnormal anti-sense transcripts in fibroblasts specifically from patients but not unaffected individuals. This work shows that repeat expansion in human-specific NOTCH2NLC, a gene that evolved by segmental duplication, causes a human disease.
Collapse
Affiliation(s)
- Jun Sone
- Department of Neurology, Nagoya University Graduate School of Medicine, Nagoya, Japan.,Department of Neurology, National hospital organization Suzuka National Hospital, Suzuka, Japan
| | - Satomi Mitsuhashi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Atsushi Fujita
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Takeshi Mizuguchi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Kohei Hamanaka
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Keiko Mori
- Department of Neurology, Oyamada Memorial Spa Hospital, Yokkaichi, Japan
| | - Haruki Koike
- Department of Neurology, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Akihiro Hashiguchi
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Hiroshi Takashima
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Hiroshi Sugiyama
- Department of Neurology, National Hospital Organization Utano National Hospital, Kyoto, Japan
| | - Yutaka Kohno
- Department of Neurology, Ibaraki Prefectural University of Health Sciences, Ibaraki, Japan
| | - Yoshihisa Takiyama
- Department of Neurology, University of Yamanashi, Chuo, Yamanashi, Japan
| | - Kengo Maeda
- Department of Neurology, National hospital organization Higashi-Ohmi General Medical Center, Higashi-Ohmi, Japan
| | - Hiroshi Doi
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Shigeru Koyano
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Hideyuki Takeuchi
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Michi Kawamoto
- Department of Neurology, Kobe City Medical Center General Hospital, Kobe, Japan
| | - Nobuo Kohara
- Department of Neurology, Kobe City Medical Center General Hospital, Kobe, Japan
| | - Tetsuo Ando
- Department of Neurology, Anjo Kosei Hospital, Anjo, Japan
| | - Toshiaki Ieda
- Department of Neurology, Yokkaichi Municipal Hospital, Yokkaichi, Japan
| | - Yasushi Kita
- Department of Neurology, Hyogo Brain and Heart Center, Himeji, Japan
| | - Norito Kokubun
- Department of Neurology, Dokkyo Medical University, Tochigi, Japan
| | - Yoshio Tsuboi
- Department of Neurology, Fukuoka University, Fukuoka, Japan
| | - Kazutaka Katoh
- Research Institute for Microbial Diseases, Osaka University, Suita, Japan.,Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
| | - Yoshihiro Kino
- Department of Bioinformatics and Molecular Neuropathology, Meiji Pharmaceutical University, Tokyo, Japan
| | - Masahisa Katsuno
- Department of Neurology, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Yasushi Iwasaki
- Department of Neuropathology, Institute for Medical Science of Aging, Aichi Medical University, Nagakute, Japan
| | - Mari Yoshida
- Department of Neuropathology, Institute for Medical Science of Aging, Aichi Medical University, Nagakute, Japan
| | - Fumiaki Tanaka
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Ikuo K Suzuki
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| | - Martin C Frith
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan.,Graduate School of Frontier Sciences, University of Tokyo, Chiba, Japan.,Computational Bio Big-Data Open Innovation Laboratory, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
| | - Naomichi Matsumoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.
| | - Gen Sobue
- Department of Neurology, Nagoya University Graduate School of Medicine, Nagoya, Japan. .,Department of Neurology, and Brain and Mind Research Center, Nagoya University Graduate School of Medicine, Nagoya, Japan. .,Aichi Medical University, Nagakute, Aichi, Japan.
| |
Collapse
|