1
|
Maestri S, Scalzo D, Damaggio G, Zobel M, Besusso D, Cattaneo E. Navigating triplet repeats sequencing: concepts, methodological challenges and perspective for Huntington's disease. Nucleic Acids Res 2025; 53:gkae1155. [PMID: 39676657 PMCID: PMC11724279 DOI: 10.1093/nar/gkae1155] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 10/16/2024] [Accepted: 12/02/2024] [Indexed: 12/17/2024] Open
Abstract
The accurate characterization of triplet repeats, especially the overrepresented CAG repeats, is increasingly relevant for several reasons. First, germline expansion of CAG repeats above a gene-specific threshold causes multiple neurodegenerative disorders; for instance, Huntington's disease (HD) is triggered by >36 CAG repeats in the huntingtin (HTT) gene. Second, extreme expansions up to 800 CAG repeats have been found in specific cell types affected by the disease. Third, synonymous single nucleotide variants within the CAG repeat stretch influence the age of disease onset. Thus, new sequencing-based protocols that profile both the length and the exact nucleotide sequence of triplet repeats are crucial. Various strategies to enrich the target gene over the background, along with sequencing platforms and bioinformatic pipelines, are under development. This review discusses the concepts, challenges, and methodological opportunities for analyzing triplet repeats, using HD as a case study. Starting with traditional approaches, we will explore how sequencing-based methods have evolved to meet increasing scientific demands. We will also highlight experimental and bioinformatic challenges, aiming to provide a guide for accurate triplet repeat characterization for diagnostic and therapeutic purposes.
Collapse
Affiliation(s)
- Simone Maestri
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Davide Scalzo
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Gianluca Damaggio
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Martina Zobel
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Dario Besusso
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Elena Cattaneo
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| |
Collapse
|
2
|
Lei L, Peng L, Wan L, Chen Z, Wang C, Peng H, Qiu R, Tang B, Jiang H. Genetic Analysis of GCA Repeats in the GLS Gene: Implications for Undiagnosed Ataxia and Spinocerebellar Ataxia 3 in Mainland China. Mov Disord 2024. [PMID: 39699045 DOI: 10.1002/mds.30083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2024] [Revised: 11/07/2024] [Accepted: 11/27/2024] [Indexed: 12/20/2024] Open
Abstract
BACKGROUND Recent studies have reported that expanded GCA repeats in the GLS gene can cause glutaminase deficiency with ataxia phenotype. However, to data, no studies have investigated the distribution and role of GCA repeats in the GLS gene of Chinese individuals. OBJECTIVE The aim was to investigate the distribution of GCA repeats in Chinese individuals, including undiagnosed ataxia patients for identifying causal factors, healthy controls for determining the normal range, and ATX-ATXN3 (spinocerebellar ataxia type 3, SCA3) patients for exploring genetic modifiers. METHODS We combined whole-genome sequencing (WGS), repeat-primed polymerase chain reaction, capillary electrophoresis (RP-PCR/CE), and ExpansionHunter to screen the GCA repeats in the GLS gene of 349 undiagnosed ataxia individuals, 1505 healthy controls, and 1236 ATX-ATXN3 (SCA3) patients from mainland China. RESULTS No expanded GCA repeats in the GLS gene were detected across any of the samples. The average number of GCA repeats was 11 (range: 8-31), 12 (range: 6-33), and 11 (range: 6-33) for undiagnosed ataxia patients, healthy controls, and SCA3 patients, respectively. The intermediate repeat size (9 < repeat size ≤ 13) of the nonexpanded GCA allele in the GLS gene was associated with later disease onset in ATX-ATXN3 (SCA3) patients. CONCLUSIONS Abnormal expansions of GLS GCA repeats are rare in the Chinese population. However, intermediate-length normal GCA repeat sizes may influence the age at onset (AAO) in ATX-ATXN3 (SCA3) patients. © 2024 International Parkinson and Movement Disorder Society.
Collapse
Affiliation(s)
- Lijing Lei
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, P. R. China
| | - Linliu Peng
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, P. R. China
| | - Linlin Wan
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, P. R. China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Central South University, Changsha, P. R. China
- Bioinformatics Center and National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, P. R. China
- Department of Radiology, Xiangya Hospital, Central South University, Changsha, P. R. China
- National International Collaborative Research Center for Medical Metabolomics, Central South University, Changsha, P. R. China
| | - Zhao Chen
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, P. R. China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Central South University, Changsha, P. R. China
- Bioinformatics Center and National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, P. R. China
- Hunan International Scientific and Technological Cooperation Base of Neurodegenerative and Neurogenetic Diseases, Changsha, P. R. China
| | - Chunrong Wang
- Department of Pathology, Xiangya Hospital, Central South University, Changsha, P. R. China
| | - Huirong Peng
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, P. R. China
| | - Rong Qiu
- School of Computer Science and Engineering, Central South University, Changsha, P. R. China
| | - Beisha Tang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, P. R. China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Central South University, Changsha, P. R. China
- Bioinformatics Center and National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, P. R. China
- Hunan International Scientific and Technological Cooperation Base of Neurodegenerative and Neurogenetic Diseases, Changsha, P. R. China
| | - Hong Jiang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, P. R. China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Central South University, Changsha, P. R. China
- Bioinformatics Center and National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, P. R. China
- National International Collaborative Research Center for Medical Metabolomics, Central South University, Changsha, P. R. China
- Hunan International Scientific and Technological Cooperation Base of Neurodegenerative and Neurogenetic Diseases, Changsha, P. R. China
- Department of Neurology, The Third Xiangya Hospital, Central South University, Changsha, P. R. China
- Furong Laboratory, Central South University, Changsha, P. R. China
- Brain Research Center, Central South University, Changsha, P. R. China
| |
Collapse
|
3
|
Manigbas CA, Jadhav B, Garg P, Shadrina M, Lee W, Altman G, Martin-Trujillo A, Sharp AJ. A phenome-wide association study of tandem repeat variation in 168,554 individuals from the UK Biobank. Nat Commun 2024; 15:10521. [PMID: 39627187 PMCID: PMC11614882 DOI: 10.1038/s41467-024-54678-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Accepted: 11/18/2024] [Indexed: 12/06/2024] Open
Abstract
Most genetic association studies focus on binary variants. To identify the effects of multi-allelic variation of tandem repeats (TRs) on human traits, we perform direct TR genotyping and phenome-wide association studies in 168,554 individuals from the UK Biobank, identifying 47 TRs showing fine-mapped associations with 73 traits. We replicate 23 of 31 (74%) of these associations in the All of Us cohort. While this set includes several known repeat expansion disorders, novel associations we found are attributable to common polymorphic variation in TR length rather than rare expansions and include e.g. a coding polyhistidine motif in HRCT1 influencing risk of hypertension and a poly(CGC) in the 5'UTR of GNB2 influencing heart rate. Fine-mapped TRs are strongly enriched for associations with local gene expression and DNA methylation. Our study highlights the contribution of multi-allelic TRs to the "missing heritability" of the human genome.
Collapse
Affiliation(s)
- Celine A Manigbas
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, New York, NY, USA
| | - Bharati Jadhav
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, New York, NY, USA
| | - Paras Garg
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, New York, NY, USA
| | - Mariya Shadrina
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, New York, NY, USA
| | - William Lee
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, New York, NY, USA
| | - Gabrielle Altman
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, New York, NY, USA
| | - Alejandro Martin-Trujillo
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, New York, NY, USA
| | - Andrew J Sharp
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, New York, NY, USA.
| |
Collapse
|
4
|
Tan JHJ, Li Z, Porta MG, Rajaby R, Lim WK, Tan YA, Jimenez RT, Teo R, Hebrard M, Ow JL, Ang S, Jeyakani J, Chong YS, Lim TH, Goh LL, Tham YC, Leong KP, Chin CWL, Davila S, Karnani N, Cheng CY, Chambers J, Tai ES, Liu J, Sim X, Sung WK, Prabhakar S, Tan P, Bertin N. A Catalogue of Structural Variation across Ancestrally Diverse Asian Genomes. Nat Commun 2024; 15:9507. [PMID: 39496583 PMCID: PMC11535549 DOI: 10.1038/s41467-024-53620-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 10/14/2024] [Indexed: 11/06/2024] Open
Abstract
Structural variants (SVs) are significant contributors to inter-individual genetic variation associated with traits and diseases. Current SV studies using whole-genome sequencing (WGS) have a largely Eurocentric composition, with little known about SV diversity in other ancestries, particularly from Asia. Here, we present a WGS catalogue of 73,035 SVs from 8392 Singaporeans of East Asian, Southeast Asian and South Asian ancestries, of which ~65% (47,770 SVs) are novel. We show that Asian populations can be stratified by their global SV patterns and identified 42,239 novel SVs that are specific to Asian populations. 52% of these novel SVs are restricted to one of the three major ancestry groups studied (Indian, Chinese or Malay). We uncovered SVs affecting major clinically actionable loci. Lastly, by identifying SVs in linkage disequilibrium with single-nucleotide variants, we demonstrate the utility of our SV catalogue in the fine-mapping of Asian GWAS variants and identification of potential causative variants. These results augment our knowledge of structural variation across human populations, thereby reducing current ancestry biases in global references of genetic variation afflicting equity, diversity and inclusion in genetic research.
Collapse
Affiliation(s)
- Joanna Hui Juan Tan
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Zhihui Li
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Mar Gonzalez Porta
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
- Nalagenetics, Singapore, Singapore
| | - Ramesh Rajaby
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
- Human Genome Center, University of Tokyo, Bunkyō, Japan
| | - Weng Khong Lim
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
- Duke-NUS Medical School, Singapore, Singapore
- SingHealth Duke-NUS Institute of Precision Medicine, Singapore Health Services, Duke-NUS Medical School, Singapore, Singapore
- SingHealth Duke-NUS Genomic Medicine Centre, Duke-NUS Medical School, Singapore, Singapore
| | - Ye An Tan
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore
| | - Rodrigo Toro Jimenez
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Renyi Teo
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Maxime Hebrard
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Jack Ling Ow
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Shimin Ang
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Justin Jeyakani
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Yap Seng Chong
- Department of Obstetrics & Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Institute for Human Development and Potential (IHDP), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Tock Han Lim
- NHG Eye Institute, Tan Tock Seng Hospital, National Healthcare Group, Singapore, Singapore
| | - Liuh Ling Goh
- Personalised Medicine Service, Tan Tock Seng Hospital, Singapore, Singapore
| | - Yih Chung Tham
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
- Centre for Innovation and Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Khai Pang Leong
- Personalised Medicine Service, Tan Tock Seng Hospital, Singapore, Singapore
| | - Calvin Woon Loong Chin
- Department of Cardiology, National Heart Centre Singapore, Singapore, Singapore
- Cardiovascular ACP, Duke-NUS Medical School, Singapore, Singapore
| | - Sonia Davila
- SingHealth Duke-NUS Genomic Medicine Centre, Duke-NUS Medical School, Singapore, Singapore
- SingHealth Duke-NUS Institute of Precision medicine, Singapore Health Services, Singapore, Singapore
- Cardiovascular and Metabolic Disorders Program, Duke-NUS Medical School, Singapore, Singapore
- Translational Medicine, Sidra Medicine, Ar-Rayyan, Qatar
| | - Neerja Karnani
- Human Development, Singapore Institute for Clinical Sciences, Singapore, Singapore
- Clinical Data Engagement, Bioinformatics Institute, Agency for Science, Technology and Research, Singapore, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Ching-Yu Cheng
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
- Centre for Innovation and Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - John Chambers
- Population and Global Health, Nanyang Technological University, Lee Kong Chian School of Medicine, Singapore, Singapore
- Department of Epidemiology and Biostatistics, Imperial College London, London, UK
- Precision Health Research, Singapore, Singapore
| | - E Shyong Tai
- Duke-NUS Medical School, Singapore, Singapore
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore
- Precision Health Research, Singapore, Singapore
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Jianjun Liu
- Laboratory of Human Genomics, Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Xueling Sim
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore
| | - Wing Kin Sung
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
- Hong Kong Genome Institute, Hong Kong, Hong Kong
- Department of Chemical Pathology, Chinese University of Hong Kong, Hong Kong, Hong Kong
| | - Shyam Prabhakar
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore.
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore.
| | - Patrick Tan
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore.
- Duke-NUS Medical School, Singapore, Singapore.
- SingHealth Duke-NUS Institute of Precision Medicine, Singapore Health Services, Duke-NUS Medical School, Singapore, Singapore.
- Precision Health Research, Singapore, Singapore.
| | - Nicolas Bertin
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore.
| |
Collapse
|
5
|
Dolzhenko E, English A, Dashnow H, De Sena Brandine G, Mokveld T, Rowell WJ, Karniski C, Kronenberg Z, Danzi MC, Cheung WA, Bi C, Farrow E, Wenger A, Chua KP, Martínez-Cerdeño V, Bartley TD, Jin P, Nelson DL, Zuchner S, Pastinen T, Quinlan AR, Sedlazeck FJ, Eberle MA. Characterization and visualization of tandem repeats at genome scale. Nat Biotechnol 2024; 42:1606-1614. [PMID: 38168995 DOI: 10.1038/s41587-023-02057-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 11/06/2023] [Indexed: 01/05/2024]
Abstract
Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there is still a need for computational methods to profile TRs across the genome. Here we introduce the Tandem Repeat Genotyping Tool (TRGT) and an accompanying TR database. TRGT determines the consensus sequences and methylation levels of specified TRs from PacBio HiFi sequencing data. It also reports reads that support each repeat allele. These reads can be subsequently visualized with a companion TR visualization tool. Assessing 937,122 TRs, TRGT showed a Mendelian concordance of 98.38%, allowing a single repeat unit difference. In six samples with known repeat expansions, TRGT detected all expansions while also identifying methylation signals and mosaicism and providing finer repeat length resolution than existing methods. Additionally, we released a database with allele sequences and methylation levels for 937,122 TRs across 100 genomes.
Collapse
Affiliation(s)
| | - Adam English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Harriet Dashnow
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | | | - Tom Mokveld
- Pacific Biosciences of California, Menlo Park, CA, USA
| | | | | | | | - Matt C Danzi
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Warren A Cheung
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Chengpeng Bi
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Emily Farrow
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron Wenger
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Khi Pin Chua
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Verónica Martínez-Cerdeño
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
- MIND Institute, UC Davis School of Medicine, Sacramento, CA, USA
| | - Trevor D Bartley
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
| | - Peng Jin
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - David L Nelson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Stephan Zuchner
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Tomi Pastinen
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron R Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | | |
Collapse
|
6
|
Zheng Y, Shang X. FindCSV: a long-read based method for detecting complex structural variations. BMC Bioinformatics 2024; 25:315. [PMID: 39342151 PMCID: PMC11439270 DOI: 10.1186/s12859-024-05937-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Accepted: 09/18/2024] [Indexed: 10/01/2024] Open
Abstract
BACKGROUND Structural variations play a significant role in genetic diseases and evolutionary mechanisms. Extensive research has been conducted over the past decade to detect simple structural variations, leading to the development of well-established detection methods. However, recent studies have highlighted the potentially greater impact of complex structural variations on individuals compared to simple structural variations. Despite this, the field still lacks precise detection methods specifically designed for complex structural variations. Therefore, the development of a highly efficient and accurate detection method is of utmost importance. RESULT In response to this need, we propose a novel method called FindCSV, which leverages deep learning techniques and consensus sequences to enhance the detection of SVs using long-read sequencing data. Compared to current methods, FindCSV performs better in detecting complex and simple structural variations. CONCLUSIONS FindCSV is a new method to detect complex and simple structural variations with reasonable accuracy in real and simulated data. The source code for the program is available at https://github.com/nwpuzhengyan/FindCSV .
Collapse
Affiliation(s)
- Yan Zheng
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi'an, 710072, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi'an, 710072, China.
| |
Collapse
|
7
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024; 25:460-475. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
8
|
Rajan-Babu IS, Dolzhenko E, Eberle MA, Friedman JM. Sequence composition changes in short tandem repeats: heterogeneity, detection, mechanisms and clinical implications. Nat Rev Genet 2024; 25:476-499. [PMID: 38467784 DOI: 10.1038/s41576-024-00696-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2024] [Indexed: 03/13/2024]
Abstract
Short tandem repeats (STRs) are a class of repetitive elements, composed of tandem arrays of 1-6 base pair sequence motifs, that comprise a substantial fraction of the human genome. STR expansions can cause a wide range of neurological and neuromuscular conditions, known as repeat expansion disorders, whose age of onset, severity, penetrance and/or clinical phenotype are influenced by the length of the repeats and their sequence composition. The presence of non-canonical motifs, depending on the type, frequency and position within the repeat tract, can alter clinical outcomes by modifying somatic and intergenerational repeat stability, gene expression and mutant transcript-mediated and/or protein-mediated toxicities. Here, we review the diverse structural conformations of repeat expansions, technological advances for the characterization of changes in sequence composition, their clinical correlations and the impact on disease mechanisms.
Collapse
Affiliation(s)
- Indhu-Shree Rajan-Babu
- Department of Medical Genetics, The University of British Columbia, and Children's & Women's Hospital, Vancouver, British Columbia, Canada.
| | | | | | - Jan M Friedman
- Department of Medical Genetics, The University of British Columbia, and Children's & Women's Hospital, Vancouver, British Columbia, Canada
- BC Children's Hospital Research Institute, Vancouver, British Columbia, Canada
| |
Collapse
|
9
|
Chiu R, Rajan-Babu IS, Friedman JM, Birol I. A comprehensive tandem repeat catalog of the human genome. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.06.19.24309173. [PMID: 38947075 PMCID: PMC11213036 DOI: 10.1101/2024.06.19.24309173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
With the increasing availability of long-read sequencing data, high-quality human genome assemblies, and software for fully characterizing tandem repeats, genome-wide genotyping of tandem repeat loci on a population scale becomes more feasible. Such efforts not only expand our knowledge of the tandem repeat landscape in the human genome but also enhance our ability to differentiate pathogenic tandem repeat mutations from benign polymorphisms. To this end, we analyzed 272 genomes assembled using datasets from three public initiatives that employed different long-read sequencing technologies. Here, we report a catalog of over 18 million tandem repeat loci, many of which were previously unannotated. Some of these loci are highly polymorphic, and many of them reside within coding sequences.
Collapse
Affiliation(s)
- Readman Chiu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
| | - Indhu-Shree Rajan-Babu
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| | - Jan M Friedman
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
- BC Children's Hospital Research Institute, Vancouver, BC V5Z 4H4, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| |
Collapse
|
10
|
Oketch JW, Wain LV, Hollox EJ. A comparison of software for analysis of rare and common short tandem repeat (STR) variation using human genome sequences from clinical and population-based samples. PLoS One 2024; 19:e0300545. [PMID: 38558075 PMCID: PMC10984476 DOI: 10.1371/journal.pone.0300545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 02/27/2024] [Indexed: 04/04/2024] Open
Abstract
Short tandem repeat (STR) variation is an often overlooked source of variation between genomes. STRs comprise about 3% of the human genome and are highly polymorphic. Some cause Mendelian disease, and others affect gene expression. Their contribution to common disease is not well-understood, but recent software tools designed to genotype STRs using short read sequencing data will help address this. Here, we compare software that genotypes common STRs and rarer STR expansions genome-wide, with the aim of applying them to population-scale genomes. By using the Genome-In-A-Bottle (GIAB) consortium and 1000 Genomes Project short-read sequencing data, we compare performance in terms of sequence length, depth, computing resources needed, genotyping accuracy and number of STRs genotyped. To ensure broad applicability of our findings, we also measure genotyping performance against a set of genomes from clinical samples with known STR expansions, and a set of STRs commonly used for forensic identification. We find that HipSTR, ExpansionHunter and GangSTR perform well in genotyping common STRs, including the CODIS 13 core STRs used for forensic analysis. GangSTR and ExpansionHunter outperform HipSTR for genotyping call rate and memory usage. ExpansionHunter denovo (EHdn), STRling and GangSTR outperformed STRetch for detecting expanded STRs, and EHdn and STRling used considerably less processor time compared to GangSTR. Analysis on shared genomic sequence data provided by the GIAB consortium allows future performance comparisons of new software approaches on a common set of data, facilitating comparisons and allowing researchers to choose the best software that fulfils their needs.
Collapse
Affiliation(s)
- John W. Oketch
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| | - Louise V. Wain
- Department of Population Health Sciences, University of Leicester, Leicester, United Kingdom
- National Institute for Health Research, Leicester Respiratory Biomedical Research Centre, Glenfield Hospital, Leicester, United Kingdom
| | - Edward J. Hollox
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| |
Collapse
|
11
|
Lu J, Toro C, Adams DR, Moreno CAM, Lee WP, Leung YY, Harms MB, Vardarajan B, Heinzen EL. LUSTR: a new customizable tool for calling genome-wide germline and somatic short tandem repeat variants. BMC Genomics 2024; 25:115. [PMID: 38279154 PMCID: PMC10811831 DOI: 10.1186/s12864-023-09935-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 12/21/2023] [Indexed: 01/28/2024] Open
Abstract
BACKGROUND Short tandem repeats (STRs) are widely distributed across the human genome and are associated with numerous neurological disorders. However, the extent that STRs contribute to disease is likely under-estimated because of the challenges calling these variants in short read next generation sequencing data. Several computational tools have been developed for STR variant calling, but none fully address all of the complexities associated with this variant class. RESULTS Here we introduce LUSTR which is designed to address some of the challenges associated with STR variant calling by enabling more flexibility in defining STR loci, allowing for customizable modules to tailor analyses, and expanding the capability to call somatic and multiallelic STR variants. LUSTR is a user-friendly and easily customizable tool for targeted or unbiased genome-wide STR variant screening that can use either predefined or novel genome builds. Using both simulated and real data sets, we demonstrated that LUSTR accurately infers germline and somatic STR expansions in individuals with and without diseases. CONCLUSIONS LUSTR offers a powerful and user-friendly approach that allows for the identification of STR variants and can facilitate more comprehensive studies evaluating the role of pathogenic STR variants across human diseases.
Collapse
Affiliation(s)
- Jinfeng Lu
- Division of Pharmacotherapy and Experimental Therapeutics, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- The Taub Institute for Research On Alzheimer's Disease and the Aging Brain, Gertrude H. Sergievsky Center, Department of Neurology, College of Physicians and Surgeons, Columbia University, The New York Presbyterian Hospital, New York, NY, 10032, USA.
| | - Camilo Toro
- NIH Undiagnosed Diseases Program, National Human Genome Research Institute (NHGRI), National Institutes of Health, Bethesda, MD, 20892, USA
| | - David R Adams
- NIH Undiagnosed Diseases Program, National Human Genome Research Institute (NHGRI), National Institutes of Health, Bethesda, MD, 20892, USA
| | | | - Wan-Ping Lee
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory MedicinePerelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Yuk Yee Leung
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory MedicinePerelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Mathew B Harms
- Department of Neurology, Division of Neuromuscular Medicine, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Badri Vardarajan
- The Taub Institute for Research On Alzheimer's Disease and the Aging Brain, Gertrude H. Sergievsky Center, Department of Neurology, College of Physicians and Surgeons, Columbia University, The New York Presbyterian Hospital, New York, NY, 10032, USA
| | - Erin L Heinzen
- Division of Pharmacotherapy and Experimental Therapeutics, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
12
|
Manigbas CA, Jadhav B, Garg P, Shadrina M, Lee W, Martin-Trujillo A, Sharp AJ. A phenome-wide association study of tandem repeat variation in 168,554 individuals from the UK Biobank. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.22.24301630. [PMID: 38343850 PMCID: PMC10854328 DOI: 10.1101/2024.01.22.24301630] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2024]
Abstract
Most genetic association studies focus on binary variants. To identify the effects of multi-allelic variation of tandem repeats (TRs) on human traits, we performed direct TR genotyping and phenome-wide association studies in 168,554 individuals from the UK Biobank, identifying 47 TRs showing causal associations with 73 traits. We replicated 23 of 31 (74%) of these causal associations in the All of Us cohort. While this set included several known repeat expansion disorders, novel associations we found were attributable to common polymorphic variation in TR length rather than rare expansions and include e.g. a coding polyhistidine motif in HRCT1 influencing risk of hypertension and a poly(CGC) in the 5'UTR of GNB2 influencing heart rate. Causal TRs were strongly enriched for associations with local gene expression and DNA methylation. Our study highlights the contribution of multi-allelic TRs to the "missing heritability" of the human genome.
Collapse
|
13
|
Audet S, Triassi V, Gelinas M, Legault-Cadieux N, Ferraro V, Duquette A, Tetreault M. Integration of multi-omics technologies for molecular diagnosis in ataxia patients. Front Genet 2024; 14:1304711. [PMID: 38239855 PMCID: PMC10794629 DOI: 10.3389/fgene.2023.1304711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 11/27/2023] [Indexed: 01/22/2024] Open
Abstract
Background: Episodic ataxias are rare neurological disorders characterized by recurring episodes of imbalance and coordination difficulties. Obtaining definitive molecular diagnoses poses challenges, as clinical presentation is highly heterogeneous, and literature on the underlying genetics is limited. While the advent of high-throughput sequencing technologies has significantly contributed to Mendelian disorders genetics, interpretation of variants of uncertain significance and other limitations inherent to individual methods still leaves many patients undiagnosed. This study aimed to investigate the utility of multi-omics for the identification and validation of molecular candidates in a cohort of complex cases of ataxia with episodic presentation. Methods: Eight patients lacking molecular diagnosis despite extensive clinical examination were recruited following standard genetic testing. Whole genome and RNA sequencing were performed on samples isolated from peripheral blood mononuclear cells. Integration of expression and splicing data facilitated genomic variants prioritization. Subsequently, long-read sequencing played a crucial role in the validation of those candidate variants. Results: Whole genome sequencing uncovered pathogenic variants in four genes (SPG7, ATXN2, ELOVL4, PMPCB). A missense and a nonsense variant, both previously reported as likely pathogenic, configured in trans in individual #1 (SPG7: c.2228T>C/p.I743T, c.1861C>T/p.Q621*). An ATXN2 microsatellite expansion (CAG32) in another late-onset case. In two separate individuals, intronic variants near splice sites (ELOVL4: c.541 + 5G>A; PMPCB: c.1154 + 5G>C) were predicted to induce loss-of-function splicing, but had never been reported as disease-causing. Long-read sequencing confirmed the compound heterozygous variants configuration, repeat expansion length, as well as splicing landscape for those pathogenic variants. A potential genetic modifier of the ATXN2 expansion was discovered in ZFYVE26 (c.3022C>T/p.R1008*). Conclusion: Despite failure to identify pathogenic variants through clinical genetic testing, the multi-omics approach enabled the molecular diagnosis in 50% of patients, also giving valuable insights for variant prioritization in remaining cases. The findings demonstrate the value of long-read sequencing for the validation of candidate variants in various scenarios. Our study demonstrates the effectiveness of leveraging complementary omics technologies to unravel the underlying genetics in patients with unresolved rare diseases such as ataxia. Molecular diagnoses not only hold significant promise in improving patient care management, but also alleviates the burden of diagnostic odysseys, more broadly enhancing quality of life.
Collapse
Affiliation(s)
- Sebastien Audet
- University of Montreal Hospital Research Center (CRCHUM), Montreal, QC, Canada
- Department of Neurosciences, University of Montreal, Montreal, QC, Canada
| | - Valerie Triassi
- University of Montreal Hospital Research Center (CRCHUM), Montreal, QC, Canada
| | - Myriam Gelinas
- Department of Medicine, University of Montreal Hospital Centre (CHUM), Montreal, QC, Canada
| | - Nab Legault-Cadieux
- University of Montreal Hospital Research Center (CRCHUM), Montreal, QC, Canada
- Department of Neurosciences, University of Montreal, Montreal, QC, Canada
| | - Vincent Ferraro
- Department of Medicine, University of Montreal Hospital Centre (CHUM), Montreal, QC, Canada
| | - Antoine Duquette
- University of Montreal Hospital Research Center (CRCHUM), Montreal, QC, Canada
- Department of Neurosciences, University of Montreal, Montreal, QC, Canada
- Neurology Service, Department of Medicine, André-Barbeau Movement Disorders Unit, University of Montreal Hospital (CHUM), Montreal, QC, Canada
- Genetic Service, Department of Medicine, University of Montreal Hospital (CHUM), Montreal, QC, Canada
| | - Martine Tetreault
- University of Montreal Hospital Research Center (CRCHUM), Montreal, QC, Canada
- Department of Neurosciences, University of Montreal, Montreal, QC, Canada
| |
Collapse
|
14
|
Birnbaum R. Rediscovering tandem repeat variation in schizophrenia: challenges and opportunities. Transl Psychiatry 2023; 13:402. [PMID: 38123544 PMCID: PMC10733427 DOI: 10.1038/s41398-023-02689-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 11/23/2023] [Accepted: 11/27/2023] [Indexed: 12/23/2023] Open
Abstract
Tandem repeats (TRs) are prevalent throughout the genome, constituting at least 3% of the genome, and often highly polymorphic. The high mutation rate of TRs, which can be orders of magnitude higher than single-nucleotide polymorphisms and indels, indicates that they are likely to make significant contributions to phenotypic variation, yet their contribution to schizophrenia has been largely ignored by recent genome-wide association studies (GWAS). Tandem repeat expansions are already known causative factors for over 50 disorders, while common tandem repeat variation is increasingly being identified as significantly associated with complex disease and gene regulation. The current review summarizes key background concepts of tandem repeat variation as pertains to disease risk, elucidating their potential for schizophrenia association. An overview of next-generation sequencing-based methods that may be applied for TR genome-wide identification is provided, and some key methodological challenges in TR analyses are delineated.
Collapse
Affiliation(s)
- Rebecca Birnbaum
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
15
|
Rafehi H, Bennett MF, Bahlo M. Detection and discovery of repeat expansions in ataxia enabled by next-generation sequencing: present and future. Emerg Top Life Sci 2023; 7:349-359. [PMID: 37733280 PMCID: PMC10754322 DOI: 10.1042/etls20230018] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 08/29/2023] [Accepted: 09/12/2023] [Indexed: 09/22/2023]
Abstract
Hereditary cerebellar ataxias are a heterogenous group of progressive neurological disorders that are disproportionately caused by repeat expansions (REs) of short tandem repeats (STRs). Genetic diagnosis for RE disorders such as ataxias are difficult as the current gold standard for diagnosis is repeat-primed PCR assays or Southern blots, neither of which are scalable nor readily available for all STR loci. In the last five years, significant advances have been made in our ability to detect STRs and REs in short-read sequencing data, especially whole-genome sequencing. Given the increasing reliance of genomics in diagnosis of rare diseases, the use of established RE detection pipelines for RE disorders is now a highly feasible and practical first-step alternative to molecular testing methods. In addition, many new pathogenic REs have been discovered in recent years by utilising WGS data. Collectively, genomes are an important resource/platform for further advancements in both the discovery and diagnosis of REs that cause ataxia and will lead to much needed improvement in diagnostic rates for patients with hereditary ataxia.
Collapse
Affiliation(s)
- Haloom Rafehi
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia
| | - Mark F Bennett
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia
- Epilepsy Research Centre, Department of Medicine, University of Melbourne, Austin Health, Heidelberg, VIC, Australia
| | - Melanie Bahlo
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia
| |
Collapse
|
16
|
Panoyan MA, Wendt FR. The role of tandem repeat expansions in brain disorders. Emerg Top Life Sci 2023; 7:249-263. [PMID: 37401564 DOI: 10.1042/etls20230022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 06/05/2023] [Accepted: 06/19/2023] [Indexed: 07/05/2023]
Abstract
The human genome contains numerous genetic polymorphisms contributing to different health and disease outcomes. Tandem repeat (TR) loci are highly polymorphic yet under-investigated in large genomic studies, which has prompted research efforts to identify novel variations and gain a deeper understanding of their role in human biology and disease outcomes. We summarize the current understanding of TRs and their implications for human health and disease, including an overview of the challenges encountered when conducting TR analyses and potential solutions to overcome these challenges. By shedding light on these issues, this article aims to contribute to a better understanding of the impact of TRs on the development of new disease treatments.
Collapse
Affiliation(s)
- Mary Anne Panoyan
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
| | - Frank R Wendt
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Forensic Science Program, University of Toronto, Mississauga, ON, Canada
| |
Collapse
|
17
|
Chaisson MJP, Sulovari A, Valdmanis PN, Miller DE, Eichler EE. Advances in the discovery and analyses of human tandem repeats. Emerg Top Life Sci 2023; 7:361-381. [PMID: 37905568 PMCID: PMC10806765 DOI: 10.1042/etls20230074] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/18/2023] [Accepted: 10/18/2023] [Indexed: 11/02/2023]
Abstract
Long-read sequencing platforms provide unparalleled access to the structure and composition of all classes of tandemly repeated DNA from STRs to satellite arrays. This review summarizes our current understanding of their organization within the human genome, their importance with respect to disease, as well as the advances and challenges in understanding their genetic diversity and functional effects. Novel computational methods are being developed to visualize and associate these complex patterns of human variation with disease, expression, and epigenetic differences. We predict accurate characterization of this repeat-rich form of human variation will become increasingly relevant to both basic and clinical human genetics.
Collapse
Affiliation(s)
- Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, U.S.A
- The Genomic and Epigenomic Regulation Program, USC Norris Cancer Center, University of Southern California, Los Angeles, CA 90089, U.S.A
| | - Arvis Sulovari
- Computational Biology, Cajal Neuroscience Inc, Seattle, WA 98102, U.S.A
| | - Paul N Valdmanis
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
| | - Danny E Miller
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, U.S.A
- Department of Pediatrics, University of Washington, Seattle, WA 98195, U.S.A
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, U.S.A
| |
Collapse
|
18
|
Read JL, Davies KC, Thompson GC, Delatycki MB, Lockhart PJ. Challenges facing repeat expansion identification, characterisation, and the pathway to discovery. Emerg Top Life Sci 2023; 7:339-348. [PMID: 37888797 PMCID: PMC10754332 DOI: 10.1042/etls20230019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 10/06/2023] [Accepted: 10/12/2023] [Indexed: 10/28/2023]
Abstract
Tandem repeat DNA sequences constitute a significant proportion of the human genome. While previously considered to be functionally inert, these sequences are now broadly accepted as important contributors to genetic diversity. However, the polymorphic nature of these sequences can lead to expansion beyond a gene-specific threshold, causing disease. More than 50 pathogenic repeat expansions have been identified to date, many of which have been discovered in the last decade as a result of advances in sequencing technologies and associated bioinformatic tools. Commonly utilised diagnostic platforms including Sanger sequencing, capillary array electrophoresis, and Southern blot are generally low throughput and are often unable to accurately determine repeat size, composition, and epigenetic signature, which are important when characterising repeat expansions. The rapid advances in bioinformatic tools designed specifically to interrogate short-read sequencing and the development of long-read single molecule sequencing is enabling a new generation of high throughput testing for repeat expansion disorders. In this review, we discuss some of the challenges surrounding the identification and characterisation of disease-causing repeat expansions and the technological advances that are poised to translate the promise of genomic medicine to individuals and families affected by these disorders.
Collapse
Affiliation(s)
- Justin L Read
- Bruce Lefroy Centre, Murdoch Children's Research Institute, Parkville, Victoria, Australia
- Department of Paediatrics, University of Melbourne, Royal Children's Hospital, Parkville, Victoria, Australia
| | - Kayli C Davies
- Bruce Lefroy Centre, Murdoch Children's Research Institute, Parkville, Victoria, Australia
- Department of Paediatrics, University of Melbourne, Royal Children's Hospital, Parkville, Victoria, Australia
| | - Genevieve C Thompson
- Bruce Lefroy Centre, Murdoch Children's Research Institute, Parkville, Victoria, Australia
- Department of Paediatrics, University of Melbourne, Royal Children's Hospital, Parkville, Victoria, Australia
| | - Martin B Delatycki
- Bruce Lefroy Centre, Murdoch Children's Research Institute, Parkville, Victoria, Australia
- Department of Paediatrics, University of Melbourne, Royal Children's Hospital, Parkville, Victoria, Australia
- Victorian Clinical Genetics Services, Parkville, Victoria, Australia
| | - Paul J Lockhart
- Bruce Lefroy Centre, Murdoch Children's Research Institute, Parkville, Victoria, Australia
- Department of Paediatrics, University of Melbourne, Royal Children's Hospital, Parkville, Victoria, Australia
| |
Collapse
|
19
|
Margoliash J, Fuchs S, Li Y, Zhang X, Massarat A, Goren A, Gymrek M. Polymorphic short tandem repeats make widespread contributions to blood and serum traits. CELL GENOMICS 2023; 3:100458. [PMID: 38116119 PMCID: PMC10726533 DOI: 10.1016/j.xgen.2023.100458] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Revised: 09/09/2023] [Accepted: 11/07/2023] [Indexed: 12/21/2023]
Abstract
Short tandem repeats (STRs) are genomic regions consisting of repeated sequences of 1-6 bp in succession. Single-nucleotide polymorphism (SNP)-based genome-wide association studies (GWASs) do not fully capture STR effects. To study these effects, we imputed 445,720 STRs into genotype arrays from 408,153 White British UK Biobank participants and tested for association with 44 blood phenotypes. Using two fine-mapping methods, we identify 119 candidate causal STR-trait associations and estimate that STRs account for 5.2%-7.6% of causal variants identifiable from GWASs for these traits. These are among the strongest associations for multiple phenotypes, including a coding CTG repeat associated with apolipoprotein B levels, a promoter CGG repeat with platelet traits, and an intronic poly(A) repeat with mean platelet volume. Our study suggests that STRs make widespread contributions to complex traits, provides stringently selected candidate causal STRs, and demonstrates the need to consider a more complete view of genetic variation in GWASs.
Collapse
Affiliation(s)
- Jonathan Margoliash
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Shai Fuchs
- Pediatric Endocrine and Diabetes Unit, Edmond and Lily Safra Children's Hospital, Sheba Medical Center, Ramat Gan, Israel
| | - Yang Li
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA; Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Xuan Zhang
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Arya Massarat
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Alon Goren
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA.
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA; Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
20
|
Hannan AJ. Expanding horizons of tandem repeats in biology and medicine: Why 'genomic dark matter' matters. Emerg Top Life Sci 2023; 7:ETLS20230075. [PMID: 38088823 PMCID: PMC10754335 DOI: 10.1042/etls20230075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 11/27/2023] [Accepted: 11/27/2023] [Indexed: 12/30/2023]
Abstract
Approximately half of the human genome includes repetitive sequences, and these DNA sequences (as well as their transcribed repetitive RNA and translated amino-acid repeat sequences) are known as the repeatome. Within this repeatome there are a couple of million tandem repeats, dispersed throughout the genome. These tandem repeats have been estimated to constitute ∼8% of the entire human genome. These tandem repeats can be located throughout exons, introns and intergenic regions, thus potentially affecting the structure and function of tandemly repetitive DNA, RNA and protein sequences. Over more than three decades, more than 60 monogenic human disorders have been found to be caused by tandem-repeat mutations. These monogenic tandem-repeat disorders include Huntington's disease, a variety of ataxias, amyotrophic lateral sclerosis and frontotemporal dementia, as well as many other neurodegenerative diseases. Furthermore, tandem-repeat disorders can include fragile X syndrome, related fragile X disorders, as well as other neurological and psychiatric disorders. However, these monogenic tandem-repeat disorders, which were discovered via their dominant or recessive modes of inheritance, may represent the 'tip of the iceberg' with respect to tandem-repeat contributions to human disorders. A previous proposal that tandem repeats may contribute to the 'missing heritability' of various common polygenic human disorders has recently been supported by a variety of new evidence. This includes genome-wide studies that associate tandem-repeat mutations with autism, schizophrenia, Parkinson's disease and various types of cancers. In this article, I will discuss how tandem-repeat mutations and polymorphisms could contribute to a wide range of common disorders, along with some of the many major challenges of tandem-repeat biology and medicine. Finally, I will discuss the potential of tandem repeats to be therapeutically targeted, so as to prevent and treat an expanding range of human disorders.
Collapse
Affiliation(s)
- Anthony J Hannan
- Florey Institute of Neuroscience and Mental Health, University of Melbourne, Parkville, Victoria 3010, Australia
- Department of Anatomy and Physiology, University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
21
|
Ziaei Jam H, Li Y, DeVito R, Mousavi N, Ma N, Lujumba I, Adam Y, Maksimov M, Huang B, Dolzhenko E, Qiu Y, Kakembo FE, Joseph H, Onyido B, Adeyemi J, Bakhtiari M, Park J, Javadzadeh S, Jjingo D, Adebiyi E, Bafna V, Gymrek M. A deep population reference panel of tandem repeat variation. Nat Commun 2023; 14:6711. [PMID: 37872149 PMCID: PMC10593948 DOI: 10.1038/s41467-023-42278-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 10/05/2023] [Indexed: 10/25/2023] Open
Abstract
Tandem repeats (TRs) represent one of the largest sources of genetic variation in humans and are implicated in a range of phenotypes. Here we present a deep characterization of TR variation based on high coverage whole genome sequencing from 3550 diverse individuals from the 1000 Genomes Project and H3Africa cohorts. We develop a method, EnsembleTR, to integrate genotypes from four separate methods resulting in high-quality genotypes at more than 1.7 million TR loci. Our catalog reveals novel sequence features influencing TR heterozygosity, identifies population-specific trinucleotide expansions, and finds hundreds of novel eQTL signals. Finally, we generate a phased haplotype panel which can be used to impute most TRs from nearby single nucleotide polymorphisms (SNPs) with high accuracy. Overall, the TR genotypes and reference haplotype panel generated here will serve as valuable resources for future genome-wide and population-wide studies of TRs and their role in human phenotypes.
Collapse
Affiliation(s)
- Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Yang Li
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Ross DeVito
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Nima Mousavi
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
| | - Nichole Ma
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Ibra Lujumba
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
| | - Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Mikhail Maksimov
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Bonnie Huang
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | | | - Yunjiang Qiu
- Illumina Incorporated, San Diego, CA, 92122, USA
| | - Fredrick Elishama Kakembo
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
| | - Habi Joseph
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
| | - Blessing Onyido
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Jumoke Adeyemi
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Mehrdad Bakhtiari
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Sara Javadzadeh
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Daudi Jjingo
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
- Department of Computer Science, Makerere University, Kampala, Uganda
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, Baden-Württemberg, 69120, Germany
| | - Vineet Bafna
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
22
|
Ichikawa K, Kawahara R, Asano T, Morishita S. A landscape of complex tandem repeats within individual human genomes. Nat Commun 2023; 14:5530. [PMID: 37709751 PMCID: PMC10502081 DOI: 10.1038/s41467-023-41262-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 08/28/2023] [Indexed: 09/16/2023] Open
Abstract
Markedly expanded tandem repeats (TRs) have been correlated with ~60 diseases. TR diversity has been considered a clue toward understanding missing heritability. However, haplotype-resolved long TRs remain mostly hidden or blacked out because their complex structures (TRs composed of various units and minisatellites containing >10-bp units) make them difficult to determine accurately with existing methods. Here, using a high-precision algorithm to determine complex TR structures from long, accurate reads of PacBio HiFi, an investigation of 270 Japanese control samples yields several genome-wide findings. Approximately 322,000 TRs are difficult to impute from the surrounding single-nucleotide variants. Greater genetic divergence of TR loci is significantly correlated with more events of younger replication slippage. Complex TRs are more abundant than single-unit TRs, and a tendency for complex TRs to consist of <10-bp units and single-unit TRs to be minisatellites is statistically significant at loci with ≥500-bp TRs. Of note, 8909 loci with extended TRs (>100b longer than the mode) contain several known disease-associated TRs and are considered candidates for association with disorders. Overall, complex TRs and minisatellites are found to be abundant and diverse, even in genetically small Japanese populations, yielding insights into the landscape of long TRs.
Collapse
Affiliation(s)
- Kazuki Ichikawa
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan
| | - Riki Kawahara
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan
| | - Takeshi Asano
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan
| | - Shinichi Morishita
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan.
| |
Collapse
|
23
|
Schuermans N, Verdin H, Ghijsels J, Hellemans M, Debackere E, Bogaert E, Symoens S, Naesens L, Lecomte E, Crosiers D, Bergmans B, Verhoeven K, Poppe B, Laureys G, Herdewyn S, Van Langenhove T, Santens P, De Bleecker JL, Hemelsoet D, Dermaut B. Exome Sequencing and Multigene Panel Testing in 1,411 Patients With Adult-Onset Neurologic Disorders. Neurol Genet 2023; 9:e200071. [PMID: 37152446 PMCID: PMC10160959 DOI: 10.1212/nxg.0000000000200071] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Accepted: 02/21/2023] [Indexed: 05/09/2023]
Abstract
Background and Objectives Owing to their extensive clinical and molecular heterogeneity, hereditary neurologic diseases in adults are difficult to diagnose. The current knowledge about the diagnostic yield and clinical utility of exome sequencing (ES) for neurologic diseases in adults is limited. This observational study assesses the diagnostic value of ES and multigene panel analysis in adult-onset neurologic disorders. Methods From January 2019 through April 2022, ES-based multigene panel testing was conducted in 1,411 patients with molecularly unexplained neurologic phenotypes at the Ghent University Hospital. Gene panels were developed for ataxia and spasticity, leukoencephalopathy, movement disorders, paroxysmal episodic disorders, neurodegeneration with brain iron accumulation, progressive myoclonic epilepsy, and amyotrophic lateral sclerosis. Single nucleotide variants, small indels, and copy number variants were analyzed. Across all panels, our analysis covered a total of 725 genes associated with Mendelian inheritance. Results A molecular diagnosis was established in 10% of the cases (144 of 1,411) representing 71 different monogenic disorders. The diagnostic yield depended significantly on the presenting phenotype with the highest yield seen in patients with ataxia or spastic paraparesis (19%). Most of the established diagnoses comprised disorders with an autosomal dominant inheritance (62%), and the most frequently mutated genes were NOTCH3 (13 patients), SPG7 (11 patients), and RFC1 (8 patients). 34% of the disease-causing variants were novel, including a unique likely pathogenic variant in APP (Ghent mutation, p.[Asn698Asp]) in a family presenting with stroke and severe cerebral white matter disease. 7% of the pathogenic variants comprised copy number variants detected in the ES data and confirmed by an independent technique. Discussion ES and multigene panel testing is a powerful and efficient tool to diagnose patients with unexplained, adult-onset neurologic disorders.
Collapse
Affiliation(s)
- Nika Schuermans
- Center for Medical Genetics (N.S., H.V., J.G., E.D., E.B., S.S., B.P., B.D.), Ghent University Hospital; Department of Biomolecular Medicine (N.S., H.V., J.G., M.H., E.D., E.B., S.S., B.P., B.D.), Faculty of Medicine and Health Sciences, Ghent University; Department of Internal Medicine and Pediatrics (L.N.), Ghent University; Primary Immunodeficiency Research Lab (L.N.), Jeffrey Modell Diagnosis and Research Center, Ghent University Hospital; Department of Neurology (E.L.), O.L.V. Lourdes Hospital, Waregem; Department of Neurology (D.C.), Antwerp University Hospital UZA; Translational Neurosciences (D.C.), Faculty of Medicine and Health Sciences, University of Antwerp; Department of Neurology (B.B., K.V.), AZ Sint-Jan, Bruges; and Department of Neurology (B.B., G.L., S.H., T.V.L., P.S., J.L.D.B., D.H.), Ghent University Hospital, Belgium
| | - Hannah Verdin
- Center for Medical Genetics (N.S., H.V., J.G., E.D., E.B., S.S., B.P., B.D.), Ghent University Hospital; Department of Biomolecular Medicine (N.S., H.V., J.G., M.H., E.D., E.B., S.S., B.P., B.D.), Faculty of Medicine and Health Sciences, Ghent University; Department of Internal Medicine and Pediatrics (L.N.), Ghent University; Primary Immunodeficiency Research Lab (L.N.), Jeffrey Modell Diagnosis and Research Center, Ghent University Hospital; Department of Neurology (E.L.), O.L.V. Lourdes Hospital, Waregem; Department of Neurology (D.C.), Antwerp University Hospital UZA; Translational Neurosciences (D.C.), Faculty of Medicine and Health Sciences, University of Antwerp; Department of Neurology (B.B., K.V.), AZ Sint-Jan, Bruges; and Department of Neurology (B.B., G.L., S.H., T.V.L., P.S., J.L.D.B., D.H.), Ghent University Hospital, Belgium
| | - Jody Ghijsels
- Center for Medical Genetics (N.S., H.V., J.G., E.D., E.B., S.S., B.P., B.D.), Ghent University Hospital; Department of Biomolecular Medicine (N.S., H.V., J.G., M.H., E.D., E.B., S.S., B.P., B.D.), Faculty of Medicine and Health Sciences, Ghent University; Department of Internal Medicine and Pediatrics (L.N.), Ghent University; Primary Immunodeficiency Research Lab (L.N.), Jeffrey Modell Diagnosis and Research Center, Ghent University Hospital; Department of Neurology (E.L.), O.L.V. Lourdes Hospital, Waregem; Department of Neurology (D.C.), Antwerp University Hospital UZA; Translational Neurosciences (D.C.), Faculty of Medicine and Health Sciences, University of Antwerp; Department of Neurology (B.B., K.V.), AZ Sint-Jan, Bruges; and Department of Neurology (B.B., G.L., S.H., T.V.L., P.S., J.L.D.B., D.H.), Ghent University Hospital, Belgium
| | - Madeleine Hellemans
- Center for Medical Genetics (N.S., H.V., J.G., E.D., E.B., S.S., B.P., B.D.), Ghent University Hospital; Department of Biomolecular Medicine (N.S., H.V., J.G., M.H., E.D., E.B., S.S., B.P., B.D.), Faculty of Medicine and Health Sciences, Ghent University; Department of Internal Medicine and Pediatrics (L.N.), Ghent University; Primary Immunodeficiency Research Lab (L.N.), Jeffrey Modell Diagnosis and Research Center, Ghent University Hospital; Department of Neurology (E.L.), O.L.V. Lourdes Hospital, Waregem; Department of Neurology (D.C.), Antwerp University Hospital UZA; Translational Neurosciences (D.C.), Faculty of Medicine and Health Sciences, University of Antwerp; Department of Neurology (B.B., K.V.), AZ Sint-Jan, Bruges; and Department of Neurology (B.B., G.L., S.H., T.V.L., P.S., J.L.D.B., D.H.), Ghent University Hospital, Belgium
| | - Elke Debackere
- Center for Medical Genetics (N.S., H.V., J.G., E.D., E.B., S.S., B.P., B.D.), Ghent University Hospital; Department of Biomolecular Medicine (N.S., H.V., J.G., M.H., E.D., E.B., S.S., B.P., B.D.), Faculty of Medicine and Health Sciences, Ghent University; Department of Internal Medicine and Pediatrics (L.N.), Ghent University; Primary Immunodeficiency Research Lab (L.N.), Jeffrey Modell Diagnosis and Research Center, Ghent University Hospital; Department of Neurology (E.L.), O.L.V. Lourdes Hospital, Waregem; Department of Neurology (D.C.), Antwerp University Hospital UZA; Translational Neurosciences (D.C.), Faculty of Medicine and Health Sciences, University of Antwerp; Department of Neurology (B.B., K.V.), AZ Sint-Jan, Bruges; and Department of Neurology (B.B., G.L., S.H., T.V.L., P.S., J.L.D.B., D.H.), Ghent University Hospital, Belgium
| | - Elke Bogaert
- Center for Medical Genetics (N.S., H.V., J.G., E.D., E.B., S.S., B.P., B.D.), Ghent University Hospital; Department of Biomolecular Medicine (N.S., H.V., J.G., M.H., E.D., E.B., S.S., B.P., B.D.), Faculty of Medicine and Health Sciences, Ghent University; Department of Internal Medicine and Pediatrics (L.N.), Ghent University; Primary Immunodeficiency Research Lab (L.N.), Jeffrey Modell Diagnosis and Research Center, Ghent University Hospital; Department of Neurology (E.L.), O.L.V. Lourdes Hospital, Waregem; Department of Neurology (D.C.), Antwerp University Hospital UZA; Translational Neurosciences (D.C.), Faculty of Medicine and Health Sciences, University of Antwerp; Department of Neurology (B.B., K.V.), AZ Sint-Jan, Bruges; and Department of Neurology (B.B., G.L., S.H., T.V.L., P.S., J.L.D.B., D.H.), Ghent University Hospital, Belgium
| | - Sofie Symoens
- Center for Medical Genetics (N.S., H.V., J.G., E.D., E.B., S.S., B.P., B.D.), Ghent University Hospital; Department of Biomolecular Medicine (N.S., H.V., J.G., M.H., E.D., E.B., S.S., B.P., B.D.), Faculty of Medicine and Health Sciences, Ghent University; Department of Internal Medicine and Pediatrics (L.N.), Ghent University; Primary Immunodeficiency Research Lab (L.N.), Jeffrey Modell Diagnosis and Research Center, Ghent University Hospital; Department of Neurology (E.L.), O.L.V. Lourdes Hospital, Waregem; Department of Neurology (D.C.), Antwerp University Hospital UZA; Translational Neurosciences (D.C.), Faculty of Medicine and Health Sciences, University of Antwerp; Department of Neurology (B.B., K.V.), AZ Sint-Jan, Bruges; and Department of Neurology (B.B., G.L., S.H., T.V.L., P.S., J.L.D.B., D.H.), Ghent University Hospital, Belgium
| | - Leslie Naesens
- Center for Medical Genetics (N.S., H.V., J.G., E.D., E.B., S.S., B.P., B.D.), Ghent University Hospital; Department of Biomolecular Medicine (N.S., H.V., J.G., M.H., E.D., E.B., S.S., B.P., B.D.), Faculty of Medicine and Health Sciences, Ghent University; Department of Internal Medicine and Pediatrics (L.N.), Ghent University; Primary Immunodeficiency Research Lab (L.N.), Jeffrey Modell Diagnosis and Research Center, Ghent University Hospital; Department of Neurology (E.L.), O.L.V. Lourdes Hospital, Waregem; Department of Neurology (D.C.), Antwerp University Hospital UZA; Translational Neurosciences (D.C.), Faculty of Medicine and Health Sciences, University of Antwerp; Department of Neurology (B.B., K.V.), AZ Sint-Jan, Bruges; and Department of Neurology (B.B., G.L., S.H., T.V.L., P.S., J.L.D.B., D.H.), Ghent University Hospital, Belgium
| | - Elien Lecomte
- Center for Medical Genetics (N.S., H.V., J.G., E.D., E.B., S.S., B.P., B.D.), Ghent University Hospital; Department of Biomolecular Medicine (N.S., H.V., J.G., M.H., E.D., E.B., S.S., B.P., B.D.), Faculty of Medicine and Health Sciences, Ghent University; Department of Internal Medicine and Pediatrics (L.N.), Ghent University; Primary Immunodeficiency Research Lab (L.N.), Jeffrey Modell Diagnosis and Research Center, Ghent University Hospital; Department of Neurology (E.L.), O.L.V. Lourdes Hospital, Waregem; Department of Neurology (D.C.), Antwerp University Hospital UZA; Translational Neurosciences (D.C.), Faculty of Medicine and Health Sciences, University of Antwerp; Department of Neurology (B.B., K.V.), AZ Sint-Jan, Bruges; and Department of Neurology (B.B., G.L., S.H., T.V.L., P.S., J.L.D.B., D.H.), Ghent University Hospital, Belgium
| | - David Crosiers
- Center for Medical Genetics (N.S., H.V., J.G., E.D., E.B., S.S., B.P., B.D.), Ghent University Hospital; Department of Biomolecular Medicine (N.S., H.V., J.G., M.H., E.D., E.B., S.S., B.P., B.D.), Faculty of Medicine and Health Sciences, Ghent University; Department of Internal Medicine and Pediatrics (L.N.), Ghent University; Primary Immunodeficiency Research Lab (L.N.), Jeffrey Modell Diagnosis and Research Center, Ghent University Hospital; Department of Neurology (E.L.), O.L.V. Lourdes Hospital, Waregem; Department of Neurology (D.C.), Antwerp University Hospital UZA; Translational Neurosciences (D.C.), Faculty of Medicine and Health Sciences, University of Antwerp; Department of Neurology (B.B., K.V.), AZ Sint-Jan, Bruges; and Department of Neurology (B.B., G.L., S.H., T.V.L., P.S., J.L.D.B., D.H.), Ghent University Hospital, Belgium
| | - Bruno Bergmans
- Center for Medical Genetics (N.S., H.V., J.G., E.D., E.B., S.S., B.P., B.D.), Ghent University Hospital; Department of Biomolecular Medicine (N.S., H.V., J.G., M.H., E.D., E.B., S.S., B.P., B.D.), Faculty of Medicine and Health Sciences, Ghent University; Department of Internal Medicine and Pediatrics (L.N.), Ghent University; Primary Immunodeficiency Research Lab (L.N.), Jeffrey Modell Diagnosis and Research Center, Ghent University Hospital; Department of Neurology (E.L.), O.L.V. Lourdes Hospital, Waregem; Department of Neurology (D.C.), Antwerp University Hospital UZA; Translational Neurosciences (D.C.), Faculty of Medicine and Health Sciences, University of Antwerp; Department of Neurology (B.B., K.V.), AZ Sint-Jan, Bruges; and Department of Neurology (B.B., G.L., S.H., T.V.L., P.S., J.L.D.B., D.H.), Ghent University Hospital, Belgium
| | - Kristof Verhoeven
- Center for Medical Genetics (N.S., H.V., J.G., E.D., E.B., S.S., B.P., B.D.), Ghent University Hospital; Department of Biomolecular Medicine (N.S., H.V., J.G., M.H., E.D., E.B., S.S., B.P., B.D.), Faculty of Medicine and Health Sciences, Ghent University; Department of Internal Medicine and Pediatrics (L.N.), Ghent University; Primary Immunodeficiency Research Lab (L.N.), Jeffrey Modell Diagnosis and Research Center, Ghent University Hospital; Department of Neurology (E.L.), O.L.V. Lourdes Hospital, Waregem; Department of Neurology (D.C.), Antwerp University Hospital UZA; Translational Neurosciences (D.C.), Faculty of Medicine and Health Sciences, University of Antwerp; Department of Neurology (B.B., K.V.), AZ Sint-Jan, Bruges; and Department of Neurology (B.B., G.L., S.H., T.V.L., P.S., J.L.D.B., D.H.), Ghent University Hospital, Belgium
| | - Bruce Poppe
- Center for Medical Genetics (N.S., H.V., J.G., E.D., E.B., S.S., B.P., B.D.), Ghent University Hospital; Department of Biomolecular Medicine (N.S., H.V., J.G., M.H., E.D., E.B., S.S., B.P., B.D.), Faculty of Medicine and Health Sciences, Ghent University; Department of Internal Medicine and Pediatrics (L.N.), Ghent University; Primary Immunodeficiency Research Lab (L.N.), Jeffrey Modell Diagnosis and Research Center, Ghent University Hospital; Department of Neurology (E.L.), O.L.V. Lourdes Hospital, Waregem; Department of Neurology (D.C.), Antwerp University Hospital UZA; Translational Neurosciences (D.C.), Faculty of Medicine and Health Sciences, University of Antwerp; Department of Neurology (B.B., K.V.), AZ Sint-Jan, Bruges; and Department of Neurology (B.B., G.L., S.H., T.V.L., P.S., J.L.D.B., D.H.), Ghent University Hospital, Belgium
| | - Guy Laureys
- Center for Medical Genetics (N.S., H.V., J.G., E.D., E.B., S.S., B.P., B.D.), Ghent University Hospital; Department of Biomolecular Medicine (N.S., H.V., J.G., M.H., E.D., E.B., S.S., B.P., B.D.), Faculty of Medicine and Health Sciences, Ghent University; Department of Internal Medicine and Pediatrics (L.N.), Ghent University; Primary Immunodeficiency Research Lab (L.N.), Jeffrey Modell Diagnosis and Research Center, Ghent University Hospital; Department of Neurology (E.L.), O.L.V. Lourdes Hospital, Waregem; Department of Neurology (D.C.), Antwerp University Hospital UZA; Translational Neurosciences (D.C.), Faculty of Medicine and Health Sciences, University of Antwerp; Department of Neurology (B.B., K.V.), AZ Sint-Jan, Bruges; and Department of Neurology (B.B., G.L., S.H., T.V.L., P.S., J.L.D.B., D.H.), Ghent University Hospital, Belgium
| | - Sarah Herdewyn
- Center for Medical Genetics (N.S., H.V., J.G., E.D., E.B., S.S., B.P., B.D.), Ghent University Hospital; Department of Biomolecular Medicine (N.S., H.V., J.G., M.H., E.D., E.B., S.S., B.P., B.D.), Faculty of Medicine and Health Sciences, Ghent University; Department of Internal Medicine and Pediatrics (L.N.), Ghent University; Primary Immunodeficiency Research Lab (L.N.), Jeffrey Modell Diagnosis and Research Center, Ghent University Hospital; Department of Neurology (E.L.), O.L.V. Lourdes Hospital, Waregem; Department of Neurology (D.C.), Antwerp University Hospital UZA; Translational Neurosciences (D.C.), Faculty of Medicine and Health Sciences, University of Antwerp; Department of Neurology (B.B., K.V.), AZ Sint-Jan, Bruges; and Department of Neurology (B.B., G.L., S.H., T.V.L., P.S., J.L.D.B., D.H.), Ghent University Hospital, Belgium
| | - Tim Van Langenhove
- Center for Medical Genetics (N.S., H.V., J.G., E.D., E.B., S.S., B.P., B.D.), Ghent University Hospital; Department of Biomolecular Medicine (N.S., H.V., J.G., M.H., E.D., E.B., S.S., B.P., B.D.), Faculty of Medicine and Health Sciences, Ghent University; Department of Internal Medicine and Pediatrics (L.N.), Ghent University; Primary Immunodeficiency Research Lab (L.N.), Jeffrey Modell Diagnosis and Research Center, Ghent University Hospital; Department of Neurology (E.L.), O.L.V. Lourdes Hospital, Waregem; Department of Neurology (D.C.), Antwerp University Hospital UZA; Translational Neurosciences (D.C.), Faculty of Medicine and Health Sciences, University of Antwerp; Department of Neurology (B.B., K.V.), AZ Sint-Jan, Bruges; and Department of Neurology (B.B., G.L., S.H., T.V.L., P.S., J.L.D.B., D.H.), Ghent University Hospital, Belgium
| | - Patrick Santens
- Center for Medical Genetics (N.S., H.V., J.G., E.D., E.B., S.S., B.P., B.D.), Ghent University Hospital; Department of Biomolecular Medicine (N.S., H.V., J.G., M.H., E.D., E.B., S.S., B.P., B.D.), Faculty of Medicine and Health Sciences, Ghent University; Department of Internal Medicine and Pediatrics (L.N.), Ghent University; Primary Immunodeficiency Research Lab (L.N.), Jeffrey Modell Diagnosis and Research Center, Ghent University Hospital; Department of Neurology (E.L.), O.L.V. Lourdes Hospital, Waregem; Department of Neurology (D.C.), Antwerp University Hospital UZA; Translational Neurosciences (D.C.), Faculty of Medicine and Health Sciences, University of Antwerp; Department of Neurology (B.B., K.V.), AZ Sint-Jan, Bruges; and Department of Neurology (B.B., G.L., S.H., T.V.L., P.S., J.L.D.B., D.H.), Ghent University Hospital, Belgium
| | - Jan L De Bleecker
- Center for Medical Genetics (N.S., H.V., J.G., E.D., E.B., S.S., B.P., B.D.), Ghent University Hospital; Department of Biomolecular Medicine (N.S., H.V., J.G., M.H., E.D., E.B., S.S., B.P., B.D.), Faculty of Medicine and Health Sciences, Ghent University; Department of Internal Medicine and Pediatrics (L.N.), Ghent University; Primary Immunodeficiency Research Lab (L.N.), Jeffrey Modell Diagnosis and Research Center, Ghent University Hospital; Department of Neurology (E.L.), O.L.V. Lourdes Hospital, Waregem; Department of Neurology (D.C.), Antwerp University Hospital UZA; Translational Neurosciences (D.C.), Faculty of Medicine and Health Sciences, University of Antwerp; Department of Neurology (B.B., K.V.), AZ Sint-Jan, Bruges; and Department of Neurology (B.B., G.L., S.H., T.V.L., P.S., J.L.D.B., D.H.), Ghent University Hospital, Belgium
| | - Dimitri Hemelsoet
- Center for Medical Genetics (N.S., H.V., J.G., E.D., E.B., S.S., B.P., B.D.), Ghent University Hospital; Department of Biomolecular Medicine (N.S., H.V., J.G., M.H., E.D., E.B., S.S., B.P., B.D.), Faculty of Medicine and Health Sciences, Ghent University; Department of Internal Medicine and Pediatrics (L.N.), Ghent University; Primary Immunodeficiency Research Lab (L.N.), Jeffrey Modell Diagnosis and Research Center, Ghent University Hospital; Department of Neurology (E.L.), O.L.V. Lourdes Hospital, Waregem; Department of Neurology (D.C.), Antwerp University Hospital UZA; Translational Neurosciences (D.C.), Faculty of Medicine and Health Sciences, University of Antwerp; Department of Neurology (B.B., K.V.), AZ Sint-Jan, Bruges; and Department of Neurology (B.B., G.L., S.H., T.V.L., P.S., J.L.D.B., D.H.), Ghent University Hospital, Belgium
| | - Bart Dermaut
- Center for Medical Genetics (N.S., H.V., J.G., E.D., E.B., S.S., B.P., B.D.), Ghent University Hospital; Department of Biomolecular Medicine (N.S., H.V., J.G., M.H., E.D., E.B., S.S., B.P., B.D.), Faculty of Medicine and Health Sciences, Ghent University; Department of Internal Medicine and Pediatrics (L.N.), Ghent University; Primary Immunodeficiency Research Lab (L.N.), Jeffrey Modell Diagnosis and Research Center, Ghent University Hospital; Department of Neurology (E.L.), O.L.V. Lourdes Hospital, Waregem; Department of Neurology (D.C.), Antwerp University Hospital UZA; Translational Neurosciences (D.C.), Faculty of Medicine and Health Sciences, University of Antwerp; Department of Neurology (B.B., K.V.), AZ Sint-Jan, Bruges; and Department of Neurology (B.B., G.L., S.H., T.V.L., P.S., J.L.D.B., D.H.), Ghent University Hospital, Belgium
| |
Collapse
|
24
|
Mantela M, Lambropoulos K, Simserides C. Charge transport properties of ideal and natural DNA segments, as mutation detectors. Phys Chem Chem Phys 2023; 25:7750-7762. [PMID: 36857625 DOI: 10.1039/d3cp00268c] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
DNA sequences of ideal and natural geometries are examined, studying their charge transport properties as mutation detectors. Ideal means textbook geometry. Natural means naturally distorted sequences; geometry taken from available databases. A tight-binding (TB) wire model at the base-pair level is recruited, together with a transfer matrix technique. The relevant TB parameters are obtained using a linear combination of all valence orbitals of all atoms, using geometry, either ideal or natural, as the only input. The investigated DNA sequences contain: (i) point substitution mutations - specifically, the transitions guanine (G) ↔ adenine (A) - and (ii) sequences extracted from human chromosomes, modified by expanding the cytosine-adenine-guanine triplet [(CAG)n repeats] to mimic the following diseases: (a) Huntington's disease, (b) Kennedy's disease, (c) Spinocerebellar ataxia 6, (d) Spinocerebellar ataxia 7. Quantities such as eigenspectra, density of states, transmission coefficients, and the - more experimentally relevant - current-voltage (I-V) curves are studied, intending to find adequate features to recognize mutations. To this end, the normalised deviation of the I-V curve from the origin (NDIV) is also defined. The features of the NDIV seem to provide a clearer picture, being sensitive to the number of point mutations and allowing to characterise the degree of danger of developing the aforementioned diseases.
Collapse
Affiliation(s)
- Marilena Mantela
- Department of Physics, National and Kapodistrian University of Athens, Panepistimiopolis, Zografos, GR-15784 Athens, Greece.
| | - Konstantinos Lambropoulos
- Department of Physics, National and Kapodistrian University of Athens, Panepistimiopolis, Zografos, GR-15784 Athens, Greece.
| | - Constantinos Simserides
- Department of Physics, National and Kapodistrian University of Athens, Panepistimiopolis, Zografos, GR-15784 Athens, Greece.
| |
Collapse
|
25
|
Jam HZ, Li Y, DeVito R, Mousavi N, Ma N, Lujumba I, Adam Y, Maksimov M, Huang B, Dolzhenko E, Qiu Y, Kakembo FE, Joseph H, Onyido B, Adeyemi J, Bakhtiari M, Park J, Javadzadeh S, Jjingo D, Adebiyi E, Bafna V, Gymrek M. A deep population reference panel of tandem repeat variation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.09.531600. [PMID: 36945429 PMCID: PMC10028971 DOI: 10.1101/2023.03.09.531600] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
Tandem repeats (TRs) represent one of the largest sources of genetic variation in humans and are implicated in a range of phenotypes. Here we present a deep characterization of TR variation based on high coverage whole genome sequencing from 3,550 diverse individuals from the 1000 Genomes Project and H3Africa cohorts. We develop a method, EnsembleTR, to integrate genotypes from four separate methods resulting in high-quality genotypes at more than 1.7 million TR loci. Our catalog reveals novel sequence features influencing TR heterozygosity, identifies population-specific trinucleotide expansions, and finds hundreds of novel eQTL signals. Finally, we generate a phased haplotype panel which can be used to impute most TRs from nearby single nucleotide polymorphisms (SNPs) with high accuracy. Overall, the TR genotypes and reference haplotype panel generated here will serve as valuable resources for future genome-wide and population-wide studies of TRs and their role in human phenotypes.
Collapse
Affiliation(s)
- Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Yang Li
- Department of Medicine, University of California San Diego, La Jolla, CA
| | - Ross DeVito
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Nima Mousavi
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA
| | - Nichole Ma
- Department of Medicine, University of California San Diego, La Jolla, CA
| | - Ibra Lujumba
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala-Uganda
| | - Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Mikhail Maksimov
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Bonnie Huang
- Department of Bioengineering, University of California San Diego, La Jolla, CA
| | | | - Yunjiang Qiu
- Illumina Incorporated, San Diego, California 92122, USA
| | - Fredrick Elishama Kakembo
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala-Uganda
| | - Habi Joseph
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala-Uganda
| | - Blessing Onyido
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Jumoke Adeyemi
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Mehrdad Bakhtiari
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Sara Javadzadeh
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Daudi Jjingo
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala-Uganda
- Department of Computer Science, Makerere University, Kampala, Uganda
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, Baden-Württemberg, 69120, Germany
| | - Vineet Bafna
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
- Department of Medicine, University of California San Diego, La Jolla, CA
| |
Collapse
|
26
|
Wang H, Wang LS, Schellenberg G, Lee WP. The role of structural variations in Alzheimer's disease and other neurodegenerative diseases. Front Aging Neurosci 2023; 14:1073905. [PMID: 36846102 PMCID: PMC9944073 DOI: 10.3389/fnagi.2022.1073905] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Accepted: 12/31/2022] [Indexed: 02/10/2023] Open
Abstract
Dozens of single nucleotide polymorphisms (SNPs) related to Alzheimer's disease (AD) have been discovered by large scale genome-wide association studies (GWASs). However, only a small portion of the genetic component of AD can be explained by SNPs observed from GWAS. Structural variation (SV) can be a major contributor to the missing heritability of AD; while SV in AD remains largely unexplored as the accurate detection of SVs from the widely used array-based and short-read technology are still far from perfect. Here, we briefly summarized the strengths and weaknesses of available SV detection methods. We reviewed the current landscape of SV analysis in AD and SVs that have been found associated with AD. Particularly, the importance of currently less explored SVs, including insertions, inversions, short tandem repeats, and transposable elements in neurodegenerative diseases were highlighted.
Collapse
Affiliation(s)
- Hui Wang
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Gerard Schellenberg
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Wan-Ping Lee
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
27
|
Points to consider in the detection of germline structural variants using next-generation sequencing: A statement of the American College of Medical Genetics and Genomics (ACMG). Genet Med 2023; 25:100316. [PMID: 36507974 DOI: 10.1016/j.gim.2022.09.017] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 09/29/2022] [Accepted: 09/30/2022] [Indexed: 12/14/2022] Open
|
28
|
Martin-Trujillo A, Garg P, Patel N, Jadhav B, Sharp AJ. Genome-wide evaluation of the effect of short tandem repeat variation on local DNA methylation. Genome Res 2023; 33:184-196. [PMID: 36577521 PMCID: PMC10069470 DOI: 10.1101/gr.277057.122] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 12/19/2022] [Indexed: 12/30/2022]
Abstract
Short tandem repeats (STRs) contribute significantly to genetic diversity in humans, including disease-causing variation. Although the effect of STR variation on gene expression has been extensively assessed, their impact on epigenetics has been poorly studied and limited to specific genomic regions. Here, we investigated the hypothesis that some STRs act as independent regulators of local DNA methylation in the human genome and modify risk of common human traits. To address these questions, we first analyzed two independent data sets comprising PCR-free whole-genome sequencing (WGS) and genome-wide DNA methylation levels derived from whole-blood samples in 245 (discovery cohort) and 484 individuals (replication cohort). Using genotypes for 131,635 polymorphic STRs derived from WGS using HipSTR, we identified 11,870 STRs that associated with DNA methylation levels (mSTRs) of 11,774 CpGs (Bonferroni P < 0.001) in our discovery cohort, with 90% successfully replicating in our second cohort. Subsequently, through fine-mapping using CAVIAR we defined 585 of these mSTRs as the likely causal variants underlying the observed associations (fm-mSTRs) and linked a fraction of these to previously reported genome-wide association study signals, providing insights into the mechanisms underlying complex human traits. Furthermore, by integrating gene expression data, we observed that 12.5% of the tested fm-mSTRs also modulate expression levels of nearby genes, reinforcing their regulatory potential. Overall, our findings expand the catalog of functional sequence variants that affect genome regulation, highlighting the importance of incorporating STRs in future genetic association analysis and epigenetics data for the interpretation of trait-associated variants.
Collapse
Affiliation(s)
- Alejandro Martin-Trujillo
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, Hess Center for Science and Medicine, New York, New York 10029, USA
| | - Paras Garg
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, Hess Center for Science and Medicine, New York, New York 10029, USA
| | - Nihir Patel
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, Hess Center for Science and Medicine, New York, New York 10029, USA
| | - Bharati Jadhav
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, Hess Center for Science and Medicine, New York, New York 10029, USA
| | - Andrew J Sharp
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, Hess Center for Science and Medicine, New York, New York 10029, USA
| |
Collapse
|
29
|
Fan C, Chen K, Wang Y, Ball EV, Stenson PD, Mort M, Bacolla A, Kehrer-Sawatzki H, Tainer JA, Cooper DN, Zhao H. Profiling human pathogenic repeat expansion regions by synergistic and multi-level impacts on molecular connections. Hum Genet 2023; 142:245-274. [PMID: 36344696 PMCID: PMC10290229 DOI: 10.1007/s00439-022-02500-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 10/24/2022] [Indexed: 11/09/2022]
Abstract
Whilst DNA repeat expansions cause numerous heritable human disorders, their origins and underlying pathological mechanisms are often unclear. We collated a dataset comprising 224 human repeat expansions encompassing 203 different genes, and performed a systematic analysis with respect to key topological features at the DNA, RNA and protein levels. Comparison with controls without known pathogenicity and genomic regions lacking repeats, allowed the construction of the first tool to discriminate repeat regions harboring pathogenic repeat expansions (DPREx). At the DNA level, pathogenic repeat expansions exhibited stronger signals for DNA regulatory factors (e.g. H3K4me3, transcription factor-binding sites) in exons, promoters, 5'UTRs and 5'genes but were not significantly different from controls in introns, 3'UTRs and 3'genes. Additionally, pathogenic repeat expansions were also found to be enriched in non-B DNA structures. At the RNA level, pathogenic repeat expansions were characterized by lower free energy for forming RNA secondary structure and were closer to splice sites in introns, exons, promoters and 5'genes than controls. At the protein level, pathogenic repeat expansions exhibited a preference to form coil rather than other types of secondary structure, and tended to encode surface-located protein domains. Guided by these features, DPREx ( http://biomed.nscc-gz.cn/zhaolab/geneprediction/# ) achieved an Area Under the Curve (AUC) value of 0.88 in a test on an independent dataset. Pathogenic repeat expansions are thus located such that they exert a synergistic influence on the gene expression pathway involving inter-molecular connections at the DNA, RNA and protein levels.
Collapse
Affiliation(s)
- Cong Fan
- Department of Medical Research Center, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, 107 Yan Jiang West Road, Guangzhou, 500001, People's Republic of China
| | - Ken Chen
- School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou, 500001, China
| | - Yukai Wang
- School of Life Science, Sun Yat-Sen University, Guangzhou, 500001, China
| | - Edward V Ball
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Matthew Mort
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Albino Bacolla
- Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, 6767 Bertner Avenue, Houston, TX, 77030, USA
| | | | - John A Tainer
- Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, 6767 Bertner Avenue, Houston, TX, 77030, USA
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Huiying Zhao
- Department of Medical Research Center, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, 107 Yan Jiang West Road, Guangzhou, 500001, People's Republic of China.
| |
Collapse
|
30
|
Recurrent repeat expansions in human cancer genomes. Nature 2023; 613:96-102. [PMID: 36517591 DOI: 10.1038/s41586-022-05515-1] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 11/02/2022] [Indexed: 12/16/2022]
Abstract
Expansion of a single repetitive DNA sequence, termed a tandem repeat (TR), is known to cause more than 50 diseases1,2. However, repeat expansions are often not explored beyond neurological and neurodegenerative disorders. In some cancers, mutations accumulate in short tracts of TRs, a phenomenon termed microsatellite instability; however, larger repeat expansions have not been systematically analysed in cancer3-8. Here we identified TR expansions in 2,622 cancer genomes spanning 29 cancer types. In seven cancer types, we found 160 recurrent repeat expansions (rREs), most of which (155/160) were subtype specific. We found that rREs were non-uniformly distributed in the genome with enrichment near candidate cis-regulatory elements, suggesting a potential role in gene regulation. One rRE, a GAAA-repeat expansion, located near a regulatory element in the first intron of UGT2B7 was detected in 34% of renal cell carcinoma samples and was validated by long-read DNA sequencing. Moreover, in preliminary experiments, treating cells that harbour this rRE with a GAAA-targeting molecule led to a dose-dependent decrease in cell proliferation. Overall, our results suggest that rREs may be an important but unexplored source of genetic variation in human cancer, and we provide a comprehensive catalogue for further study.
Collapse
|
31
|
Dashnow H, Pedersen BS, Hiatt L, Brown J, Beecroft SJ, Ravenscroft G, LaCroix AJ, Lamont P, Roxburgh RH, Rodrigues MJ, Davis M, Mefford HC, Laing NG, Quinlan AR. STRling: a k-mer counting approach that detects short tandem repeat expansions at known and novel loci. Genome Biol 2022; 23:257. [PMID: 36517892 PMCID: PMC9753380 DOI: 10.1186/s13059-022-02826-4] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 11/30/2022] [Indexed: 12/23/2022] Open
Abstract
Expansions of short tandem repeats (STRs) cause many rare diseases. Expansion detection is challenging with short-read DNA sequencing data since supporting reads are often mapped incorrectly. Detection is particularly difficult for "novel" STRs, which include new motifs at known loci or STRs absent from the reference genome. We developed STRling to efficiently count k-mers to recover informative reads and call expansions at known and novel STR loci. STRling is sensitive to known STR disease loci, has a low false discovery rate, and resolves novel STR expansions to base-pair position accuracy. It is fast, scalable, open-source, and available at: github.com/quinlan-lab/STRling .
Collapse
Affiliation(s)
- Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Brent S Pedersen
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utrecht University Medical Center, Utrecht, The Netherlands
| | - Laurel Hiatt
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Joe Brown
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Sarah J Beecroft
- Pawsey Supercomputing Research Centre, Kensington, WA, Australia
- Harry Perkins Institute of Medical Research and Centre for Medical Research, University of Western Australia, Perth, WA, Australia
| | - Gianina Ravenscroft
- Harry Perkins Institute of Medical Research and Centre for Medical Research, University of Western Australia, Perth, WA, Australia
| | - Amy J LaCroix
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA, 98195, USA
| | - Phillipa Lamont
- Neurogenetic Unit, Royal Perth Hospital, Perth, WA, Australia
| | | | - Miriam J Rodrigues
- Neurology, Auckland City Hospital, Auckland, New Zealand
- Centre for Brain Research, University of Auckland, Auckland, New Zealand
| | - Mark Davis
- Neurogenetics Unit, Department of Diagnostic Genomics, PathWest Laboratory Medicine, Western Australian Department of Health, Nedlands, Australia
| | - Heather C Mefford
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA, 98195, USA
| | - Nigel G Laing
- Harry Perkins Institute of Medical Research and Centre for Medical Research, University of Western Australia, Perth, WA, Australia
- Neurogenetics Unit, Department of Diagnostic Genomics, PathWest Laboratory Medicine, Western Australian Department of Health, Nedlands, Australia
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| |
Collapse
|
32
|
Steely CJ, Watkins WS, Baird L, Jorde LB. The mutational dynamics of short tandem repeats in large, multigenerational families. Genome Biol 2022; 23:253. [PMID: 36510265 PMCID: PMC9743774 DOI: 10.1186/s13059-022-02818-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 11/17/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Short tandem repeats (STRs) compose approximately 3% of the genome, and mutations at STR loci have been linked to dozens of human diseases including amyotrophic lateral sclerosis, Friedreich ataxia, Huntington disease, and fragile X syndrome. Improving our understanding of these mutations would increase our knowledge of the mutational dynamics of the genome and may uncover additional loci that contribute to disease. To estimate the genome-wide pattern of mutations at STR loci, we analyze blood-derived whole-genome sequencing data for 544 individuals from 29 three-generation CEPH pedigrees. These pedigrees contain both sets of grandparents, the parents, and an average of 9 grandchildren per family. RESULTS We use HipSTR to identify de novo STR mutations in the 2nd generation of these pedigrees and require transmission to the third generation for validation. Analyzing approximately 1.6 million STR loci, we estimate the empirical de novo STR mutation rate to be 5.24 × 10-5 mutations per locus per generation. Perfect repeats mutate about 2 × more often than imperfect repeats. De novo STRs are significantly enriched in Alu elements. CONCLUSIONS Approximately 30% of new STR mutations occur within Alu elements, which compose only 11% of the genome, but only 10% are found in LINE-1 insertions, which compose 17% of the genome. Phasing these mutations to the parent of origin shows that parental transmission biases vary among families. We estimate the average number of de novo genome-wide STR mutations per individual to be approximately 85, which is similar to the average number of observed de novo single nucleotide variants.
Collapse
Affiliation(s)
- Cody J. Steely
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112 USA
| | - W. Scott Watkins
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112 USA
| | - Lisa Baird
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112 USA
| | - Lynn B. Jorde
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112 USA
| |
Collapse
|
33
|
Muacevic A, Adler JR. The Impact of Leukemia on the Detection of Short Tandem Repeat (STR) Markers. Cureus 2022; 14:e30954. [PMID: 36465210 PMCID: PMC9711926 DOI: 10.7759/cureus.30954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/31/2022] [Indexed: 01/25/2023] Open
Abstract
INTRODUCTION Short tandem repeats (STRs) have been used for various identity typing methods worldwide. They have high discrimination power in human identification in forensics, paternity testing, missed personal identification, genetic diseases, and gene regulatory functions. They have also been used to detect and monitor the stability of diseases, including various types of cancer. This study aimed to investigate the impact of leukemia on the detection and stability of STR markers. METHODS DNA was isolated from 30 participants (15 with chronic myeloid leukemia( CML) and 15 healthy controls) and used to amplify STR markers using specific primers. RESULTS We found that the blood of those with leukemia had more 9.3 and 9 alleles at the tyrosine hydroxylase 1 (TH01) marker than the blood of the healthy control samples. The results of this study will help researchers understand leukemia's effect on the detection and stability of STR markers in leukemic patients compared to healthy individuals. CONCLUSION Our results demonstrate that STR markers could become useful in genetic studies of leukemia cases.
Collapse
|
34
|
Byrska-Bishop M, Evani US, Zhao X, Basile AO, Abel HJ, Regier AA, Corvelo A, Clarke WE, Musunuri R, Nagulapalli K, Fairley S, Runnels A, Winterkorn L, Lowy E, Paul Flicek, Germer S, Brand H, Hall IM, Talkowski ME, Narzisi G, Zody MC. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 2022; 185:3426-3440.e19. [PMID: 36055201 PMCID: PMC9439720 DOI: 10.1016/j.cell.2022.08.004] [Citation(s) in RCA: 343] [Impact Index Per Article: 114.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 06/21/2022] [Accepted: 08/03/2022] [Indexed: 01/05/2023]
Abstract
The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies.
Collapse
Affiliation(s)
| | | | - Xuefang Zhao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | | | - Haley J Abel
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA; Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Allison A Regier
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA; Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | | | - Wayne E Clarke
- New York Genome Center, New York, NY 10013, USA; Outlier Informatics Inc., Saskatoon, SK S7H 1L4, Canada
| | | | | | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | | | - Ernesto Lowy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Harrison Brand
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ira M Hall
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA; Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA; Center for Genomic Health, Yale University School of Medicine, New Haven, CT 06510, USA; Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Michael E Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | | |
Collapse
|
35
|
Detection of repeat expansions in large next generation DNA and RNA sequencing data without alignment. Sci Rep 2022; 12:13124. [PMID: 35907931 PMCID: PMC9338934 DOI: 10.1038/s41598-022-17267-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 07/22/2022] [Indexed: 11/10/2022] Open
Abstract
Bioinformatic methods for detecting short tandem repeat expansions in short-read sequencing have identified new repeat expansions in humans, but require alignment information to identify repetitive motif enrichment at genomic locations. We present superSTR, an ultrafast method that does not require alignment. superSTR is used to process whole-genome and whole-exome sequencing data, and perform the first STR analysis of the UK Biobank, efficiently screening and identifying known and potential disease-associated STRs in the exomes of 49,953 biobank participants. We demonstrate the first bioinformatic screening of RNA sequencing data to detect repeat expansions in humans and mouse models of ataxia and dystrophy.
Collapse
|
36
|
Halman A, Dolzhenko E, Oshlack A. STRipy: A graphical application for enhanced genotyping of pathogenic short tandem repeats in sequencing data. Hum Mutat 2022; 43:859-868. [PMID: 35395114 PMCID: PMC9541159 DOI: 10.1002/humu.24382] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 12/01/2021] [Accepted: 04/06/2022] [Indexed: 11/22/2022]
Abstract
Expansions of short tandem repeats (STRs) have been implicated as the causal variant in over 50 diseases known to date. There are several tools which can genotype STRs from high-throughput sequencing (HTS) data. However, running these tools out of the box only allows around half of the known disease-causing loci to be genotyped. Furthermore, the genotypes estimated at these loci are often underestimated with maximum lengths limited to either the read or fragment length, which is less than the pathogenic cutoff for some diseases. Although analysis tools can be customized to genotype extra loci, this requires proficiency in bioinformatics to set up, limiting their widespread usage by other researchers and clinicians. To address these issues, we have developed a new software called STRipy, which is able to target all known disease-causing STRs from HTS data. We created an intuitive graphical interface for STRipy and significantly simplified the detection of STRs expansions. Moreover, we genotyped all disease loci for over two and half thousand samples to provide population-wide distributions to assist with interpretation of results. We believe the simplicity and breadth of STRipy will increase the genotyping of STRs in sequencing data resulting in further diagnoses of rare STR diseases.
Collapse
Affiliation(s)
- Andreas Halman
- Peter MacCallum Cancer CentreMelbourneVictoriaAustralia
- Sir Peter MacCallum Department of OncologyThe University of MelbourneParkvilleVictoriaAustralia
- Murdoch Children's Research Institute, Royal Children's HospitalParkvilleVictoriaAustralia
- Florey Department of Neuroscience and Mental HealthThe University of MelbourneParkvilleVictoriaAustralia
- School of Natural Sciences and HealthTallinn UniversityTallinnEstonia
| | | | - Alicia Oshlack
- Peter MacCallum Cancer CentreMelbourneVictoriaAustralia
- Sir Peter MacCallum Department of OncologyThe University of MelbourneParkvilleVictoriaAustralia
- School of BioSciencesUniversity of MelbourneParkvilleVictoriaAustralia
| |
Collapse
|
37
|
Chiu R, Rajan-Babu IS, Birol I, Friedman JM. Linked-read sequencing for detecting short tandem repeat expansions. Sci Rep 2022; 12:9352. [PMID: 35672336 PMCID: PMC9174224 DOI: 10.1038/s41598-022-13024-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 05/19/2022] [Indexed: 11/09/2022] Open
Abstract
Detection of short tandem repeat (STR) expansions with standard short-read sequencing is challenging due to the difficulty in mapping multicopy repeat sequences. In this study, we explored how the long-range sequence information of barcode linked-read sequencing (BLRS) can be leveraged to improve repeat-read detection. We also devised a novel algorithm using BLRS barcodes for distance estimation and evaluated its application for STR genotyping. Both approaches were designed for genotyping large expansions (> 1 kb) that cannot be sized accurately by existing methods. Using simulated and experimental data of genomes with STR expansions from multiple BLRS platforms, we validated the utility of barcode and phasing information in attaining better STR genotypes compared to standard short-read sequencing. Although the coverage bias of extremely GC-rich STRs is an important limitation of BLRS, BLRS is an effective strategy for genotyping many other STR loci.
Collapse
Affiliation(s)
- Readman Chiu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Indhu-Shree Rajan-Babu
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada.,Department of Medical and Molecular Genetics, King's College London, Strand, London, WC2R 2LS, UK
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada. .,Department of Medical Genetics, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada.
| | - Jan M Friedman
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada.,BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada
| |
Collapse
|
38
|
Fang L, Liu Q, Monteys AM, Gonzalez-Alegre P, Davidson BL, Wang K. DeepRepeat: direct quantification of short tandem repeats on signal data from nanopore sequencing. Genome Biol 2022; 23:108. [PMID: 35484600 PMCID: PMC9052667 DOI: 10.1186/s13059-022-02670-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 04/08/2022] [Indexed: 12/12/2022] Open
Abstract
Despite recent improvements in basecalling accuracy, nanopore sequencing still has higher error rates on short-tandem repeats (STRs). Instead of using basecalled reads, we developed DeepRepeat which converts ionic current signals into red-green-blue channels, thus transforming the repeat detection problem into an image recognition problem. DeepRepeat identifies and accurately quantifies telomeric repeats in the CHM13 cell line and achieves higher accuracy in quantifying repeats in long STRs than competing methods. We also evaluate DeepRepeat on genome-wide or candidate region datasets from seven different sources. In summary, DeepRepeat enables accurate quantification of long STRs and complements existing methods relying on basecalled reads.
Collapse
Affiliation(s)
- Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA. .,School of Life Sciences, College of Science, University of Nevada, Las Vegas, 4505 S Maryland Pkwy, Las Vegas, NV, 89154, USA. .,Nevada Institute of Personalized Medicine, College of Science, University of Nevada, Las Vegas, 4505 S Maryland Pkwy, Las Vegas, NV, 89154, USA.
| | - Alex Mas Monteys
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Pedro Gonzalez-Alegre
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Beverly L Davidson
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.,Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA. .,Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
39
|
Stevanovski I, Chintalaphani SR, Gamaarachchi H, Ferguson JM, Pineda SS, Scriba CK, Tchan M, Fung V, Ng K, Cortese A, Houlden H, Dobson-Stone C, Fitzpatrick L, Halliday G, Ravenscroft G, Davis MR, Laing NG, Fellner A, Kennerson M, Kumar KR, Deveson IW. Comprehensive genetic diagnosis of tandem repeat expansion disorders with programmable targeted nanopore sequencing. SCIENCE ADVANCES 2022; 8:eabm5386. [PMID: 35245110 PMCID: PMC8896783 DOI: 10.1126/sciadv.abm5386] [Citation(s) in RCA: 82] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 01/11/2022] [Indexed: 05/25/2023]
Abstract
More than 50 neurological and neuromuscular diseases are caused by short tandem repeat (STR) expansions, with 37 different genes implicated to date. We describe the use of programmable targeted long-read sequencing with Oxford Nanopore's ReadUntil function for parallel genotyping of all known neuropathogenic STRs in a single assay. Our approach enables accurate, haplotype-resolved assembly and DNA methylation profiling of STR sites, from a list of predetermined candidates. This correctly diagnoses all individuals in a small cohort (n = 37) including patients with various neurogenetic diseases (n = 25). Targeted long-read sequencing solves large and complex STR expansions that confound established molecular tests and short-read sequencing and identifies noncanonical STR motif conformations and internal sequence interruptions. We observe a diversity of STR alleles of known and unknown pathogenicity, suggesting that long-read sequencing will redefine the genetic landscape of repeat disorders. Last, we show how the inclusion of pharmacogenomic genes as secondary ReadUntil targets can further inform patient care.
Collapse
Affiliation(s)
- Igor Stevanovski
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Sydney, NSW, Australia
| | - Sanjog R. Chintalaphani
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Sydney, NSW, Australia
- School of Medicine, University of New South Wales, Sydney, NSW, Australia
- St Vincent’s Clinical School, Faculty of Medicine, University of New South Wales, Sydney, NSW, Australia
| | - Hasindu Gamaarachchi
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Sydney, NSW, Australia
- School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia
| | - James M. Ferguson
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Sydney, NSW, Australia
| | - Sandy S. Pineda
- Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, Sydney, NSW, Australia
- The University of Sydney, Brain and Mind Centre and School of Medical Sciences, Faculty of Medicine and Health, Camperdown, NSW, Australia
| | - Carolin K. Scriba
- Harry Perkins Institute of Medical Research, University of Western Australia, Nedlands, WA, Australia
- Diagnostic Genomics, PathWest Laboratory Medicine WA, Nedlands, WA, Australia
| | - Michel Tchan
- Westmead Hospital, Westmead, NSW, Australia and Sydney Medical School, The University of Sydney, NSW, Australia
| | - Victor Fung
- Westmead Hospital, Westmead, NSW, Australia and Sydney Medical School, The University of Sydney, NSW, Australia
| | - Karl Ng
- Department of Neurology, Royal North Shore Hospital and The University of Sydney, Sydney, NSW, Australia
| | - Andrea Cortese
- Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
- The National Hospital for Neurology and Neurosurgery, London, UK
| | - Henry Houlden
- Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
- The National Hospital for Neurology and Neurosurgery, London, UK
| | - Carol Dobson-Stone
- The University of Sydney, Brain and Mind Centre and School of Medical Sciences, Faculty of Medicine and Health, Camperdown, NSW, Australia
| | - Lauren Fitzpatrick
- The University of Sydney, Brain and Mind Centre and School of Medical Sciences, Faculty of Medicine and Health, Camperdown, NSW, Australia
| | - Glenda Halliday
- The University of Sydney, Brain and Mind Centre and School of Medical Sciences, Faculty of Medicine and Health, Camperdown, NSW, Australia
| | - Gianina Ravenscroft
- Harry Perkins Institute of Medical Research, University of Western Australia, Nedlands, WA, Australia
| | - Mark R. Davis
- Harry Perkins Institute of Medical Research, University of Western Australia, Nedlands, WA, Australia
| | - Nigel G. Laing
- Harry Perkins Institute of Medical Research, University of Western Australia, Nedlands, WA, Australia
- Diagnostic Genomics, PathWest Laboratory Medicine WA, Nedlands, WA, Australia
| | - Avi Fellner
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Sydney, NSW, Australia
- Raphael Recanati Genetics Institute, Rabin Medical Center, Beilinson Hospital, Petah Tikva, Israel
- The Neurology Department, Rabin Medical Center, Beilinson Hospital, Petah Tikva, Israel
| | - Marina Kennerson
- Northcott Neuroscience Laboratory, ANZAC Research Institute, Sydney, NSW, Australia
- Faculty of Health and Medicine, University of Sydney, Camperdown, NSW, Australia
- Molecular Medicine Laboratory, Concord Hospital, Concord, NSW, Australia
| | - Kishore R. Kumar
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Sydney, NSW, Australia
- Molecular Medicine Laboratory, Concord Hospital, Concord, NSW, Australia
- Neurology Department, Central Clinical School, Concord Repatriation General Hospital, University of Sydney, Concord, NSW, Australia
| | - Ira W. Deveson
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Sydney, NSW, Australia
- St Vincent’s Clinical School, Faculty of Medicine, University of New South Wales, Sydney, NSW, Australia
| |
Collapse
|
40
|
Marwaha S, Knowles JW, Ashley EA. A guide for the diagnosis of rare and undiagnosed disease: beyond the exome. Genome Med 2022; 14:23. [PMID: 35220969 PMCID: PMC8883622 DOI: 10.1186/s13073-022-01026-w] [Citation(s) in RCA: 131] [Impact Index Per Article: 43.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 02/10/2022] [Indexed: 02/07/2023] Open
Abstract
Rare diseases affect 30 million people in the USA and more than 300-400 million worldwide, often causing chronic illness, disability, and premature death. Traditional diagnostic techniques rely heavily on heuristic approaches, coupling clinical experience from prior rare disease presentations with the medical literature. A large number of rare disease patients remain undiagnosed for years and many even die without an accurate diagnosis. In recent years, gene panels, microarrays, and exome sequencing have helped to identify the molecular cause of such rare and undiagnosed diseases. These technologies have allowed diagnoses for a sizable proportion (25-35%) of undiagnosed patients, often with actionable findings. However, a large proportion of these patients remain undiagnosed. In this review, we focus on technologies that can be adopted if exome sequencing is unrevealing. We discuss the benefits of sequencing the whole genome and the additional benefit that may be offered by long-read technology, pan-genome reference, transcriptomics, metabolomics, proteomics, and methyl profiling. We highlight computational methods to help identify regionally distant patients with similar phenotypes or similar genetic mutations. Finally, we describe approaches to automate and accelerate genomic analysis. The strategies discussed here are intended to serve as a guide for clinicians and researchers in the next steps when encountering patients with non-diagnostic exomes.
Collapse
Affiliation(s)
- Shruti Marwaha
- Department of Medicine, Division of Cardiovascular Medicine, School of Medicine, Stanford University, Stanford, CA, USA.
- Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, USA.
| | - Joshua W Knowles
- Department of Medicine, Division of Cardiovascular Medicine, School of Medicine, Stanford University, Stanford, CA, USA
- Department of Medicine, Diabetes Research Center, Cardiovascular Institute and Prevention Research Center, Stanford, CA, USA
| | - Euan A Ashley
- Department of Medicine, Division of Cardiovascular Medicine, School of Medicine, Stanford University, Stanford, CA, USA.
- Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, USA.
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA.
| |
Collapse
|
41
|
McHale P, Quinlan AR. trfermikit: a tool to discover VNTR-associated deletions. Bioinformatics 2022; 38:1231-1234. [PMID: 34864893 PMCID: PMC8826174 DOI: 10.1093/bioinformatics/btab805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 10/25/2021] [Accepted: 11/27/2021] [Indexed: 02/04/2023] Open
Abstract
SUMMARY We present trfermikit, a software tool designed to detect deletions larger than 50 bp occurring in Variable Number Tandem Repeats using Illumina DNA sequencing reads. In such regions, it achieves a better tradeoff between sensitivity and false discovery than a state-of-the-art structural variation caller, Manta and complements it by recovering a significant number of deletions that Manta missed. trfermikit is based upon the fermikit pipeline, which performs read assembly, maps the assembly to the reference genome and calls variants from the alignment. AVAILABILITY AND IMPLEMENTATION https://github.com/petermchale/trfermikit. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Peter McHale
- Department of Human Genetics and Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT 84112, USA
| | - Aaron R Quinlan
- Department of Human Genetics and Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT 84112, USA
| |
Collapse
|
42
|
Loureiro JR, Castro AF, Figueiredo AS, Silveira I. Molecular Mechanisms in Pentanucleotide Repeat Diseases. Cells 2022; 11:cells11020205. [PMID: 35053321 PMCID: PMC8773600 DOI: 10.3390/cells11020205] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 02/01/2023] Open
Abstract
The number of neurodegenerative diseases resulting from repeat expansion has increased extraordinarily in recent years. In several of these pathologies, the repeat can be transcribed in RNA from both DNA strands producing, at least, one toxic RNA repeat that causes neurodegeneration by a complex mechanism. Recently, seven diseases have been found caused by a novel intronic pentanucleotide repeat in distinct genes encoding proteins highly expressed in the cerebellum. These disorders are clinically heterogeneous being characterized by impaired motor function, resulting from ataxia or epilepsy. The role that apparently normal proteins from these mutant genes play in these pathologies is not known. However, recent advances in previously known spinocerebellar ataxias originated by abnormal non-coding pentanucleotide repeats point to a gain of a toxic function by the pathogenic repeat-containing RNA that abnormally forms nuclear foci with RNA-binding proteins. In cells, RNA foci have been shown to be formed by phase separation. Moreover, the field of repeat expansions has lately achieved an extraordinary progress with the discovery that RNA repeats, polyglutamine, and polyalanine proteins are crucial for the formation of nuclear membraneless organelles by phase separation, which is perturbed when they are expanded. This review will cover the amazing advances on repeat diseases.
Collapse
Affiliation(s)
- Joana R. Loureiro
- Genetics of Cognitive Dysfunction Laboratory, i3S- Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135 Porto, Portugal; (J.R.L.); (A.F.C.); (A.S.F.)
- Institute for Molecular and Cell Biology, Universidade do Porto, 4200-135 Porto, Portugal
| | - Ana F. Castro
- Genetics of Cognitive Dysfunction Laboratory, i3S- Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135 Porto, Portugal; (J.R.L.); (A.F.C.); (A.S.F.)
- Institute for Molecular and Cell Biology, Universidade do Porto, 4200-135 Porto, Portugal
- Instituto de Ciências Biomédicas Abel Salazar, Universidade do Porto, 4050-313 Porto, Portugal
| | - Ana S. Figueiredo
- Genetics of Cognitive Dysfunction Laboratory, i3S- Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135 Porto, Portugal; (J.R.L.); (A.F.C.); (A.S.F.)
- Institute for Molecular and Cell Biology, Universidade do Porto, 4200-135 Porto, Portugal
- Instituto de Ciências Biomédicas Abel Salazar, Universidade do Porto, 4050-313 Porto, Portugal
| | - Isabel Silveira
- Genetics of Cognitive Dysfunction Laboratory, i3S- Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135 Porto, Portugal; (J.R.L.); (A.F.C.); (A.S.F.)
- Institute for Molecular and Cell Biology, Universidade do Porto, 4200-135 Porto, Portugal
- Correspondence: ; Tel.: +351-2240-8800
| |
Collapse
|
43
|
Gall-Duncan T, Sato N, Yuen RKC, Pearson CE. Advancing genomic technologies and clinical awareness accelerates discovery of disease-associated tandem repeat sequences. Genome Res 2022; 32:1-27. [PMID: 34965938 PMCID: PMC8744678 DOI: 10.1101/gr.269530.120] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 11/29/2021] [Indexed: 11/25/2022]
Abstract
Expansions of gene-specific DNA tandem repeats (TRs), first described in 1991 as a disease-causing mutation in humans, are now known to cause >60 phenotypes, not just disease, and not only in humans. TRs are a common form of genetic variation with biological consequences, observed, so far, in humans, dogs, plants, oysters, and yeast. Repeat diseases show atypical clinical features, genetic anticipation, and multiple and partially penetrant phenotypes among family members. Discovery of disease-causing repeat expansion loci accelerated through technological advances in DNA sequencing and computational analyses. Between 2019 and 2021, 17 new disease-causing TR expansions were reported, totaling 63 TR loci (>69 diseases), with a likelihood of more discoveries, and in more organisms. Recent and historical lessons reveal that properly assessed clinical presentations, coupled with genetic and biological awareness, can guide discovery of disease-causing unstable TRs. We highlight critical but underrecognized aspects of TR mutations. Repeat motifs may not be present in current reference genomes but will be in forthcoming gapless long-read references. Repeat motif size can be a single nucleotide to kilobases/unit. At a given locus, repeat motif sequence purity can vary with consequence. Pathogenic repeats can be "insertions" within nonpathogenic TRs. Expansions, contractions, and somatic length variations of TRs can have clinical/biological consequences. TR instabilities occur in humans and other organisms. TRs can be epigenetically modified and/or chromosomal fragile sites. We discuss the expanding field of disease-associated TR instabilities, highlighting prospects, clinical and genetic clues, tools, and challenges for further discoveries of disease-causing TR instabilities and understanding their biological and pathological impacts-a vista that is about to expand.
Collapse
Affiliation(s)
- Terence Gall-Duncan
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Nozomu Sato
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
| | - Ryan K C Yuen
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Christopher E Pearson
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| |
Collapse
|
44
|
Rajabi F, Jabalameli N, Rezaei N. The Concept of Immunogenetics. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1367:1-17. [DOI: 10.1007/978-3-030-92616-8_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
45
|
Schröder C, Horsthemke B, Depienne C. GC-rich repeat expansions: associated disorders and mechanisms. MED GENET-BERLIN 2021; 33:325-335. [PMID: 38835438 PMCID: PMC11006399 DOI: 10.1515/medgen-2021-2099] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 11/12/2021] [Indexed: 06/06/2024]
Abstract
Noncoding repeat expansions are a well-known cause of genetic disorders mainly affecting the central nervous system. Missed by most standard technologies used in routine diagnosis, pathogenic noncoding repeat expansions have to be searched for using specific techniques such as repeat-primed PCR or specific bioinformatics tools applied to genome data, such as ExpansionHunter. In this review, we focus on GC-rich repeat expansions, which represent at least one third of all noncoding repeat expansions described so far. GC-rich expansions are mainly located in regulatory regions (promoter, 5' untranslated region, first intron) of genes and can lead to either a toxic gain-of-function mediated by RNA toxicity and/or repeat-associated non-AUG (RAN) translation, or a loss-of-function of the associated gene, depending on their size and their methylation status. We herein review the clinical and molecular characteristics of disorders associated with these difficult-to-detect expansions.
Collapse
Affiliation(s)
- Christopher Schröder
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Bernhard Horsthemke
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Christel Depienne
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| |
Collapse
|
46
|
An Introductory Overview of Open-Source and Commercial Software Options for the Analysis of Forensic Sequencing Data. Genes (Basel) 2021; 12:genes12111739. [PMID: 34828345 PMCID: PMC8618049 DOI: 10.3390/genes12111739] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 10/27/2021] [Accepted: 10/27/2021] [Indexed: 12/30/2022] Open
Abstract
The top challenges of adopting new methods to forensic DNA analysis in routine laboratories are often the capital investment and the expertise required to implement and validate such methods locally. In the case of next-generation sequencing, in the last decade, several specifically forensic commercial options became available, offering reliable and validated solutions. Despite this, the readily available expertise to analyze, interpret and understand such data is still perceived to be lagging behind. This review gives an introductory overview for the forensic scientists who are at the beginning of their journey with implementing next-generation sequencing locally and because most in the field do not have a bioinformatics background may find it difficult to navigate the new terms and analysis options available. The currently available open-source and commercial software for forensic sequencing data analysis are summarized here to provide an accessible starting point for those fairly new to the forensic application of massively parallel sequencing.
Collapse
|
47
|
Chiu R, Rajan-Babu IS, Friedman JM, Birol I. Straglr: discovering and genotyping tandem repeat expansions using whole genome long-read sequences. Genome Biol 2021; 22:224. [PMID: 34389037 PMCID: PMC8361843 DOI: 10.1186/s13059-021-02447-3] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 07/26/2021] [Indexed: 12/11/2022] Open
Abstract
Tandem repeat (TR) expansion is the underlying cause of over 40 neurological disorders. Long-read sequencing offers an exciting avenue over conventional technologies for detecting TR expansions. Here, we present Straglr, a robust software tool for both targeted genotyping and novel expansion detection from long-read alignments. We benchmark Straglr using various simulations, targeted genotyping data of cell lines carrying expansions of known diseases, and whole genome sequencing data with chromosome-scale assembly. Our results suggest that Straglr may be useful for investigating disease-associated TR expansions using long-read sequencing.
Collapse
Affiliation(s)
- Readman Chiu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Indhu-Shree Rajan-Babu
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
- BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical and Molecular Genetics, King's College London, Strand, London, WC2R 2LS, UK
| | - Jan M Friedman
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
- BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada.
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada.
| |
Collapse
|
48
|
Rajan-Babu IS, Peng JJ, Chiu R, Li C, Mohajeri A, Dolzhenko E, Eberle MA, Birol I, Friedman JM. Genome-wide sequencing as a first-tier screening test for short tandem repeat expansions. Genome Med 2021; 13:126. [PMID: 34372915 PMCID: PMC8351082 DOI: 10.1186/s13073-021-00932-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 07/05/2021] [Indexed: 02/01/2023] Open
Abstract
Background Screening for short tandem repeat (STR) expansions in next-generation sequencing data can enable diagnosis, optimal clinical management/treatment, and accurate genetic counseling of patients with repeat expansion disorders. We aimed to develop an efficient computational workflow for reliable detection of STR expansions in next-generation sequencing data and demonstrate its clinical utility. Methods We characterized the performance of eight STR analysis methods (lobSTR, HipSTR, RepeatSeq, ExpansionHunter, TREDPARSE, GangSTR, STRetch, and exSTRa) on next-generation sequencing datasets of samples with known disease-causing full-mutation STR expansions and genomes simulated to harbor repeat expansions at selected loci and optimized their sensitivity. We then used a machine learning decision tree classifier to identify an optimal combination of methods for full-mutation detection. In Burrows-Wheeler Aligner (BWA)-aligned genomes, the ensemble approach of using ExpansionHunter, STRetch, and exSTRa performed the best (precision = 82%, recall = 100%, F1-score = 90%). We applied this pipeline to screen 301 families of children with suspected genetic disorders. Results We identified 10 individuals with full-mutations in the AR, ATXN1, ATXN8, DMPK, FXN, or HTT disease STR locus in the analyzed families. Additional candidates identified in our analysis include two probands with borderline ATXN2 expansions between the established repeat size range for reduced-penetrance and full-penetrance full-mutation and seven individuals with FMR1 CGG repeats in the intermediate/premutation repeat size range. In 67 probands with a prior negative clinical PCR test for the FMR1, FXN, or DMPK disease STR locus, or the spinocerebellar ataxia disease STR panel, our pipeline did not falsely identify aberrant expansion. We performed clinical PCR tests on seven (out of 10) full-mutation samples identified by our pipeline and confirmed the expansion status in all, showing absolute concordance between our bioinformatics and molecular findings. Conclusions We have successfully demonstrated the application of a well-optimized bioinformatics pipeline that promotes the utility of genome-wide sequencing as a first-tier screening test to detect expansions of known disease STRs. Interrogating clinical next-generation sequencing data for pathogenic STR expansions using our ensemble pipeline can improve diagnostic yield and enhance clinical outcomes for patients with repeat expansion disorders. Supplementary Information The online version contains supplementary material available at 10.1186/s13073-021-00932-9.
Collapse
Affiliation(s)
- Indhu-Shree Rajan-Babu
- Department of Medical Genetics, University of British Columbia and Children's & Women's Hospital, Vancouver, BC, V6H3N1, Canada. .,Department of Medical and Molecular Genetics, King's College London, Strand, London, WC2R 2LS, UK.
| | - Junran J Peng
- Department of Medical Genetics, University of British Columbia and Children's & Women's Hospital, Vancouver, BC, V6H3N1, Canada
| | - Readman Chiu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, V5Z4S6, Canada
| | | | | | - Chenkai Li
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, V5Z4S6, Canada.,Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, V6T1Z4, Canada
| | - Arezoo Mohajeri
- Department of Medical Genetics, University of British Columbia and Children's & Women's Hospital, Vancouver, BC, V6H3N1, Canada
| | | | | | - Inanc Birol
- Department of Medical Genetics, University of British Columbia and Children's & Women's Hospital, Vancouver, BC, V6H3N1, Canada.,Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, V5Z4S6, Canada
| | - Jan M Friedman
- Department of Medical Genetics, University of British Columbia and Children's & Women's Hospital, Vancouver, BC, V6H3N1, Canada
| |
Collapse
|
49
|
Zeevi DA, Chung WK, Levi C, Scher SY, Bringer R, Kahan Y, Muallem H, Benel R, Hirsch Y, Weiden T, Ekstein A, Ekstein J. Recommendation of premarital genetic screening in the Syrian Jewish community based on mutation carrier frequencies within Syrian Jewish cohorts. Mol Genet Genomic Med 2021; 9:e1756. [PMID: 34288589 PMCID: PMC8404236 DOI: 10.1002/mgg3.1756] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 05/06/2021] [Accepted: 07/08/2021] [Indexed: 01/16/2023] Open
Abstract
Background There is a paucity of information available regarding the carrier frequency for autosomal recessive pathogenic variants among Syrian Jews. This report provides data to support carrier screening for a group of autosomal recessive conditions among Syrian Jews based on the population frequency of 40 different pathogenic variants in a cohort of over 3800 individuals with Syrian Jewish ancestry. Methods High throughput PCR amplicon sequencing was used to genotype 40 disease‐causing variants in 3840 and 5279 individuals of Syrian and Iranian Jewish ancestry, respectively. These data were compared with Ashkenazi Jewish carrier frequencies for the same variants, based on roughly 370,000 Ashkenazi Jewish individuals in the Dor Yeshorim database. Results Carrier screening identified pathogenic variants shared among Syrian, Iranian, and Ashkenazi Jewish groups. In addition, alleles unique to each group were identified. Importantly, 8.2% of 3401 individuals of mixed Syrian Jewish ancestry were carriers for at least one pathogenic variant. Conclusion The findings of this study support the clinical usefulness of premarital genetic screening for individuals with Syrian Jewish ancestry to reduce the incidence of autosomal recessive disease among persons with Syrian Jewish heritage.
Collapse
Affiliation(s)
- David A Zeevi
- Dor Yeshorim, The Committee for Prevention of Jewish Genetic Diseases, Jerusalem, Israel
| | | | - Chaim Levi
- Dor Yeshorim, The Committee for Prevention of Jewish Genetic Diseases, Jerusalem, Israel
| | - Sholem Y Scher
- Dor Yeshorim, The Committee for Prevention of Jewish Genetic Diseases, Brooklyn, NY, USA
| | - Rachel Bringer
- Dor Yeshorim, The Committee for Prevention of Jewish Genetic Diseases, Jerusalem, Israel
| | - Yael Kahan
- Dor Yeshorim, The Committee for Prevention of Jewish Genetic Diseases, Jerusalem, Israel
| | - Hagit Muallem
- Dor Yeshorim, The Committee for Prevention of Jewish Genetic Diseases, Jerusalem, Israel
| | - Rinat Benel
- Dor Yeshorim, The Committee for Prevention of Jewish Genetic Diseases, Jerusalem, Israel
| | - Yoel Hirsch
- Dor Yeshorim, The Committee for Prevention of Jewish Genetic Diseases, Brooklyn, NY, USA
| | - Tzvi Weiden
- Dor Yeshorim, The Committee for Prevention of Jewish Genetic Diseases, Jerusalem, Israel
| | - Ahron Ekstein
- Dor Yeshorim, The Committee for Prevention of Jewish Genetic Diseases, Jerusalem, Israel
| | - Josef Ekstein
- Dor Yeshorim, The Committee for Prevention of Jewish Genetic Diseases, Brooklyn, NY, USA
| |
Collapse
|
50
|
Nasrollahzadehsabet M, Esmeilzadeh E, Shirmohammady N, Heidari MF. The Effect of EDTA Buffer and Temperature on DNA Extraction from Teeth for Molecular Forensic Assessment. ANNALS OF MILITARY AND HEALTH SCIENCES RESEARCH 2021; 19. [DOI: 10.5812/amh.113043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
: Using DNA to generate genetic profiles of individuals is an efficient and accurate technique. Achieving the right and net amount of DNA is one of the challenges in this area. Due to tissues destruction after death, it is usually very difficult to achieve proper DNA. So, the use of hard tissues such as bones and teeth as important resources can help in these cases. Accordingly, the use of ion chelating buffers is one of the most important parts of preparing these tissues to extract DNA. In this study, a buffer containing ethylenediaminetetraacetic acid (EDTA) buffer (0.5 mM) and distilled water was used as a control. Different temperatures were also examined. The average concentration of DNA extracted from the sample into ternary sort at a temperature of 55°C, 37°C, 22°C, and 4°C was equal to 19.68 ng/µL, 12.23 ng/µL, 17.19 ng/µL, and 15.06 ng/µL, respectively. For evaluation, sterile distilled water was used instead of buffer, which was equal to 7.9 ng/µL at 55°C. Based on the results of this study, the buffer containing EDTA was found to be suitable for releasing genomic resources from bones and teeth.
Collapse
|