1
|
An Z, Jiang A, Chen J. Toward understanding the role of genomic repeat elements in neurodegenerative diseases. Neural Regen Res 2025; 20:646-659. [PMID: 38886931 DOI: 10.4103/nrr.nrr-d-23-01568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 03/02/2024] [Indexed: 06/20/2024] Open
Abstract
Neurodegenerative diseases cause great medical and economic burdens for both patients and society; however, the complex molecular mechanisms thereof are not yet well understood. With the development of high-coverage sequencing technology, researchers have started to notice that genomic repeat regions, previously neglected in search of disease culprits, are active contributors to multiple neurodegenerative diseases. In this review, we describe the association between repeat element variants and multiple degenerative diseases through genome-wide association studies and targeted sequencing. We discuss the identification of disease-relevant repeat element variants, further powered by the advancement of long-read sequencing technologies and their related tools, and summarize recent findings in the molecular mechanisms of repeat element variants in brain degeneration, such as those causing transcriptional silencing or RNA-mediated gain of toxic function. Furthermore, we describe how in silico predictions using innovative computational models, such as deep learning language models, could enhance and accelerate our understanding of the functional impact of repeat element variants. Finally, we discuss future directions to advance current findings for a better understanding of neurodegenerative diseases and the clinical applications of genomic repeat elements.
Collapse
Affiliation(s)
- Zhengyu An
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Aidi Jiang
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Jingqi Chen
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Fudan University, Shanghai, China
- Zhangjiang Fudan International Innovation Center, Shanghai, China
| |
Collapse
|
2
|
Qi F, Chen X, Wang J, Niu X, Li S, Huang S, Ran X. Genome-wide characterization of structure variations in the Xiang pig for genetic resistance to African swine fever. Virulence 2024; 15:2382762. [PMID: 39092797 PMCID: PMC11299630 DOI: 10.1080/21505594.2024.2382762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 05/07/2024] [Accepted: 07/12/2024] [Indexed: 08/04/2024] Open
Abstract
African swine fever (ASF) is a rapidly fatal viral haemorrhagic fever in Chinese domestic pigs. Although very high mortality is observed in pig farms after an ASF outbreak, clinically healthy and antibody-positive pigs are found in those farms, and viral detection is rare from these pigs. The ability of pigs to resist ASF viral infection may be modulated by host genetic variations. However, the genetic basis of the resistance of domestic pigs against ASF remains unclear. We generated a comprehensive set of structural variations (SVs) in a Chinese indigenous Xiang pig with ASF-resistant (Xiang-R) and ASF-susceptible (Xiang-S) phenotypes using whole-genome resequencing method. A total of 53,589 nonredundant SVs were identified, with an average of 25,656 SVs per individual in the Xiang pig genome, including insertion, deletion, inversion and duplication variations. The Xiang-R group harboured more SVs than the Xiang-S group. The F-statistics (FST) was carried out to reveal genetic differences between two populations using the resequencing data at each SV locus. We identified 2,414 population-stratified SVs and annotated 1,152 Ensembl genes (including 986 protein-coding genes), in which 1,326 SVs might disturb the structure and expression of the Ensembl genes. Those protein-coding genes were mainly enriched in the Wnt, Hippo, and calcium signalling pathways. Other important pathways associated with the ASF viral infection were also identified, such as the endocytosis, apoptosis, focal adhesion, Fc gamma R-mediated phagocytosis, junction, NOD-like receptor, PI3K-Akt, and c-type lectin receptor signalling pathways. Finally, we identified 135 candidate adaptive genes overlapping 166 SVs that were involved in the virus entry and virus-host cell interactions. The fact that some of population-stratified SVs regions detected as selective sweep signals gave another support for the genetic variations affecting pig resistance against ASF. The research indicates that SVs play an important role in the evolutionary processes of Xiang pig adaptation to ASF infection.
Collapse
Affiliation(s)
- Fenfang Qi
- Institute of Agro-Bioengineering, Key Laboratory of Plant Resource Conservation and Germplasm Innovation in Mountainous Region (Ministry of Education), College of Life Sciences, College of Animal Science, Guizhou University, Guiyang, Guizhou Province, China
| | - Xia Chen
- Institute of Agro-Bioengineering, Key Laboratory of Plant Resource Conservation and Germplasm Innovation in Mountainous Region (Ministry of Education), College of Life Sciences, College of Animal Science, Guizhou University, Guiyang, Guizhou Province, China
| | - Jiafu Wang
- Institute of Agro-Bioengineering, Key Laboratory of Plant Resource Conservation and Germplasm Innovation in Mountainous Region (Ministry of Education), College of Life Sciences, College of Animal Science, Guizhou University, Guiyang, Guizhou Province, China
| | - Xi Niu
- Institute of Agro-Bioengineering, Key Laboratory of Plant Resource Conservation and Germplasm Innovation in Mountainous Region (Ministry of Education), College of Life Sciences, College of Animal Science, Guizhou University, Guiyang, Guizhou Province, China
| | - Sheng Li
- Institute of Agro-Bioengineering, Key Laboratory of Plant Resource Conservation and Germplasm Innovation in Mountainous Region (Ministry of Education), College of Life Sciences, College of Animal Science, Guizhou University, Guiyang, Guizhou Province, China
| | - Shihui Huang
- Institute of Agro-Bioengineering, Key Laboratory of Plant Resource Conservation and Germplasm Innovation in Mountainous Region (Ministry of Education), College of Life Sciences, College of Animal Science, Guizhou University, Guiyang, Guizhou Province, China
| | - Xueqin Ran
- Institute of Agro-Bioengineering, Key Laboratory of Plant Resource Conservation and Germplasm Innovation in Mountainous Region (Ministry of Education), College of Life Sciences, College of Animal Science, Guizhou University, Guiyang, Guizhou Province, China
| |
Collapse
|
3
|
Höps W, Rausch T, Jendrusch M, Korbel JO, Sedlazeck FJ. Impact and characterization of serial structural variations across humans and great apes. Nat Commun 2024; 15:8007. [PMID: 39266513 PMCID: PMC11393467 DOI: 10.1038/s41467-024-52027-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 08/23/2024] [Indexed: 09/14/2024] Open
Abstract
Modern sequencing technology enables the systematic detection of complex structural variation (SV) across genomes. However, extensive DNA rearrangements arising through a series of mutations, a phenomenon we refer to as serial SV (sSV), remain underexplored, posing a challenge for SV discovery. Here, we present NAHRwhals ( https://github.com/WHops/NAHRwhals ), a method to infer repeat-mediated series of SVs in long-read genomic assemblies. Applying NAHRwhals to haplotype-resolved human genomes from 28 individuals reveals 37 sSV loci of various length and complexity. These sSVs explain otherwise cryptic variation in medically relevant regions such as the TPSAB1 gene, 8p23.1, 22q11 and Sotos syndrome regions. Comparisons with great ape assemblies indicate that most human sSVs formed recently, after the human-ape split, and involved non-repeat-mediated processes in addition to non-allelic homologous recombination. NAHRwhals reliably discovers and characterizes sSVs at scale and independent of species, uncovering their genomic abundance and suggesting broader implications for disease.
Collapse
Affiliation(s)
- Wolfram Höps
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
| | - Tobias Rausch
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
- Molecular Medicine Partnership Unit, European Molecular Biology Laboratory, University of Heidelberg, Heidelberg, Germany
| | - Michael Jendrusch
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| |
Collapse
|
4
|
Köroğlu Ç, Chen P, Traurig M, Altok S, Bogardus C, Baier LJ. De Novo Genome Assemblies From Two Indigenous Americans from Arizona Identify New Polymorphisms in Non-Reference Sequences. Genome Biol Evol 2024; 16:evae188. [PMID: 39190003 PMCID: PMC11384899 DOI: 10.1093/gbe/evae188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 05/17/2024] [Accepted: 08/22/2024] [Indexed: 08/28/2024] Open
Abstract
There is a collective push to diversify human genetic studies by including underrepresented populations. However, analyzing DNA sequence reads involves the initial step of aligning the reads to the GRCh38/hg38 reference genome which is inadequate for non-European ancestries. In this study, using long-read sequencing technology, we constructed de novo genome assemblies from two indigenous Americans from Arizona (IAZ). Each assembly included ∼17 Mb of DNA sequence not present [nonreference sequence (NRS)] in hg38, which consists mostly of repeat elements. Forty NRSs totaling 240 kb were uniquely anchored to the hg38 primary assembly generating a modified hg38-NRS reference genome. DNA sequence alignment and variant calling were then conducted with whole-genome sequencing (WGS) sequencing data from 387 IAZ using both the hg38 and modified hg38-NRS reference maps. Variant calling with the hg38-NRS map identified ∼50,000 single-nucleotide variants present in at least 5% of the WGS samples which were not detected with the hg38 reference map. We also directly assessed the NRSs positioned within genes. Seventeen NRSs anchored to regions including an identical 187 bp NRS found in both de novo assemblies. The NRS is located in HCN2 79 bp downstream of Exon 3 and contains several putative transcriptional regulatory elements. Genotyping of the HCN2-NRS revealed that the insertion is enriched in IAZ (minor allele frequency = 0.45) compared to other reference populations tested. This study shows that inclusion of population-specific NRSs can dramatically change the variant profile in an underrepresented ethnic groups and thereby lead to the discovery of previously missed common variations.
Collapse
Affiliation(s)
- Çiğdem Köroğlu
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Peng Chen
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Michael Traurig
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Serdar Altok
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Clifton Bogardus
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Leslie J Baier
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| |
Collapse
|
5
|
Mirus T, Lohmayer R, Döhring C, Halldórsson BV, Kehr B. GGTyper: genotyping complex structural variants using short-read sequencing data. Bioinformatics 2024; 40:ii11-ii19. [PMID: 39230689 PMCID: PMC11373317 DOI: 10.1093/bioinformatics/btae391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open
Abstract
MOTIVATION Complex structural variants (SVs) are genomic rearrangements that involve multiple segments of DNA. They contribute to human diversity and have been shown to cause Mendelian disease. Nevertheless, our abilities to analyse complex SVs are very limited. As opposed to deletions and other canonical types of SVs, there are no established tools that have explicitly been designed for analysing complex SVs. RESULTS Here, we describe a new computational approach that we specifically designed for genotyping complex SVs in short-read sequenced genomes. Given a variant description, our approach computes genotype-specific probability distributions for observing aligned read pairs with a wide range of properties. Subsequently, these distributions can be used to efficiently determine the most likely genotype for any set of aligned read pairs observed in a sequenced genome. In addition, we use these distributions to compute a genotyping difficulty for a given variant, which predicts the amount of data needed to achieve a reliable call. Careful evaluation confirms that our approach outperforms other genotypers by making reliable genotype predictions across both simulated and real data. On up to 7829 human genomes, we achieve high concordance with population-genetic assumptions and expected inheritance patterns. On simulated data, we show that precision correlates well with our prediction of genotyping difficulty. This together with low memory and time requirements makes our approach well-suited for application in biomedical studies involving small to very large numbers of short-read sequenced genomes. AVAILABILITY AND IMPLEMENTATION Source code is available at https://github.com/kehrlab/Complex-SV-Genotyping.
Collapse
Affiliation(s)
- Tim Mirus
- AG Algorithmic Bioinformatics, Leibniz-Institut für Immuntherapie, Regensburg 93053, Germany
| | - Robert Lohmayer
- AG Algorithmic Bioinformatics, Leibniz-Institut für Immuntherapie, Regensburg 93053, Germany
| | - Clementine Döhring
- AG Algorithmic Bioinformatics, Leibniz-Institut für Immuntherapie, Regensburg 93053, Germany
| | - Bjarni V Halldórsson
- deCODE genetics/Amgen Inc, Reykjavik 101, Iceland
- School of Technology, Reykjavik University, Reykjavic 102, Iceland
| | - Birte Kehr
- AG Algorithmic Bioinformatics, Leibniz-Institut für Immuntherapie, Regensburg 93053, Germany
- Fakultät für Informatik und Data Science, Universität Regensburg, Regensburg 93053, Germany
| |
Collapse
|
6
|
Rojas de Oliveira H, Chud TCS, Oliveira GA, Hermisdorff IC, Narayana SG, Rochus CM, Butty AM, Malchiodi F, Stothard P, Miglior F, Baes CF, Schenkel FS. Genome-wide association analyses reveal copy number variant regions associated with reproduction and disease traits in Canadian Holstein cattle. J Dairy Sci 2024; 107:7052-7063. [PMID: 38788846 DOI: 10.3168/jds.2023-24295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 04/01/2024] [Indexed: 05/26/2024]
Abstract
This study aimed to evaluate the impact of copy number variants (CNV) on 13 reproduction and 12 disease traits in Holstein cattle. Intensity signal files containing log R ratio and B allele frequency information from 13,730 Holstein animals genotyped with a 95K SNP panel, and 8,467 Holstein animals genotyped with a 50K SNP panel were used to identify the CNVs. Subsequently, the identified CNVs were validated using whole-genome sequence data from 126 animals, resulting in 870 high-confidence copy number variant regions (CNVR) on 12,131 animals. Out of these, 54 CNVR had frequencies higher than or equal to 1% in the population and were used in the genome-wide association analysis (one CNVR at a time, including the G matrix). Results revealed that 4 CNVR were significantly associated with at least one of the traits analyzed in this study. Specifically, 2 CNVR were associated with 3 reproduction traits (i.e., calf survival, first service to conception, and nonreturn rate), and 2 CNVR were associated with 2 disease traits (i.e., metritis and retained placenta). These CNVR harbored genes implicated in immune response, cellular signaling, and neuronal development, supporting their potential involvement in these traits. Further investigations to unravel the mechanistic and functional implications of these CNVR on the mentioned traits are warranted.
Collapse
Affiliation(s)
- Hinayah Rojas de Oliveira
- Department of Animal Sciences, Purdue University, West Lafayette, IN 47907; Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada N1G 2W1.
| | - Tatiane C S Chud
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada N1G 2W1
| | - Gerson A Oliveira
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada N1G 2W1
| | - Isis C Hermisdorff
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada N1G 2W1
| | - Saranya G Narayana
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada N1G 2W1; Lactanet, Guelph, ON, Canada N1K 1E5
| | - Christina M Rochus
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada N1G 2W1
| | | | - Francesca Malchiodi
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada N1G 2W1; Semex, Guelph, ON, Canada N1H 6J2
| | - Paul Stothard
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada T6G 2H1
| | - Filippo Miglior
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada N1G 2W1; Lactanet, Guelph, ON, Canada N1K 1E5
| | - Christine F Baes
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada N1G 2W1; Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern, Switzerland 3012
| | - Flavio S Schenkel
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada N1G 2W1.
| |
Collapse
|
7
|
Wu XR, Wu BS, Kang JJ, Chen LM, Deng YT, Chen SD, Dong Q, Feng JF, Cheng W, Yu JT. Contribution of copy number variations to education, socioeconomic status and cognition from a genome-wide study of 305,401 subjects. Mol Psychiatry 2024:10.1038/s41380-024-02717-z. [PMID: 39215183 DOI: 10.1038/s41380-024-02717-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 08/19/2024] [Accepted: 08/22/2024] [Indexed: 09/04/2024]
Abstract
Educational attainment (EA), socioeconomic status (SES) and cognition are phenotypically and genetically linked to health outcomes. However, the role of copy number variations (CNVs) in influencing EA/SES/cognition remains unclear. Using a large-scale (n = 305,401) genome-wide CNV-level association analysis, we discovered 33 CNV loci significantly associated with EA/SES/cognition, 20 of which were novel (deletions at 2p22.2, 2p16.2, 2p12, 3p25.3, 4p15.2, 5p15.33, 5q21.1, 8p21.3, 9p21.1, 11p14.3, 13q12.13, 17q21.31, and 20q13.33, as well as duplications at 3q12.2, 3q23, 7p22.3, 8p23.1, 8p23.2, 17q12 (105 kb), and 19q13.32). The genes identified in gene-level tests were enriched in biological pathways such as neurodegeneration, telomere maintenance and axon guidance. Phenome-wide association studies further identified novel associations of EA/SES/cognition-associated CNVs with mental and physical diseases, such as 6q27 duplication with upper respiratory disease and 17q12 (105 kb) duplication with mood disorders. Our findings provide a genome-wide CNV profile for EA/SES/cognition and bridge their connections to health. The expanded candidate CNVs database and the residing genes would be a valuable resource for future studies aimed at uncovering the biological mechanisms underlying cognitive function and related clinical phenotypes.
Collapse
Affiliation(s)
- Xin-Rui Wu
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Bang-Sheng Wu
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Ju-Jiao Kang
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Li-Min Chen
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Yue-Ting Deng
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Shi-Dong Chen
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Qiang Dong
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Jian-Feng Feng
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- Department of Computer Science, University of Warwick, Coventry, CV4 7AL, UK
| | - Wei Cheng
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China.
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.
| | - Jin-Tai Yu
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China.
| |
Collapse
|
8
|
Lai S, Wang H, Bork P, Chen WH, Zhao XM. Long-read sequencing reveals extensive gut phageome structural variations driven by genetic exchange with bacterial hosts. SCIENCE ADVANCES 2024; 10:eadn3316. [PMID: 39141729 PMCID: PMC11323893 DOI: 10.1126/sciadv.adn3316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 07/10/2024] [Indexed: 08/16/2024]
Abstract
Genetic variations are instrumental for unraveling phage evolution and deciphering their functional implications. Here, we explore the underlying fine-scale genetic variations in the gut phageome, especially structural variations (SVs). By using virome-enriched long-read metagenomic sequencing across 91 individuals, we identified a total of 14,438 nonredundant phage SVs and revealed their prevalence within the human gut phageome. These SVs are mainly enriched in genes involved in recombination, DNA methylation, and antibiotic resistance. Notably, a substantial fraction of phage SV sequences share close homology with bacterial fragments, with most SVs enriched for horizontal gene transfer (HGT) mechanism. Further investigations showed that these SV sequences were genetic exchanged between specific phage-bacteria pairs, particularly between phages and their respective bacterial hosts. Temperate phages exhibit a higher frequency of genetic exchange with bacterial chromosomes and then virulent phages. Collectively, our findings provide insights into the genetic landscape of the human gut phageome.
Collapse
Affiliation(s)
- Senying Lai
- Department of Neurology, Zhongshan Hospital and Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Huarui Wang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular Imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Peer Bork
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany
- Max Delbrück Centre for Molecular Medicine, Berlin, Germany
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| | - Wei-Hua Chen
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China
- College of Life Science, Henan Normal University, Xinxiang, Henan, China
| | - Xing-Ming Zhao
- Department of Neurology, Zhongshan Hospital and Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| |
Collapse
|
9
|
Stefansson OA, Sigurpalsdottir BD, Rognvaldsson S, Halldorsson GH, Juliusson K, Sveinbjornsson G, Gunnarsson B, Beyter D, Jonsson H, Gudjonsson SA, Olafsdottir TA, Saevarsdottir S, Magnusson MK, Lund SH, Tragante V, Oddsson A, Hardarson MT, Eggertsson HP, Gudmundsson RL, Sverrisson S, Frigge ML, Zink F, Holm H, Stefansson H, Rafnar T, Jonsdottir I, Sulem P, Helgason A, Gudbjartsson DF, Halldorsson BV, Thorsteinsdottir U, Stefansson K. The correlation between CpG methylation and gene expression is driven by sequence variants. Nat Genet 2024; 56:1624-1631. [PMID: 39048797 PMCID: PMC11319203 DOI: 10.1038/s41588-024-01851-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 06/27/2024] [Indexed: 07/27/2024]
Abstract
Gene promoter and enhancer sequences are bound by transcription factors and are depleted of methylated CpG sites (cytosines preceding guanines in DNA). The absence of methylated CpGs in these sequences typically correlates with increased gene expression, indicating a regulatory role for methylation. We used nanopore sequencing to determine haplotype-specific methylation rates of 15.3 million CpG units in 7,179 whole-blood genomes. We identified 189,178 methylation depleted sequences where three or more proximal CpGs were unmethylated on at least one haplotype. A total of 77,789 methylation depleted sequences (~41%) associated with 80,503 cis-acting sequence variants, which we termed allele-specific methylation quantitative trait loci (ASM-QTLs). RNA sequencing of 896 samples from the same blood draws used to perform nanopore sequencing showed that the ASM-QTL, that is, DNA sequence variability, drives most of the correlation found between gene expression and CpG methylation. ASM-QTLs were enriched 40.2-fold (95% confidence interval 32.2, 49.9) among sequence variants associating with hematological traits, demonstrating that ASM-QTLs are important functional units in the noncoding genome.
Collapse
Affiliation(s)
| | - Brynja Dogg Sigurpalsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- School of Technology, Reykjavik University, Reykjavik, Iceland
| | | | - Gisli Hreinn Halldorsson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | | | | | | | | | | | | | - Thorunn Asta Olafsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | - Saedis Saevarsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | - Magnus Karl Magnusson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | - Sigrun Helga Lund
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | | | | | - Marteinn Thor Hardarson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- School of Technology, Reykjavik University, Reykjavik, Iceland
| | | | | | | | | | | | - Hilma Holm
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
| | | | | | - Ingileif Jonsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | | | - Agnar Helgason
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- Department of Anthropology, University of Iceland, Reykjavik, Iceland
| | - Daniel F Gudbjartsson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | - Bjarni V Halldorsson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- School of Technology, Reykjavik University, Reykjavik, Iceland
| | - Unnur Thorsteinsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | - Kari Stefansson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland.
| |
Collapse
|
10
|
Taylor DJ, Eizenga JM, Li Q, Das A, Jenike KM, Kenny EE, Miga KH, Monlong J, McCoy RC, Paten B, Schatz MC. Beyond the Human Genome Project: The Age of Complete Human Genome Sequences and Pangenome References. Annu Rev Genomics Hum Genet 2024; 25:77-104. [PMID: 38663087 DOI: 10.1146/annurev-genom-021623-081639] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2024]
Abstract
The Human Genome Project was an enormous accomplishment, providing a foundation for countless explorations into the genetics and genomics of the human species. Yet for many years, the human genome reference sequence remained incomplete and lacked representation of human genetic diversity. Recently, two major advances have emerged to address these shortcomings: complete gap-free human genome sequences, such as the one developed by the Telomere-to-Telomere Consortium, and high-quality pangenomes, such as the one developed by the Human Pangenome Reference Consortium. Facilitated by advances in long-read DNA sequencing and genome assembly algorithms, complete human genome sequences resolve regions that have been historically difficult to sequence, including centromeres, telomeres, and segmental duplications. In parallel, pangenomes capture the extensive genetic diversity across populations worldwide. Together, these advances usher in a new era of genomics research, enhancing the accuracy of genomic analysis, paving the path for precision medicine, and contributing to deeper insights into human biology.
Collapse
Affiliation(s)
- Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Jordan M Eizenga
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Arun Das
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Katharine M Jenike
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA;
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA;
| | - Karen H Miga
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Jean Monlong
- Institut de Recherche en Santé Digestive, Université de Toulouse, INSERM, INRA, ENVT, UPS, Toulouse, France;
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Benedict Paten
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| |
Collapse
|
11
|
Alvarez Jerez P, Daida K, Grenn FP, Malik L, Miano-Burkhardt A, Makarious MB, Ding J, Gibbs JR, Moore A, Reed X, Nalls MA, Shah S, Mahmoud M, Sedlazeck FJ, Dolzhenko E, Park M, Iwaki H, Casey B, Ryten M, Blauwendraat C, Singleton AB, Billingsley KJ. Characterizing a complex CT-rich haplotype in intron 4 of SNCA using large-scale targeted amplicon long-read sequencing. NPJ Parkinsons Dis 2024; 10:136. [PMID: 39060285 PMCID: PMC11282088 DOI: 10.1038/s41531-024-00749-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 07/04/2024] [Indexed: 07/28/2024] Open
Abstract
Parkinson's disease (PD) is a common neurodegenerative disorder with a significant risk proportion driven by genetics. While much progress has been made, most of the heritability remains unknown. This is in-part because previous genetic studies have focused on the contribution of single nucleotide variants. More complex forms of variation, such as structural variants and tandem repeats, are already associated with several synucleinopathies. However, because more sophisticated sequencing methods are usually required to detect these regions, little is understood regarding their contribution to PD. One example is a polymorphic CT-rich region in intron 4 of the SNCA gene. This haplotype has been suggested to be associated with risk of Lewy Body (LB) pathology in Alzheimer's Disease and SNCA gene expression, but is yet to be investigated in PD. Here, we attempt to resolve this CT-rich haplotype and investigate its role in PD. We performed targeted PacBio HiFi sequencing of the region in 1375 PD cases and 959 controls. We replicate the previously reported associations and a novel association between two PD risk SNVs (rs356182 and rs5019538) and haplotype 4, the largest haplotype. Through quantitative trait locus analyzes we identify a significant haplotype 4 association with alternative CAGE transcriptional start site usage, not leading to significant differential SNCA gene expression in post-mortem frontal cortex brain tissue. Therefore, disease association in this locus might not be biologically driven by this CT-rich repeat region. Our data demonstrates the complexity of this SNCA region and highlights that further follow up functional studies are warranted.
Collapse
Affiliation(s)
- Pilar Alvarez Jerez
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Kensuke Daida
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Francis P Grenn
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Laksh Malik
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Abigail Miano-Burkhardt
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Mary B Makarious
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Jinhui Ding
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - J Raphael Gibbs
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Anni Moore
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Xylena Reed
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Mike A Nalls
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
- DataTecnica LLC, Washington, DC, USA
| | - Syed Shah
- DataTecnica LLC, Washington, DC, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Morgan Park
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Hirotaka Iwaki
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
- DataTecnica LLC, Washington, DC, USA
| | - Bradford Casey
- The Michael J. Fox Foundation for Parkinson's Research, New York, New York, USA
| | - Mina Ryten
- Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
- Uk Dementia Research Institute at the University of Cambridge and Department of Clinical Neurosciences, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Cornelis Blauwendraat
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Andrew B Singleton
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Kimberley J Billingsley
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA.
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA.
| |
Collapse
|
12
|
Yuan N, Jia P. Comprehensive assessment of long-read sequencing platforms and calling algorithms for detection of copy number variation. Brief Bioinform 2024; 25:bbae441. [PMID: 39256200 PMCID: PMC11387058 DOI: 10.1093/bib/bbae441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 07/09/2024] [Accepted: 08/25/2024] [Indexed: 09/12/2024] Open
Abstract
Copy number variations (CNVs) play pivotal roles in disease susceptibility and have been intensively investigated in human disease studies. Long-read sequencing technologies offer opportunities for comprehensive structural variation (SV) detection, and numerous methodologies have been developed recently. Consequently, there is a pressing need to assess these methods and aid researchers in selecting appropriate techniques for CNV detection using long-read sequencing. Hence, we conducted an evaluation of eight CNV calling methods across 22 datasets from nine publicly available samples and 15 simulated datasets, covering multiple sequencing platforms. The overall performance of CNV callers varied substantially and was influenced by the input dataset type, sequencing depth, and CNV type, among others. Specifically, the PacBio CCS sequencing platform outperformed PacBio CLR and Nanopore platforms regarding CNV detection recall rates. A sequencing depth of 10x demonstrated the capability to identify 85% of the CNVs detected in a 50x dataset. Moreover, deletions were more generally detectable than duplications. Among the eight benchmarked methods, cuteSV, Delly, pbsv, and Sniffles2 demonstrated superior accuracy, while SVIM exhibited high recall rates.
Collapse
Affiliation(s)
- Na Yuan
- National Genomics Data Center, China National Center for Bioinformation, Beichen West Road, Chaoyang District, Beijing 100101, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beichen West Road, Chaoyang District, Beijing 100101, China
| | - Peilin Jia
- National Genomics Data Center, China National Center for Bioinformation, Beichen West Road, Chaoyang District, Beijing 100101, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beichen West Road, Chaoyang District, Beijing 100101, China
| |
Collapse
|
13
|
Buckley RM, Ostrander EA. Large-scale genomic analysis of the domestic dog informs biological discovery. Genome Res 2024; 34:811-821. [PMID: 38955465 PMCID: PMC11293549 DOI: 10.1101/gr.278569.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
Recent advances in genomics, coupled with a unique population structure and remarkable levels of variation, have propelled the domestic dog to new levels as a system for understanding fundamental principles in mammalian biology. Central to this advance are more than 350 recognized breeds, each a closed population that has undergone selection for unique features. Genetic variation in the domestic dog is particularly well characterized compared with other domestic mammals, with almost 3000 high-coverage genomes publicly available. Importantly, as the number of sequenced genomes increases, new avenues for analysis are becoming available. Herein, we discuss recent discoveries in canine genomics regarding behavior, morphology, and disease susceptibility. We explore the limitations of current data sets for variant interpretation, tradeoffs between sequencing strategies, and the burgeoning role of long-read genomes for capturing structural variants. In addition, we consider how large-scale collections of whole-genome sequence data drive rare variant discovery and assess the geographic distribution of canine diversity, which identifies Asia as a major source of missing variation. Finally, we review recent comparative genomic analyses that will facilitate annotation of the noncoding genome in dogs.
Collapse
Affiliation(s)
- Reuben M Buckley
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Elaine A Ostrander
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| |
Collapse
|
14
|
Subramanian K, Chopra M, Kahali B. Landscape of genomic structural variations in Indian population-based cohorts: Deeper insights into their prevalence and clinical relevance. HGG ADVANCES 2024; 5:100285. [PMID: 38521976 PMCID: PMC11007539 DOI: 10.1016/j.xhgg.2024.100285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 03/13/2024] [Accepted: 03/20/2024] [Indexed: 03/25/2024] Open
Abstract
Structural variations (SV) are large (>50 base pairs) genomic rearrangements comprising deletions, duplications, insertions, inversions, and translocations. Studying SVs is important because they play active and critical roles in regulating gene expression, determining disease predispositions, and identifying population-specific differences among individuals of diverse ancestries. However, SV discoveries in the Indian population using whole-genome sequencing (WGS) have been limited. In this study, using short-read WGS having an average 42X depth of coverage, we identify and characterize 36,210 SVs from 529 individuals enrolled in population-based cohorts in India. These SVs include 24,574 deletions, 2,913 duplications, 8,710 insertions, and 13 inversions; 1.26% (456 out of 36,210) of the identified SVs can potentially impact the coding regions of genes. Furthermore, 56 of these SVs are highly intolerant to loss-of-function changes to the mapped genes, and five SVs impacting ADAMTS17, CCDC40, and RHCE are common in our study individuals. Seven rare SVs significantly impact dosage sensitivity of genes known to be associated with various clinical phenotypes. Most of the SVs in our study are rare and heterozygous. This fine-scale SV discovery in the underrepresented Indian population provides valuable insights that extend beyond Eurocentric human genetic studies.
Collapse
Affiliation(s)
- Krithika Subramanian
- Centre for Brain Research, Indian Institute of Science, Bangalore 560012, India; Manipal Academy of Higher Education, Manipal, Karnataka 576104, India
| | - Mehak Chopra
- Centre for Brain Research, Indian Institute of Science, Bangalore 560012, India
| | - Bratati Kahali
- Centre for Brain Research, Indian Institute of Science, Bangalore 560012, India.
| |
Collapse
|
15
|
Ji Y, Zhao J, Gong J, Sedlazeck FJ, Fan S. Unveiling novel genetic variants in 370 challenging medically relevant genes using the long read sequencing data of 41 samples from 19 global populations. Mol Genet Genomics 2024; 299:65. [PMID: 38972030 DOI: 10.1007/s00438-024-02158-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 06/16/2024] [Indexed: 07/08/2024]
Abstract
BACKGROUND A large number of challenging medically relevant genes (CMRGs) are situated in complex or highly repetitive regions of the human genome, hindering comprehensive characterization of genetic variants using next-generation sequencing technologies. In this study, we employed long-read sequencing technology, extensively utilized in studying complex genomic regions, to characterize genetic alterations, including short variants (single nucleotide variants and short insertions and deletions) and copy number variations, in 370 CMRGs across 41 individuals from 19 global populations. RESULTS Our analysis revealed high levels of genetic variants in CMRGs, with 68.73% exhibiting copy number variations and 65.20% containing short variants that may disrupt protein function across individuals. Such variants can influence pharmacogenomics, genetic disease susceptibility, and other clinical outcomes. We observed significant differences in CMRG variation across populations, with individuals of African ancestry harboring the highest number of copy number variants and short variants compared to samples from other continents. Notably, 15.79% to 33.96% of short variants were exclusively detectable through long-read sequencing. While the T2T-CHM13 reference genome significantly improved the assembly of CMRG regions, thereby facilitating variant detection in these regions, some regions still lacked resolution. CONCLUSION Our results provide an important reference for future clinical and pharmacogenetic studies, highlighting the need for a comprehensive representation of global genetic diversity in the reference genome and improved variant calling techniques to fully resolve medically relevant genes.
Collapse
Affiliation(s)
- Yanfeng Ji
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Junfan Zhao
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Jiao Gong
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA.
| | - Shaohua Fan
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China.
| |
Collapse
|
16
|
Lamkin M, Gymrek M. The emerging role of tandem repeats in complex traits. Nat Rev Genet 2024; 25:452-453. [PMID: 38714860 DOI: 10.1038/s41576-024-00736-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Affiliation(s)
- Michael Lamkin
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
17
|
Liang H, Sedillo JC, Schrodi SJ, Ikeda A. Structural variants in linkage disequilibrium with GWAS-significant SNPs. Heliyon 2024; 10:e32053. [PMID: 38882374 PMCID: PMC11177133 DOI: 10.1016/j.heliyon.2024.e32053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 05/17/2024] [Accepted: 05/28/2024] [Indexed: 06/18/2024] Open
Abstract
With the recent expansion of structural variant identification in the human genome, understanding the role of these impactful variants in disease architecture is critically important. Currently, a large proportion of genome-wide-significant genome-wide association study (GWAS) single nucleotide polymorphisms (SNPs) are functionally unresolved, raising the possibility that some of these SNPs are associated with disease through linkage disequilibrium with causal structural variants. Hence, understanding the linkage disequilibrium between newly discovered structural variants and statistically significant SNPs may provide a resource for further investigation into disease-associated regions in the genome. Here we present a resource cataloging structural variant-significant SNP pairs in high linkage disequilibrium. The database is composed of (i) SNPs that have exhibited genome-wide significant association with traits, primarily disease phenotypes, (ii) newly released structural variants (SVs), and (iii) linkage disequilibrium values calculated from unphased data. All data files including those detailing SV and GWAS SNP associations and results of GWAS-SNP-SV pairs are available at the SV-SNP LD Database and can be accessed at 'https://github.com/hliang-SchrodiLab/SV_SNPs. Our analysis results represent a useful fine mapping tool for interrogating SVs in linkage disequilibrium with disease-associated SNPs. We anticipate that this resource may play an important role in subsequent studies which investigate incorporating disease causing SVs into disease risk prediction models.
Collapse
Affiliation(s)
- Hao Liang
- Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI, USA
| | - Joni C Sedillo
- Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI, USA
- Computation and Informatics in Biology and Medicine, University of Wisconsin-Madison, Madison, WI, USA
| | - Steven J Schrodi
- Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI, USA
- Computation and Informatics in Biology and Medicine, University of Wisconsin-Madison, Madison, WI, USA
| | - Akihiro Ikeda
- Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI, USA
- McPherson Eye Research Institute, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
18
|
Pan C, Reinert K. Leaf: an ultrafast filter for population-scale long-read SV detection. Genome Biol 2024; 25:155. [PMID: 38872200 PMCID: PMC11170821 DOI: 10.1186/s13059-024-03297-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 06/04/2024] [Indexed: 06/15/2024] Open
Abstract
Advances in sequencing technology have facilitated population-scale long-read structural variant (SV) detection. Arguably, one of the main challenges in population-scale analysis is developing effective computational pipelines. Here, we present a new filter-based pipeline for population-scale long-read SV detection. It better captures SV signals at an early stage than conventional assembly-based or alignment-based pipelines. Assessments in this work suggest that the filter-based pipeline helps better resolve intra-read rearrangements. Moreover, it is also more computationally efficient than conventional pipelines and thus may facilitate population-scale long-read applications.
Collapse
Affiliation(s)
- Chenxu Pan
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany.
| | - Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, 14195, Germany
| |
Collapse
|
19
|
Patel-Tupper D, Kelikian A, Leipertz A, Maryn N, Tjahjadi M, Karavolias NG, Cho MJ, Niyogi KK. Multiplexed CRISPR-Cas9 mutagenesis of rice PSBS1 noncoding sequences for transgene-free overexpression. SCIENCE ADVANCES 2024; 10:eadm7452. [PMID: 38848363 PMCID: PMC11160471 DOI: 10.1126/sciadv.adm7452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Accepted: 05/03/2024] [Indexed: 06/09/2024]
Abstract
Understanding CRISPR-Cas9's capacity to produce native overexpression (OX) alleles would accelerate agronomic gains achievable by gene editing. To generate OX alleles with increased RNA and protein abundance, we leveraged multiplexed CRISPR-Cas9 mutagenesis of noncoding sequences upstream of the rice PSBS1 gene. We isolated 120 gene-edited alleles with varying non-photochemical quenching (NPQ) capacity in vivo-from knockout to overexpression-using a high-throughput screening pipeline. Overexpression increased OsPsbS1 protein abundance two- to threefold, matching fold changes obtained by transgenesis. Increased PsbS protein abundance enhanced NPQ capacity and water-use efficiency. Across our resolved genetic variation, we identify the role of 5'UTR indels and inversions in driving knockout/knockdown and overexpression phenotypes, respectively. Complex structural variants, such as the 252-kb duplication/inversion generated here, evidence the potential of CRISPR-Cas9 to facilitate significant genomic changes with negligible off-target transcriptomic perturbations. Our results may inform future gene-editing strategies for hypermorphic alleles and have advanced the pursuit of gene-edited, non-transgenic rice plants with accelerated relaxation of photoprotection.
Collapse
Affiliation(s)
- Dhruv Patel-Tupper
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA 94720, USA
| | - Armen Kelikian
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Anna Leipertz
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Nina Maryn
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Michelle Tjahjadi
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
| | - Nicholas G. Karavolias
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
| | - Myeong-Je Cho
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
| | - Krishna K. Niyogi
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| |
Collapse
|
20
|
Hu H, Gao R, Gao W, Gao B, Jiang Z, Zhou M, Wang G, Jiang T. SVDF: enhancing structural variation detect from long-read sequencing via automatic filtering strategies. Brief Bioinform 2024; 25:bbae336. [PMID: 38980375 PMCID: PMC11232458 DOI: 10.1093/bib/bbae336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 06/03/2024] [Accepted: 06/27/2024] [Indexed: 07/10/2024] Open
Abstract
Structural variation (SV) is an important form of genomic variation that influences gene function and expression by altering the structure of the genome. Although long-read data have been proven to better characterize SVs, SVs detected from noisy long-read data still include a considerable portion of false-positive calls. To accurately detect SVs in long-read data, we present SVDF, a method that employs a learning-based noise filtering strategy and an SV signature-adaptive clustering algorithm, for effectively reducing the likelihood of false-positive events. Benchmarking results from multiple orthogonal experiments demonstrate that, across different sequencing platforms and depths, SVDF achieves higher calling accuracy for each sample compared to several existing general SV calling tools. We believe that, with its meticulous and sensitive SV detection capability, SVDF can bring new opportunities and advancements to cutting-edge genomic research.
Collapse
Affiliation(s)
- Heng Hu
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Runtian Gao
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Wentao Gao
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Bo Gao
- Department of Radiology, The Second Affiliated Hospital of Harbin Medical University, Harbin 150000, China
| | - Zhongjun Jiang
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Murong Zhou
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150000, China
- State Key Laboratory of Tree Genetics and Breeding, Harbin 150000, China
| | - Tao Jiang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| |
Collapse
|
21
|
Yu Y, Gao R, Luo J. LcDel: deletion variation detection based on clustering and long reads. Front Genet 2024; 15:1404415. [PMID: 38798694 PMCID: PMC11116628 DOI: 10.3389/fgene.2024.1404415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 04/25/2024] [Indexed: 05/29/2024] Open
Abstract
Motivation: Genomic structural variation refers to chromosomal level variations such as genome rearrangement or insertion/deletion, which typically involve larger DNA fragments compared to single nucleotide variations. Deletion is a common type of structural variants in the genome, which may lead to mangy diseases, so the detection of deletions can help to gain insights into the pathogenesis of diseases and provide accurate information for disease diagnosis, treatment, and prevention. Many tools exist for deletion variant detection, but they are still inadequate in some aspects, and most of them ignore the presence of chimeric variants in clustering, resulting in less precise clustering results. Results: In this paper, we present LcDel, which can detect deletion variation based on clustering and long reads. LcDel first finds the candidate deletion sites and then performs the first clustering step using two clustering methods (sliding window-based and coverage-based, respectively) based on the length of the deletion. After that, LcDel immediately uses the second clustering by hierarchical clustering to determine the location and length of the deletion. LcDel is benchmarked against some other structural variation detection tools on multiple datasets, and the results show that LcDel has better detection performance for deletion. The source code is available in https://github.com/cyq1314woaini/LcDel.
Collapse
Affiliation(s)
| | | | - Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo, China
| |
Collapse
|
22
|
Fernández-Suárez E, González-Del Pozo M, Méndez-Vidal C, Martín-Sánchez M, Mena M, de la Morena-Barrio B, Corral J, Borrego S, Antiñolo G. Long-read sequencing improves the genetic diagnosis of retinitis pigmentosa by identifying an Alu retrotransposon insertion in the EYS gene. Mob DNA 2024; 15:9. [PMID: 38704576 PMCID: PMC11069205 DOI: 10.1186/s13100-024-00320-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 04/10/2024] [Indexed: 05/06/2024] Open
Abstract
BACKGROUND Biallelic variants in EYS are the major cause of autosomal recessive retinitis pigmentosa (arRP) in certain populations, a clinically and genetically heterogeneous disease that may lead to legal blindness. EYS is one of the largest genes (~ 2 Mb) expressed in the retina, in which structural variants (SVs) represent a common cause of disease. However, their identification using short-read sequencing (SRS) is not always feasible. Here, we conducted targeted long-read sequencing (T-LRS) using adaptive sampling of EYS on the MinION sequencing platform (Oxford Nanopore Technologies) to definitively diagnose an arRP family, whose affected individuals (n = 3) carried the heterozygous pathogenic deletion of exons 32-33 in the EYS gene. As this was a recurrent variant identified in three additional families in our cohort, we also aimed to characterize the known deletion at the nucleotide level to assess a possible founder effect. RESULTS T-LRS in family A unveiled a heterozygous AluYa5 insertion in the coding exon 43 of EYS (chr6(GRCh37):g.64430524_64430525ins352), which segregated with the disease in compound heterozygosity with the previously identified deletion. Visual inspection of previous SRS alignments using IGV revealed several reads containing soft-clipped bases, accompanied by a slight drop in coverage at the Alu insertion site. This prompted us to develop a simplified program using grep command to investigate the recurrence of this variant in our cohort from SRS data. Moreover, LRS also allowed the characterization of the CNV as a ~ 56.4kb deletion spanning exons 32-33 of EYS (chr6(GRCh37):g.64764235_64820592del). The results of further characterization by Sanger sequencing and linkage analysis in the four families were consistent with a founder variant. CONCLUSIONS To our knowledge, this is the first report of a mobile element insertion into the coding sequence of EYS, as a likely cause of arRP in a family. Our study highlights the value of LRS technology in characterizing and identifying hidden pathogenic SVs, such as retrotransposon insertions, whose contribution to the etiopathogenesis of rare diseases may be underestimated.
Collapse
Affiliation(s)
- Elena Fernández-Suárez
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville (IBiS), University Hospital Virgen del Rocío/CSIC, University of Seville, Seville, Spain
- Center for Biomedical Network Research On Rare Diseases (CIBERER), Seville, Spain
| | - María González-Del Pozo
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville (IBiS), University Hospital Virgen del Rocío/CSIC, University of Seville, Seville, Spain
- Center for Biomedical Network Research On Rare Diseases (CIBERER), Seville, Spain
| | - Cristina Méndez-Vidal
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville (IBiS), University Hospital Virgen del Rocío/CSIC, University of Seville, Seville, Spain
- Center for Biomedical Network Research On Rare Diseases (CIBERER), Seville, Spain
| | - Marta Martín-Sánchez
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville (IBiS), University Hospital Virgen del Rocío/CSIC, University of Seville, Seville, Spain
- Center for Biomedical Network Research On Rare Diseases (CIBERER), Seville, Spain
| | - Marcela Mena
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville (IBiS), University Hospital Virgen del Rocío/CSIC, University of Seville, Seville, Spain
- Center for Biomedical Network Research On Rare Diseases (CIBERER), Seville, Spain
| | - Belén de la Morena-Barrio
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, IMIB-Pascual Parrilla, CIBERER-ISCIII, Murcia, Spain
| | - Javier Corral
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, IMIB-Pascual Parrilla, CIBERER-ISCIII, Murcia, Spain
| | - Salud Borrego
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville (IBiS), University Hospital Virgen del Rocío/CSIC, University of Seville, Seville, Spain.
- Center for Biomedical Network Research On Rare Diseases (CIBERER), Seville, Spain.
| | - Guillermo Antiñolo
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville (IBiS), University Hospital Virgen del Rocío/CSIC, University of Seville, Seville, Spain.
- Center for Biomedical Network Research On Rare Diseases (CIBERER), Seville, Spain.
| |
Collapse
|
23
|
Steyaert W, Sagath L, Demidov G, Yépez VA, Esteve-Codina A, Gagneur J, Ellwanger K, Derks R, Weiss M, den Ouden A, van den Heuvel S, Swinkels H, Zomer N, Steehouwer M, O'Gorman L, Astuti G, Neveling K, Schüle R, Xu J, Synofzik M, Beijer D, Hengel H, Schöls L, Claeys KG, Baets J, Van de Vondel L, Ferlini A, Selvatici R, Morsy H, Saeed Abd Elmaksoud M, Straub V, Müller J, Pini V, Perry L, Sarkozy A, Zaharieva I, Muntoni F, Bugiardini E, Polavarapu K, Horvath R, Reid E, Lochmüller H, Spinazzi M, Savarese M, Matalonga L, Laurie S, Brunner HG, Graessner H, Beltran S, Ossowski S, Vissers LELM, Gilissen C, Hoischen A. Unravelling undiagnosed rare disease cases by HiFi long-read genome sequencing. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.03.24305331. [PMID: 38746462 PMCID: PMC11092722 DOI: 10.1101/2024.05.03.24305331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Solve-RD is a pan-European rare disease (RD) research program that aims to identify disease-causing genetic variants in previously undiagnosed RD families. We utilised 10-fold coverage HiFi long-read sequencing (LRS) for detecting causative structural variants (SVs), single nucleotide variants (SNVs), insertion-deletions (InDels), and short tandem repeat (STR) expansions in extensively studied RD families without clear molecular diagnoses. Our cohort includes 293 individuals from 114 genetically undiagnosed RD families selected by European Rare Disease Network (ERN) experts. Of these, 21 families were affected by so-called 'unsolvable' syndromes for which genetic causes remain unknown, and 93 families with at least one individual affected by a rare neurological, neuromuscular, or epilepsy disorder without genetic diagnosis despite extensive prior testing. Clinical interpretation and orthogonal validation of variants in known disease genes yielded thirteen novel genetic diagnoses due to de novo and rare inherited SNVs, InDels, SVs, and STR expansions. In an additional four families, we identified a candidate disease-causing SV affecting several genes including an MCF2 / FGF13 fusion and PSMA3 deletion. However, no common genetic cause was identified in any of the 'unsolvable' syndromes. Taken together, we found (likely) disease-causing genetic variants in 13.0% of previously unsolved families and additional candidate disease-causing SVs in another 4.3% of these families. In conclusion, our results demonstrate the added value of HiFi long-read genome sequencing in undiagnosed rare diseases.
Collapse
|
24
|
Schloissnig S, Pani S, Rodriguez-Martin B, Ebler J, Hain C, Tsapalou V, Söylev A, Hüther P, Ashraf H, Prodanov T, Asparuhova M, Hunt S, Rausch T, Marschall T, Korbel JO. Long-read sequencing and structural variant characterization in 1,019 samples from the 1000 Genomes Project. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.18.590093. [PMID: 38659906 PMCID: PMC11042266 DOI: 10.1101/2024.04.18.590093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Structural variants (SVs) contribute significantly to human genetic diversity and disease 1-4 . Previously, SVs have remained incompletely resolved by population genomics, with short-read sequencing facing limitations in capturing the whole spectrum of SVs at nucleotide resolution 5-7 . Here we leveraged nanopore sequencing 8 to construct an intermediate coverage resource of 1,019 long-read genomes sampled within 26 human populations from the 1000 Genomes Project. By integrating linear and graph-based approaches for SV analysis via pangenome graph-augmentation, we uncover 167,291 sequence-resolved SVs in these samples, considerably advancing SV characterization compared to population-wide short-read sequencing studies 3,4 . Our analysis details diverse SV classes-deletions, duplications, insertions, and inversions-at population-scale. LINE-1 and SVA retrotransposition activities frequently mediate transductions 9,10 of unique sequences, with both mobile element classes transducing sequences at either the 3'- or 5'-end, depending on the source element locus. Furthermore, analyses of SV breakpoint junctions suggest a continuum of homology-mediated rearrangement processes are integral to SV formation, and highlight evidence for SV recurrence involving repeat sequences. Our open-access dataset underscores the transformative impact of long-read sequencing in advancing the characterisation of polymorphic genomic architectures, and provides a resource for guiding variant prioritisation in future long-read sequencing-based disease studies.
Collapse
|
25
|
Perez-Becerril C, Burghel GJ, Hartley C, Rowlands CF, Evans DG, Smith MJ. Improved sensitivity for detection of pathogenic variants in familial NF2-related schwannomatosis. J Med Genet 2024; 61:452-458. [PMID: 38302265 DOI: 10.1136/jmg-2023-109586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 12/07/2023] [Indexed: 02/03/2024]
Abstract
PURPOSE To determine the impact of additional genetic screening techniques on the rate of detection of pathogenic variants leading to familial NF2-related schwannomatosis. METHODS We conducted genetic screening of a cohort of 168 second-generation individuals meeting the clinical criteria for NF2-related schwannomatosis. In addition to the current clinical screening techniques, targeted next-generation sequencing (NGS) and multiplex ligation-dependent probe amplification analysis, we applied additional genetic screening techniques, including karyotype and RNA analysis. For characterisation of a complex structural variant, we also performed long-read sequencing analysis. RESULTS Additional genetic analysis resulted in increased sensitivity of detection of pathogenic variants from 87% to 95% in our second-generation NF2-related schwannomatosis cohort. A number of pathogenic variants identified through extended analysis had been previously observed after NGS analysis but had been overlooked or classified as variants of uncertain significance. CONCLUSION Our study indicates there is added value in performing additional genetic analysis for detection of pathogenic variants that are difficult to identify with current clinical genetic screening methods. In particular, RNA analysis is valuable for accurate classification of non-canonical splicing variants. Karyotype analysis and whole genome sequencing analysis are of particular value for identification of large and/or complex structural variants, with additional advantages in the use of long-read sequencing techniques.
Collapse
Affiliation(s)
- Cristina Perez-Becerril
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Manchester, UK
- Division of Evolution, Infection and Genomics, School of Biological Sciences, The University of Manchester, Manchester, UK
| | - George J Burghel
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Manchester, UK
- Division of Evolution, Infection and Genomics, School of Biological Sciences, The University of Manchester, Manchester, UK
| | - Claire Hartley
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Manchester, UK
| | - Charles F Rowlands
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - D Gareth Evans
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Manchester, UK
- Division of Evolution, Infection and Genomics, School of Biological Sciences, The University of Manchester, Manchester, UK
| | - Miriam J Smith
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Manchester, UK
- Division of Evolution, Infection and Genomics, School of Biological Sciences, The University of Manchester, Manchester, UK
| |
Collapse
|
26
|
Hujoel MLA, Handsaker RE, Sherman MA, Kamitaki N, Barton AR, Mukamel RE, Terao C, McCarroll SA, Loh PR. Protein-altering variants at copy number-variable regions influence diverse human phenotypes. Nat Genet 2024; 56:569-578. [PMID: 38548989 PMCID: PMC11018521 DOI: 10.1038/s41588-024-01684-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 02/08/2024] [Indexed: 04/09/2024]
Abstract
Copy number variants (CNVs) are among the largest genetic variants, yet CNVs have not been effectively ascertained in most genetic association studies. Here we ascertained protein-altering CNVs from UK Biobank whole-exome sequencing data (n = 468,570) using haplotype-informed methods capable of detecting subexonic CNVs and variation within segmental duplications. Incorporating CNVs into analyses of rare variants predicted to cause gene loss of function (LOF) identified 100 associations of predicted LOF variants with 41 quantitative traits. A low-frequency partial deletion of RGL3 exon 6 conferred one of the strongest protective effects of gene LOF on hypertension risk (odds ratio = 0.86 (0.82-0.90)). Protein-coding variation in rapidly evolving gene families within segmental duplications-previously invisible to most analysis methods-generated some of the human genome's largest contributions to variation in type 2 diabetes risk, chronotype and blood cell traits. These results illustrate the potential for new genetic insights from genomic variation that has escaped large-scale analysis to date.
Collapse
Affiliation(s)
- Margaux L A Hujoel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Robert E Handsaker
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Maxwell A Sherman
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Serinus Biosciences Inc., New York, NY, USA
| | - Nolan Kamitaki
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Alison R Barton
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Ronen E Mukamel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- Department of Applied Genetics, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| | - Steven A McCarroll
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Po-Ru Loh
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
27
|
Chen Z, Finnell RH, Lei Y, Wang H. Progress and clinical prospect of genomic structural variants investigation. Sci Bull (Beijing) 2024; 69:705-708. [PMID: 38310047 DOI: 10.1016/j.scib.2024.01.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2024]
Affiliation(s)
- Zhongzhong Chen
- Obstetrics and Gynecology Hospital, State Key Laboratory of Genetic Engineering, Institute of Reproduction and Development, Fudan University, Shanghai 200011, China; Shanghai Children's Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200062, China
| | - Richard H Finnell
- Center for Precision Environmental Health, Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston 77030, USA; Departments of Molecular and Human Genetics and Medicine, Baylor College of Medicine, One Baylor Plaza, Houston 77030, USA
| | - Yunping Lei
- Center for Precision Environmental Health, Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston 77030, USA.
| | - Hongyan Wang
- Obstetrics and Gynecology Hospital, State Key Laboratory of Genetic Engineering, Institute of Reproduction and Development, Fudan University, Shanghai 200011, China; Shanghai Key Laboratory of Metabolic Remodelling and Health, Institute of Metabolism and Integrative Biology, Fudan University, Shanghai 200438, China; Children's Hospital of Fudan University, Shanghai 201102, China.
| |
Collapse
|
28
|
Wu Z, Li T, Jiang Z, Zheng J, Gu Y, Liu Y, Liu Y, Xie Z. Human pangenome analysis of sequences missing from the reference genome reveals their widespread evolutionary, phenotypic, and functional roles. Nucleic Acids Res 2024; 52:2212-2230. [PMID: 38364871 PMCID: PMC10954445 DOI: 10.1093/nar/gkae086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 01/18/2024] [Accepted: 01/27/2024] [Indexed: 02/18/2024] Open
Abstract
Nonreference sequences (NRSs) are DNA sequences present in global populations but absent in the current human reference genome. However, the extent and functional significance of NRSs in the human genomes and populations remains unclear. Here, we de novo assembled 539 genomes from five genetically divergent human populations using long-read sequencing technology, resulting in the identification of 5.1 million NRSs. These were merged into 45284 unique NRSs, with 29.7% being novel discoveries. Among these NRSs, 38.7% were common across the five populations, and 35.6% were population specific. The use of a graph-based pangenome approach allowed for the detection of 565 transcript expression quantitative trait loci on NRSs, with 426 of these being novel findings. Moreover, 26 NRS candidates displayed evidence of adaptive selection within human populations. Genes situated in close proximity to or intersecting with these candidates may be associated with metabolism and type 2 diabetes. Genome-wide association studies revealed 14 NRSs to be significantly associated with eight phenotypes. Additionally, 154 NRSs were found to be in strong linkage disequilibrium with 258 phenotype-associated SNPs in the GWAS catalogue. Our work expands the understanding of human NRSs and provides novel insights into their functions, facilitating evolutionary and biomedical researches.
Collapse
Affiliation(s)
- Zhikun Wu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Tong Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Zehang Jiang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Jingjing Zheng
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yizhou Gu
- Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China
- University of Wisconsin-Madison, WI, USA
| | - Yizhi Liu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yun Liu
- MOE Key Laboratory of Metabolism and Molecular Medicine, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences and Shanghai Xuhui Central Hospital, Fudan University, Shanghai, China
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
- Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
29
|
Leonard AS, Mapel XM, Pausch H. Pangenome-genotyped structural variation improves molecular phenotype mapping in cattle. Genome Res 2024; 34:300-309. [PMID: 38355307 PMCID: PMC10984387 DOI: 10.1101/gr.278267.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 02/01/2024] [Indexed: 02/16/2024]
Abstract
Expression and splicing quantitative trait loci (e/sQTL) are large contributors to phenotypic variability. Achieving sufficient statistical power for e/sQTL mapping requires large cohorts with both genotypes and molecular phenotypes, and so, the genomic variation is often called from short-read alignments, which are unable to comprehensively resolve structural variation. Here we build a pangenome from 16 HiFi haplotype-resolved cattle assemblies to identify small and structural variation and genotype them with PanGenie in 307 short-read samples. We find high (>90%) concordance of PanGenie-genotyped and DeepVariant-called small variation and confidently genotype close to 21 million small and 43,000 structural variants in the larger population. We validate 85% of these structural variants (with MAF > 0.1) directly with a subset of 25 short-read samples that also have medium coverage HiFi reads. We then conduct e/sQTL mapping with this comprehensive variant set in a subset of 117 cattle that have testis transcriptome data, and find 92 structural variants as causal candidates for eQTL and 73 for sQTL. We find that roughly half of the top associated structural variants affecting expression or splicing are transposable elements, such as SV-eQTL for STN1 and MYH7 and SV-sQTL for CEP89 and ASAH2 Extensive linkage disequilibrium between small and structural variation results in only 28 additional eQTL and 17 sQTL discovered when including SVs, although many top associated SVs are compelling candidates.
Collapse
Affiliation(s)
| | - Xena M Mapel
- Animal Genomics, ETH Zurich, 8092 Zurich, Switzerland
| | - Hubert Pausch
- Animal Genomics, ETH Zurich, 8092 Zurich, Switzerland
| |
Collapse
|
30
|
Sigurpalsdottir BD, Stefansson OA, Holley G, Beyter D, Zink F, Hardarson MÞ, Sverrisson SÞ, Kristinsdottir N, Magnusdottir DN, Magnusson OÞ, Gudbjartsson DF, Halldorsson BV, Stefansson K. A comparison of methods for detecting DNA methylation from long-read sequencing of human genomes. Genome Biol 2024; 25:69. [PMID: 38468278 PMCID: PMC10929077 DOI: 10.1186/s13059-024-03207-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 02/28/2024] [Indexed: 03/13/2024] Open
Abstract
BACKGROUND Long-read sequencing can enable the detection of base modifications, such as CpG methylation, in single molecules of DNA. The most commonly used methods for long-read sequencing are nanopore developed by Oxford Nanopore Technologies (ONT) and single molecule real-time (SMRT) sequencing developed by Pacific Bioscience (PacBio). In this study, we systematically compare the performance of CpG methylation detection from long-read sequencing. RESULTS We demonstrate that CpG methylation detection from 7179 nanopore-sequenced DNA samples is highly accurate and consistent with 132 oxidative bisulfite-sequenced (oxBS) samples, isolated from the same blood draws. We introduce quality filters for CpGs that further enhance the accuracy of CpG methylation detection from nanopore-sequenced DNA, while removing at most 30% of CpGs. We evaluate the per-site performance of CpG methylation detection across different genomic features and CpG methylation rates and demonstrate how the latest R10.4 flowcell chemistry and base-calling algorithms improve methylation detection from nanopore sequencing. Additionally, we show how the methylation detection of 50 SMRT-sequenced genomes compares to nanopore sequencing and oxBS. CONCLUSIONS This study provides the first systematic comparison of CpG methylation detection tools for long-read sequencing methods. We compare two commonly used computational methods for the detection of CpG methylation in a large number of nanopore genomes, including samples sequenced using the latest R10.4 nanopore flowcell chemistry and 50 SMRT sequenced samples. We provide insights into the strengths and limitations of each sequencing method as well as recommendations for standardization and evaluation of tools designed for genome-scale modified base detection using long-read sequencing.
Collapse
Affiliation(s)
- Brynja D Sigurpalsdottir
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland.
- School of Technology, Reykjavík University, Reykjavík, Iceland.
| | | | | | - Doruk Beyter
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland
| | - Florian Zink
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland
| | - Marteinn Þ Hardarson
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland
- School of Technology, Reykjavík University, Reykjavík, Iceland
| | | | | | | | | | - Daniel F Gudbjartsson
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavík, Iceland
| | - Bjarni V Halldorsson
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland.
- School of Technology, Reykjavík University, Reykjavík, Iceland.
| | - Kari Stefansson
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland
- Faculty of Medicine, School of Health Science, University of Iceland, Reykjavík, Iceland
| |
Collapse
|
31
|
Wang Z, Fu G, Ma G, Wang C, Wang Q, Lu C, Fu L, Zhang X, Cong B, Li S. The association between DNA methylation and human height and a prospective model of DNA methylation-based height prediction. Hum Genet 2024; 143:401-421. [PMID: 38507014 DOI: 10.1007/s00439-024-02659-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 02/13/2024] [Indexed: 03/22/2024]
Abstract
As a vital anthropometric characteristic, human height information not only helps to understand overall developmental status and genetic risk factors, but is also important for forensic DNA phenotyping. We utilized linear regression analysis to test the association between each CpG probe and the height phenotype. Next, we designed a methylation sequencing panel targeting 959 CpGs and subsequent height inference models were constructed for the Chinese population. A total of 11,730 height-associated sites were identified. By employing KPCA and deep neural networks, a prediction model was developed, of which the cross-validation RMSE, MAE and R2 were 5.62 cm, 4.45 cm and 0.64, respectively. Genetic factors could explain 39.4% of the methylation level variance of sites used in the height inference models. Collectively, we demonstrated an association between height and DNA methylation status through an EWAS analysis. Targeted methylation sequencing of only 959 CpGs combined with deep learning techniques could provide a model to estimate human height with higher accuracy than SNP-based prediction models.
Collapse
Affiliation(s)
- Zhonghua Wang
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Guangping Fu
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Guanju Ma
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Chunyan Wang
- Physical Examination Center of Shijiazhuang People's Hospital, Shijiazhuang, 050011, Hebei, China
| | - Qian Wang
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Chaolong Lu
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Lihong Fu
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Xiaojing Zhang
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Bin Cong
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China
| | - Shujin Li
- College of Forensic Medicine, Hebei Key Laboratory of Forensic Medicine, Collaborative Innovation Center of Forensic Medical Molecular Identification, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Hebei Medical University, Chinese Academy of Medical Sciences, Shijiazhuang, 050017, Hebei, China.
| |
Collapse
|
32
|
Nakamura W, Hirata M, Oda S, Chiba K, Okada A, Mateos RN, Sugawa M, Iida N, Ushiama M, Tanabe N, Sakamoto H, Sekine S, Hirasawa A, Kawai Y, Tokunaga K, Tsujimoto SI, Shiba N, Ito S, Yoshida T, Shiraishi Y. Assessing the efficacy of target adaptive sampling long-read sequencing through hereditary cancer patient genomes. NPJ Genom Med 2024; 9:11. [PMID: 38368425 PMCID: PMC10874402 DOI: 10.1038/s41525-024-00394-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 01/15/2024] [Indexed: 02/19/2024] Open
Abstract
Innovations in sequencing technology have led to the discovery of novel mutations that cause inherited diseases. However, many patients with suspected genetic diseases remain undiagnosed. Long-read sequencing technologies are expected to significantly improve the diagnostic rate by overcoming the limitations of short-read sequencing. In addition, Oxford Nanopore Technologies (ONT) offers adaptive sampling and computationally driven target enrichment technology. This enables more affordable intensive analysis of target gene regions compared to standard non-selective long-read sequencing. In this study, we developed an efficient computational workflow for target adaptive sampling long-read sequencing (TAS-LRS) and evaluated it through application to 33 genomes collected from suspected hereditary cancer patients. Our workflow can identify single nucleotide variants with nearly the same accuracy as the short-read platform and elucidate complex forms of structural variations. We also newly identified several SINE-R/VNTR/Alu (SVA) elements affecting the APC gene in two patients with familial adenomatous polyposis, as well as their sites of origin. In addition, we demonstrated that off-target reads from adaptive sampling, which is typically discarded, can be effectively used to accurately genotype common single-nucleotide polymorphisms (SNPs) across the entire genome, enabling the calculation of a polygenic risk score. Furthermore, we identified allele-specific MLH1 promoter hypermethylation in a Lynch syndrome patient. In summary, our workflow with TAS-LRS can simultaneously capture monogenic risk variants including complex structural variations, polygenic background as well as epigenetic alterations, and will be an efficient platform for genetic disease research and diagnosis.
Collapse
Affiliation(s)
- Wataru Nakamura
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
- Department of Pediatrics, Yokohama City University Hospital, Kanagawa, Japan
| | - Makoto Hirata
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Department of Molecular Pathology, National Cancer Center Research Institute, Tokyo, Japan
| | - Satoyo Oda
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Division of Laboratory Medicine, National Cancer Center Hospital, Tokyo, Japan
| | - Kenichi Chiba
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Ai Okada
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Raúl Nicolás Mateos
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Masahiro Sugawa
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Naoko Iida
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Mineko Ushiama
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Department of Clinical Genetics, National Cancer Center Research Institute, Tokyo, Japan
| | - Noriko Tanabe
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
| | - Hiromi Sakamoto
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Department of Clinical Genetics, National Cancer Center Research Institute, Tokyo, Japan
| | - Shigeki Sekine
- Division of Molecular Pathology, National Cancer Center Research Institute, Tokyo, Japan
| | - Akira Hirasawa
- Department of Clinical Genetics and Genomic Medicine, Okayama University Hospital, Okayama, Japan
| | - Yosuke Kawai
- Genome Medical Science Project, Research Institute, National Center for Global Health and Medicine, Tokyo, Japan
| | - Katsushi Tokunaga
- Genome Medical Science Project, Research Institute, National Center for Global Health and Medicine, Tokyo, Japan
- Central Biobank, National Center Biobank Network, Tokyo, Japan
| | - Shin-Ichi Tsujimoto
- Department of Pediatrics, Yokohama City University Hospital, Kanagawa, Japan
| | - Norio Shiba
- Department of Pediatrics, Yokohama City University Hospital, Kanagawa, Japan
| | - Shuichi Ito
- Department of Pediatrics, Yokohama City University Hospital, Kanagawa, Japan
| | - Teruhiko Yoshida
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Department of Clinical Genetics, National Cancer Center Research Institute, Tokyo, Japan
| | - Yuichi Shiraishi
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan.
| |
Collapse
|
33
|
Audano PA, Beck CR. Small polymorphisms are a source of ancestral bias in structural variant breakpoint placement. Genome Res 2024; 34:7-19. [PMID: 38176712 PMCID: PMC10904011 DOI: 10.1101/gr.278203.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 01/02/2024] [Indexed: 01/06/2024]
Abstract
High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥50 bp) has improved to near base pair precision. Despite these advances, many SV breakpoint locations are subject to systematic bias affecting variant representation. To understand why SV breakpoints are inconsistent across samples, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identify 882 SV insertions and 180 SV deletions with variable breakpoints not anchored in tandem repeats (TRs) or segmental duplications (SDs). SVs called from aligned sequencing reads increase breakpoint disagreements by 2×-16×. Sequence accuracy had a minimal impact on breakpoints, but we observe a strong effect of ancestry. We confirm that SNP and indel polymorphisms are enriched at shifted breakpoints and are also absent from variant callsets. Breakpoint homology increases the likelihood of imprecise SV calls and the distance they are shifted, and tandem duplications are the most heavily affected SVs. Because graph genome methods normalize SV calls across samples, we investigated graphs generated by two different methods and find the resulting breakpoints are subject to other technical biases affecting breakpoint accuracy. The breakpoint inconsistencies we characterize affect ∼5% of the SVs called in a human genome and can impact variant interpretation and annotation. These limitations underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoints, and increase the value of callsets for investigating breakpoint features.
Collapse
Affiliation(s)
- Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA;
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, Connecticut 06030, USA
| |
Collapse
|
34
|
Gueuning M, Thun GA, Trost N, Schneider L, Sigurdardottir S, Engström C, Larbes N, Merki Y, Frey BM, Gassner C, Meyer S, Mattle-Greminger MP. Resolving Genotype-Phenotype Discrepancies of the Kidd Blood Group System Using Long-Read Nanopore Sequencing. Biomedicines 2024; 12:225. [PMID: 38275395 PMCID: PMC10813000 DOI: 10.3390/biomedicines12010225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 01/15/2024] [Accepted: 01/16/2024] [Indexed: 01/27/2024] Open
Abstract
Due to substantial improvements in read accuracy, third-generation long-read sequencing holds great potential in blood group diagnostics, particularly in cases where traditional genotyping or sequencing techniques, primarily targeting exons, fail to explain serological phenotypes. In this study, we employed Oxford Nanopore sequencing to resolve all genotype-phenotype discrepancies in the Kidd blood group system (JK, encoded by SLC14A1) observed over seven years of routine high-throughput donor genotyping using a mass spectrometry-based platform at the Blood Transfusion Service, Zurich. Discrepant results from standard serological typing and donor genotyping were confirmed using commercial PCR-SSP kits. To resolve discrepancies, we amplified the entire coding region of SLC14A1 (~24 kb, exons 3 to 10) in two overlapping long-range PCRs in all samples. Amplicons were barcoded and sequenced on a MinION flow cell. Sanger sequencing and bridge-PCRs were used to confirm findings. Among 11,972 donors with both serological and genotype data available for the Kidd system, we identified 10 cases with unexplained conflicting results. Five were linked to known weak and null alleles caused by variants not included in the routine donor genotyping. In two cases, we identified novel null alleles on the JK*01 (Gly40Asp; c.119G>A) and JK*02 (Gly242Glu; c.725G>A) haplotypes, respectively. Remarkably, the remaining three cases were associated with a yet unknown deletion of ~5 kb spanning exons 9-10 of the JK*01 allele, which other molecular methods had failed to detect. Overall, nanopore sequencing demonstrated reliable and accurate performance for detecting both single-nucleotide and structural variants. It possesses the potential to become a robust tool in the molecular diagnostic portfolio, particularly for addressing challenging structural variants such as hybrid genes, deletions and duplications.
Collapse
Affiliation(s)
- Morgan Gueuning
- Department of Research and Development, Blood Transfusion Service Zurich, Swiss Red Cross, Rütistrasse 19, 8952 Schlieren, Switzerland
| | - Gian Andri Thun
- Department of Research and Development, Blood Transfusion Service Zurich, Swiss Red Cross, Rütistrasse 19, 8952 Schlieren, Switzerland
| | - Nadine Trost
- Department of Molecular Diagnostics and Cytometry, Blood Transfusion Service Zurich, Swiss Red Cross, 8952 Schlieren, Switzerland
| | - Linda Schneider
- Department of Molecular Diagnostics and Cytometry, Blood Transfusion Service Zurich, Swiss Red Cross, 8952 Schlieren, Switzerland
| | - Sonja Sigurdardottir
- Department of Molecular Diagnostics and Cytometry, Blood Transfusion Service Zurich, Swiss Red Cross, 8952 Schlieren, Switzerland
| | - Charlotte Engström
- Department of Immunohematology, Blood Transfusion Service Zurich, Swiss Red Cross, 8952 Schlieren, Switzerland; (C.E.)
| | - Naemi Larbes
- Department of Immunohematology, Blood Transfusion Service Zurich, Swiss Red Cross, 8952 Schlieren, Switzerland; (C.E.)
| | - Yvonne Merki
- Department of Molecular Diagnostics and Cytometry, Blood Transfusion Service Zurich, Swiss Red Cross, 8952 Schlieren, Switzerland
| | - Beat M. Frey
- Department of Research and Development, Blood Transfusion Service Zurich, Swiss Red Cross, Rütistrasse 19, 8952 Schlieren, Switzerland
- Department of Molecular Diagnostics and Cytometry, Blood Transfusion Service Zurich, Swiss Red Cross, 8952 Schlieren, Switzerland
- Department of Immunohematology, Blood Transfusion Service Zurich, Swiss Red Cross, 8952 Schlieren, Switzerland; (C.E.)
| | - Christoph Gassner
- Institute of Translational Medicine, Private University in the Principality of Liechtenstein, 9495 Triesen, Liechtenstein;
| | - Stefan Meyer
- Department of Molecular Diagnostics and Cytometry, Blood Transfusion Service Zurich, Swiss Red Cross, 8952 Schlieren, Switzerland
| | - Maja P. Mattle-Greminger
- Department of Research and Development, Blood Transfusion Service Zurich, Swiss Red Cross, Rütistrasse 19, 8952 Schlieren, Switzerland
| |
Collapse
|
35
|
Auwerx C, Jõeloo M, Sadler MC, Tesio N, Ojavee S, Clark CJ, Mägi R, Reymond A, Kutalik Z. Rare copy-number variants as modulators of common disease susceptibility. Genome Med 2024; 16:5. [PMID: 38185688 PMCID: PMC10773105 DOI: 10.1186/s13073-023-01265-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 11/27/2023] [Indexed: 01/09/2024] Open
Abstract
BACKGROUND Copy-number variations (CNVs) have been associated with rare and debilitating genomic disorders (GDs) but their impact on health later in life in the general population remains poorly described. METHODS Assessing four modes of CNV action, we performed genome-wide association scans (GWASs) between the copy-number of CNV-proxy probes and 60 curated ICD-10 based clinical diagnoses in 331,522 unrelated white British UK Biobank (UKBB) participants with replication in the Estonian Biobank. RESULTS We identified 73 signals involving 40 diseases, all of which indicating that CNVs increased disease risk and caused earlier onset. We estimated that 16% of these associations are indirect, acting by increasing body mass index (BMI). Signals mapped to 45 unique, non-overlapping regions, nine of which being linked to known GDs. Number and identity of genes affected by CNVs modulated their pathogenicity, with many associations being supported by colocalization with both common and rare single-nucleotide variant association signals. Dissection of association signals provided insights into the epidemiology of known gene-disease pairs (e.g., deletions in BRCA1 and LDLR increased risk for ovarian cancer and ischemic heart disease, respectively), clarified dosage mechanisms of action (e.g., both increased and decreased dosage of 17q12 impacted renal health), and identified putative causal genes (e.g., ABCC6 for kidney stones). Characterization of the pleiotropic pathological consequences of recurrent CNVs at 15q13, 16p13.11, 16p12.2, and 22q11.2 in adulthood indicated variable expressivity of these regions and the involvement of multiple genes. Finally, we show that while the total burden of rare CNVs-and especially deletions-strongly associated with disease risk, it only accounted for ~ 0.02% of the UKBB disease burden. These associations are mainly driven by CNVs at known GD CNV regions, whose pleiotropic effect on common diseases was broader than anticipated by our CNV-GWAS. CONCLUSIONS Our results shed light on the prominent role of rare CNVs in determining common disease susceptibility within the general population and provide actionable insights for anticipating later-onset comorbidities in carriers of recurrent CNVs.
Collapse
Affiliation(s)
- Chiara Auwerx
- Center for Integrative Genomics, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland.
- Department of Computational Biology, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
- University Center for Primary Care and Public Health, 1005, Lausanne, Switzerland.
| | - Maarja Jõeloo
- Institute of Molecular and Cell Biology, University of Tartu, 51010, Tartu, Estonia
- Estonian Genome Centre, Institute of Genomics, University of Tartu, 51010, Tartu, Estonia
| | - Marie C Sadler
- Department of Computational Biology, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
- University Center for Primary Care and Public Health, 1005, Lausanne, Switzerland
| | - Nicolò Tesio
- Center for Integrative Genomics, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland
| | - Sven Ojavee
- Department of Computational Biology, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Charlie J Clark
- Center for Integrative Genomics, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland
| | - Reedik Mägi
- Estonian Genome Centre, Institute of Genomics, University of Tartu, 51010, Tartu, Estonia
| | - Alexandre Reymond
- Center for Integrative Genomics, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland.
| | - Zoltán Kutalik
- Department of Computational Biology, University of Lausanne, Genopode building, 1015, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
- University Center for Primary Care and Public Health, 1005, Lausanne, Switzerland.
| |
Collapse
|
36
|
Smolka M, Paulin LF, Grochowski CM, Horner DW, Mahmoud M, Behera S, Kalef-Ezra E, Gandhi M, Hong K, Pehlivan D, Scholz SW, Carvalho CMB, Proukakis C, Sedlazeck FJ. Detection of mosaic and population-level structural variants with Sniffles2. Nat Biotechnol 2024:10.1038/s41587-023-02024-y. [PMID: 38168980 PMCID: PMC11217151 DOI: 10.1038/s41587-023-02024-y] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 10/11/2023] [Indexed: 01/05/2024]
Abstract
Calling structural variations (SVs) is technically challenging, but using long reads remains the most accurate way to identify complex genomic alterations. Here we present Sniffles2, which improves over current methods by implementing a repeat aware clustering coupled with a fast consensus sequence and coverage-adaptive filtering. Sniffles2 is 11.8 times faster and 29% more accurate than state-of-the-art SV callers across different coverages (5-50×), sequencing technologies (ONT and HiFi) and SV types. Furthermore, Sniffles2 solves the problem of family-level to population-level SV calling to produce fully genotyped VCF files. Across 11 probands, we accurately identified causative SVs around MECP2, including highly complex alleles with three overlapping SVs. Sniffles2 also enables the detection of mosaic SVs in bulk long-read data. As a result, we identified multiple mosaic SVs in brain tissue from a patient with multiple system atrophy. The identified SV showed a remarkable diversity within the cingulate cortex, impacting both genes involved in neuron function and repetitive elements.
Collapse
Affiliation(s)
- Moritz Smolka
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Luis F Paulin
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | | | - Dominic W Horner
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Sairam Behera
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Ester Kalef-Ezra
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Mira Gandhi
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Karl Hong
- Bionano Genomics, San Diego, CA, USA
| | - Davut Pehlivan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Division of Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Sonja W Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Claudia M B Carvalho
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Christos Proukakis
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
37
|
Chaisson MJP, Sulovari A, Valdmanis PN, Miller DE, Eichler EE. Advances in the discovery and analyses of human tandem repeats. Emerg Top Life Sci 2023; 7:361-381. [PMID: 37905568 PMCID: PMC10806765 DOI: 10.1042/etls20230074] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/18/2023] [Accepted: 10/18/2023] [Indexed: 11/02/2023]
Abstract
Long-read sequencing platforms provide unparalleled access to the structure and composition of all classes of tandemly repeated DNA from STRs to satellite arrays. This review summarizes our current understanding of their organization within the human genome, their importance with respect to disease, as well as the advances and challenges in understanding their genetic diversity and functional effects. Novel computational methods are being developed to visualize and associate these complex patterns of human variation with disease, expression, and epigenetic differences. We predict accurate characterization of this repeat-rich form of human variation will become increasingly relevant to both basic and clinical human genetics.
Collapse
Affiliation(s)
- Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, U.S.A
- The Genomic and Epigenomic Regulation Program, USC Norris Cancer Center, University of Southern California, Los Angeles, CA 90089, U.S.A
| | - Arvis Sulovari
- Computational Biology, Cajal Neuroscience Inc, Seattle, WA 98102, U.S.A
| | - Paul N Valdmanis
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
| | - Danny E Miller
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, U.S.A
- Department of Pediatrics, University of Washington, Seattle, WA 98195, U.S.A
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, U.S.A
| |
Collapse
|
38
|
Shi J, Jia Z, Sun J, Wang X, Zhao X, Zhao C, Liang F, Song X, Guan J, Jia X, Yang J, Chen Q, Yu K, Jia Q, Wu J, Wang D, Xiao Y, Xu X, Liu Y, Wu S, Zhong Q, Wu J, Cui S, Bo X, Wu Z, Park M, Kellis M, He K. Structural variants involved in high-altitude adaptation detected using single-molecule long-read sequencing. Nat Commun 2023; 14:8282. [PMID: 38092772 PMCID: PMC10719358 DOI: 10.1038/s41467-023-44034-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 11/27/2023] [Indexed: 12/17/2023] Open
Abstract
Structural variants (SVs), accounting for a larger fraction of the genome than SNPs/InDels, are an important pool of genetic variation, enabling environmental adaptations. Here, we perform long-read sequencing data of 320 Tibetan and Han samples and show that SVs are highly involved in high-altitude adaptation. We expand the landscape of global SVs, apply robust models of selection and population differentiation combining SVs, SNPs and InDels, and use epigenomic analyses to predict enhancers, target genes and biological functions. We reveal diverse Tibetan-specific SVs affecting the regulatory circuitry of biological functions, including the hypoxia response, energy metabolism and pulmonary function. We find a Tibetan-specific deletion disrupts a super-enhancer and downregulates EPAS1 using enhancer reporter, cellular knock-out and DNA pull-down assays. Our study expands the global SV landscape, reveals the role of gene-regulatory circuitry rewiring in human adaptation, and illustrates the diverse functional roles of SVs in human biology.
Collapse
Affiliation(s)
- Jinlong Shi
- Medical Big Data Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
- National Engineering Research Center of Medical Big Data, Chinese PLA General Hospital, Beijing, 100853, China
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
- Beijing Key Laboratory for Precision Medicine of Chronic Heart Failure, Chinese PLA General Hospital, Beijing, China
| | - Zhilong Jia
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
- Beijing Key Laboratory for Precision Medicine of Chronic Heart Failure, Chinese PLA General Hospital, Beijing, China
- Medical Artificial Intelligence Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
| | - Jinxiu Sun
- Medical Big Data Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
- National Engineering Research Center of Medical Big Data, Chinese PLA General Hospital, Beijing, 100853, China
| | - Xiaoreng Wang
- Laboratory of Nuclear and Radiation Injury, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
- State Key Laboratory of Experimental Hematology, Beijing, 100853, China
| | - Xiaojing Zhao
- Beijing Key Laboratory for Precision Medicine of Chronic Heart Failure, Chinese PLA General Hospital, Beijing, China
- Translational Medicine Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
| | - Chenghui Zhao
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
- Research Center for Biomedical Engineering, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
| | - Fan Liang
- NextOmics Biosciences Inc, Wuhan, 430000, China
| | - Xinyu Song
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
- Medical Artificial Intelligence Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
| | - Jiawei Guan
- Medical Big Data Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
- National Engineering Research Center of Medical Big Data, Chinese PLA General Hospital, Beijing, 100853, China
| | - Xue Jia
- Laboratory of Nuclear and Radiation Injury, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
| | - Jing Yang
- Laboratory of Nuclear and Radiation Injury, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
| | - Qi Chen
- Medical Big Data Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
- National Engineering Research Center of Medical Big Data, Chinese PLA General Hospital, Beijing, 100853, China
| | - Kang Yu
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
| | - Qian Jia
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
| | - Jing Wu
- Medical Big Data Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
- National Engineering Research Center of Medical Big Data, Chinese PLA General Hospital, Beijing, 100853, China
| | - Depeng Wang
- NextOmics Biosciences Inc, Wuhan, 430000, China
| | - Yuhui Xiao
- NextOmics Biosciences Inc, Wuhan, 430000, China
| | - Xiaoman Xu
- NextOmics Biosciences Inc, Wuhan, 430000, China
| | - Yinzhe Liu
- NextOmics Biosciences Inc, Wuhan, 430000, China
| | - Shijing Wu
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
| | - Qin Zhong
- Medical Big Data Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China
- National Engineering Research Center of Medical Big Data, Chinese PLA General Hospital, Beijing, 100853, China
| | - Jue Wu
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China
| | - Saijia Cui
- Beijing Key Laboratory for Precision Medicine of Chronic Heart Failure, Chinese PLA General Hospital, Beijing, China
| | - Xiaochen Bo
- Beijing Institute of Radiation Medicine, Beijing, 100850, China
| | | | | | - Manolis Kellis
- Massachusetts Institute of Technology; MIT Computer Science and Artificial Intelligence Laboratory, Broad Institute of MIT and Harvard, Cambridge, 02139, MA, USA
| | - Kunlun He
- Medical Big Data Research Center, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing, 100853, China.
- National Engineering Research Center of Medical Big Data, Chinese PLA General Hospital, Beijing, 100853, China.
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing, 100853, China.
- Beijing Key Laboratory for Precision Medicine of Chronic Heart Failure, Chinese PLA General Hospital, Beijing, China.
| |
Collapse
|
39
|
Reis ALM, Rapadas M, Hammond JM, Gamaarachchi H, Stevanovski I, Ayuputeri Kumaheri M, Chintalaphani SR, Dissanayake DSB, Siggs OM, Hewitt AW, Llamas B, Brown A, Baynam G, Mann GJ, McMorran BJ, Easteal S, Hermes A, Jenkins MR, Patel HR, Deveson IW. The landscape of genomic structural variation in Indigenous Australians. Nature 2023; 624:602-610. [PMID: 38093003 PMCID: PMC10733147 DOI: 10.1038/s41586-023-06842-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 11/07/2023] [Indexed: 12/20/2023]
Abstract
Indigenous Australians harbour rich and unique genomic diversity. However, Aboriginal and Torres Strait Islander ancestries are historically under-represented in genomics research and almost completely missing from reference datasets1-3. Addressing this representation gap is critical, both to advance our understanding of global human genomic diversity and as a prerequisite for ensuring equitable outcomes in genomic medicine. Here we apply population-scale whole-genome long-read sequencing4 to profile genomic structural variation across four remote Indigenous communities. We uncover an abundance of large insertion-deletion variants (20-49 bp; n = 136,797), structural variants (50 b-50 kb; n = 159,912) and regions of variable copy number (>50 kb; n = 156). The majority of variants are composed of tandem repeat or interspersed mobile element sequences (up to 90%) and have not been previously annotated (up to 62%). A large fraction of structural variants appear to be exclusive to Indigenous Australians (12% lower-bound estimate) and most of these are found in only a single community, underscoring the need for broad and deep sampling to achieve a comprehensive catalogue of genomic structural variation across the Australian continent. Finally, we explore short tandem repeats throughout the genome to characterize allelic diversity at 50 known disease loci5, uncover hundreds of novel repeat expansion sites within protein-coding genes, and identify unique patterns of diversity and constraint among short tandem repeat sequences. Our study sheds new light on the dimensions and dynamics of genomic structural variation within and beyond Australia.
Collapse
Affiliation(s)
- Andre L M Reis
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
- Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia
| | - Melissa Rapadas
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
| | - Jillian M Hammond
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
| | - Hasindu Gamaarachchi
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
- School of Computer Science and Engineering, University of New South Wales, Sydney, New South Wales, Australia
| | - Igor Stevanovski
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
| | - Meutia Ayuputeri Kumaheri
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
| | - Sanjog R Chintalaphani
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
- Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia
| | - Duminda S B Dissanayake
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
- Institute for Applied Ecology, University of Canberra, Canberra, Australian Capital Territory, Australia
| | - Owen M Siggs
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
- Department of Ophthalmology, Flinders University, Bedford Park, South Australia, Australia
| | - Alex W Hewitt
- Menzies Institute for Medical Research, University of Tasmania, Hobart, Tasmania, Australia
| | - Bastien Llamas
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
- Australian Centre for Ancient DNA, School of Biological Sciences and Environment Institute, University of Adelaide, Adelaide, South Australia, Australia
- ARC Centre of Excellence for Australian Biodiversity and Heritage, University of Adelaide, Adelaide, South Australia, Australia
- Indigenous Genomics, Telethon Kids Institute, Adelaide, South Australia, Australia
| | - Alex Brown
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
- Indigenous Genomics, Telethon Kids Institute, Adelaide, South Australia, Australia
| | - Gareth Baynam
- Telethon Kids Institute and Division of Paediatrics, Faculty of Health and Medical Sciences, University of Western Australia, Perth, Western Australia, Australia
- Genetic Services of Western Australia, Western Australian Department of Health, Perth, Western Australia, Australia
- Western Australian Register of Developmental Anomalies, Western Australian Department of Health, Perth, Western Australia, Australia
| | - Graham J Mann
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Brendan J McMorran
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Simon Easteal
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Azure Hermes
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Misty R Jenkins
- Immunology Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
| | - Hardip R Patel
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia.
| | - Ira W Deveson
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia.
- Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
40
|
Hinch R, Donnelly P, Hinch AG. Meiotic DNA breaks drive multifaceted mutagenesis in the human germ line. Science 2023; 382:eadh2531. [PMID: 38033082 PMCID: PMC7615360 DOI: 10.1126/science.adh2531] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 09/29/2023] [Indexed: 12/02/2023]
Abstract
Meiotic recombination commences with hundreds of programmed DNA breaks; however, the degree to which they are accurately repaired remains poorly understood. We report that meiotic break repair is eightfold more mutagenic for single-base substitutions than was previously understood, leading to de novo mutation in one in four sperm and one in 12 eggs. Its impact on indels and structural variants is even higher, with 100- to 1300-fold increases in rates per break. We uncovered new mutational signatures and footprints relative to break sites, which implicate unexpected biochemical processes and error-prone DNA repair mechanisms, including translesion synthesis and end joining in meiotic break repair. We provide evidence that these mechanisms drive mutagenesis in human germ lines and lead to disruption of hundreds of genes genome wide.
Collapse
Affiliation(s)
- Robert Hinch
- Big Data Institute, University of Oxford; Oxford, UK
| | - Peter Donnelly
- Wellcome Centre for Human Genetics, University of Oxford; Oxford, UK
- Genomics plc; Oxford, UK
| | | |
Collapse
|
41
|
Xu Z, Li Q, Marchionni L, Wang K. PhenoSV: interpretable phenotype-aware model for the prioritization of genes affected by structural variants. Nat Commun 2023; 14:7805. [PMID: 38016949 PMCID: PMC10684511 DOI: 10.1038/s41467-023-43651-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 11/15/2023] [Indexed: 11/30/2023] Open
Abstract
Structural variants (SVs) represent a major source of genetic variation associated with phenotypic diversity and disease susceptibility. While long-read sequencing can discover over 20,000 SVs per human genome, interpreting their functional consequences remains challenging. Existing methods for identifying disease-related SVs focus on deletion/duplication only and cannot prioritize individual genes affected by SVs, especially for noncoding SVs. Here, we introduce PhenoSV, a phenotype-aware machine-learning model that interprets all major types of SVs and genes affected. PhenoSV segments and annotates SVs with diverse genomic features and employs a transformer-based architecture to predict their impacts under a multiple-instance learning framework. With phenotype information, PhenoSV further utilizes gene-phenotype associations to prioritize phenotype-related SVs. Evaluation on extensive human SV datasets covering all SV types demonstrates PhenoSV's superior performance over competing methods. Applications in diseases suggest that PhenoSV can determine disease-related genes from SVs. A web server and a command-line tool for PhenoSV are available at https://phenosv.wglab.org .
Collapse
Affiliation(s)
- Zhuoran Xu
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Quan Li
- Princess Margaret Cancer Centre, University Health Network, University of Toronto, Toronto, ON, M5G2C1, Canada
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
42
|
Ren L, Duan X, Dong L, Zhang R, Yang J, Gao Y, Peng R, Hou W, Liu Y, Li J, Yu Y, Zhang N, Shang J, Liang F, Wang D, Chen H, Sun L, Hao L, Scherer A, Nordlund J, Xiao W, Xu J, Tong W, Hu X, Jia P, Ye K, Li J, Jin L, Hong H, Wang J, Fan S, Fang X, Zheng Y, Shi L. Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance. Genome Biol 2023; 24:270. [PMID: 38012772 PMCID: PMC10680274 DOI: 10.1186/s13059-023-03109-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 11/13/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND Genomic DNA reference materials are widely recognized as essential for ensuring data quality in omics research. However, relying solely on reference datasets to evaluate the accuracy of variant calling results is incomplete, as they are limited to benchmark regions. Therefore, it is important to develop DNA reference materials that enable the assessment of variant detection performance across the entire genome. RESULTS We established a DNA reference material suite from four immortalized cell lines derived from a family of parents and monozygotic twins. Comprehensive reference datasets of 4.2 million small variants and 15,000 structural variants were integrated and certified for evaluating the reliability of germline variant calls inside the benchmark regions. Importantly, the genetic built-in-truth of the Quartet family design enables estimation of the precision of variant calls outside the benchmark regions. Using the Quartet reference materials along with study samples, batch effects are objectively monitored and alleviated by training a machine learning model with the Quartet reference datasets to remove potential artifact calls. Moreover, the matched RNA and protein reference materials and datasets from the Quartet project enables cross-omics validation of variant calls from multiomics data. CONCLUSIONS The Quartet DNA reference materials and reference datasets provide a unique resource for objectively assessing the quality of germline variant calls throughout the whole-genome regions and improving the reliability of large-scale genomic profiling.
Collapse
Affiliation(s)
- Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Xiaoke Duan
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | | | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, Guangdong, China
| | - Yuechen Gao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Rongxue Peng
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Jingjing Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Fan Liang
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Depeng Wang
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Hui Chen
- OrigiMed Co., Ltd, Shanghai, China
| | - Lele Sun
- Sequanta Technologies Co., Ltd, Shanghai, China
| | | | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Jessica Nordlund
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Department of Medical Sciences, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Wenming Xiao
- Office of Oncologic Diseases, Office of New Drugs, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Xin Hu
- Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Peng Jia
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Jing Wang
- National Institute of Metrology, Beijing, China.
| | - Shaohua Fan
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Xiang Fang
- National Institute of Metrology, Beijing, China.
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Shanghai Cancer Center, Fudan University, Shanghai, China
- International Human Phenome Institutes, Shanghai, China
| |
Collapse
|
43
|
Zhou X, Wang Y, He R, Liu Z, Xu Q, Guo J, Yan X, Li J, Tang B, Zeng S, Sun Q. Microdeletion in distal PLP1 enhancers causes hereditary spastic paraplegia 2. Ann Clin Transl Neurol 2023; 10:1590-1602. [PMID: 37475517 PMCID: PMC10502680 DOI: 10.1002/acn3.51848] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Revised: 06/26/2023] [Accepted: 06/27/2023] [Indexed: 07/22/2023] Open
Abstract
OBJECTIVES Hereditary spastic paraplegia (HSP) is a genetically heterogeneous disease caused by over 70 genes, with a significant number of patients still genetically unsolved. In this study, we recruited a suspected HSP family characterized by spasticity, developmental delay, ataxia and hypomyelination, and intended to reveal its molecular etiology by whole exome sequencing (WES) and long-read sequencing (LRS) analyses. METHODS WES was performed on 13 individuals of the family to identify the causative mutations, including analyses of SNVs (single-nucleotide variants) and CNVs (copy number variants). Accurate circular consensus (CCS) long-read sequencing (LRS) was used to verify the findings of CNV analysis from WES. RESULTS SNVs analysis identified a missense variant c.195G>T (p.E65D) of MORF4L2 at Xq22.2 co-segregating in this family from WES data. Further CNVs analysis revealed a microdeletion, which was adjacent to the MORF4L2 gene, also co-segregating in this family. LRS verified this microdeletion and confirmed the deletion range (chrX: 103,690,507-103,715,018, hg38) with high resolution at nucleotide level accuracy. INTERPRETATIONS In this study, we identified an Xq22.2 microdeletion (about 24.5 kb), which contains distal enhancers of the PLP1 gene, as a likely cause of SPG2 in this family. The lack of distal enhancers may result in transcriptional repression of PLP1 in oligodendrocytes, potentially affecting its role in the maintenance of myelin, and causing SPG2 phenotype. This study has highlighted the importance of noncoding genomic alterations in the genetic etiology of SPG2.
Collapse
Affiliation(s)
- Xun Zhou
- Department of Geriatric Neurology, Xiangya HospitalCentral South UniversityChangshaChina
| | - Yige Wang
- Department of Neurology, Xiangya HospitalCentral South UniversityChangshaChina
| | - Runcheng He
- Department of Neurology, Xiangya HospitalCentral South UniversityChangshaChina
| | - Zhenhua Liu
- Department of Neurology, Xiangya HospitalCentral South UniversityChangshaChina
- National Clinical Research Center for Geriatric Disorders, Xiangya HospitalCentral South UniversityChangshaChina
- Key Laboratory of Hunan Province in Neurodegenerative DisordersCentral South UniversityChangshaChina
| | - Qian Xu
- Department of Neurology, Xiangya HospitalCentral South UniversityChangshaChina
- National Clinical Research Center for Geriatric Disorders, Xiangya HospitalCentral South UniversityChangshaChina
- Key Laboratory of Hunan Province in Neurodegenerative DisordersCentral South UniversityChangshaChina
| | - Jifeng Guo
- Department of Neurology, Xiangya HospitalCentral South UniversityChangshaChina
- National Clinical Research Center for Geriatric Disorders, Xiangya HospitalCentral South UniversityChangshaChina
- Key Laboratory of Hunan Province in Neurodegenerative DisordersCentral South UniversityChangshaChina
| | - Xinxiang Yan
- Department of Neurology, Xiangya HospitalCentral South UniversityChangshaChina
- National Clinical Research Center for Geriatric Disorders, Xiangya HospitalCentral South UniversityChangshaChina
| | - Jinchen Li
- Department of Geriatric Neurology, Xiangya HospitalCentral South UniversityChangshaChina
- Department of Neurology, Xiangya HospitalCentral South UniversityChangshaChina
- National Clinical Research Center for Geriatric Disorders, Xiangya HospitalCentral South UniversityChangshaChina
- Center for Medical Genetics, School of Life SciencesCentral South UniversityChangshaChina
| | - Beisha Tang
- Department of Geriatric Neurology, Xiangya HospitalCentral South UniversityChangshaChina
- Department of Neurology, Xiangya HospitalCentral South UniversityChangshaChina
- National Clinical Research Center for Geriatric Disorders, Xiangya HospitalCentral South UniversityChangshaChina
- Key Laboratory of Hunan Province in Neurodegenerative DisordersCentral South UniversityChangshaChina
| | - Sheng Zeng
- Department of Geriatrics, The Second Xiangya HospitalCentral South UniversityChangshaChina
| | - Qiying Sun
- Department of Geriatric Neurology, Xiangya HospitalCentral South UniversityChangshaChina
- National Clinical Research Center for Geriatric Disorders, Xiangya HospitalCentral South UniversityChangshaChina
- Key Laboratory of Hunan Province in Neurodegenerative DisordersCentral South UniversityChangshaChina
| |
Collapse
|
44
|
Hook PW, Timp W. Beyond assembly: the increasing flexibility of single-molecule sequencing technology. Nat Rev Genet 2023; 24:627-641. [PMID: 37161088 PMCID: PMC10169143 DOI: 10.1038/s41576-023-00600-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/30/2023] [Indexed: 05/11/2023]
Abstract
The maturation of high-throughput short-read sequencing technology over the past two decades has shaped the way genomes are studied. Recently, single-molecule, long-read sequencing has emerged as an essential tool in deciphering genome structure and function, including filling gaps in the human reference genome, measuring the epigenome and characterizing splicing variants in the transcriptome. With recent technological developments, these single-molecule technologies have moved beyond genome assembly and are being used in a variety of ways, including to selectively sequence specific loci with long reads, measure chromatin state and protein-DNA binding in order to investigate the dynamics of gene regulation, and rapidly determine copy number variation. These increasingly flexible uses of single-molecule technologies highlight a young and fast-moving part of the field that is leading to a more accessible era of nucleic acid sequencing.
Collapse
Affiliation(s)
- Paul W Hook
- Department of Biomedical Engineering, Molecular Biology and Genetics, and Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Winston Timp
- Department of Biomedical Engineering, Molecular Biology and Genetics, and Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
45
|
Mukamel RE, Handsaker RE, Sherman MA, Barton AR, Hujoel MLA, McCarroll SA, Loh PR. Repeat polymorphisms underlie top genetic risk loci for glaucoma and colorectal cancer. Cell 2023; 186:3659-3673.e23. [PMID: 37527660 PMCID: PMC10528368 DOI: 10.1016/j.cell.2023.07.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 04/07/2023] [Accepted: 07/03/2023] [Indexed: 08/03/2023]
Abstract
Many regions in the human genome vary in length among individuals due to variable numbers of tandem repeats (VNTRs). To assess the phenotypic impact of VNTRs genome-wide, we applied a statistical imputation approach to estimate the lengths of 9,561 autosomal VNTR loci in 418,136 unrelated UK Biobank participants and 838 GTEx participants. Association and statistical fine-mapping analyses identified 58 VNTRs that appeared to influence a complex trait in UK Biobank, 18 of which also appeared to modulate expression or splicing of a nearby gene. Non-coding VNTRs at TMCO1 and EIF3H appeared to generate the largest known contributions of common human genetic variation to risk of glaucoma and colorectal cancer, respectively. Each of these two VNTRs associated with a >2-fold range of risk across individuals. These results reveal a substantial and previously unappreciated role of non-coding VNTRs in human health and gene regulation.
Collapse
Affiliation(s)
- Ronen E Mukamel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Robert E Handsaker
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Genetics, Harvard Medical School, Boston, MA, USA.
| | - Maxwell A Sherman
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Alison R Barton
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Bioinformatics and Integrative Genomics Program, Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Margaux L A Hujoel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Steven A McCarroll
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Genetics, Harvard Medical School, Boston, MA, USA.
| | - Po-Ru Loh
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
46
|
Song B, Ning W, Wei D, Jiang M, Zhu K, Wang X, Edwards D, Odeny DA, Cheng S. Plant genome resequencing and population genomics: Current status and future prospects. MOLECULAR PLANT 2023; 16:1252-1268. [PMID: 37501370 DOI: 10.1016/j.molp.2023.07.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 05/30/2023] [Accepted: 07/25/2023] [Indexed: 07/29/2023]
Abstract
Advances in DNA sequencing technology have sparked a genomics revolution, driving breakthroughs in plant genetics and crop breeding. Recently, the focus has shifted from cataloging genetic diversity in plants to exploring their functional significance and delivering beneficial alleles for crop improvement. This transformation has been facilitated by the increasing adoption of whole-genome resequencing. In this review, we summarize the current progress of population-based genome resequencing studies and how these studies affect crop breeding. A total of 187 land plants from 163 countries have been resequenced, comprising 54 413 accessions. As part of resequencing efforts 367 traits have been surveyed and 86 genome-wide association studies have been conducted. Economically important crops, particularly cereals, vegetables, and legumes, have dominated the resequencing efforts, leaving a gap in 49 orders, including Lycopodiales, Liliales, Acorales, Austrobaileyales, and Commelinales. The resequenced germplasm is distributed across diverse geographic locations, providing a global perspective on plant genomics. We highlight genes that have been selected during domestication, or associated with agronomic traits, and form a repository of candidate genes for future research and application. Despite the opportunities for cross-species comparative genomics, many population genomic datasets are not accessible, impeding secondary analyses. We call for a more open and collaborative approach to population genomics that promotes data sharing and encourages contribution-based credit policy. The number of plant genome resequencing studies will continue to rise with the decreasing DNA sequencing costs, coupled with advances in analysis and computational technologies. This expansion, in terms of both scale and quality, holds promise for deeper insights into plant trait genetics and breeding design.
Collapse
Affiliation(s)
- Bo Song
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Weidong Ning
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China; Huazhong Agricultural University, College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Wuhan, Hubei, China
| | - Di Wei
- Biotechnology Research Institute, Guangxi Academy of Agricultural Sciences, Nanning 53007, China
| | - Mengyun Jiang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China; State Key Laboratory of Crop Stress Adaptation and Improvement, School of Life Sciences, Henan University, Kaifeng 475004, China; Shenzhen Research Institute of Henan University, Shenzhen 518000, China
| | - Kun Zhu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China; State Key Laboratory of Crop Stress Adaptation and Improvement, School of Life Sciences, Henan University, Kaifeng 475004, China; Shenzhen Research Institute of Henan University, Shenzhen 518000, China
| | - Xingwei Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China; State Key Laboratory of Crop Stress Adaptation and Improvement, School of Life Sciences, Henan University, Kaifeng 475004, China; Shenzhen Research Institute of Henan University, Shenzhen 518000, China
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Damaris A Odeny
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT) - Eastern and Southern Africa, Nairobi, Kenya
| | - Shifeng Cheng
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China.
| |
Collapse
|
47
|
Wojcik MH, Reuter CM, Marwaha S, Mahmoud M, Duyzend MH, Barseghyan H, Yuan B, Boone PM, Groopman EE, Délot EC, Jain D, Sanchis-Juan A, Starita LM, Talkowski M, Montgomery SB, Bamshad MJ, Chong JX, Wheeler MT, Berger SI, O'Donnell-Luria A, Sedlazeck FJ, Miller DE. Beyond the exome: What's next in diagnostic testing for Mendelian conditions. Am J Hum Genet 2023; 110:1229-1248. [PMID: 37541186 PMCID: PMC10432150 DOI: 10.1016/j.ajhg.2023.06.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 06/13/2023] [Accepted: 06/14/2023] [Indexed: 08/06/2023] Open
Abstract
Despite advances in clinical genetic testing, including the introduction of exome sequencing (ES), more than 50% of individuals with a suspected Mendelian condition lack a precise molecular diagnosis. Clinical evaluation is increasingly undertaken by specialists outside of clinical genetics, often occurring in a tiered fashion and typically ending after ES. The current diagnostic rate reflects multiple factors, including technical limitations, incomplete understanding of variant pathogenicity, missing genotype-phenotype associations, complex gene-environment interactions, and reporting differences between clinical labs. Maintaining a clear understanding of the rapidly evolving landscape of diagnostic tests beyond ES, and their limitations, presents a challenge for non-genetics professionals. Newer tests, such as short-read genome or RNA sequencing, can be challenging to order, and emerging technologies, such as optical genome mapping and long-read DNA sequencing, are not available clinically. Furthermore, there is no clear guidance on the next best steps after inconclusive evaluation. Here, we review why a clinical genetic evaluation may be negative, discuss questions to be asked in this setting, and provide a framework for further investigation, including the advantages and disadvantages of new approaches that are nascent in the clinical sphere. We present a guide for the next best steps after inconclusive molecular testing based upon phenotype and prior evaluation, including when to consider referral to research consortia focused on elucidating the underlying cause of rare unsolved genetic disorders.
Collapse
Affiliation(s)
- Monica H Wojcik
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Division of Newborn Medicine, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Chloe M Reuter
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Shruti Marwaha
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Michael H Duyzend
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Hayk Barseghyan
- Center for Genetics Medicine Research, Children's National Research Institute, Children's National Hospital, Washington, DC 20010, USA; Department of Genomics and Precision Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, USA
| | - Bo Yuan
- Department of Molecular and Human Genetics and Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Philip M Boone
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Emily E Groopman
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Emmanuèle C Délot
- Department of Genomics and Precision Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, USA; Center for Genetics Medicine Research, Children's National Research and Innovation Campus, Washington, DC, USA; Department of Pediatrics, George Washington University, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, USA
| | - Deepti Jain
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA 98195, USA
| | - Alba Sanchis-Juan
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Lea M Starita
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Michael Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Stephen B Montgomery
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Michael J Bamshad
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA 98195, USA
| | - Jessica X Chong
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA; Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA 98195, USA
| | - Matthew T Wheeler
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Seth I Berger
- Center for Genetics Medicine Research and Rare Disease Institute, Children's National Hospital, Washington, DC 20010, USA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA; Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | - Danny E Miller
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA; Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA 98195, USA; Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
48
|
Sanchis-Juan A, Megy K, Stephens J, Armirola Ricaurte C, Dewhurst E, Low K, French CE, Grozeva D, Stirrups K, Erwood M, McTague A, Penkett CJ, Shamardina O, Tuna S, Daugherty LC, Gleadall N, Duarte ST, Hedrera-Fernández A, Vogt J, Ambegaonkar G, Chitre M, Josifova D, Kurian MA, Parker A, Rankin J, Reid E, Wakeling E, Wassmer E, Woods CG, Raymond FL, Carss KJ. Genome sequencing and comprehensive rare-variant analysis of 465 families with neurodevelopmental disorders. Am J Hum Genet 2023; 110:1343-1355. [PMID: 37541188 PMCID: PMC10432178 DOI: 10.1016/j.ajhg.2023.07.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 07/07/2023] [Accepted: 07/07/2023] [Indexed: 08/06/2023] Open
Abstract
Despite significant progress in unraveling the genetic causes of neurodevelopmental disorders (NDDs), a substantial proportion of individuals with NDDs remain without a genetic diagnosis after microarray and/or exome sequencing. Here, we aimed to assess the power of short-read genome sequencing (GS), complemented with long-read GS, to identify causal variants in participants with NDD from the National Institute for Health and Care Research (NIHR) BioResource project. Short-read GS was conducted on 692 individuals (489 affected and 203 unaffected relatives) from 465 families. Additionally, long-read GS was performed on five affected individuals who had structural variants (SVs) in technically challenging regions, had complex SVs, or required distal variant phasing. Causal variants were identified in 36% of affected individuals (177/489), and a further 23% (112/489) had a variant of uncertain significance after multiple rounds of re-analysis. Among all reported variants, 88% (333/380) were coding nuclear SNVs or insertions and deletions (indels), and the remainder were SVs, non-coding variants, and mitochondrial variants. Furthermore, long-read GS facilitated the resolution of challenging SVs and invalidated variants of difficult interpretation from short-read GS. This study demonstrates the value of short-read GS, complemented with long-read GS, in investigating the genetic causes of NDDs. GS provides a comprehensive and unbiased method of identifying all types of variants throughout the nuclear and mitochondrial genomes in individuals with NDD.
Collapse
Affiliation(s)
- Alba Sanchis-Juan
- Department of Haematology, University of Cambridge, Cambridge, UK; NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK; Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Karyn Megy
- Department of Haematology, University of Cambridge, Cambridge, UK; NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK; Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Jonathan Stephens
- Department of Haematology, University of Cambridge, Cambridge, UK; NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Camila Armirola Ricaurte
- Department of Haematology, University of Cambridge, Cambridge, UK; NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Eleanor Dewhurst
- Department of Haematology, University of Cambridge, Cambridge, UK; NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Kayyi Low
- Department of Haematology, University of Cambridge, Cambridge, UK; NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | | | - Detelina Grozeva
- Department of Medical Genetics, University of Cambridge, Cambridge, UK; Centre for Trials Research, Cardiff University, Cardiff, UK
| | - Kathleen Stirrups
- Department of Haematology, University of Cambridge, Cambridge, UK; NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Marie Erwood
- Department of Haematology, University of Cambridge, Cambridge, UK; NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Amy McTague
- Molecular Neurosciences, Zayed Centre for Research into Rare Disease in Children, UCL Great Ormond Street Institute of Child Health, London, UK; Department of Neurology, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Christopher J Penkett
- Department of Haematology, University of Cambridge, Cambridge, UK; NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Olga Shamardina
- Department of Haematology, University of Cambridge, Cambridge, UK; NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Salih Tuna
- Department of Haematology, University of Cambridge, Cambridge, UK; NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Louise C Daugherty
- Department of Haematology, University of Cambridge, Cambridge, UK; NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Nicholas Gleadall
- Department of Haematology, University of Cambridge, Cambridge, UK; NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Sofia T Duarte
- Hospital Dona Estefânia, Centro Hospitalar de Lisboa Central, Lisbon, Portugal
| | | | - Julie Vogt
- West Midlands Regional Genetics Service, Birmingham Women's and Children's Hospital, Birmingham, UK
| | - Gautam Ambegaonkar
- Child Development Centre, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Manali Chitre
- Clinical Medical School, University of Cambridge, Cambridge, UK
| | | | - Manju A Kurian
- Molecular Neurosciences, Zayed Centre for Research into Rare Disease in Children, UCL Great Ormond Street Institute of Child Health, London, UK
| | - Alasdair Parker
- Clinical Medical School, University of Cambridge, Cambridge, UK; Child Development Centre, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Julia Rankin
- Department of Clinical Genetics, Royal Devon University Healthcare NHS Foundation Trust, Exeter, UK
| | - Evan Reid
- Cambridge Institute for Medical Research and Department of Medical Genetics, University of Cambridge, Cambridge, UK
| | - Emma Wakeling
- North West Thames Regional Genetics Service, Harrow, UK
| | - Evangeline Wassmer
- Neurology Department, Birmingham Women and Children's Hospital, Birmingham, UK
| | - C Geoffrey Woods
- Clinical Medical School, University of Cambridge, Cambridge, UK; Department of Medical Genetics, University of Cambridge, Cambridge, UK
| | - F Lucy Raymond
- Department of Haematology, University of Cambridge, Cambridge, UK; NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK; Department of Medical Genetics, University of Cambridge, Cambridge, UK.
| | - Keren J Carss
- Department of Haematology, University of Cambridge, Cambridge, UK; NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK; Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.
| |
Collapse
|
49
|
O'Donnell S, Yue JX, Saada OA, Agier N, Caradec C, Cokelaer T, De Chiara M, Delmas S, Dutreux F, Fournier T, Friedrich A, Kornobis E, Li J, Miao Z, Tattini L, Schacherer J, Liti G, Fischer G. Telomere-to-telomere assemblies of 142 strains characterize the genome structural landscape in Saccharomyces cerevisiae. Nat Genet 2023; 55:1390-1399. [PMID: 37524789 PMCID: PMC10412453 DOI: 10.1038/s41588-023-01459-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 06/26/2023] [Indexed: 08/02/2023]
Abstract
Pangenomes provide access to an accurate representation of the genetic diversity of species, both in terms of sequence polymorphisms and structural variants (SVs). Here we generated the Saccharomyces cerevisiae Reference Assembly Panel (ScRAP) comprising reference-quality genomes for 142 strains representing the species' phylogenetic and ecological diversity. The ScRAP includes phased haplotype assemblies for several heterozygous diploid and polyploid isolates. We identified circa (ca.) 4,800 nonredundant SVs that provide a broad view of the genomic diversity, including the dynamics of telomere length and transposable elements. We uncovered frequent cases of complex aneuploidies where large chromosomes underwent large deletions and translocations. We found that SVs can impact gene expression near the breakpoints and substantially contribute to gene repertoire evolution. We also discovered that horizontally acquired regions insert at chromosome ends and can generate new telomeres. Overall, the ScRAP demonstrates the benefit of a pangenome in understanding genome evolution at population scale.
Collapse
Affiliation(s)
- Samuel O'Donnell
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France
| | - Jia-Xing Yue
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Sun Yat-sen University Cancer Center, Guangzhou, China
- Université Côte d'Azur, CNRS, INSERM, IRCAN, Nice, France
| | - Omar Abou Saada
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Nicolas Agier
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France
| | - Claudia Caradec
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Thomas Cokelaer
- Biomics Technological Platform, Center for Technological Resources and Research (C2RT), Institut Pasteur, Paris, France
- Bioinformatics and Biostatistics Hub, Computational Biology Department, Institut Pasteur, Paris, France
| | | | - Stéphane Delmas
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France
| | - Fabien Dutreux
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Téo Fournier
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Anne Friedrich
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Etienne Kornobis
- Biomics Technological Platform, Center for Technological Resources and Research (C2RT), Institut Pasteur, Paris, France
- Bioinformatics and Biostatistics Hub, Computational Biology Department, Institut Pasteur, Paris, France
| | - Jing Li
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Sun Yat-sen University Cancer Center, Guangzhou, China
- Université Côte d'Azur, CNRS, INSERM, IRCAN, Nice, France
| | - Zepu Miao
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Sun Yat-sen University Cancer Center, Guangzhou, China
| | | | | | - Gianni Liti
- Université Côte d'Azur, CNRS, INSERM, IRCAN, Nice, France.
| | - Gilles Fischer
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France.
| |
Collapse
|
50
|
Ahsan MU, Liu Q, Perdomo JE, Fang L, Wang K. A survey of algorithms for the detection of genomic structural variants from long-read sequencing data. Nat Methods 2023; 20:1143-1158. [PMID: 37386186 PMCID: PMC11208083 DOI: 10.1038/s41592-023-01932-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 05/31/2023] [Indexed: 07/01/2023]
Abstract
As long-read sequencing technologies are becoming increasingly popular, a number of methods have been developed for the discovery and analysis of structural variants (SVs) from long reads. Long reads enable detection of SVs that could not be previously detected from short-read sequencing, but computational methods must adapt to the unique challenges and opportunities presented by long-read sequencing. Here, we summarize over 50 long-read-based methods for SV detection, genotyping and visualization, and discuss how new telomere-to-telomere genome assemblies and pangenome efforts can improve the accuracy and drive the development of SV callers in the future.
Collapse
Affiliation(s)
- Mian Umair Ahsan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Jonathan Elliot Perdomo
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|