1
|
Mokveld T, Dolzhenko E, Dashnow H, Nicholas TJ, Sasani T, van der Sanden B, Jadhav B, Pedersen B, Kronenberg Z, Tucci A, Sharp AJ, Quinlan AR, Gilissen C, Hoischen A, Eberle MA. TRGT-denovo: accurate detection of de novo tandem repeat mutations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.16.600745. [PMID: 39071386 PMCID: PMC11275785 DOI: 10.1101/2024.07.16.600745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Motivation Identifying de novo tandem repeat (TR) mutations on a genome-wide scale is essential for understanding genetic variability and its implications in rare diseases. While PacBio HiFi sequencing data enhances the accessibility of the genome's TR regions for genotyping, simple de novo calling strategies often generate an excess of likely false positives, which can obscure true positive findings, particularly as the number of surveyed genomic regions increases. Results We developed TRGT-denovo, a computational method designed to accurately identify all types of de novo TR mutations-including expansions, contractions, and compositional changes-within family trios. TRGT-denovo directly interrogates read evidence, allowing for the detection of subtle variations often overlooked in variant call format (VCF) files. TRGT-denovo improves the precision and specificity of de novo mutation (DNM) identification, reducing the number of de novo candidates by an order of magnitude compared to genotype-based approaches. In our experiments involving eight rare disease trios previously studiedTRGT-denovo correctly reclassified all false positive DNM candidates as true negatives. Using an expanded repeat catalog, it identified new candidates, of which 95% (19/20) were experimentally validated, demonstrating its effectiveness in minimizing likely false positives while maintaining high sensitivity for true discoveries. Availability and implementation Built in Rust, TRGT-denovo is available as source code and a pre-compiled Linux binary along with a user guide at: https://github.com/PacificBiosciences/trgt-denovo.
Collapse
Affiliation(s)
| | | | | | | | - T Sasani
- Univ. of Utah, Salt Lake City, UT
| | - B van der Sanden
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud university medical center, Nijmegen, the Netherlands
| | - B Jadhav
- Icahn School of Medicine at Mount Sinai, New York, NY
| | | | | | - A Tucci
- Genomics England, London, UK
| | - A J Sharp
- Icahn School of Medicine at Mount Sinai, New York, NY
| | | | - C Gilissen
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud university medical center, Nijmegen, the Netherlands
| | - A Hoischen
- Department of Human Genetics, Research Institute for Medical Innovation, Radboud university medical center, Nijmegen, the Netherlands
- Department of Internal Medicine, Radboud Expertise Center for Immunodeficiency and Autoinflammation and Radboud Center for Infectious Disease (RCI), Radboud university medical center, Nijmegen, the Netherlands
| | | |
Collapse
|
2
|
Liu Z, Xie Z, Li M. Comprehensive and deep evaluation of structural variation detection pipelines with third-generation sequencing data. Genome Biol 2024; 25:188. [PMID: 39010145 PMCID: PMC11247875 DOI: 10.1186/s13059-024-03324-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 06/26/2024] [Indexed: 07/17/2024] Open
Abstract
BACKGROUND Structural variation (SV) detection methods using third-generation sequencing data are widely employed, yet accurately detecting SVs remains challenging. Different methods often yield inconsistent results for certain SV types, complicating tool selection and revealing biases in detection. RESULTS This study comprehensively evaluates 53 SV detection pipelines using simulated and real data from PacBio (CLR: Continuous Long Read, CCS: Circular Consensus Sequencing) and Nanopore (ONT) platforms. We assess their performance in detecting various sizes and types of SVs, breakpoint biases, and genotyping accuracy with various sequencing depths. Notably, pipelines such as Minimap2-cuteSV2, NGMLR-SVIM, PBMM2-pbsv, Winnowmap-Sniffles2, and Winnowmap-SVision exhibit comparatively higher recall and precision. Our findings also show that combining multiple pipelines with the same aligner, like pbmm2 or winnowmap, can significantly enhance performance. The individual pipelines' detailed ranking and performance metrics can be viewed in a dynamic table: http://pmglab.top/SVPipelinesRanking . CONCLUSIONS This study comprehensively characterizes the strengths and weaknesses of numerous pipelines, providing valuable insights that can improve SV detection in third-generation sequencing data and inform SV annotation and function prediction.
Collapse
Affiliation(s)
- Zhi Liu
- Program in Bioinformatics, Zhongshan School of Medicine, The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, China
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Miaoxin Li
- Program in Bioinformatics, Zhongshan School of Medicine, The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China.
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, China.
- Center for Precision Medicine, Sun Yat-Sen University, Guangzhou, China.
- Department of Psychiatry, The University of Hong Kong, Hong Kong, SAR, China.
- Guangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-Sen University, Zhuhai, China.
| |
Collapse
|
3
|
Kolesnikov A, Cook D, Nattestad M, Brambrink L, McNulty B, Gorzynski J, Goenka S, Ashley EA, Jain M, Miga KH, Paten B, Chang PC, Carroll A, Shafin K. Local read haplotagging enables accurate long-read small variant calling. Nat Commun 2024; 15:5907. [PMID: 39003259 PMCID: PMC11246426 DOI: 10.1038/s41467-024-50079-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 06/28/2024] [Indexed: 07/15/2024] Open
Abstract
Long-read sequencing technology has enabled variant detection in difficult-to-map regions of the genome and enabled rapid genetic diagnosis in clinical settings. Rapidly evolving third-generation sequencing platforms like Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) are introducing newer platforms and data types. It has been demonstrated that variant calling methods based on deep neural networks can use local haplotyping information with long-reads to improve the genotyping accuracy. However, using local haplotype information creates an overhead as variant calling needs to be performed multiple times which ultimately makes it difficult to extend to new data types and platforms as they get introduced. In this work, we have developed a local haplotype approximate method that enables state-of-the-art variant calling performance with multiple sequencing platforms including PacBio Revio system, ONT R10.4 simplex and duplex data. This addition of local haplotype approximation simplifies long-read variant calling with DeepVariant.
Collapse
Affiliation(s)
| | - Daniel Cook
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
| | | | | | - Brandy McNulty
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | | | | | | | - Miten Jain
- Northeastern university, Boston, MA, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Pi-Chuan Chang
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
| | - Andrew Carroll
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA.
| | - Kishwar Shafin
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA.
| |
Collapse
|
4
|
LeMaster C, Schwendinger-Schreck C, Ge B, Cheung WA, McLennan R, Johnston JJ, Pastinen T, Smail C. Mapping structural variants to rare disease genes using long-read whole genome sequencing and trait-relevant polygenic scores. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.15.24304216. [PMID: 38562793 PMCID: PMC10984062 DOI: 10.1101/2024.03.15.24304216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Recent studies have revealed the pervasive landscape of rare structural variants (rSVs) present in human genomes. rSVs can have extreme effects on the expression of proximal genes and, in a rare disease context, have been implicated in patient cases where no diagnostic single nucleotide variant (SNV) was found. Approaches for integrating rSVs to date have focused on targeted approaches in known Mendelian rare disease genes. This approach is intractable for rare diseases with many causal loci or patients with complex, multi-phenotype syndromes. We hypothesized that integrating trait-relevant polygenic scores (PGS) would provide a substantial reduction in the number of candidate disease genes in which to assess rSV effects. We further implemented a method for ranking PGS genes to define a set of core/key genes where a rSV has the potential to exert relatively larger effects on disease risk. Among a subset of patients enrolled in the Genomic Answers for Kids (GA4K) rare disease program (N=497), we used PacBio HiFi long-read whole genome sequencing (lrWGS) to identify rSVs intersecting genes in trait-relevant PGSs. Illustrating our approach in Autism (N=54 cases), we identified 22, 019 deletions, 2,041 duplications, 87,826 insertions, and 214 inversions overlapping putative core/key PGS genes. Additionally, by integrating genomic constraint annotations from gnomAD, we observed that rare duplications overlapping putative core/key PGS genes were frequently in higher constraint regions compared to controls (P = 1×10-03). This difference was not observed in the lowest-ranked gene set (P = 0.15). Overall, our study provides a framework for the annotation of long-read rSVs from lrWGS data and prioritization of disease-linked genomic regions for downstream functional validation of rSV impacts. To enable reuse by other researchers, we have made SV allele frequencies and gene associations freely available.
Collapse
Affiliation(s)
- Cas LeMaster
- Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Carl Schwendinger-Schreck
- Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Bing Ge
- McGill University, Montreal, Quebec, Canada
| | - Warren A. Cheung
- Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Rebecca McLennan
- Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Jeffrey J. Johnston
- Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Tomi Pastinen
- Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Craig Smail
- Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
| |
Collapse
|
5
|
Urbanowicz K, Opielka M, Stegmann KM, Dickmanns A, Dobbelstein M, Peters GJ, Smoleński RT. Evaluation of N4-hydroxycytidine incorporation into nucleic acids of SARS-CoV-2-infected host cells by direct measurement with liquid chromatography-mass spectrometry. NUCLEOSIDES, NUCLEOTIDES & NUCLEIC ACIDS 2024:1-9. [PMID: 38741480 DOI: 10.1080/15257770.2024.2346550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Accepted: 04/16/2024] [Indexed: 05/16/2024]
Abstract
Molnupiravir, an orally administered prodrug of β-d-N4-hydroxycytidine (NHC), is incorporated into newly synthesized RNA by viral RNA-dependent RNA polymerase (RdRp). It is used for treatment of SARS-CoV-2 infections. Incorporation of NHC triphosphate into viral RNA inhibits replication of the virus, at least in part by introducing deleterious mutations. However, there is limited information on NHC incorporation into host RNA and reports on the risk of mutagenicity that molnupiravir/NHC pose to the host are conflicting. We used two liquid chromatography-mass spectrometry (LC-MS) methods to evaluate the incorporation of NHC into RNA and DNA of host Vero E6 cells in a SARS-CoV-2 infection model. To test this, host and viral RNA were degraded to their ribonucleosides, while host DNA was degraded to deoxyribonucleosides. Subsequently, nucleic acid constituents were analyzed by LC-MS, which offers specific, direct, and quantitative determination of incorporation. Our findings revealed concentration dependent NHC incorporation into host cell RNA in both infected and uninfected cell cultures, reaching a maximum of 1 in 7,093 bases. Analysis of host DNA revealed no presence of deoxy-N4-hydroxycytidine down to a detection limit of 1 in 133,000 bases. Our findings therefore suggest minimal to no NHC incorporation into host DNA, indicating a low probability of significant host cell mutagenicity associated with its use.
Collapse
Affiliation(s)
| | - Mikolaj Opielka
- Department of Biochemistry, Medical University of Gdansk, Gdansk, Poland
| | - Kim M Stegmann
- Department of Molecular Oncology, Göttingen Center of Molecular Biosciences, University of Göttingen, Göttingen, Germany
| | - Antje Dickmanns
- Department of Molecular Oncology, Göttingen Center of Molecular Biosciences, University of Göttingen, Göttingen, Germany
| | - Matthias Dobbelstein
- Department of Molecular Oncology, Göttingen Center of Molecular Biosciences, University of Göttingen, Göttingen, Germany
| | - Godefridus J Peters
- Department of Biochemistry, Medical University of Gdansk, Gdansk, Poland
- Laboratory of Medical Oncology, Amsterdam University Medical Centers, Cancer Center Amsterdam, Vrije Unversteit Amsterdam, Amsterdam, the Netherlands
| | | |
Collapse
|
6
|
Steyaert W, Sagath L, Demidov G, Yépez VA, Esteve-Codina A, Gagneur J, Ellwanger K, Derks R, Weiss M, den Ouden A, van den Heuvel S, Swinkels H, Zomer N, Steehouwer M, O'Gorman L, Astuti G, Neveling K, Schüle R, Xu J, Synofzik M, Beijer D, Hengel H, Schöls L, Claeys KG, Baets J, Van de Vondel L, Ferlini A, Selvatici R, Morsy H, Saeed Abd Elmaksoud M, Straub V, Müller J, Pini V, Perry L, Sarkozy A, Zaharieva I, Muntoni F, Bugiardini E, Polavarapu K, Horvath R, Reid E, Lochmüller H, Spinazzi M, Savarese M, Matalonga L, Laurie S, Brunner HG, Graessner H, Beltran S, Ossowski S, Vissers LELM, Gilissen C, Hoischen A. Unravelling undiagnosed rare disease cases by HiFi long-read genome sequencing. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.03.24305331. [PMID: 38746462 PMCID: PMC11092722 DOI: 10.1101/2024.05.03.24305331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Solve-RD is a pan-European rare disease (RD) research program that aims to identify disease-causing genetic variants in previously undiagnosed RD families. We utilised 10-fold coverage HiFi long-read sequencing (LRS) for detecting causative structural variants (SVs), single nucleotide variants (SNVs), insertion-deletions (InDels), and short tandem repeat (STR) expansions in extensively studied RD families without clear molecular diagnoses. Our cohort includes 293 individuals from 114 genetically undiagnosed RD families selected by European Rare Disease Network (ERN) experts. Of these, 21 families were affected by so-called 'unsolvable' syndromes for which genetic causes remain unknown, and 93 families with at least one individual affected by a rare neurological, neuromuscular, or epilepsy disorder without genetic diagnosis despite extensive prior testing. Clinical interpretation and orthogonal validation of variants in known disease genes yielded thirteen novel genetic diagnoses due to de novo and rare inherited SNVs, InDels, SVs, and STR expansions. In an additional four families, we identified a candidate disease-causing SV affecting several genes including an MCF2 / FGF13 fusion and PSMA3 deletion. However, no common genetic cause was identified in any of the 'unsolvable' syndromes. Taken together, we found (likely) disease-causing genetic variants in 13.0% of previously unsolved families and additional candidate disease-causing SVs in another 4.3% of these families. In conclusion, our results demonstrate the added value of HiFi long-read genome sequencing in undiagnosed rare diseases.
Collapse
|
7
|
Leitão E, Schröder C, Depienne C. Identification and characterization of repeat expansions in neurological disorders: Methodologies, tools, and strategies. Rev Neurol (Paris) 2024; 180:383-392. [PMID: 38594146 DOI: 10.1016/j.neurol.2024.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/11/2024]
Abstract
Tandem repeats are a common, highly polymorphic class of variation in human genomes. Their expansion beyond a pathogenic threshold is a process that contributes to a wide range of neurological and neuromuscular genetic disorders, of which over 60 have been identified to date. The last few years have seen a resurgence in repeat expansion discovery propelled by technological advancements, enabling the identification of over 20 novel repeat expansion disorders. These expansions can occur in coding or non-coding regions of genes, resulting in a range of pathogenic mechanisms. In this article, we review strategies, tools and methods that can be used for efficient detection and characterization of known repeat expansions and identification of new expansion disorders. Features that can be used to prioritize repeat expansions include anticipation, which is characterized by increased severity or earlier onset of symptoms across generations, and founder effects, which contribute to higher prevalence rates in certain populations. Classical technologies such as Southern blotting, repeat-primed polymerase chain reaction (PCR) and long-range PCR can still be used to detect known repeat expansions, although they usually have significant limitations linked to the absence of sequence context. Targeted sequencing of known expansions using either long-range PCR or CRISPR-Cas9 enrichment combined with long-read sequencing or adaptive nanopore sampling are usually better but more expensive alternatives. The development of new bioinformatics tools applied to short-read genome data can now be used to detect repeat expansions either in a targeted manner or at the genome-wide level. In addition, technological advances, particularly long-read technologies such as optical genome mapping (Bionano Genomics), Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) HiFi sequencing, offer promising avenues for the detection of repeat expansions. Despite challenges in specific DNA extraction requirements, computation resources needed and data interpretation, these technologies have an immense potential to advance our understanding of repeat expansion disorders and improve diagnostic accuracy.
Collapse
Affiliation(s)
- E Leitão
- Institute of Human Genetics, University Hospital Essen, University Duisburg-Essen, Essen, Germany
| | - C Schröder
- Institute of Human Genetics, University Hospital Essen, University Duisburg-Essen, Essen, Germany
| | - C Depienne
- Institute of Human Genetics, University Hospital Essen, University Duisburg-Essen, Essen, Germany.
| |
Collapse
|
8
|
Kosugi S, Terao C. Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data. Hum Genome Var 2024; 11:18. [PMID: 38632226 PMCID: PMC11024196 DOI: 10.1038/s41439-024-00276-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 04/19/2024] Open
Abstract
Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Center for Genome Informatics, Research Organization of Information and Systems, Joint Support-Center for Data Science Research, Shizuoka, Japan.
- Advanced Genomics Center, National Institute of Genetics, Shizuoka, Japan.
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan.
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|
9
|
Leonard AS, Mapel XM, Pausch H. Pangenome-genotyped structural variation improves molecular phenotype mapping in cattle. Genome Res 2024; 34:300-309. [PMID: 38355307 PMCID: PMC10984387 DOI: 10.1101/gr.278267.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 02/01/2024] [Indexed: 02/16/2024]
Abstract
Expression and splicing quantitative trait loci (e/sQTL) are large contributors to phenotypic variability. Achieving sufficient statistical power for e/sQTL mapping requires large cohorts with both genotypes and molecular phenotypes, and so, the genomic variation is often called from short-read alignments, which are unable to comprehensively resolve structural variation. Here we build a pangenome from 16 HiFi haplotype-resolved cattle assemblies to identify small and structural variation and genotype them with PanGenie in 307 short-read samples. We find high (>90%) concordance of PanGenie-genotyped and DeepVariant-called small variation and confidently genotype close to 21 million small and 43,000 structural variants in the larger population. We validate 85% of these structural variants (with MAF > 0.1) directly with a subset of 25 short-read samples that also have medium coverage HiFi reads. We then conduct e/sQTL mapping with this comprehensive variant set in a subset of 117 cattle that have testis transcriptome data, and find 92 structural variants as causal candidates for eQTL and 73 for sQTL. We find that roughly half of the top associated structural variants affecting expression or splicing are transposable elements, such as SV-eQTL for STN1 and MYH7 and SV-sQTL for CEP89 and ASAH2 Extensive linkage disequilibrium between small and structural variation results in only 28 additional eQTL and 17 sQTL discovered when including SVs, although many top associated SVs are compelling candidates.
Collapse
Affiliation(s)
| | - Xena M Mapel
- Animal Genomics, ETH Zurich, 8092 Zurich, Switzerland
| | - Hubert Pausch
- Animal Genomics, ETH Zurich, 8092 Zurich, Switzerland
| |
Collapse
|
10
|
Wijngaard R, Demidov G, O'Gorman L, Corominas-Galbany J, Yaldiz B, Steyaert W, de Boer E, Vissers LELM, Kamsteeg EJ, Pfundt R, Swinkels H, den Ouden A, Te Paske IBAW, de Voer RM, Faivre L, Denommé-Pichon AS, Duffourd Y, Vitobello A, Chevarin M, Straub V, Töpf A, van der Kooi AJ, Magrinelli F, Rocca C, Hanna MG, Vandrovcova J, Ossowski S, Laurie S, Gilissen C. Mobile element insertions in rare diseases: a comparative benchmark and reanalysis of 60,000 exome samples. Eur J Hum Genet 2024; 32:200-208. [PMID: 37853102 PMCID: PMC10853235 DOI: 10.1038/s41431-023-01478-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 08/29/2023] [Accepted: 10/04/2023] [Indexed: 10/20/2023] Open
Abstract
Mobile element insertions (MEIs) are a known cause of genetic disease but have been underexplored due to technical limitations of genetic testing methods. Various bioinformatic tools have been developed to identify MEIs in Next Generation Sequencing data. However, most tools have been developed specifically for genome sequencing (GS) data rather than exome sequencing (ES) data, which remains more widely used for routine diagnostic testing. In this study, we benchmarked six MEI detection tools (ERVcaller, MELT, Mobster, SCRAMble, TEMP2 and xTea) on ES data and on GS data from publicly available genomic samples (HG002, NA12878). For all the tools we evaluated sensitivity and precision of different filtering strategies. Results show that there were substantial differences in tool performance between ES and GS data. MELT performed best with ES data and its combination with SCRAMble increased substantially the detection rate of MEIs. By applying both tools to 10,890 ES samples from Solve-RD and 52,624 samples from Radboudumc we were able to diagnose 10 patients who had remained undiagnosed by conventional ES analysis until now. Our study shows that MELT and SCRAMble can be used reliably to identify clinically relevant MEIs in ES data. This may lead to an additional diagnosis for 1 in 3000 to 4000 patients in routine clinical ES.
Collapse
Affiliation(s)
- Robin Wijngaard
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - German Demidov
- Universitätsklinikum Tübingen - Institut für Medizinische Genetik und angewandte Genomik, Tübingen, Germany
| | - Luke O'Gorman
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | | | - Burcu Yaldiz
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Wouter Steyaert
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Elke de Boer
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Lisenka E L M Vissers
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Erik-Jan Kamsteeg
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Rolph Pfundt
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Hilde Swinkels
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Amber den Ouden
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Iris B A W Te Paske
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Richarda M de Voer
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Laurence Faivre
- Centre de Référence Maladies Rares "Anomalies du développement et syndromes malformatifs", Centre de Génétique, FHU-TRANSLAD et Institut GIMI, CHU Dijon Bourgogne, Dijon, France
| | - Anne-Sophie Denommé-Pichon
- UMR1231-Inserm, Génétique des Anomalies du développement, Université de Bourgogne Franche-Comté, Dijon, France
- Laboratoire de Génétique chromosomique et moléculaire, UF6254 Innovation en diagnostic génomique des maladies rares, Centre Hospitalier Universitaire de Dijon, Dijon, France
| | - Yannis Duffourd
- UMR1231-Inserm, Génétique des Anomalies du développement, Université de Bourgogne Franche-Comté, Dijon, France
- Laboratoire de Génétique chromosomique et moléculaire, UF6254 Innovation en diagnostic génomique des maladies rares, Centre Hospitalier Universitaire de Dijon, Dijon, France
| | - Antonio Vitobello
- UMR1231-Inserm, Génétique des Anomalies du développement, Université de Bourgogne Franche-Comté, Dijon, France
- Laboratoire de Génétique chromosomique et moléculaire, UF6254 Innovation en diagnostic génomique des maladies rares, Centre Hospitalier Universitaire de Dijon, Dijon, France
| | - Martin Chevarin
- UMR1231-Inserm, Génétique des Anomalies du développement, Université de Bourgogne Franche-Comté, Dijon, France
- Laboratoire de Génétique chromosomique et moléculaire, UF6254 Innovation en diagnostic génomique des maladies rares, Centre Hospitalier Universitaire de Dijon, Dijon, France
| | - Volker Straub
- John Walton Muscular Dystrophy Research Centre, Translational and Clinical Research Institute, Newcastle University and Newcastle Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK
| | - Ana Töpf
- John Walton Muscular Dystrophy Research Centre, Translational and Clinical Research Institute, Newcastle University and Newcastle Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK
| | - Anneke J van der Kooi
- Department of Neurology, Amsterdam UMC, University of Amsterdam, Amsterdam Neuroscience, Amsterdam, The Netherlands
| | - Francesca Magrinelli
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
| | - Clarissa Rocca
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
- Clinical Pharmacology, William Harvey Research Institute, School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Michael G Hanna
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
| | - Jana Vandrovcova
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
| | - Stephan Ossowski
- Universitätsklinikum Tübingen - Institut für Medizinische Genetik und angewandte Genomik, Tübingen, Germany
| | - Steven Laurie
- Centro Nacional de Análisis Genómico (CNAG), Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Christian Gilissen
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands.
- Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, The Netherlands.
| |
Collapse
|
11
|
Owusu R, Savarese M. Long-read sequencing improves diagnostic rate in neuromuscular disorders. ACTA MYOLOGICA : MYOPATHIES AND CARDIOMYOPATHIES : OFFICIAL JOURNAL OF THE MEDITERRANEAN SOCIETY OF MYOLOGY 2023; 42:123-128. [PMID: 38406378 PMCID: PMC10883326 DOI: 10.36185/2532-1900-394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 12/05/2023] [Indexed: 02/27/2024]
Abstract
Massive parallel sequencing methods, such as exome, genome, and targeted DNA sequencing, have aided molecular diagnosis of genetic diseases in the last 20 years. However, short-read sequencing methods still have several limitations, such inaccurate genome assembly, the inability to detect large structural variants, and variants located in hard-to-sequence regions like highly repetitive areas. The recently emerged PacBio single-molecule real-time (SMRT) and Oxford nanopore technology (ONT) long-read sequencing (LRS) methods have been shown to overcome most of these technical issues, leading to an increase in diagnostic rate. LRS methods are contributing to the detection of repeat expansions in novel disease-causing genes (e.g., ABCD3, NOTCH2NLC and RILPL1 causing an Oculopharyngodistal myopathy or PLIN4 causing a Myopathy with rimmed ubiquitin-positive autophagic vacuolation), of structural variants (e.g., in DMD), and of single nucleotide variants in repetitive regions (TTN and NEB). Moreover, these methods have simplified the characterization of the D4Z4 repeats in DUX4, facilitating the diagnosis of Facioscapulohumeral muscular dystrophy (FSHD). We review recent studies that have used either ONT or PacBio SMRT sequencing methods and discuss different types of variants that have been detected using these approaches in individuals with neuromuscular disorders.
Collapse
Affiliation(s)
| | - Marco Savarese
- Folkhälsan Research Center, Helsinki, Finland
- University of Helsinki, Faculty of Medicine, Helsinki, Finland
| |
Collapse
|
12
|
Panoyan MA, Wendt FR. The role of tandem repeat expansions in brain disorders. Emerg Top Life Sci 2023; 7:249-263. [PMID: 37401564 DOI: 10.1042/etls20230022] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 06/05/2023] [Accepted: 06/19/2023] [Indexed: 07/05/2023]
Abstract
The human genome contains numerous genetic polymorphisms contributing to different health and disease outcomes. Tandem repeat (TR) loci are highly polymorphic yet under-investigated in large genomic studies, which has prompted research efforts to identify novel variations and gain a deeper understanding of their role in human biology and disease outcomes. We summarize the current understanding of TRs and their implications for human health and disease, including an overview of the challenges encountered when conducting TR analyses and potential solutions to overcome these challenges. By shedding light on these issues, this article aims to contribute to a better understanding of the impact of TRs on the development of new disease treatments.
Collapse
Affiliation(s)
- Mary Anne Panoyan
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
| | - Frank R Wendt
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Forensic Science Program, University of Toronto, Mississauga, ON, Canada
| |
Collapse
|
13
|
Balestrini S, Mei D, Sisodiya SM, Guerrini R. Steps to Improve Precision Medicine in Epilepsy. Mol Diagn Ther 2023; 27:661-672. [PMID: 37755653 PMCID: PMC10590329 DOI: 10.1007/s40291-023-00676-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/29/2023] [Indexed: 09/28/2023]
Abstract
Precision medicine is an old concept, but it is not widely applied across human health conditions as yet. Numerous attempts have been made to apply precision medicine in epilepsy, this has been based on a better understanding of aetiological mechanisms and deconstructing disease into multiple biological subsets. The scope of precision medicine is to provide effective strategies for treating individual patients with specific agent(s) that are likely to work best based on the causal biological make-up. We provide an overview of the main applications of precision medicine in epilepsy, including the current limitations and pitfalls, and propose potential strategies for implementation and to achieve a higher rate of success in patient care. Such strategies include establishing a definition of precision medicine and its outcomes; learning from past experiences, from failures and from other fields (e.g. oncology); using appropriate precision medicine strategies (e.g. drug repurposing versus traditional drug discovery process); and using adequate methods to assess efficacy (e.g. randomised controlled trials versus alternative trial designs). Although the progress of diagnostic techniques now allows comprehensive characterisation of each individual epilepsy condition from a molecular, biological, structural and clinical perspective, there remain challenges in the integration of individual data in clinical practice to achieve effective applications of precision medicine in this domain.
Collapse
Affiliation(s)
- S Balestrini
- Neuroscience Department, Meyer Children's Hospital IRCSS, Florence, Italy
- University of Florence, Florence, Italy
- Department of Clinical and Experimental Epilepsy, UCL Queen Square Institute of Neurology, London, UK
- Chalfont Centre for Epilepsy, Chalfont St Peter, UK
| | - D Mei
- Neuroscience Department, Meyer Children's Hospital IRCSS, Florence, Italy
| | - S M Sisodiya
- Department of Clinical and Experimental Epilepsy, UCL Queen Square Institute of Neurology, London, UK
- Chalfont Centre for Epilepsy, Chalfont St Peter, UK
| | - Renzo Guerrini
- Neuroscience Department, Meyer Children's Hospital IRCSS, Florence, Italy.
- University of Florence, Florence, Italy.
| |
Collapse
|