1
|
Groza C, Ge B, Cheung WA, Pastinen T, Bourque G. Expanded methylome and quantitative trait loci detection by long-read profiling of personal DNA. Genome Res 2025; 35:644-652. [PMID: 40113263 DOI: 10.1101/gr.279240.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Accepted: 02/11/2025] [Indexed: 03/22/2025]
Abstract
Structural variants (SVs) are omnipresent in human DNA, yet their genotype and methylation statuses are rarely characterized due to previous limitations in genome assembly and detection of modified nucleotides. Also, the extent to which SVs act as methylation quantitative trait loci (SV-mQTLs) is largely unknown. Here, we generated a pangenome graph summarizing SVs in 782 de novo assemblies obtained from Genomic Answers for Kids, capturing 14.6 million CpG dinucleotides that are absent from the CHM13v2 reference (SV-CpGs), thus expanding their number by 43.6%. Using 435 methylomes, we genotyped 4.06 million SV-CpGs, of which 3.93 million (96.8%) are methylated at least once. Nonrepeat sequences contribute 1.59 × 106 novel SV-CpGs, followed by centromeric satellites (6.57 × 105), simple repeats (5.40 × 105), Alu elements (5.07 × 105), satellites (2.17 × 105), LINE-1s (1.83 × 105), and SVA (SINE-VNTR-Alu) elements (1.50 × 105). Centromeric satellites, simple repeats, and SVAs are overrepresented in SV-CpGs versus reference CpGs. Similarly, methylation levels in SV-CpGs are more variable than in reference CpGs. To explore if SVs are potentially causal for functional variation, we measured SV-mQTLs. This revealed over 230,464 methylation bins where the methylation is associated with common SVs within 100 kbp. Finally, we identified 65,659 methylation bins (28.5%) where the leading QTL variant is an SV. In conclusion, we demonstrate that graph pangenomes provide full SV structures, the associated methylation variation, and reveal tens of thousands of SV-mQTLs, underscoring the importance of assembly based analyses of human traits.
Collapse
Affiliation(s)
- Cristian Groza
- Université de Montréal, Montréal Heart Institute, Montréal, Québec H1T 1C8, Canada
| | - Bing Ge
- McGill University, McGill University and Genome Quebec Innovation Centre, Montréal, Québec H3A 2T8, Canada
| | - Warren A Cheung
- Children's Mercy Hospital and Research Institute, Genomic Medicine Center, Kansas City, Missouri 64108, USA
| | - Tomi Pastinen
- Children's Mercy Hospital and Research Institute, Genomic Medicine Center, Kansas City, Missouri 64108, USA;
| | - Guillaume Bourque
- McGill University, Human Genetics, Montréal, Québec H3A 0C7, Canada;
- Canadian Center for Computational Genomics, McGill University, Montréal, Québec H3A 2R7, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec H3A 0G1, Canada
| |
Collapse
|
2
|
Montano C, Timp W. Evolution of genome-wide methylation profiling technologies. Genome Res 2025; 35:572-582. [PMID: 40228903 DOI: 10.1101/gr.278407.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2025]
Abstract
In this mini-review, we explore the advancements in genome-wide DNA methylation profiling, tracing the evolution from traditional methods such as methylation arrays and whole-genome bisulfite sequencing to the cutting-edge single-molecule profiling enabled by long-read sequencing (LRS) technologies. We highlight how LRS is transforming clinical and translational research, particularly by its ability to simultaneously measure genetic and epigenetic information, providing a more comprehensive understanding of complex disease mechanisms. We discuss current challenges and future directions in the field, emphasizing the need for innovative computational tools and robust, reproducible approaches to fully harness the capabilities of LRS in molecular diagnostics.
Collapse
Affiliation(s)
- Carolina Montano
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA
- Division of Human Genetics, Department of Pediatrics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA;
| |
Collapse
|
3
|
Rausch T, Marschall T, Korbel JO. The impact of long-read sequencing on human population-scale genomics. Genome Res 2025; 35:593-598. [PMID: 40228902 DOI: 10.1101/gr.280120.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2025]
Abstract
Long-read sequencing technologies, particularly those from Pacific Biosciences and Oxford Nanopore Technologies, are revolutionizing genome research by providing high-resolution insights into complex and repetitive regions of the human genome that were previously inaccessible. These advances have been particularly enabling for the comprehensive detection of genomic structural variants (SVs), which is critical for linking genotype to phenotype in population-scale and rare disease studies, as well as in cancer. Recent developments in sequencing throughput and computational methods, such as pangenome graphs and haplotype-resolved assemblies, are paving the way for the future inclusion of long-read sequencing in clinical cohort studies and disease diagnostics. DNA methylation signals directly obtained from long reads enhance the utility of single-molecule long-read sequencing technologies by enabling molecular phenotypes to be interpreted, and by allowing the identification of the parent of origin of de novo mutations. Despite this recent progress, challenges remain in scaling long-read technologies to large populations due to cost, computational complexity, and the lack of tools to facilitate the efficient interpretation of SVs in graphs. This perspective provides a succinct review on the current state of long-read sequencing in genomics by highlighting its transformative potential and key hurdles, and emphasizing future opportunities for advancing the understanding of human genetic diversity and diseases through population-scale long-read analysis.
Collapse
Affiliation(s)
- Tobias Rausch
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany;
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, 40225 Düsseldorf, Germany;
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany;
| |
Collapse
|
4
|
Del Gobbo GF, Boycott KM. The additional diagnostic yield of long-read sequencing in undiagnosed rare diseases. Genome Res 2025; 35:559-571. [PMID: 39900460 DOI: 10.1101/gr.279970.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2025]
Abstract
Long-read sequencing (LRS) is a promising technology positioned to study the significant proportion of rare diseases (RDs) that remain undiagnosed as it addresses many of the limitations of short-read sequencing, detecting and clarifying additional disease-associated variants that may be missed by the current standard diagnostic workflow for RDs. Some key areas where additional diagnostic yields may be realized include: (1) detection and resolution of structural variants (SVs); (2) detection and characterization of tandem repeat expansions; (3) coverage of regions of high sequence similarity; (4) variant phasing; (5) the use of de novo genome assemblies for reference-based or graph genome variant detection; and (6) epigenetic and transcriptomic evaluations. Examples from over 50 studies support that the main areas of added diagnostic yield currently lie in SV detection and characterization, repeat expansion assessment, and phasing (with or without DNA methylation information). Several emerging studies applying LRS in cohorts of undiagnosed RDs also demonstrate that LRS can boost diagnostic yields following negative standard-of-care clinical testing and provide an added yield of 7%-17% following negative short-read genome sequencing. With this evidence of improved diagnostic yield, we discuss the incorporation of LRS into the diagnostic care pathway for undiagnosed RDs, including current challenges and considerations, with the ultimate goal of ending the diagnostic odyssey for countless individuals with RDs.
Collapse
Affiliation(s)
- Giulia F Del Gobbo
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Ontario, Canada K1H 5B2
| | - Kym M Boycott
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Ontario, Canada K1H 5B2;
- Department of Genetics, Children's Hospital of Eastern Ontario, Ottawa, Ontario, Canada K1H 8L1
| |
Collapse
|
5
|
Genner R, Akeson S, Meredith M, Jerez PA, Malik L, Baker B, Miano-Burkhardt A, Paten B, Billingsley KJ, Blauwendraat C, Jain M. Assessing DNA methylation detection for primary human tissue using Nanopore sequencing. Genome Res 2025; 35:632-643. [PMID: 40054862 DOI: 10.1101/gr.279159.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 02/11/2025] [Indexed: 03/12/2025]
Abstract
DNA methylation most commonly occurs as 5-methylcytosine (5mC) in the human genome and has been associated with human diseases. Recent developments in single-molecule sequencing technologies (Oxford Nanopore Technologies [ONT] and Pacific Biosciences [PacBio]) have enabled readouts of long, native DNA molecules, including cytosine methylation. ONT recently upgraded their Nanopore sequencing chemistry and kits from the R9 to the R10 version, which yielded increased accuracy and sequencing throughput. However, the effects on methylation detection have not yet been documented. Here, we performed a series of computational analyses to characterize differences in Nanopore-based 5mC detection between the ONT R9 and R10 chemistries. We compared 5mC calls in R9 and R10 for three human genome data sets: a cell line, a frontal cortex brain sample, and a blood sample. We performed an in-depth analysis on CpG islands and homopolymer regions, and documented high concordance for methylation detection among sequencing technologies. The strongest correlation was observed between Nanopore R10 and Illumina bisulfite technologies for cell line-derived data sets. Subtle differences in methylation data sets between technologies can impact analysis tools such as differential methylation calling software. Our findings show that comparisons can be drawn between methylation data from different Nanopore chemistries using guided hypotheses. This work will facilitate comparison among Nanopore data cohorts derived using different chemistries from large-scale sequencing efforts, such as the NIH CARD Long Read Initiative.
Collapse
Affiliation(s)
- Rylee Genner
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland 20892, USA
- Department of Biology, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Stuart Akeson
- Department of Bioengineering, Northeastern University, Boston, Massachusetts 02115, USA
| | - Melissa Meredith
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA
| | - Pilar Alvarez Jerez
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland 20892, USA
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London WC1N 3BG, United Kingdom
| | - Laksh Malik
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Breeana Baker
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | - Benedict Paten
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA
| | - Kimberley J Billingsley
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland 20892, USA;
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland 20892, USA
| | - Cornelis Blauwendraat
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland 20892, USA;
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland 20892, USA
| | - Miten Jain
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland 20892, USA;
- Department of Bioengineering, Northeastern University, Boston, Massachusetts 02115, USA
- Department of Physics, Northeastern University, Boston, Massachusetts 02115, USA
- Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts 02115, USA
| |
Collapse
|
6
|
Han H, Lee HH, Kim MG, Shin YS, Chung JS, Kim J. Genome assembly resources of genitourinary cancers for chromosomal aberration at the single nucleotide level. Sci Data 2025; 12:550. [PMID: 40169664 PMCID: PMC11962096 DOI: 10.1038/s41597-025-04801-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 03/11/2025] [Indexed: 04/03/2025] Open
Abstract
Traditionally, the evolutionary perspective of cancer has been understood as gradual alterations in passenger/driver genes that lead to branching phylogeny. However, in cases of prostate adenocarcinoma and kidney renal cell carcinoma, macroevolutionary landmarks like chromoplexy and chromothripsis are frequently observed. Unfortunately, short-read sequencing techniques often miss these significant macroevolutionary changes, which involve multiple translocations and deletions at the chromosomal level. To resolve such genomic dark matters, we provided high-fidelity long-read sequencing data (78-92 Gb of ~Q30 reads) of six genitourinary tumour cell lines (one benign kidney tumour and two kidney and three prostate cancers). Based on these data, we obtained 12 high-quality, partially phased genome assemblies (Contig N50 1.85-29.01 Mb; longest contig 2.02-171.62 Mb), graph-based pan-genome variant sets (11.57 M variants including 60 K structural variants), and 5-methylcytosine sites (14.68%-27.05% of the CpG sites). We also identified several severe chromosome aberration events, which would result from chromosome break and fusion events. Our cancer genome assemblies will provide unprecedented resolution to understand cancer genome instability and chromosomal aberration.
Collapse
Affiliation(s)
- Hyunho Han
- Department of Urology, Urological Science Institute, Yonsei University College of Medicine, Seoul, Republic of Korea.
| | - Hyung Ho Lee
- Center for Urologic Cancer, National Cancer Center, Goyang, Republic of Korea
| | - Min Gyu Kim
- Center for Urologic Cancer, National Cancer Center, Goyang, Republic of Korea
| | - Yoo Sub Shin
- Department of Urology, Urological Science Institute, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Jin Soo Chung
- Center for Urologic Cancer, National Cancer Center, Goyang, Republic of Korea.
| | - Jun Kim
- Department of Convergent Bioscience and Informatics, College of Bioscience and Biotechnology, Chungnam National University, Daejeon, 34134, Korea.
| |
Collapse
|
7
|
Théberge ET, Durbano K, Demailly D, Huby S, Mitina A, Yin Y, Mohajeri A, van Karnebeek C, Horvath GA, Yuen RKC, Usdin K, Lehman A, Cif L, Richmond PA. Disco-Interacting Protein 2 Homolog B CGG Repeat Expansion in Siblings with Neurodevelopmental Disability and Progressive Movement Disorder. Mov Disord 2025; 40:567-578. [PMID: 39854091 DOI: 10.1002/mds.30101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 11/18/2024] [Accepted: 12/13/2024] [Indexed: 01/26/2025] Open
Abstract
BACKGROUND Trinucleotide repeat expansions are an emerging class of genetic variants associated with various movement disorders. Unbiased genome-wide analyses can reveal novel genotype-phenotype associations and provide a diagnosis for patients and families. OBJECTIVE The aim was to identify the genetic cause of a severe progressive movement disorder phenotype in 2 affected brothers. METHODS A family of 2 affected brothers and unaffected parents had extensive phenotyping since birth. Whole-genome and long-read sequencing methods characterized genetic variants and methylation status. RESULTS Two male siblings with a CGG repeat expansion in the 5'-untranslated region (UTR) of disco-interacting protein 2 homolog B (DIP2B) presented with a novel DIP2B phenotype, including neurodevelopmental disability, dysmorphic traits, and a severe progressive movement disorder (chorea, dystonia, and ataxia). CONCLUSIONS This is the first report of a severe progressive movement disorder phenotype associated with a CGG repeat expansion in the DIP2B 5'-UTR. © 2025 International Parkinson and Movement Disorder Society. This article has been contributed to by U.S. Government employees and their work is in the public domain in the USA.
Collapse
Affiliation(s)
- Emilie T Théberge
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Kate Durbano
- Department of Neurology, CHU Montpellier, Montpellier, France
| | - Diane Demailly
- Department of Neurology, Clinique Beau Soleil, Institut Mutualiste Montpelliérain, Montpellier, France
| | - Sophie Huby
- Department of Neurology, CHU Montpellier, Montpellier, France
| | - Aleksandra Mitina
- Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Yue Yin
- Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Arezoo Mohajeri
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Clara van Karnebeek
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
- Emma Center for Personalized Medicine, Departments of Pediatrics and Human Genetics, Amsterdam Gastroenterology Endocrinology Metabolism, Amsterdam UMC, Amsterdam, The Netherlands
| | - Gabriella A Horvath
- Department of Pediatrics, British Columbia Children's Hospital, Vancouver, British Columbia, Canada
| | - Ryan K C Yuen
- Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Karen Usdin
- Section on Gene Structure and Disease, Laboratory of Cell and Molecular Biology, National Institute of Diabetes, Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Anna Lehman
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Laura Cif
- Department of Neurosurgery, CHU Montpellier, Montpellier, France
- Service of Neurology, Department of Clinical Neurosciences, Lausanne University Hospital (CHUV) and University of Lausanne (UNIL), Lausanne, Switzerland
| | - Phillip A Richmond
- British Columbia Children's Hospital Research Institute, Vancouver, British Columbia, Canada
| |
Collapse
|
8
|
Negi S, Stenton SL, Berger SI, Canigiula P, McNulty B, Violich I, Gardner J, Hillaker T, O'Rourke SM, O'Leary MC, Carbonell E, Austin-Tse C, Lemire G, Serrano J, Mangilog B, VanNoy G, Kolmogorov M, Vilain E, O'Donnell-Luria A, Délot E, Miga KH, Monlong J, Paten B. Advancing long-read nanopore genome assembly and accurate variant calling for rare disease detection. Am J Hum Genet 2025; 112:428-449. [PMID: 39862869 DOI: 10.1016/j.ajhg.2025.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 12/22/2024] [Accepted: 01/02/2025] [Indexed: 01/27/2025] Open
Abstract
More than 50% of families with suspected rare monogenic diseases remain unsolved after whole-genome analysis by short-read sequencing (SRS). Long-read sequencing (LRS) could help bridge this diagnostic gap by capturing variants inaccessible to SRS, facilitating long-range mapping and phasing and providing haplotype-resolved methylation profiling. To evaluate LRS's additional diagnostic yield, we sequenced a rare-disease cohort of 98 samples from 41 families, using nanopore sequencing, achieving per sample ∼36× average coverage and 32-kb read N50 from a single flow cell. Our Napu pipeline generated assemblies, phased variants, and methylation calls. LRS covered, on average, coding exons in ∼280 genes and ∼5 known Mendelian disease-associated genes that were not covered by SRS. In comparison to SRS, LRS detected additional rare, functionally annotated variants, including structural variants (SVs) and tandem repeats, and completely phased 87% of protein-coding genes. LRS detected additional de novo variants and could be used to distinguish postzygotic mosaic variants from prezygotic de novos. Diagnostic variants were established by LRS in 11 probands, with diverse underlying genetic causes including de novo and compound heterozygous variants, large-scale SVs, and epigenetic modifications. Our study demonstrates LRS's potential to enhance diagnostic yield for rare monogenic diseases, implying utility in future clinical genomics workflows.
Collapse
Affiliation(s)
- Shloka Negi
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Sarah L Stenton
- Center for Mendelian Genomics, Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Seth I Berger
- Children's National Research Institute, Washington, DC, USA
| | | | - Brandy McNulty
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Ivo Violich
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Joshua Gardner
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Todd Hillaker
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Sara M O'Rourke
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Melanie C O'Leary
- Center for Mendelian Genomics, Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Elizabeth Carbonell
- Center for Mendelian Genomics, Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Christina Austin-Tse
- Center for Mendelian Genomics, Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Gabrielle Lemire
- Center for Mendelian Genomics, Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Jillian Serrano
- Center for Mendelian Genomics, Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Brian Mangilog
- Center for Mendelian Genomics, Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Grace VanNoy
- Center for Mendelian Genomics, Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mikhail Kolmogorov
- Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Eric Vilain
- Institute for Clinical and Translational Science, University of California, Irvine, Irvine, CA, USA
| | - Anne O'Donnell-Luria
- Center for Mendelian Genomics, Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Emmanuèle Délot
- Institute for Clinical and Translational Science, University of California, Irvine, Irvine, CA, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Jean Monlong
- Institut de Recherche en Santé Digestive, Université de Toulouse, INSERM, INRA, ENVT, UPS, Toulouse, France.
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA.
| |
Collapse
|
9
|
Dishuck PC, Munson KM, Lewis AP, Dougherty ML, Underwood JG, Harvey WT, Hsieh P, Pastinen T, Eichler EE. Structural variation, selection, and diversification of the NPIP gene family from the human pangenome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.04.636496. [PMID: 39975192 PMCID: PMC11838601 DOI: 10.1101/2025.02.04.636496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
The NPIP (nuclear pore interacting protein) gene family has expanded to high copy number in humans and African apes where it has been subject to an excess of amino acid replacement consistent with positive selection (1). Due to the limitations of short-read sequencing, NPIP human genetic diversity has been poorly understood. Using highly accurate assemblies generated from long-read sequencing as part of the human pangenome, we completely characterize 169 human haplotypes (4,665 NPIP paralogs and alleles). Of the 28 NPIP paralogs, just three (NPIPB2, B11, and B14) are fixed at a single copy, and only a single locus, B2, shows no structural variation. Four NPIP paralogs map to large segmental duplication blocks that mediate polymorphic inversions (355 kbp-1.6 Mbp) corresponding to microdeletions associated with developmental delay and autism. Haplotype-based tests of positive selection and selective sweeps identify two paralogs, B9 and B15, within the top percentile for both tests. Using full-length cDNA data from 101 tissue/cell types, we construct paralog-specific gene models and show that 56% (31/55 most abundant isoforms) have not been previously described in RefSeq. We define six distinct translation start sites and other protein structural features that distinguish paralogs, including a variable number tandem repeat that encodes a beta helix of variable size that emerged ~3.1 million years ago in human evolution. Among the 28 NPIP paralogs, we identify distinct tissue and developmental patterns of expression with only a few maintaining the ancestral testis-enriched expression. A subset of paralogs (NPIPA1, A5, A6-9, B3-5, and B12/B13) show increased brain expression. Our results suggest ongoing positive selection in the human population and rapid diversification of NPIP gene models.
Collapse
Affiliation(s)
- Philip C. Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P. Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Max L. Dougherty
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Present address: Tisch Cancer Institute, Division of Hematology and Medical Oncology, The Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jason G. Underwood
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Pacific Biosciences (PacBio) of California, Incorporated, Menlo Park, CA, USA
| | - William T. Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genetics, Cell Biology, and Development, Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA
| | - Tomi Pastinen
- Genomic Medicine Center, Department of Pediatrics, Children’s Mercy Kansas City, Kansas City, KS, USA
- UMKC School of Medicine, University of Missouri, Kansas City, Kansas City, KS, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
10
|
Akbari V, Dada S, Shen Y, Dixon K, Hejla D, Galbraith A, Choufani S, Weksberg R, Boerkoel CF, Stewart L, Gibson WT, Jones SJM. Long-read sequencing for detection and subtyping of Prader-Willi and Angelman syndromes. J Med Genet 2024; 62:32-36. [PMID: 39537351 DOI: 10.1136/jmg-2024-110115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 10/23/2024] [Indexed: 11/16/2024]
Abstract
Prader-Willi syndrome (PWS) and Angelman syndrome (AS) are imprinting disorders caused by genetic or epigenetic aberrations of 15q11.2-q13. Their clinical testing is often multitiered; diagnostic testing begins with methylation-specific multiplex ligation-dependent probe amplification or methylation-sensitive PCR and then proceeds to molecular subtyping to determine the mechanism and recurrence risk. Currently, correct classification of a proband's PWS/AS subtype often requires parental samples, a costly process for families and health systems. The use of nanopore sequencing for molecular diagnosis of PWS and AS has been explored by Yamada et al; however, to confirm heterodisomy parental data were still required. Here, we investigate genome-wide nanopore sequencing in a larger cohort of PWS (18) and AS (6) as a singular test to detect the molecular subtype, without parental data. We accurately subtyped these cases including uniparental heterodisomy, mixed iso-/heterodisomy, type 1 and 2 deletions, microdeletion and UBE3A indels. One PWS case with a previously unresolved diagnosis subtyped as maternal isodisomy. This work highlights the application of long-read sequencing and other imprinted regions outside of the PWS/AS critical region to resolve the molecular diagnosis and subtyping of PWS and AS without parental data. The work also outlines an approach to generically detect heterodisomy through the interrogation of distant imprinted regions.
Collapse
Affiliation(s)
- Vahid Akbari
- Canada's Michael Smith Genome Sciences Centre, Vancouver, British Columbia, Canada
- Department of Medical Genetics, The University of British Columbia, Vancouver, British Columbia, Canada
| | - Sarah Dada
- Canada's Michael Smith Genome Sciences Centre, Vancouver, British Columbia, Canada
- Bioinformatics Graduate Program, The University of British Columbia, Vancouver, British Columbia, Canada
| | - Yaoqing Shen
- Canada's Michael Smith Genome Sciences Centre, Vancouver, British Columbia, Canada
| | - Katherine Dixon
- Canada's Michael Smith Genome Sciences Centre, Vancouver, British Columbia, Canada
- Department of Medical Genetics, The University of British Columbia, Vancouver, British Columbia, Canada
| | - Duha Hejla
- BC Children's Hospital, Vancouver, British Columbia, Canada
- Division of Endocrinology, Department of Pediatrics, The University of British Columbia, Vancouver, British Columbia, Canada
| | - Andrew Galbraith
- Canada's Michael Smith Genome Sciences Centre, Vancouver, British Columbia, Canada
- Bioinformatics Graduate Program, The University of British Columbia, Vancouver, British Columbia, Canada
| | - Sanaa Choufani
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Rosanna Weksberg
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Cornelius F Boerkoel
- Department of Medical Genetics, The University of British Columbia, Vancouver, British Columbia, Canada
- BC Women's Hospital, Vancouver, British Columbia, Canada
| | - Laura Stewart
- BC Children's Hospital, Vancouver, British Columbia, Canada
- Division of Endocrinology, Department of Pediatrics, The University of British Columbia, Vancouver, British Columbia, Canada
| | - William T Gibson
- Department of Medical Genetics, The University of British Columbia, Vancouver, British Columbia, Canada
- BC Children's Hospital, Vancouver, British Columbia, Canada
| | - Steven J M Jones
- Canada's Michael Smith Genome Sciences Centre, Vancouver, British Columbia, Canada
- Department of Medical Genetics, The University of British Columbia, Vancouver, British Columbia, Canada
- Bioinformatics Graduate Program, The University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
11
|
Tan JW, Blake EJ, Farris JD, Klee EW. Expanding Upon Genomics in Rare Diseases: Epigenomic Insights. Int J Mol Sci 2024; 26:135. [PMID: 39795993 PMCID: PMC11719497 DOI: 10.3390/ijms26010135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Revised: 12/19/2024] [Accepted: 12/24/2024] [Indexed: 01/13/2025] Open
Abstract
DNA methylation is an essential epigenetic modification that plays a crucial role in regulating gene expression and maintaining genomic stability. With the advancement in sequencing technology, methylation studies have provided valuable insights into the diagnosis of rare diseases through the various identification of episignatures, epivariation, epioutliers, and allele-specific methylation. However, current methylation studies are not without limitations. This mini-review explores the current understanding of DNA methylation in rare diseases, highlighting the key mechanisms and diagnostic potential, and emphasizing the need for advanced methodologies and integrative approaches to enhance the understanding of disease progression and design more personable treatment for patients, given the nature of rare diseases.
Collapse
Affiliation(s)
| | | | | | - Eric W. Klee
- Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.W.T.); (E.J.B.); (J.D.F.)
| |
Collapse
|
12
|
Guitart X, Porubsky D, Yoo D, Dougherty ML, Dishuck PC, Munson KM, Lewis AP, Hoekzema K, Knuth J, Chang S, Pastinen T, Eichler EE. Independent expansion, selection, and hypervariability of the TBC1D3 gene family in humans. Genome Res 2024; 34:1798-1810. [PMID: 39107043 DOI: 10.1101/gr.279299.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 07/29/2024] [Indexed: 08/09/2024]
Abstract
TBC1D3 is a primate-specific gene family that has expanded in the human lineage and has been implicated in neuronal progenitor proliferation and expansion of the frontal cortex. The gene family and its expression have been challenging to investigate because it is embedded in high-identity and highly variable segmental duplications. We sequenced and assembled the gene family using long-read sequencing data from 34 humans and 11 nonhuman primate species. Our analysis shows that this particular gene family has independently duplicated in at least five primate lineages, and the duplicated loci are enriched at sites of large-scale chromosomal rearrangements on Chromosome 17. We find that all human copy-number variation maps to two distinct clusters located at Chromosome 17q12 and that humans are highly structurally variable at this locus, differing by as many as 20 copies and ∼1 Mbp in length depending on haplotypes. We also show evidence of positive selection, as well as a significant change in the predicted human TBC1D3 protein sequence. Last, we find that, despite multiple duplications, human TBC1D3 expression is limited to a subset of copies and, most notably, from a single paralog group: TBC1D3-CDKL These observations may help explain why a gene potentially important in cortical development can be so variable in the human population.
Collapse
Affiliation(s)
- Xavi Guitart
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Max L Dougherty
- Tisch Cancer Institute, Division of Hematology and Medical Oncology, The Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Jordan Knuth
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Stephen Chang
- Department of Biochemistry
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University, Stanford, California 94305, USA
| | - Tomi Pastinen
- Department of Pediatrics, Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, Missouri 64108, USA
- Department of Pediatrics, School of Medicine, University of Missouri Kansas City, Kansas City, Missouri 64108, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA;
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
13
|
Dolzhenko E, English A, Dashnow H, De Sena Brandine G, Mokveld T, Rowell WJ, Karniski C, Kronenberg Z, Danzi MC, Cheung WA, Bi C, Farrow E, Wenger A, Chua KP, Martínez-Cerdeño V, Bartley TD, Jin P, Nelson DL, Zuchner S, Pastinen T, Quinlan AR, Sedlazeck FJ, Eberle MA. Characterization and visualization of tandem repeats at genome scale. Nat Biotechnol 2024; 42:1606-1614. [PMID: 38168995 PMCID: PMC11921810 DOI: 10.1038/s41587-023-02057-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 11/06/2023] [Indexed: 01/05/2024]
Abstract
Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there is still a need for computational methods to profile TRs across the genome. Here we introduce the Tandem Repeat Genotyping Tool (TRGT) and an accompanying TR database. TRGT determines the consensus sequences and methylation levels of specified TRs from PacBio HiFi sequencing data. It also reports reads that support each repeat allele. These reads can be subsequently visualized with a companion TR visualization tool. Assessing 937,122 TRs, TRGT showed a Mendelian concordance of 98.38%, allowing a single repeat unit difference. In six samples with known repeat expansions, TRGT detected all expansions while also identifying methylation signals and mosaicism and providing finer repeat length resolution than existing methods. Additionally, we released a database with allele sequences and methylation levels for 937,122 TRs across 100 genomes.
Collapse
Affiliation(s)
| | - Adam English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Harriet Dashnow
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | | | - Tom Mokveld
- Pacific Biosciences of California, Menlo Park, CA, USA
| | | | | | | | - Matt C Danzi
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Warren A Cheung
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Chengpeng Bi
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Emily Farrow
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron Wenger
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Khi Pin Chua
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Verónica Martínez-Cerdeño
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
- MIND Institute, UC Davis School of Medicine, Sacramento, CA, USA
| | - Trevor D Bartley
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
| | - Peng Jin
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - David L Nelson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Stephan Zuchner
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Tomi Pastinen
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron R Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | | |
Collapse
|
14
|
Smail C, Ge B, Keever-Keigher MR, Schwendinger-Schreck C, Cheung WA, Johnston JJ, Barrett C, Feldman K, Cohen ASA, Farrow EG, Thiffault I, Grundberg E, Pastinen T. Complex trait associations in rare diseases and impacts on Mendelian variant interpretation. Nat Commun 2024; 15:8196. [PMID: 39294130 PMCID: PMC11411080 DOI: 10.1038/s41467-024-52407-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 09/05/2024] [Indexed: 09/20/2024] Open
Abstract
Emerging evidence implicates common genetic variation - aggregated into polygenic scores (PGS) - in the onset and phenotypic presentation of rare diseases. Here, we comprehensively map individual polygenic liability for 1102 open-source PGS in a cohort of 3059 probands enrolled in the Genomic Answers for Kids (GA4K) rare disease study, revealing widespread associations between rare disease phenotypes and PGSs for common complex diseases and traits, blood protein levels, and brain and other organ morphological measurements. Using this resource, we demonstrate increased polygenic liability in probands with an inherited candidate disease variant (VUS) compared to unaffected carrier parents. Further, we show an enrichment for large-effect rare variants in putative core PGS genes for associated complex traits. Overall, our study supports and expands on previous findings of complex trait associations in rare diseases, implicates polygenic liability as a potential mechanism underlying variable penetrance of candidate causal variants, and provides a framework for identifying novel candidate rare disease genes.
Collapse
Affiliation(s)
- Craig Smail
- Genomic Medicine Center, Department of Pediatrics, Children's Mercy Kansas City, Kansas City, USA.
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, USA.
| | - Bing Ge
- Department of Human Genetics, McGill University, Montreal, Canada
| | - Marissa R Keever-Keigher
- Genomic Medicine Center, Department of Pediatrics, Children's Mercy Kansas City, Kansas City, USA
| | | | - Warren A Cheung
- Genomic Medicine Center, Department of Pediatrics, Children's Mercy Kansas City, Kansas City, USA
| | - Jeffrey J Johnston
- Genomic Medicine Center, Department of Pediatrics, Children's Mercy Kansas City, Kansas City, USA
| | - Cassandra Barrett
- Genomic Medicine Center, Department of Pediatrics, Children's Mercy Kansas City, Kansas City, USA
| | - Keith Feldman
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, USA
- Health Outcomes and Health Services Research, Department of Pediatrics, Children's Mercy Kansas City, Kansas City, USA
| | - Ana S A Cohen
- Genomic Medicine Center, Department of Pediatrics, Children's Mercy Kansas City, Kansas City, USA
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, USA
- Department of Pathology and Laboratory Medicine, Children's Mercy Kansas City, Kansas City, USA
| | - Emily G Farrow
- Genomic Medicine Center, Department of Pediatrics, Children's Mercy Kansas City, Kansas City, USA
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, USA
- Department of Pediatrics, Children's Mercy Kansas City, Kansas City, USA
| | - Isabelle Thiffault
- Genomic Medicine Center, Department of Pediatrics, Children's Mercy Kansas City, Kansas City, USA
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, USA
- Department of Pathology and Laboratory Medicine, Children's Mercy Kansas City, Kansas City, USA
| | - Elin Grundberg
- Genomic Medicine Center, Department of Pediatrics, Children's Mercy Kansas City, Kansas City, USA
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, USA
| | - Tomi Pastinen
- Genomic Medicine Center, Department of Pediatrics, Children's Mercy Kansas City, Kansas City, USA.
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, USA.
| |
Collapse
|
15
|
Engelbrecht E, Rodriguez OL, Watson CT. Addressing Technical Pitfalls in Pursuit of Molecular Factors That Mediate Immunoglobulin Gene Regulation. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2024; 213:651-662. [PMID: 39007649 PMCID: PMC11333172 DOI: 10.4049/jimmunol.2400131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 06/13/2024] [Indexed: 07/16/2024]
Abstract
The expressed Ab repertoire is a critical determinant of immune-related phenotypes. Ab-encoding transcripts are distinct from other expressed genes because they are transcribed from somatically rearranged gene segments. Human Abs are composed of two identical H and L chain polypeptides derived from genes in IGH locus and one of two L chain loci. The combinatorial diversity that results from Ab gene rearrangement and the pairing of different H and L chains contributes to the immense diversity of the baseline Ab repertoire. During rearrangement, Ab gene selection is mediated by factors that influence chromatin architecture, promoter/enhancer activity, and V(D)J recombination. Interindividual variation in the composition of the Ab repertoire associates with germline variation in IGH, implicating polymorphism in Ab gene regulation. Determining how IGH variants directly mediate gene regulation will require integration of these variants with other functional genomic datasets. In this study, we argue that standard approaches using short reads have limited utility for characterizing regulatory regions in IGH at haplotype resolution. Using simulated and chromatin immunoprecipitation sequencing reads, we define features of IGH that limit use of short reads and a single reference genome, namely 1) the highly duplicated nature of the DNA sequence in IGH and 2) structural polymorphisms that are frequent in the population. We demonstrate that personalized diploid references enhance performance of short-read data for characterizing mappable portions of the locus, while also showing that long-read profiling tools will ultimately be needed to fully resolve functional impacts of IGH germline variation on expressed Ab repertoires.
Collapse
Affiliation(s)
- Eric Engelbrecht
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY
| | - Oscar L Rodriguez
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY
| |
Collapse
|
16
|
Negi S, Stenton SL, Berger SI, McNulty B, Violich I, Gardner J, Hillaker T, O'Rourke SM, O'Leary MC, Carbonell E, Austin-Tse C, Lemire G, Serrano J, Mangilog B, VanNoy G, Kolmogorov M, Vilain E, O'Donnell-Luria A, Délot E, Miga KH, Monlong J, Paten B. Advancing long-read nanopore genome assembly and accurate variant calling for rare disease detection. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.08.22.24312327. [PMID: 39228712 PMCID: PMC11370519 DOI: 10.1101/2024.08.22.24312327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
More than 50% of families with suspected rare monogenic diseases remain unsolved after whole genome analysis by short read sequencing (SRS). Long-read sequencing (LRS) could help bridge this diagnostic gap by capturing variants inaccessible to SRS, facilitating long-range mapping and phasing, and providing haplotype-resolved methylation profiling. To evaluate LRS's additional diagnostic yield, we sequenced a rare disease cohort of 98 samples, including 41 probands and some family members, using nanopore sequencing, achieving per sample ∼36x average coverage and 32 kilobase (kb) read N50 from a single flow cell. Our Napu pipeline generated assemblies, phased variants, and methylation calls. LRS covered, on average, coding exons in ∼280 genes and ∼5 known Mendelian disease genes that were not covered by SRS. In comparison to SRS, LRS detected additional rare, functionally annotated variants, including SVs and tandem repeats, and completely phased 87% of protein-coding genes. LRS detected additional de novo variants, and could be used to distinguish postzygotic mosaic variants from prezygotic de novos . Eleven probands were solved, with diverse underlying genetic causes including de novo and compound heterozygous variants, large-scale SVs, and epigenetic modifications. Our study demonstrates LRS's potential to enhance diagnostic yield for rare monogenic diseases, implying utility in future clinical genomics workflows.
Collapse
|
17
|
Trégouët DA, Morange PE. Next-generation sequencing strategies in venous thromboembolism: in whom and for what purpose? J Thromb Haemost 2024; 22:1826-1834. [PMID: 38641321 DOI: 10.1016/j.jtha.2024.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/04/2024] [Accepted: 04/05/2024] [Indexed: 04/21/2024]
Abstract
This invited review follows the oral presentation "To Sequence or Not to Sequence, That Is Not the Question; But 'When, Who, Which and What For?' Is" given during the State of the Art session "Translational Genomics in Thrombosis: From OMICs to Clinics" of the International Society on Thrombosis and Haemostasis 2023 Congress. Emphasizing the power of next-generation sequencing technologies and the diverse strategies associated with DNA variant analysis, this review highlights the unresolved questions and challenges in their implementation both for the clinical diagnosis of venous thromboembolism and in translational research.
Collapse
Affiliation(s)
- David-Alexandre Trégouët
- University of Bordeaux, Institut National de la Santé et de la Recherche Médicale, Bordeaux Population Health Research Center, Unité Mixte de Recherche 1219, Bordeaux, France.
| | - Pierre-Emmanuel Morange
- Cardiovascular and Nutrition Research Center (Centre de Recherche en CardioVasculaire et Nutrition), Institut National de la Santé et de la Recherche Médicale, Institut National de Recherche pour l'agriculture, l' Alimentation et l'Environnement, Aix-Marseille University, Marseille, France
| |
Collapse
|
18
|
Cheng H, Bai J, Zhou X, Chen N, Jiang Q, Ren Z, Li X, Su T, Liang L, Jiang W, Wang Y, Peng J, Shang A. Electrical stimulation with polypyrrole-coated polycaprolactone/silk fibroin scaffold promotes sacral nerve regeneration by modulating macrophage polarisation. BIOMATERIALS TRANSLATIONAL 2024; 5:157-174. [PMID: 39351163 PMCID: PMC11438605 DOI: 10.12336/biomatertransl.2024.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 05/18/2024] [Accepted: 06/22/2024] [Indexed: 10/04/2024]
Abstract
Peripheral nerve injury poses a great threat to neurosurgery and limits the regenerative potential of sacral nerves in the neurogenic bladder. It remains unknown whether electrical stimulation can facilitate sacral nerve regeneration in addition to modulate bladder function. The objective of this study was to utilise electrical stimulation in sacra nerve crush injury with newly constructed electroconductive scaffold and explore the role of macrophages in electrical stimulation with crushed nerves. As a result, we generated a polypyrrole-coated polycaprolactone/silk fibroin scaffold through which we applied electrical stimulation. The electrical stimulation boosted nerve regeneration and polarised the macrophages towards the M2 phenotype. An in vitro test using bone marrow derived macrophages revealed that the pro-regenerative polarisation of M2 were significantly enhanced by electrical stimulation. Bioinformatics analysis showed that the expression of signal transducer and activator of transcriptions (STATs) was differentially regulated in a way that promoted M2-related genes expression. Our work indicated the feasibility of electricals stimulation used for sacral nerve regeneration and provided a firm demonstration of a pivotal role which macrophages played in electrical stimulation.
Collapse
Affiliation(s)
- Haofeng Cheng
- School of Medicine, Nankai University, Tianjin, China
- Department of Neurosurgery, Chinese PLA General Hospital, Beijing, China
- Institute of Orthopedics, Chinese PLA General Hospital; Beijing Key Lab of Regenerative Medicine in Orthopedics; Key Laboratory of Musculoskeletal Trauma & War Injuries PLA; Beijing, China
- Co-innovation Center of Neuroregeneration; Nantong University, Nantong, Jiangsu Province, China
| | - Jun Bai
- Department of Neurosurgery, Chinese PLA General Hospital, Beijing, China
- Institute of Orthopedics, Chinese PLA General Hospital; Beijing Key Lab of Regenerative Medicine in Orthopedics; Key Laboratory of Musculoskeletal Trauma & War Injuries PLA; Beijing, China
- Co-innovation Center of Neuroregeneration; Nantong University, Nantong, Jiangsu Province, China
| | - Xingyu Zhou
- School of Medicine, Nankai University, Tianjin, China
- Department of Neurosurgery, Chinese PLA General Hospital, Beijing, China
- Institute of Orthopedics, Chinese PLA General Hospital; Beijing Key Lab of Regenerative Medicine in Orthopedics; Key Laboratory of Musculoskeletal Trauma & War Injuries PLA; Beijing, China
- Co-innovation Center of Neuroregeneration; Nantong University, Nantong, Jiangsu Province, China
| | - Nantian Chen
- School of Medicine, Nankai University, Tianjin, China
- Department of Neurosurgery, Chinese PLA General Hospital, Beijing, China
- Institute of Orthopedics, Chinese PLA General Hospital; Beijing Key Lab of Regenerative Medicine in Orthopedics; Key Laboratory of Musculoskeletal Trauma & War Injuries PLA; Beijing, China
- Co-innovation Center of Neuroregeneration; Nantong University, Nantong, Jiangsu Province, China
| | - Qingyu Jiang
- Department of Neurosurgery, Chinese PLA General Hospital, Beijing, China
- Institute of Orthopedics, Chinese PLA General Hospital; Beijing Key Lab of Regenerative Medicine in Orthopedics; Key Laboratory of Musculoskeletal Trauma & War Injuries PLA; Beijing, China
| | - Zhiqi Ren
- Department of Neurosurgery, Chinese PLA General Hospital, Beijing, China
- Institute of Orthopedics, Chinese PLA General Hospital; Beijing Key Lab of Regenerative Medicine in Orthopedics; Key Laboratory of Musculoskeletal Trauma & War Injuries PLA; Beijing, China
| | - Xiangling Li
- Institute of Orthopedics, Chinese PLA General Hospital; Beijing Key Lab of Regenerative Medicine in Orthopedics; Key Laboratory of Musculoskeletal Trauma & War Injuries PLA; Beijing, China
| | - Tianqi Su
- Department of Neurosurgery, Chinese PLA General Hospital, Beijing, China
- Institute of Orthopedics, Chinese PLA General Hospital; Beijing Key Lab of Regenerative Medicine in Orthopedics; Key Laboratory of Musculoskeletal Trauma & War Injuries PLA; Beijing, China
| | - Lijing Liang
- Institute of Orthopedics, Chinese PLA General Hospital; Beijing Key Lab of Regenerative Medicine in Orthopedics; Key Laboratory of Musculoskeletal Trauma & War Injuries PLA; Beijing, China
- Graduate School of Chinese PLA General Hospital, Beijing, China
- Department of Ultrasound, Chinese PLA General Hospital, Beijing, China
| | - Wenli Jiang
- Department of Ultrasound, Chinese PLA General Hospital, Beijing, China
| | - Yu Wang
- Institute of Orthopedics, Chinese PLA General Hospital; Beijing Key Lab of Regenerative Medicine in Orthopedics; Key Laboratory of Musculoskeletal Trauma & War Injuries PLA; Beijing, China
- Co-innovation Center of Neuroregeneration; Nantong University, Nantong, Jiangsu Province, China
| | - Jiang Peng
- Institute of Orthopedics, Chinese PLA General Hospital; Beijing Key Lab of Regenerative Medicine in Orthopedics; Key Laboratory of Musculoskeletal Trauma & War Injuries PLA; Beijing, China
- Co-innovation Center of Neuroregeneration; Nantong University, Nantong, Jiangsu Province, China
| | - Aijia Shang
- School of Medicine, Nankai University, Tianjin, China
- Department of Neurosurgery, Chinese PLA General Hospital, Beijing, China
| |
Collapse
|
19
|
Kernohan KD, Boycott KM. The expanding diagnostic toolbox for rare genetic diseases. Nat Rev Genet 2024; 25:401-415. [PMID: 38238519 DOI: 10.1038/s41576-023-00683-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/22/2023] [Indexed: 05/23/2024]
Abstract
Genomic technologies, such as targeted, exome and short-read genome sequencing approaches, have revolutionized the care of patients with rare genetic diseases. However, more than half of patients remain without a diagnosis. Emerging approaches from research-based settings such as long-read genome sequencing and optical genome mapping hold promise for improving the identification of disease-causal genetic variants. In addition, new omic technologies that measure the transcriptome, epigenome, proteome or metabolome are showing great potential for variant interpretation. As genetic testing options rapidly expand, the clinical community needs to be mindful of their individual strengths and limitations, as well as remaining challenges, to select the appropriate diagnostic test, correctly interpret results and drive innovation to address insufficiencies. If used effectively - through truly integrative multi-omics approaches and data sharing - the resulting large quantities of data from these established and emerging technologies will greatly improve the interpretative power of genetic and genomic diagnostics for rare diseases.
Collapse
Affiliation(s)
- Kristin D Kernohan
- CHEO Research Institute, University of Ottawa, Ottawa, ON, Canada
- Newborn Screening Ontario, CHEO, Ottawa, ON, Canada
| | - Kym M Boycott
- CHEO Research Institute, University of Ottawa, Ottawa, ON, Canada.
- Department of Genetics, CHEO, Ottawa, ON, Canada.
| |
Collapse
|
20
|
Guitart X, Porubsky D, Yoo D, Dougherty ML, Dishuck PC, Munson KM, Lewis AP, Hoekzema K, Knuth J, Chang S, Pastinen T, Eichler EE. Independent expansion, selection and hypervariability of the TBC1D3 gene family in humans. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.12.584650. [PMID: 38654825 PMCID: PMC11037872 DOI: 10.1101/2024.03.12.584650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
TBC1D3 is a primate-specific gene family that has expanded in the human lineage and has been implicated in neuronal progenitor proliferation and expansion of the frontal cortex. The gene family and its expression have been challenging to investigate because it is embedded in high-identity and highly variable segmental duplications. We sequenced and assembled the gene family using long-read sequencing data from 34 humans and 11 nonhuman primate species. Our analysis shows that this particular gene family has independently duplicated in at least five primate lineages, and the duplicated loci are enriched at sites of large-scale chromosomal rearrangements on chromosome 17. We find that most humans vary along two TBC1D3 clusters where human haplotypes are highly variable in copy number, differing by as many as 20 copies, and structure (structural heterozygosity 90%). We also show evidence of positive selection, as well as a significant change in the predicted human TBC1D3 protein sequence. Lastly, we find that, despite multiple duplications, human TBC1D3 expression is limited to a subset of copies and, most notably, from a single paralog group: TBC1D3-CDKL. These observations may help explain why a gene potentially important in cortical development can be so variable in the human population.
Collapse
Affiliation(s)
- Xavi Guitart
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Max L. Dougherty
- Tisch Cancer Institute, Division of Hematology and Medical Oncology, The Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Philip C. Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P. Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jordan Knuth
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Stephen Chang
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University, Stanford, CA, USA
| | - Tomi Pastinen
- Department of Pediatrics, Genomic Medicine Center, Children’s Mercy Kansas City, Kansas City, MO, USA
- Department of Pediatrics, School of Medicine, University of Missouri Kansas City, Kansas City, MO, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
21
|
Smail C, Ge B, Keever-Keigher MR, Schwendinger-Schreck C, Cheung W, Johnston JJ, Barrett C, Feldman K, Cohen AS, Farrow EG, Thiffault I, Grundberg E, Pastinen T. Complex trait associations in rare diseases and impacts on Mendelian variant interpretation. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.10.24301111. [PMID: 38260377 PMCID: PMC10802745 DOI: 10.1101/2024.01.10.24301111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Emerging evidence implicates common genetic variation - aggregated into polygenic scores (PGS) - impacting the onset and phenotypic presentation of rare diseases. In this study, we quantified individual polygenic liability for 1,151 previously published PGS in a cohort of 2,374 probands enrolled in the Genomic Answers for Kids (GA4K) rare disease study, revealing widespread associations between rare disease phenotypes and PGSs for common complex diseases and traits, blood protein levels, and brain and other organ morphological measurements. We observed increased polygenic burden in probands with variants of unknown significance (VUS) compared to unaffected carrier parents. We further observed an enrichment in overlap between diagnostic and candidate rare disease genes and large-effect PGS genes. Overall, our study supports and expands on previous findings of complex trait associations in rare disease phenotypes and provides a framework for identifying novel candidate rare disease genes and in understanding variable penetrance of candidate Mendelian disease variants.
Collapse
Affiliation(s)
- Craig Smail
- Genomic Medicine Center, Department of Pediatrics, Children’s Mercy Kansas City, Kansas City, MO, USA
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO, USA
| | - Bing Ge
- Department of Human Genetics, McGill University, Montreal, Canada
| | - Marissa R. Keever-Keigher
- Genomic Medicine Center, Department of Pediatrics, Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Carl Schwendinger-Schreck
- Genomic Medicine Center, Department of Pediatrics, Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Warren Cheung
- Genomic Medicine Center, Department of Pediatrics, Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Jeffrey J. Johnston
- Genomic Medicine Center, Department of Pediatrics, Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Cassandra Barrett
- Genomic Medicine Center, Department of Pediatrics, Children’s Mercy Kansas City, Kansas City, MO, USA
| | | | - Keith Feldman
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO, USA
- Health Outcomes and Health Services Research, Department of Pediatrics, Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Ana S.A. Cohen
- Genomic Medicine Center, Department of Pediatrics, Children’s Mercy Kansas City, Kansas City, MO, USA
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO, USA
- Department of Pathology and Laboratory Medicine, Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Emily G. Farrow
- Genomic Medicine Center, Department of Pediatrics, Children’s Mercy Kansas City, Kansas City, MO, USA
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO, USA
- Department of Pediatrics, Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Isabelle Thiffault
- Genomic Medicine Center, Department of Pediatrics, Children’s Mercy Kansas City, Kansas City, MO, USA
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO, USA
- Department of Pathology and Laboratory Medicine, Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Elin Grundberg
- Genomic Medicine Center, Department of Pediatrics, Children’s Mercy Kansas City, Kansas City, MO, USA
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO, USA
| | - Tomi Pastinen
- Genomic Medicine Center, Department of Pediatrics, Children’s Mercy Kansas City, Kansas City, MO, USA
- UKMC School of Medicine, University of Missouri Kansas City, Kansas City, MO, USA
| |
Collapse
|
22
|
Damaraju N, Miller AL, Miller DE. Long-Read DNA and RNA Sequencing to Streamline Clinical Genetic Testing and Reduce Barriers to Comprehensive Genetic Testing. J Appl Lab Med 2024; 9:138-150. [PMID: 38167773 DOI: 10.1093/jalm/jfad107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 10/24/2023] [Indexed: 01/05/2024]
Abstract
BACKGROUND Obtaining a precise molecular diagnosis through clinical genetic testing provides information about disease prognosis or progression, allows accurate counseling about recurrence risk, and empowers individuals to benefit from precision therapies or take part in N-of-1 trials. Unfortunately, more than half of individuals with a suspected Mendelian condition remain undiagnosed after a comprehensive clinical evaluation, and the results of any individual clinical genetic test ordered during a typical evaluation may take weeks or months to return. Furthermore, commonly used technologies, such as short-read sequencing, are limited in the types of disease-causing variation they can identify. New technologies, such as long-read sequencing (LRS), are poised to solve these problems. CONTENT Recent technical advances have improved accuracy, increased throughput, and decreased the costs of commercially available LRS technologies. This has resolved many historical concerns about the use of LRS in the clinical environment and opened the door to widespread clinical adoption of LRS. Here, we review LRS technology, how it has been used in the research setting to clarify complex variants or identify disease-causing variation missed by prior clinical testing, and how it may be used clinically in the near future. SUMMARY LRS is unique in that, as a single data source, it has the potential to replace nearly every other clinical genetic test offered today. When analyzed in a stepwise fashion, LRS will simplify laboratory processes, reduce barriers to comprehensive genetic testing, increase the rate of genetic diagnoses, and shorten the amount of time required to make a molecular diagnosis.
Collapse
Affiliation(s)
- Nikhita Damaraju
- Institute for Public Health Genetics, University of Washington, Seattle, WA 98195, United States
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, United States
| | - Angela L Miller
- Department of Pediatrics, University of Washington, Seattle, WA 98195, United States
| | - Danny E Miller
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, United States
- Department of Pediatrics, University of Washington, Seattle, WA 98195, United States
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, United States
| |
Collapse
|
23
|
Ni P, Nie F, Zhong Z, Xu J, Huang N, Zhang J, Zhao H, Zou Y, Huang Y, Li J, Xiao CL, Luo F, Wang J. DNA 5-methylcytosine detection and methylation phasing using PacBio circular consensus sequencing. Nat Commun 2023; 14:4054. [PMID: 37422489 PMCID: PMC10329642 DOI: 10.1038/s41467-023-39784-9] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Accepted: 06/22/2023] [Indexed: 07/10/2023] Open
Abstract
Long single-molecular sequencing technologies, such as PacBio circular consensus sequencing (CCS) and nanopore sequencing, are advantageous in detecting DNA 5-methylcytosine in CpGs (5mCpGs), especially in repetitive genomic regions. However, existing methods for detecting 5mCpGs using PacBio CCS are less accurate and robust. Here, we present ccsmeth, a deep-learning method to detect DNA 5mCpGs using CCS reads. We sequence polymerase-chain-reaction treated and M.SssI-methyltransferase treated DNA of one human sample using PacBio CCS for training ccsmeth. Using long (≥10 Kb) CCS reads, ccsmeth achieves 0.90 accuracy and 0.97 Area Under the Curve on 5mCpG detection at single-molecule resolution. At the genome-wide site level, ccsmeth achieves >0.90 correlations with bisulfite sequencing and nanopore sequencing using only 10× reads. Furthermore, we develop a Nextflow pipeline, ccsmethphase, to detect haplotype-aware methylation using CCS reads, and then sequence a Chinese family trio to validate it. ccsmeth and ccsmethphase can be robust and accurate tools for detecting DNA 5-methylcytosines.
Collapse
Affiliation(s)
- Peng Ni
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Xiangjiang Laboratory, Changsha, 410205, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Fan Nie
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Xiangjiang Laboratory, Changsha, 410205, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Zeyu Zhong
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Jinrui Xu
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Neng Huang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Jun Zhang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Haochen Zhao
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - You Zou
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Yuanfeng Huang
- Bioinformatics Center, National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, 410000, China
| | - Jinchen Li
- Bioinformatics Center, National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, 410000, China
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, 410000, China
| | - Chuan-Le Xiao
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, #7 Jinsui Road, Tianhe District, Guangzhou, China.
| | - Feng Luo
- School of Computing, Clemson University, Clemson, SC, 29634-0974, USA.
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
- Xiangjiang Laboratory, Changsha, 410205, China.
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China.
| |
Collapse
|