51
|
Araki R, Suga T, Hoki Y, Imadome K, Sunayama M, Kamimura S, Fujita M, Abe M. iPS cell generation-associated point mutations include many C > T substitutions via different cytosine modification mechanisms. Nat Commun 2024; 15:4946. [PMID: 38862540 PMCID: PMC11166658 DOI: 10.1038/s41467-024-49335-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 05/31/2024] [Indexed: 06/13/2024] Open
Abstract
Genomic aberrations are a critical impediment for the safe medical use of iPSCs and their origin and developmental mechanisms remain unknown. Here we find through WGS analysis of human and mouse iPSC lines that genomic mutations are de novo events and that, in addition to unmodified cytosine base prone to deamination, the DNA methylation sequence CpG represents a significant mutation-prone site. CGI and TSS regions show increased mutations in iPSCs and elevated mutations are observed in retrotransposons, especially in the AluY subfamily. Furthermore, increased cytosine to thymine mutations are observed in differentially methylated regions. These results indicate that in addition to deamination of cytosine, demethylation of methylated cytosine, which plays a central role in genome reprogramming, may act mutagenically during iPSC generation.
Collapse
Affiliation(s)
- Ryoko Araki
- Stem Cell Biology Team, Institute for Quantum Life Science, National Institutes for Quantum Science and Technology, Chiba, Japan.
- Department of Radiation Regulatory Science Research, Institute for Radiological Science, National Institutes for Quantum Science and Technology, Chiba, Japan.
| | - Tomo Suga
- Stem Cell Biology Team, Institute for Quantum Life Science, National Institutes for Quantum Science and Technology, Chiba, Japan
- Department of Radiation Regulatory Science Research, Institute for Radiological Science, National Institutes for Quantum Science and Technology, Chiba, Japan
| | - Yuko Hoki
- Stem Cell Biology Team, Institute for Quantum Life Science, National Institutes for Quantum Science and Technology, Chiba, Japan
- Department of Radiation Regulatory Science Research, Institute for Radiological Science, National Institutes for Quantum Science and Technology, Chiba, Japan
| | - Kaori Imadome
- Stem Cell Biology Team, Institute for Quantum Life Science, National Institutes for Quantum Science and Technology, Chiba, Japan
- Department of Radiation Regulatory Science Research, Institute for Radiological Science, National Institutes for Quantum Science and Technology, Chiba, Japan
| | - Misato Sunayama
- Stem Cell Biology Team, Institute for Quantum Life Science, National Institutes for Quantum Science and Technology, Chiba, Japan
- Department of Radiation Regulatory Science Research, Institute for Radiological Science, National Institutes for Quantum Science and Technology, Chiba, Japan
| | - Satoshi Kamimura
- Stem Cell Biology Team, Institute for Quantum Life Science, National Institutes for Quantum Science and Technology, Chiba, Japan
- Department of Radiation Regulatory Science Research, Institute for Radiological Science, National Institutes for Quantum Science and Technology, Chiba, Japan
| | - Mayumi Fujita
- Stem Cell Biology Team, Institute for Quantum Life Science, National Institutes for Quantum Science and Technology, Chiba, Japan
- Department of Radiation Regulatory Science Research, Institute for Radiological Science, National Institutes for Quantum Science and Technology, Chiba, Japan
| | - Masumi Abe
- Institute for Quantum Medical Science, National Institutes for Quantum Science and Technology, Chiba, Japan.
| |
Collapse
|
52
|
Kuroki Y, Hattori A, Matsubara K, Fukami M. Long-read next-generation sequencing for molecular diagnosis of pediatric endocrine disorders. Ann Pediatr Endocrinol Metab 2024; 29:156-160. [PMID: 38956752 PMCID: PMC11220396 DOI: 10.6065/apem.2448028.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 04/23/2024] [Indexed: 07/04/2024] Open
Abstract
Recent advances in long-read next-generation sequencing (NGS) have enabled researchers to identify several pathogenic variants overlooked by short-read NGS, array-based comparative genomic hybridization, and other conventional methods. Long-read NGS is particularly useful in the detection of structural variants and repeat expansions. Furthermore, it can be used for mutation screening in difficultto- sequence regions, as well as for DNA-methylation analyses and haplotype phasing. This mini-review introduces the usefulness of long-read NGS in the molecular diagnosis of pediatric endocrine disorders.
Collapse
Affiliation(s)
- Yoko Kuroki
- Division of Diversity Research, National Research Institute for Child Health and Development, Tokyo, Japan
- Department of Genome Medicine, National Research Institute for Child Health and Development, Tokyo, Japan
| | - Atsushi Hattori
- Division of Diversity Research, National Research Institute for Child Health and Development, Tokyo, Japan
- Department of Molecular Endocrinology, National Research Institute for Child Health and Development, Tokyo, Japan
| | - Keiko Matsubara
- Division of Diversity Research, National Research Institute for Child Health and Development, Tokyo, Japan
- Department of Molecular Endocrinology, National Research Institute for Child Health and Development, Tokyo, Japan
| | - Maki Fukami
- Division of Diversity Research, National Research Institute for Child Health and Development, Tokyo, Japan
- Department of Molecular Endocrinology, National Research Institute for Child Health and Development, Tokyo, Japan
| |
Collapse
|
53
|
Nanda AS, Wu K, Irkliyenko I, Woo B, Ostrowski MS, Clugston AS, Sayles LC, Xu L, Satpathy AT, Nguyen HG, Alejandro Sweet-Cordero E, Goodarzi H, Kasinathan S, Ramani V. Direct transposition of native DNA for sensitive multimodal single-molecule sequencing. Nat Genet 2024; 56:1300-1309. [PMID: 38724748 PMCID: PMC11176058 DOI: 10.1038/s41588-024-01748-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 04/08/2024] [Indexed: 05/23/2024]
Abstract
Concurrent readout of sequence and base modifications from long unamplified DNA templates by Pacific Biosciences of California (PacBio) single-molecule sequencing requires large amounts of input material. Here we adapt Tn5 transposition to introduce hairpin oligonucleotides and fragment (tagment) limiting quantities of DNA for generating PacBio-compatible circular molecules. We developed two methods that implement tagmentation and use 90-99% less input than current protocols: (1) single-molecule real-time sequencing by tagmentation (SMRT-Tag), which allows detection of genetic variation and CpG methylation; and (2) single-molecule adenine-methylated oligonucleosome sequencing assay by tagmentation (SAMOSA-Tag), which uses exogenous adenine methylation to add a third channel for probing chromatin accessibility. SMRT-Tag of 40 ng or more human DNA (approximately 7,000 cell equivalents) yielded data comparable to gold standard whole-genome and bisulfite sequencing. SAMOSA-Tag of 30,000-50,000 nuclei resolved single-fiber chromatin structure, CTCF binding and DNA methylation in patient-derived prostate cancer xenografts and uncovered metastasis-associated global epigenome disorganization. Tagmentation thus promises to enable sensitive, scalable and multimodal single-molecule genomics for diverse basic and clinical applications.
Collapse
Affiliation(s)
- Arjun S Nanda
- Gladstone Institute for Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
| | - Ke Wu
- Gladstone Institute for Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA
| | - Iryna Irkliyenko
- Gladstone Institute for Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA
| | - Brian Woo
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Helen-Diller Cancer Center, San Francisco, CA, USA
| | - Megan S Ostrowski
- Gladstone Institute for Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA
| | - Andrew S Clugston
- Helen-Diller Cancer Center, San Francisco, CA, USA
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA, USA
| | - Leanne C Sayles
- Helen-Diller Cancer Center, San Francisco, CA, USA
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA, USA
| | - Lingru Xu
- Helen-Diller Cancer Center, San Francisco, CA, USA
| | - Ansuman T Satpathy
- Department of Pathology, Stanford University, Stanford, CA, USA
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA
- Gladstone-University of California, San Francisco Institute for Genomic Immunology, Gladstone Institutes, San Francisco, CA, USA
| | - Hao G Nguyen
- Helen-Diller Cancer Center, San Francisco, CA, USA
| | - E Alejandro Sweet-Cordero
- Helen-Diller Cancer Center, San Francisco, CA, USA
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA, USA
| | - Hani Goodarzi
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Helen-Diller Cancer Center, San Francisco, CA, USA
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, San Francisco, CA, USA
| | - Sivakanthan Kasinathan
- Gladstone-University of California, San Francisco Institute for Genomic Immunology, Gladstone Institutes, San Francisco, CA, USA.
- Division of Rheumatology, Department of Pediatrics, Stanford University, Stanford, CA, USA.
| | - Vijay Ramani
- Gladstone Institute for Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA.
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA.
- Helen-Diller Cancer Center, San Francisco, CA, USA.
- Bakar Computational Health Sciences Institute, San Francisco, CA, USA.
| |
Collapse
|
54
|
Zhou H, Su X, Song B. ACMGA: a reference-free multiple-genome alignment pipeline for plant species. BMC Genomics 2024; 25:515. [PMID: 38796435 PMCID: PMC11127342 DOI: 10.1186/s12864-024-10430-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 05/20/2024] [Indexed: 05/28/2024] Open
Abstract
BACKGROUND The short-read whole-genome sequencing (WGS) approach has been widely applied to investigate the genomic variation in the natural populations of many plant species. With the rapid advancements in long-read sequencing and genome assembly technologies, high-quality genome sequences are available for a group of varieties for many plant species. These genome sequences are expected to help researchers comprehensively investigate any type of genomic variants that are missed by the WGS technology. However, multiple genome alignment (MGA) tools designed by the human genome research community might be unsuitable for plant genomes. RESULTS To fill this gap, we developed the AnchorWave-Cactus Multiple Genome Alignment (ACMGA) pipeline, which improved the alignment of repeat elements and could identify long (> 50 bp) deletions or insertions (INDELs). We conducted MGA using ACMGA and Cactus for 8 Arabidopsis (Arabidopsis thaliana) and 26 Maize (Zea mays) de novo assembled genome sequences and compared them with the previously published short-read variant calling results. MGA identified more single nucleotide variants (SNVs) and long INDELs than did previously published WGS variant callings. Additionally, ACMGA detected significantly more SNVs and long INDELs in repetitive regions and the whole genome than did Cactus. Compared with the results of Cactus, the results of ACMGA were more similar to the previously published variants called using short-read. These two MGA pipelines identified numerous multi-allelic variants that were missed by the WGS variant calling pipeline. CONCLUSIONS Aligning de novo assembled genome sequences could identify more SNVs and INDELs than mapping short-read. ACMGA combines the advantages of AnchorWave and Cactus and offers a practical solution for plant MGA by integrating global alignment, a 2-piece-affine-gap cost strategy, and the progressive MGA algorithm.
Collapse
Affiliation(s)
- Huafeng Zhou
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong, 266071, China
- National Key Laboratory of Wheat Improvement, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agriculture Sciences in Weifang, Weifang, Shandong, 261325, China
| | - Xiaoquan Su
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong, 266071, China.
| | - Baoxing Song
- National Key Laboratory of Wheat Improvement, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agriculture Sciences in Weifang, Weifang, Shandong, 261325, China.
- Key Laboratory of Maize Biology and Genetic Breeding in Arid Area of Northwest Region of the Ministry of Agriculture, College of Agronomy, Northwest A&F University, Yangling, Shaanxi, 712100, China.
| |
Collapse
|
55
|
Kumari P, Kaur M, Dindhoria K, Ashford B, Amarasinghe SL, Thind AS. Advances in long-read single-cell transcriptomics. Hum Genet 2024:10.1007/s00439-024-02678-x. [PMID: 38787419 DOI: 10.1007/s00439-024-02678-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 05/07/2024] [Indexed: 05/25/2024]
Abstract
Long-read single-cell transcriptomics (scRNA-Seq) is revolutionizing the way we profile heterogeneity in disease. Traditional short-read scRNA-Seq methods are limited in their ability to provide complete transcript coverage, resolve isoforms, and identify novel transcripts. The scRNA-Seq protocols developed for long-read sequencing platforms overcome these limitations by enabling the characterization of full-length transcripts. Long-read scRNA-Seq techniques initially suffered from comparatively poor accuracy compared to short read scRNA-Seq. However, with improvements in accuracy, accessibility, and cost efficiency, long-reads are gaining popularity in the field of scRNA-Seq. This review details the advances in long-read scRNA-Seq, with an emphasis on library preparation protocols and downstream bioinformatics analysis tools.
Collapse
Affiliation(s)
- Pallawi Kumari
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Manmeet Kaur
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Kiran Dindhoria
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Bruce Ashford
- Illawarra Shoalhaven Local Health District (ISLHD), NSW Health, Wollongong, NSW, Australia
| | - Shanika L Amarasinghe
- Monash Biomedical Discovery Institute, Monash University, Clayton, VIC, 3800, Australia
- Walter and Eliza Hall Institute of Medical Research, 1G, Royal Parade, Parkville, VIC, 3025, Australia
| | - Amarinder Singh Thind
- Illawarra Shoalhaven Local Health District (ISLHD), NSW Health, Wollongong, NSW, Australia.
- The School of Chemistry and Molecular Bioscience (SCMB), University of Wollongong, Loftus St, Wollongong, NSW, 2500, Australia.
| |
Collapse
|
56
|
Chao KH, Heinz JM, Hoh C, Mao A, Shumate A, Pertea M, Salzberg SL. Combining DNA and protein alignments to improve genome annotation with LiftOn. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.16.593026. [PMID: 38798552 PMCID: PMC11118573 DOI: 10.1101/2024.05.16.593026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
As the number and variety of assembled genomes continues to grow, the number of annotated genomes is falling behind, particularly for eukaryotes. DNA-based mapping tools help to address this challenge, but they are only able to transfer annotation between closely-related species. Here we introduce LiftOn, a homology-based software tool that integrates DNA and protein alignments to enhance the accuracy of genome-scale annotation and to allow mapping between relatively distant species. LiftOn's protein-centric algorithm considers both types of alignments, chooses optimal open reading frames, resolves overlapping gene loci, and finds additional gene copies where they exist. LiftOn can reliably transfer annotation between genomes representing members of the same species, as we demonstrate on human, mouse, honey bee, rice, and Arabidopsis thaliana. It can further map annotation effectively across species pairs as far apart as mouse and rat or Drosophila melanogaster and D. erecta.
Collapse
Affiliation(s)
- Kuan-Hao Chao
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jakob M. Heinz
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Celine Hoh
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Alan Mao
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Alaina Shumate
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Mihaela Pertea
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Steven L Salzberg
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21211, USA
| |
Collapse
|
57
|
Ji CM, Feng XY, Huang YW, Chen RA. The Applications of Nanopore Sequencing Technology in Animal and Human Virus Research. Viruses 2024; 16:798. [PMID: 38793679 PMCID: PMC11125791 DOI: 10.3390/v16050798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 05/07/2024] [Accepted: 05/13/2024] [Indexed: 05/26/2024] Open
Abstract
In recent years, an increasing number of viruses have triggered outbreaks that pose a severe threat to both human and animal life, as well as caused substantial economic losses. It is crucial to understand the genomic structure and epidemiology of these viruses to guide effective clinical prevention and treatment strategies. Nanopore sequencing, a third-generation sequencing technology, has been widely used in genomic research since 2014. This technology offers several advantages over traditional methods and next-generation sequencing (NGS), such as the ability to generate ultra-long reads, high efficiency, real-time monitoring and analysis, portability, and the ability to directly sequence RNA or DNA molecules. As a result, it exhibits excellent applicability and flexibility in virus research, including viral detection and surveillance, genome assembly, the discovery of new variants and novel viruses, and the identification of chemical modifications. In this paper, we provide a comprehensive review of the development, principles, advantages, and applications of nanopore sequencing technology in animal and human virus research, aiming to offer fresh perspectives for future studies in this field.
Collapse
Affiliation(s)
- Chun-Miao Ji
- Zhaoqing Branch Center of Guangdong Laboratory for Lingnan Modern Agricultural Science and Technology, Zhaoqing 526238, China; (C.-M.J.); (X.-Y.F.)
| | - Xiao-Yin Feng
- Zhaoqing Branch Center of Guangdong Laboratory for Lingnan Modern Agricultural Science and Technology, Zhaoqing 526238, China; (C.-M.J.); (X.-Y.F.)
| | - Yao-Wei Huang
- College of Veterinary Medicine, South China Agricultural University, Guangzhou 510642, China;
- Department of Veterinary Medicine, Zhejiang University, Hangzhou 310058, China
| | - Rui-Ai Chen
- Zhaoqing Branch Center of Guangdong Laboratory for Lingnan Modern Agricultural Science and Technology, Zhaoqing 526238, China; (C.-M.J.); (X.-Y.F.)
- College of Veterinary Medicine, South China Agricultural University, Guangzhou 510642, China;
| |
Collapse
|
58
|
Su Y, Yu Z, Jin S, Ai Z, Yuan R, Chen X, Xue Z, Guo Y, Chen D, Liang H, Liu Z, Liu W. Comprehensive assessment of mRNA isoform detection methods for long-read sequencing data. Nat Commun 2024; 15:3972. [PMID: 38730241 PMCID: PMC11087464 DOI: 10.1038/s41467-024-48117-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 04/19/2024] [Indexed: 05/12/2024] Open
Abstract
The advancement of Long-Read Sequencing (LRS) techniques has significantly increased the length of sequencing to several kilobases, thereby facilitating the identification of alternative splicing events and isoform expressions. Recently, numerous computational tools for isoform detection using long-read sequencing data have been developed. Nevertheless, there remains a deficiency in comparative studies that systemically evaluate the performance of these tools, which are implemented with different algorithms, under various simulations that encompass potential influencing factors. In this study, we conducted a benchmark analysis of thirteen methods implemented in nine tools capable of identifying isoform structures from long-read RNA-seq data. We evaluated their performances using simulated data, which represented diverse sequencing platforms generated by an in-house simulator, RNA sequins (sequencing spike-ins) data, as well as experimental data. Our findings demonstrate IsoQuant as a highly effective tool for isoform detection with LRS, with Bambu and StringTie2 also exhibiting strong performance. These results offer valuable guidance for future research on alternative splicing analysis and the ongoing improvement of tools for isoform detection using LRS data.
Collapse
Affiliation(s)
- Yaqi Su
- Department of Orthopedic Surgery of the Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
| | - Zhejian Yu
- Department of Orthopedic Surgery of the Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Siqian Jin
- Department of Orthopedic Surgery of the Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Zhipeng Ai
- Division of Human Reproduction and Developmental Genetics, Women's Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310006, Zhejiang, China
| | - Ruihong Yuan
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Xinyi Chen
- Department of Orthopedic Surgery of the Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Ziwei Xue
- Department of Orthopedic Surgery of the Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Yixin Guo
- Department of Orthopedic Surgery of the Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Di Chen
- Center for Reproductive Medicine of the Second Affiliated Hospital Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China
- Centre for Regeneration and Cell Therapy of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Hongqing Liang
- Division of Human Reproduction and Developmental Genetics, Women's Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310006, Zhejiang, China
| | - Zuozhu Liu
- Zhejiang University-Angel Align Inc. R&D Center for Intelligent Healthcare, Zhejiang University-University of Illinois at Urbana-Champaign Institute (ZJU-UIUC Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Wanlu Liu
- Department of Orthopedic Surgery of the Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China.
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China.
- Future Health Laboratory, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314100, China.
- Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
59
|
Haer-Wigman L, den Ouden A, Derks R, van Genderen MM, Lugtenberg D, Verheij J, Vijzelaar R, Yntema HG, Vissers LELM, Neveling K. Reply to: Pitfalls in the genetic testing of the OPN1LW-OPN1MW gene cluster in human subjects. NPJ Genom Med 2024; 9:29. [PMID: 38704388 PMCID: PMC11069539 DOI: 10.1038/s41525-024-00409-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 03/13/2024] [Indexed: 05/06/2024] Open
Affiliation(s)
- Lonneke Haer-Wigman
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands.
- Research Institute for Medical Innovation, Radboud University Medical Center, Nijmegen, The Netherlands.
| | - Amber den Ouden
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Ronny Derks
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Maria M van Genderen
- Bartiméus Diagnostic Center for complex visual disorders, Zeist, the Netherlands
- Department of Ophthalmology, University Medical Centre Utrecht, Utrecht, the Netherlands
| | - Dorien Lugtenberg
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Research Institute for Medical Innovation, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Joke Verheij
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | | | - Helger G Yntema
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Research Institute for Medical Innovation, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Lisenka E L M Vissers
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Research Institute for Medical Innovation, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Kornelia Neveling
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Research Institute for Medical Innovation, Radboud University Medical Center, Nijmegen, The Netherlands
| |
Collapse
|
60
|
Anantharam R, Duchen D, Cox AL, Timp W, Thomas DL, Clipman SJ, Kandathil AJ. Long-Read Nanopore-Based Sequencing of Anelloviruses. Viruses 2024; 16:723. [PMID: 38793605 PMCID: PMC11125752 DOI: 10.3390/v16050723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 04/27/2024] [Accepted: 04/30/2024] [Indexed: 05/26/2024] Open
Abstract
Routinely used metagenomic next-generation sequencing (mNGS) techniques often fail to detect low-level viremia (<104 copies/mL) and appear biased towards viruses with linear genomes. These limitations hinder the capacity to comprehensively characterize viral infections, such as those attributed to the Anelloviridae family. These near ubiquitous non-pathogenic components of the human virome have circular single-stranded DNA genomes that vary in size from 2.0 to 3.9 kb and exhibit high genetic diversity. Hence, species identification using short reads can be challenging. Here, we introduce a rolling circle amplification (RCA)-based metagenomic sequencing protocol tailored for circular single-stranded DNA genomes, utilizing the long-read Oxford Nanopore platform. The approach was assessed by sequencing anelloviruses in plasma drawn from people who inject drugs (PWID) in two geographically distinct cohorts. We detail the methodological adjustments implemented to overcome difficulties inherent in sequencing circular genomes and describe a computational pipeline focused on anellovirus detection. We assessed our protocol across various sample dilutions and successfully differentiated anellovirus sequences in conditions simulating mixed infections. This method provides a robust framework for the comprehensive characterization of circular viruses within the human virome using the Oxford Nanopore.
Collapse
Affiliation(s)
- Raghavendran Anantharam
- Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; (R.A.)
| | - Dylan Duchen
- Center for Biomedical Data Science, Yale University School of Medicine, New Haven, CT 06511, USA;
- Department of Pathology, Yale University School of Medicine, New Haven, CT 06519, USA
| | - Andrea L. Cox
- Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; (R.A.)
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - David L. Thomas
- Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; (R.A.)
| | - Steven J. Clipman
- Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; (R.A.)
| | - Abraham J. Kandathil
- Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; (R.A.)
| |
Collapse
|
61
|
Su C, Chandradoss KR, Malachowski T, Boya R, Ryu HS, Brennand KJ, Phillips-Cremins JE. MASTR-seq: Multiplexed Analysis of Short Tandem Repeats with sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.29.591790. [PMID: 38746155 PMCID: PMC11092654 DOI: 10.1101/2024.04.29.591790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
More than 60 human disorders have been linked to unstable expansion of short tandem repeat (STR) tracts. STR length and the extent of DNA methylation is linked to disease pathology and can be mosaic in a cell type-specific manner in several repeat expansion disorders. Mosaic phenomenon have been difficult to study to date due to technical bias intrinsic to repeat sequences and the need for multi-modal measurements at single-allele resolution. Nanopore long-read sequencing accurately measures STR length and DNA methylation in the same single molecule but is cost prohibitive for studies assessing a target locus across multiple experimental conditions or patient samples. Here, we describe MASTR-seq, M ultiplexed A nalysis of S hort T andem R epeats, for cost-effective, high-throughput, accurate, multi-modal measurements of DNA methylation and STR genotype at single-allele resolution. MASTR-seq couples long-read sequencing, Cas9-mediated target enrichment, and PCR-free multiplexed barcoding to achieve a >ten-fold increase in on-target read mapping for 8-12 pooled samples in a single MinION flow cell. We provide a detailed experimental protocol and computational tools and present evidence that MASTR-seq quantifies tract length and DNA methylation status for CGG and CAG STR loci in normal-length and mutation-length human cell lines. The MASTR-seq protocol takes approximately eight days for experiments and one additional day for data processing and analyses. Key points We provide a protocol for MASTR-seq: M ultiplexed A nalysis of S hort T andem R epeats using Cas9-mediated target enrichment and PCR-free, multiplexed nanopore sequencing. MASTR-seq achieves a >10-fold increase in on-target read proportion for highly repetitive, technically inaccessible regions of the genome relevant for human health and disease.MASTR-seq allows for high-throughput, efficient, accurate, and cost-effective measurement of STR length and DNA methylation in the same single allele for up to 8-12 samples in parallel in one Nanopore MinION flow cell.
Collapse
|
62
|
Nicolas G. Lessons from genetic studies in Alzheimer disease. Rev Neurol (Paris) 2024; 180:368-377. [PMID: 38429159 DOI: 10.1016/j.neurol.2023.12.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 12/27/2023] [Indexed: 03/03/2024]
Abstract
Research on Alzheimer disease (AD) genetics has provided critical advances to the knowledge of AD pathophysiological mechanisms. The etiology of AD can be divided into monogenic (autosomal dominant inheritance) and complex (multifactorial determinism). In monogenic AD, recent advances mainly concern mutation-associated mechanisms, presymptomatic clinical studies, and the search for modifiers of ages of onset that are still ongoing. In complex AD, genetic factors can be further categorized into three classes: (i) the APOE-ɛ4 and ɛ2 common alleles that represent a category by themselves as they are both common and with a strong impact on AD risk; (ii) common variants with a modest effect, identified in genome-wide association studies (GWAS); and (iii) rare variants with a moderate-to-strong effect, identified in case-control sequencing studies. Regarding APOE, odds ratios, available in multiple ethnicities, can now be converted into penetrance curves, although such curves remain to be performed in diverse ethnicities. In addition, advances in the understanding of mechanisms have been recently reported and rare APOE variants add to the complexity. In the GWAS category, novel loci have been discovered thanks to larger studies, doubling the number of hits as compared to the previous reference meta-analysis. However, such modest risk factors cannot be used in the clinic, neither individually, nor in genetic risk scores. In the category of rare variants, two novel genes, ABCA1 and ATP8B4 now add to the three main ones, TREM2, SORL1, and ABCA7. The study of such rare variants suggests oligogenic inheritance in some families, as also suggested by digenic penetrance curves for SORL1 loss-of-function variants with APOE-ɛ4. Cumulate frequencies of definite (so-called) rare risk factors are 2.3% to 3.6% (depending on thresholds on odds ratios) in control databases and many more remain to be classified and identified, showing how important these risk factors may be as part of the complex determinism of AD. A better understanding of these rare risk factors and their combined effects on each other, with common variants, and with environmental factors, should allow for a prediction of AD risk and, eventually, preventive medicine. Taken together, most genetic determinants of AD, in monogenic and in complex forms, point toward the aggregation of Aβ as a pivotal triggering factor, such that targeting it may be efficient as prevention in at-risk individuals. The role of neuroinflammation, microglia, and Tau pathology modulation are important sources of research for disease modification.
Collapse
Affiliation(s)
- G Nicolas
- Univ Rouen Normandie, Normandie Univ, Inserm U1245 and CHU Rouen, Department of Genetics and CNRMAJ, 76000 Rouen, France.
| |
Collapse
|
63
|
Mascher M, Marone MP, Schreiber M, Stein N. Are cereal grasses a single genetic system? NATURE PLANTS 2024; 10:719-731. [PMID: 38605239 DOI: 10.1038/s41477-024-01674-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 03/17/2024] [Indexed: 04/13/2024]
Abstract
In 1993, a passionate and provocative call to arms urged cereal researchers to consider the taxon they study as a single genetic system and collaborate with each other. Since then, that group of scientists has seen their discipline blossom. In an attempt to understand what unity of genetic systems means and how the notion was borne out by later research, we survey the progress and prospects of cereal genomics: sequence assemblies, population-scale sequencing, resistance gene cloning and domestication genetics. Gene order may not be as extraordinarily well conserved in the grasses as once thought. Still, several recurring themes have emerged. The same ancestral molecular pathways defining plant architecture have been co-opted in the evolution of different cereal crops. Such genetic convergence as much as cross-fertilization of ideas between cereal geneticists has led to a rich harvest of genes that, it is hoped, will lead to improved varieties.
Collapse
Affiliation(s)
- Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research, Gatersleben, Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.
| | - Marina Püpke Marone
- Leibniz Institute of Plant Genetics and Crop Plant Research, Gatersleben, Germany
| | - Mona Schreiber
- University of Marburg, Department of Biology, Marburg, Germany
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research, Gatersleben, Germany.
- Martin Luther University Halle-Wittenberg, Halle (Saale), Germany.
| |
Collapse
|
64
|
Del Gobbo GF, Wang X, Couse M, Mackay L, Goldsmith C, Marshall AE, Liang Y, Lambert C, Zhang S, Dhillon H, Fanslow C, Rowell WJ, Marshall CR, Kernohan KD, Boycott KM. Long-read genome sequencing reveals a novel intronic retroelement insertion in NR5A1 associated with 46,XY differences of sexual development. Am J Med Genet A 2024; 194:e63522. [PMID: 38131126 DOI: 10.1002/ajmg.a.63522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 12/12/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023]
Abstract
Despite significant advancements in rare genetic disease diagnostics, many patients with rare genetic disease remain without a molecular diagnosis. Novel tools and methods are needed to improve the detection of disease-associated variants and understand the genetic basis of many rare diseases. Long-read genome sequencing provides improved sequencing in highly repetitive, homologous, and low-complexity regions, and improved assessment of structural variation and complex genomic rearrangements compared to short-read genome sequencing. As such, it is a promising method to explore overlooked genetic variants in rare diseases with a high suspicion of a genetic basis. We therefore applied PacBio HiFi sequencing in a large multi-generational family presenting with autosomal dominant 46,XY differences of sexual development (DSD), for whom extensive molecular testing over multiple decades had failed to identify a molecular diagnosis. This revealed a rare SINE-VNTR-Alu retroelement insertion in intron 4 of NR5A1, a gene in which loss-of-function variants are an established cause of 46,XY DSD. The insertion segregated among affected family members and was associated with loss-of-expression of alleles in cis, demonstrating a functional impact on NR5A1. This case highlights the power of long-read genome sequencing to detect genomic variants that have previously been intractable to detection by standard short-read genomic testing.
Collapse
Affiliation(s)
- Giulia F Del Gobbo
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Canada
| | - Xueqi Wang
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Canada
| | - Madeline Couse
- Centre for Computational Medicine, The Hospital for Sick Children, Toronto, Canada
| | - Layla Mackay
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Canada
- Department of Genetics, Children's Hospital of Eastern Ontario, Ottawa, Canada
| | - Claire Goldsmith
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Canada
- Department of Genetics, Children's Hospital of Eastern Ontario, Ottawa, Canada
| | - Aren E Marshall
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Canada
| | - Yijing Liang
- Centre for Computational Medicine, The Hospital for Sick Children, Toronto, Canada
| | | | - Siyuan Zhang
- PacBio of California, Inc, Menlo Park, California, USA
| | | | | | | | | | - Kristin D Kernohan
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Canada
- Newborn Screening Ontario, Ottawa, Canada
| | - Kym M Boycott
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Canada
- Department of Genetics, Children's Hospital of Eastern Ontario, Ottawa, Canada
| |
Collapse
|
65
|
Komoto T, Ikeo K, Yaguchi S, Yamamoto T, Sakamoto N, Awazu A. Assembly of continuous high-resolution draft genome sequence of Hemicentrotus pulcherrimus using long-read sequencing. Dev Growth Differ 2024; 66:297-304. [PMID: 38634255 PMCID: PMC11457506 DOI: 10.1111/dgd.12924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/13/2024] [Accepted: 04/02/2024] [Indexed: 04/19/2024]
Abstract
The update of the draft genome assembly of sea urchin, Hemicentrotus pulcherrimus, which is widely studied in East Asia as a model organism of early development, was performed using Oxford nanopore long-read sequencing. The updated assembly provided ~600-Mb genome sequences divided into 2,163 contigs with N50 = 516 kb. BUSCO completeness score and transcriptome model mapping ratio (TMMR) of the present assembly were obtained as 96.5% and 77.8%, respectively. These results were more continuous with higher resolution than those by the previous version of H. pulcherrimus draft genome, HpulGenome_v1, where the number of scaffolds = 16,251 with a total of ~100 Mb, N50 = 143 kb, BUSCO completeness score = 86.1%, and TMMR = 55.4%. The obtained genome contained 36,055 gene models that were consistent with those in other echinoderms. Additionally, two tandem repeat sequences of early histone gene locus containing 47 copies and 34 copies of all histone genes, and 185 of the homologous sequences of the interspecifically conserved region of the Ars insulator, ArsInsC, were obtained. These results provide further advance for genome-wide research of development, gene regulation, and intranuclear structural dynamics of multicellular organisms using H. pulcherrimus.
Collapse
Affiliation(s)
- Tetsushi Komoto
- Graduate School of Integrated Sciences for LifeHiroshima UniversityHigashi‐HiroshimaJapan
| | - Kazuho Ikeo
- Department of Genomics and Evolutionary BiologyNational Institute of GeneticsShizuokaJapan
| | | | - Takashi Yamamoto
- Graduate School of Integrated Sciences for LifeHiroshima UniversityHigashi‐HiroshimaJapan
- Research Center for the Mathematics on Chromatin Live DynamicsHiroshima UniversityHigashi‐HiroshimaJapan
| | - Naoaki Sakamoto
- Graduate School of Integrated Sciences for LifeHiroshima UniversityHigashi‐HiroshimaJapan
- Research Center for the Mathematics on Chromatin Live DynamicsHiroshima UniversityHigashi‐HiroshimaJapan
| | - Akinori Awazu
- Graduate School of Integrated Sciences for LifeHiroshima UniversityHigashi‐HiroshimaJapan
- Research Center for the Mathematics on Chromatin Live DynamicsHiroshima UniversityHigashi‐HiroshimaJapan
| |
Collapse
|
66
|
Kronzer VL, Sparks JA, Raychaudhuri S, Cerhan JR. Low-frequency and rare genetic variants associated with rheumatoid arthritis risk. Nat Rev Rheumatol 2024; 20:290-300. [PMID: 38538758 DOI: 10.1038/s41584-024-01096-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/20/2024] [Indexed: 04/28/2024]
Abstract
Rheumatoid arthritis (RA) has an estimated heritability of nearly 50%, which is particularly high in seropositive RA. HLA alleles account for a large proportion of this heritability, in addition to many common single-nucleotide polymorphisms with smaller individual effects. Low-frequency and rare variants, such as those captured by next-generation sequencing, can also have a large role in heritability in some individuals. Rare variant discovery has informed the development of drugs such as inhibitors of PCSK9 and Janus kinases. Some 34 low-frequency and rare variants are currently associated with RA risk. One variant (19:10352442G>C in TYK2) was identified in five separate studies, and might therefore represent a promising therapeutic target. Following a set of best practices in future studies, including studying diverse populations, using large sample sizes, validating RA and serostatus, replicating findings, adjusting for other variants and performing functional assessment, could help to ensure the relevance of identified variants. Exciting opportunities are now on the horizon for genetics in RA, including larger datasets and consortia, whole-genome sequencing and direct applications of findings in the management, and especially treatment, of RA.
Collapse
Affiliation(s)
| | - Jeffrey A Sparks
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Soumya Raychaudhuri
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - James R Cerhan
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
67
|
Bjørnstad PM, Aaløkken R, Åsheim J, Sundaram AYM, Felde CN, Østby GH, Dalland M, Sjursen W, Carrizosa C, Vigeland MD, Sorte HS, Sheng Y, Ariansen SL, Grindedal EM, Gilfillan GD. A 39 kb structural variant causing Lynch Syndrome detected by optical genome mapping and nanopore sequencing. Eur J Hum Genet 2024; 32:513-520. [PMID: 38030917 PMCID: PMC11061271 DOI: 10.1038/s41431-023-01494-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 10/19/2023] [Accepted: 11/06/2023] [Indexed: 12/01/2023] Open
Abstract
Lynch Syndrome (LS) is a hereditary cancer syndrome caused by pathogenic germline variants in one of the four mismatch repair (MMR) genes MLH1, MSH2, MSH6 and PMS2. It is characterized by a significantly increased risk of multiple cancer types, particularly colorectal and endometrial cancer, with autosomal dominant inheritance. Access to precise and sensitive methods for genetic testing is important, as early detection and prevention of cancer is possible when the variant is known. We present here two unrelated Norwegian families with family histories strongly suggestive of LS, where immunohistochemical and microsatellite instability analyses indicated presence of a pathogenic variant in MSH2, but targeted exon sequencing and multiplex ligation-dependent probe amplification (MLPA) were negative. Using Bionano optical genome mapping, we detected a 39 kb insertion in the MSH2 gene. Precise mapping of the insertion breakpoints and inserted sequence was performed by low-coverage whole-genome sequencing with an Oxford Nanopore MinION. The same variant was present in both families, and later found in other families from the same region of Norway, indicative of a founder event. To our knowledge, this is the first diagnosis of LS caused by a structural variant using these technologies. We suggest that structural variant detection be performed when LS is suspected but not confirmed with first-tier standard genetic testing.
Collapse
Affiliation(s)
- Pål Marius Bjørnstad
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Ragnhild Aaløkken
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - June Åsheim
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Arvind Y M Sundaram
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Caroline N Felde
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - G Henriette Østby
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Marianne Dalland
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Wenche Sjursen
- Department of Clinical & Molecular Medicine, NTNU and Department of Medical Genetics, St Olavs Hospital, Trondheim, Norway
| | - Christian Carrizosa
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Magnus D Vigeland
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
- Department of Forensic Sciences, Oslo University Hospital, 0372, Oslo, Norway
| | - Hanne S Sorte
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Ying Sheng
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Sarah L Ariansen
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Eli Marie Grindedal
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Gregor D Gilfillan
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway.
| |
Collapse
|
68
|
Espinosa E, Bautista R, Larrosa R, Plata O. Advancements in long-read genome sequencing technologies and algorithms. Genomics 2024; 116:110842. [PMID: 38608738 DOI: 10.1016/j.ygeno.2024.110842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/01/2024] [Accepted: 04/06/2024] [Indexed: 04/14/2024]
Abstract
The recent advent of long read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore technology (ONT), have led to substantial improvements in accuracy and computational cost in sequencing genomes. However, de novo whole-genome assembly still presents significant challenges related to the quality of the results. Pursuing de novo whole-genome assembly remains a formidable challenge, underscored by intricate considerations surrounding computational demands and result quality. As sequencing accuracy and throughput steadily advance, a continuous stream of innovative assembly tools floods the field. Navigating this dynamic landscape necessitates a reasonable choice of sequencing platform, depth, and assembly tools to orchestrate high-quality genome reconstructions. This comprehensive review delves into the intricate interplay between cutting-edge long read sequencing technologies, assembly methodologies, and the ever-evolving field of genomics. With a focus on addressing the pivotal challenges and harnessing the opportunities presented by these advancements, we provide an in-depth exploration of the crucial factors influencing the selection of optimal strategies for achieving robust and insightful genome assemblies.
Collapse
Affiliation(s)
- Elena Espinosa
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain.
| | - Rocio Bautista
- Supercomputing and Bioinnovation Center, University of Malaga, C. Severo Ochoa, 34, Malaga 29590, Spain.
| | - Rafael Larrosa
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain; Supercomputing and Bioinnovation Center, University of Malaga, C. Severo Ochoa, 34, Malaga 29590, Spain.
| | - Oscar Plata
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain.
| |
Collapse
|
69
|
Bilgrav Saether K, Eisfeldt J, Bengtsson J, Lun MY, Grochowski CM, Mahmoud M, Chao HT, Rosenfeld JA, Liu P, Schuy J, Ameur A, Hwang JP, Sedlazeck FJ, Bi W, Marom R, Nordgren A, Carvalho CMB, Lindstrand A. Mind the gap: the relevance of the genome reference to resolve rare and pathogenic inversions. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.22.24305780. [PMID: 38712270 PMCID: PMC11071548 DOI: 10.1101/2024.04.22.24305780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Both long-read genome sequencing (lrGS) and the recently published Telomere to Telomere (T2T) reference genome provide increased coverage and resolution across repetitive regions promising heightened structural variant detection and improved mapping. Inversions (INV), intrachromosomal segments which are rotated 180° and inserted back into the same chromosome, are a class of structural variants particularly challenging to detect due to their copy-number neutral state and association with repetitive regions. Inversions represent about 1/20 of all balanced structural chromosome aberrations and can lead to disease by gene disruption or altering regulatory regions of dosage sensitive genes in cis . Here we remapped the genome data from six individuals carrying unsolved cytogenetically detected inversions. An INV6 and INV10 were resolved using GRCh38 and T2T-CHM13. Finally, an INV9 required optical genome mapping, de novo assembly of lrGS data and T2T-CHM13. This inversion disrupted intron 25 of EHMT1, confirming a diagnosis of Kleefstra syndrome 1 (MIM#610253). These three inversions, only mappable in specific references, prompted us to investigate the presence and population frequencies of differential reference regions (DRRs) between T2T-CHM13, GRCh37, GRCh38, the chimpanzee and bonobo, and hundreds of megabases of DRRs were identified. Our results emphasize the significance of the chosen reference genome and the added benefits of lrGS and optical genome mapping in solving rearrangements in challenging regions of the genome. This is particularly important for inversions and may impact clinical diagnostics.
Collapse
|
70
|
Schloissnig S, Pani S, Rodriguez-Martin B, Ebler J, Hain C, Tsapalou V, Söylev A, Hüther P, Ashraf H, Prodanov T, Asparuhova M, Hunt S, Rausch T, Marschall T, Korbel JO. Long-read sequencing and structural variant characterization in 1,019 samples from the 1000 Genomes Project. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.18.590093. [PMID: 38659906 PMCID: PMC11042266 DOI: 10.1101/2024.04.18.590093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Structural variants (SVs) contribute significantly to human genetic diversity and disease 1-4 . Previously, SVs have remained incompletely resolved by population genomics, with short-read sequencing facing limitations in capturing the whole spectrum of SVs at nucleotide resolution 5-7 . Here we leveraged nanopore sequencing 8 to construct an intermediate coverage resource of 1,019 long-read genomes sampled within 26 human populations from the 1000 Genomes Project. By integrating linear and graph-based approaches for SV analysis via pangenome graph-augmentation, we uncover 167,291 sequence-resolved SVs in these samples, considerably advancing SV characterization compared to population-wide short-read sequencing studies 3,4 . Our analysis details diverse SV classes-deletions, duplications, insertions, and inversions-at population-scale. LINE-1 and SVA retrotransposition activities frequently mediate transductions 9,10 of unique sequences, with both mobile element classes transducing sequences at either the 3'- or 5'-end, depending on the source element locus. Furthermore, analyses of SV breakpoint junctions suggest a continuum of homology-mediated rearrangement processes are integral to SV formation, and highlight evidence for SV recurrence involving repeat sequences. Our open-access dataset underscores the transformative impact of long-read sequencing in advancing the characterisation of polymorphic genomic architectures, and provides a resource for guiding variant prioritisation in future long-read sequencing-based disease studies.
Collapse
|
71
|
Ten Berk de Boer E, Ameur A, Bunikis I, Ek M, Stattin EL, Feuk L, Eisfeldt J, Lindstrand A. Long-read sequencing and optical mapping generates near T2T assemblies that resolves a centromeric translocation. Sci Rep 2024; 14:9000. [PMID: 38637641 PMCID: PMC11026446 DOI: 10.1038/s41598-024-59683-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 04/13/2024] [Indexed: 04/20/2024] Open
Abstract
Long-read genome sequencing (lrGS) is a promising method in genetic diagnostics. Here we investigate the potential of lrGS to detect a disease-associated chromosomal translocation between 17p13 and the 19 centromere. We constructed two sets of phased and non-phased de novo assemblies; (i) based on lrGS only and (ii) hybrid assemblies combining lrGS with optical mapping using lrGS reads with a median coverage of 34X. Variant calling detected both structural variants (SVs) and small variants and the accuracy of the small variant calling was compared with those called with short-read genome sequencing (srGS). The de novo and hybrid assemblies had high quality and contiguity with N50 of 62.85 Mb, enabling a near telomere to telomere assembly with less than a 100 contigs per haplotype. Notably, we successfully identified the centromeric breakpoint of the translocation. A concordance of 92% was observed when comparing small variant calling between srGS and lrGS. In summary, our findings underscore the remarkable potential of lrGS as a comprehensive and accurate solution for the analysis of SVs and small variants. Thus, lrGS could replace a large battery of genetic tests that were used for the diagnosis of a single symptomatic translocation carrier, highlighting the potential of lrGS in the realm of digital karyotyping.
Collapse
Affiliation(s)
- Esmee Ten Berk de Boer
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76, Stockholm, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, 171 76, Stockholm, Sweden
- Science for Life Laboratory, Karolinska Institutet Science Park, 171 65, Solna, Sweden
| | - Adam Ameur
- Department of Immunology, Genetics and Pathology, Uppsala University, 752 36, Uppsala, Sweden
| | - Ignas Bunikis
- Department of Immunology, Genetics and Pathology, Uppsala University, 752 36, Uppsala, Sweden
| | - Marlene Ek
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76, Stockholm, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, 171 76, Stockholm, Sweden
| | - Eva-Lena Stattin
- Department of Immunology, Genetics and Pathology, Uppsala University, 752 36, Uppsala, Sweden
| | - Lars Feuk
- Department of Immunology, Genetics and Pathology, Uppsala University, 752 36, Uppsala, Sweden
| | - Jesper Eisfeldt
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76, Stockholm, Sweden.
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, 171 76, Stockholm, Sweden.
- Science for Life Laboratory, Karolinska Institutet Science Park, 171 65, Solna, Sweden.
| | - Anna Lindstrand
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76, Stockholm, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, 171 76, Stockholm, Sweden
| |
Collapse
|
72
|
Zhang S, Xu N, Fu L, Yang X, Li Y, Yang Z, Feng Y, Ma K, Jiang X, Han J, Hu R, Zhang L, de Gennaro L, Ryabov F, Meng D, He Y, Wu D, Yang C, Paparella A, Mao Y, Bian X, Lu Y, Antonacci F, Ventura M, Shepelev VA, Miga KH, Alexandrov IA, Logsdon GA, Phillippy AM, Su B, Zhang G, Eichler EE, Lu Q, Shi Y, Sun Q, Mao Y. Comparative genomics of macaques and integrated insights into genetic variation and population history. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.07.588379. [PMID: 38645259 PMCID: PMC11030432 DOI: 10.1101/2024.04.07.588379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
The crab-eating macaques ( Macaca fascicularis ) and rhesus macaques ( M. mulatta ) are widely studied nonhuman primates in biomedical and evolutionary research. Despite their significance, the current understanding of the complex genomic structure in macaques and the differences between species requires substantial improvement. Here, we present a complete genome assembly of a crab-eating macaque and 20 haplotype-resolved macaque assemblies to investigate the complex regions and major genomic differences between species. Segmental duplication in macaques is ∼42% lower, while centromeres are ∼3.7 times longer than those in humans. The characterization of ∼2 Mbp fixed genetic variants and ∼240 Mbp complex loci highlights potential associations with metabolic differences between the two macaque species (e.g., CYP2C76 and EHBP1L1 ). Additionally, hundreds of alternative splicing differences show post-transcriptional regulation divergence between these two species (e.g., PNPO ). We also characterize 91 large-scale genomic differences between macaques and humans at a single-base-pair resolution and highlight their impact on gene regulation in primate evolution (e.g., FOLH1 and PIEZO2 ). Finally, population genetics recapitulates macaque speciation and selective sweeps, highlighting potential genetic basis of reproduction and tail phenotype differences (e.g., STAB1 , SEMA3F , and HOXD13 ). In summary, the integrated analysis of genetic variation and population genetics in macaques greatly enhances our comprehension of lineage-specific phenotypes, adaptation, and primate evolution, thereby improving their biomedical applications in human diseases.
Collapse
|
73
|
Eisenhofer R, Nesme J, Santos-Bay L, Koziol A, Sørensen SJ, Alberdi A, Aizpurua O. A comparison of short-read, HiFi long-read, and hybrid strategies for genome-resolved metagenomics. Microbiol Spectr 2024; 12:e0359023. [PMID: 38451230 PMCID: PMC10986573 DOI: 10.1128/spectrum.03590-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 02/11/2024] [Indexed: 03/08/2024] Open
Abstract
Shotgun metagenomics enables the reconstruction of complex microbial communities at a high level of detail. Such an approach can be conducted using both short-read and long-read sequencing data, as well as a combination of both. To assess the pros and cons of these different approaches, we used 22 fecal DNA extracts collected weekly for 11 weeks from two respective lab mice to study seven performance metrics over four combinations of sequencing depth and technology: (i) 20 Gbp of Illumina short-read data, (ii) 40 Gbp of short-read data, (iii) 20 Gbp of PacBio HiFi long-read data, and (iv) 40 Gbp of hybrid (20 Gbp of short-read +20 Gbp of long-read) data. No strategy was best for all metrics; instead, each one excelled across different metrics. The long-read approach yielded the best assembly statistics, with the highest N50 and lowest number of contigs. The 40 Gbp short-read approach yielded the highest number of refined bins. Finally, the hybrid approach yielded the longest assemblies and the highest mapping rate to the bacterial genomes. Our results suggest that while long-read sequencing significantly improves the quality of reconstructed bacterial genomes, it is more expensive and requires deeper sequencing than short-read approaches to recover a comparable amount of reconstructed genomes. The most optimal strategy is study-specific and depends on how researchers assess the trade-off between the quantity and quality of recovered genomes.IMPORTANCEMice are an important model organism for understanding the gut microbiome. When studying these gut microbiomes using DNA techniques, researchers can choose from technologies that use short or long DNA reads. In this study, we perform an extensive benchmark between short- and long-read DNA sequencing for studying mice gut microbiomes. We find that no one approach was best for all metrics and provide information that can help guide researchers in planning their experiments.
Collapse
Affiliation(s)
- Raphael Eisenhofer
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Joseph Nesme
- Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Luisa Santos-Bay
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Adam Koziol
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Søren Johannes Sørensen
- Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Antton Alberdi
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Ostaizka Aizpurua
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
74
|
Krenn M, Wagner M, Zulehner G, Weng R, Jäger F, Keritam O, Sener M, Brücke C, Milenkovic I, Langer A, Buchinger D, Habersam R, Mayerhanser K, Brugger M, Brunet T, Jacob M, Graf E, Berutti R, Cetin H, Hoefele J, Winkelmann J, Zimprich F, Rath J. Next-generation sequencing and comprehensive data reassessment in 263 adult patients with neuromuscular disorders: insights into the gray zone of molecular diagnoses. J Neurol 2024; 271:1937-1946. [PMID: 38127101 PMCID: PMC10972933 DOI: 10.1007/s00415-023-12101-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 11/03/2023] [Accepted: 11/04/2023] [Indexed: 12/23/2023]
Abstract
BACKGROUND Neuromuscular disorders (NMDs) are heterogeneous conditions with a considerable fraction attributed to monogenic defects. Despite the advancements in genomic medicine, many patients remain without a diagnosis. Here, we investigate whether a comprehensive reassessment strategy improves the diagnostic outcomes. METHODS We analyzed 263 patients with NMD phenotypes that underwent diagnostic exome or genome sequencing at our tertiary referral center between 2015 and 2023. We applied a comprehensive reassessment encompassing variant reclassification, re-phenotyping and NGS data reanalysis. Multivariable logistic regression was performed to identify predictive factors associated with a molecular diagnosis. RESULTS Initially, a molecular diagnosis was identified in 53 cases (20%), while an additional 23 (9%) had findings of uncertain significance. Following comprehensive reassessment, the diagnostic yield increased to 23%, revealing 44 distinct monogenic etiologies. Reasons for newly obtained molecular diagnoses were variant reclassifications in 7 and NGS data reanalysis in 3 cases including one recently described disease-gene association (DNAJB4). Male sex reduced the odds of receiving a molecular diagnosis (OR 0.42; 95%CI 0.21-0.82), while a positive family history (OR 5.46; 95%CI 2.60-11.76) and a myopathy phenotype (OR 2.72; 95%CI 1.11-7.14) increased the likelihood. 7% were resolved through targeted genetic testing or classified as acquired etiologies. CONCLUSION Our findings reinforce the use of NGS in NMDs of suspected monogenic origin. We show that a comprehensive reassessment enhances diagnostic accuracy. However, one needs to be aware that genetic diagnoses are often made with uncertainty and can even be downgraded based on new evidence.
Collapse
Affiliation(s)
- Martin Krenn
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Matias Wagner
- Institute of Human Genetics, Klinikum Rechts Der Isar, School of Medicine, Technical University of Munich, Munich, Germany
- Institute of Neurogenomics, Helmholtz Zentrum München, Munich, Germany
| | - Gudrun Zulehner
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Rosa Weng
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Fiona Jäger
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Omar Keritam
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Merve Sener
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Christof Brücke
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Ivan Milenkovic
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Agnes Langer
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Dominic Buchinger
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Richard Habersam
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Katharina Mayerhanser
- Institute of Human Genetics, Klinikum Rechts Der Isar, School of Medicine, Technical University of Munich, Munich, Germany
| | - Melanie Brugger
- Institute of Human Genetics, Klinikum Rechts Der Isar, School of Medicine, Technical University of Munich, Munich, Germany
| | - Theresa Brunet
- Institute of Human Genetics, Klinikum Rechts Der Isar, School of Medicine, Technical University of Munich, Munich, Germany
- Department of Pediatric Neurology, Developmental Medicine and Social Pediatrics, Dr. Von Hauner's Children's Hospital, University of Munich, Munich, Germany
| | - Maureen Jacob
- Institute of Human Genetics, Klinikum Rechts Der Isar, School of Medicine, Technical University of Munich, Munich, Germany
| | - Elisabeth Graf
- Institute of Human Genetics, Klinikum Rechts Der Isar, School of Medicine, Technical University of Munich, Munich, Germany
| | - Riccardo Berutti
- Institute of Human Genetics, Klinikum Rechts Der Isar, School of Medicine, Technical University of Munich, Munich, Germany
- Institute of Neurogenomics, Helmholtz Zentrum München, Munich, Germany
| | - Hakan Cetin
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Julia Hoefele
- Institute of Human Genetics, Klinikum Rechts Der Isar, School of Medicine, Technical University of Munich, Munich, Germany
| | - Juliane Winkelmann
- Institute of Human Genetics, Klinikum Rechts Der Isar, School of Medicine, Technical University of Munich, Munich, Germany
- Institute of Neurogenomics, Helmholtz Zentrum München, Munich, Germany
| | - Fritz Zimprich
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Jakob Rath
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria.
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria.
| |
Collapse
|
75
|
Nicolas G. Recent advances in Alzheimer disease genetics. Curr Opin Neurol 2024; 37:154-165. [PMID: 38235704 DOI: 10.1097/wco.0000000000001242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
PURPOSE OF REVIEW Genetics studies provide important insights into Alzheimer disease (AD) etiology and mechanisms. Critical advances have been made recently, mainly thanks to the access to novel techniques and larger studies. RECENT FINDINGS In monogenic AD, progress has been made with a better understanding of the mechanisms associated with pathogenic variants and the input of clinical studies in presymptomatic individuals. In complex AD, increasing sample sizes in both DNA chip-based (genome-wide association studies, GWAS) and exome/genome sequencing case-control studies unveiled novel common and rare risk factors, while the understanding of their combined effect starts to suggest the existence of rare families with oligogenic inheritance of early-onset, nonmonogenic, AD. SUMMARY Most genetic risk factors with a known consequence designate the aggregation of the Aβ peptide as a core etiological factor in complex AD thus confirming that the research based on monogenic AD - where the amyloid cascade seems more straightforward - is relevant to complex AD as well. Novel mechanistic insights and risk factor studies unveiling novel factors and attempting to combine the effect of common and rare variants will offer promising perspectives for future AD prevention, at least regarding early-onset AD, and probably in case of later onset as well.
Collapse
Affiliation(s)
- Gaël Nicolas
- Univ Rouen Normandie, Normandie Univ, Inserm U1245 and CHU Rouen, Department of Genetics and CNRMAJ, F-76000 Rouen, France
| |
Collapse
|
76
|
Lee M, Ahmad SF, Xu J. Regulation and function of transposable elements in cancer genomes. Cell Mol Life Sci 2024; 81:157. [PMID: 38556602 PMCID: PMC10982106 DOI: 10.1007/s00018-024-05195-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 02/28/2024] [Accepted: 03/01/2024] [Indexed: 04/02/2024]
Abstract
Over half of human genomic DNA is composed of repetitive sequences generated throughout evolution by prolific mobile genetic parasites called transposable elements (TEs). Long disregarded as "junk" or "selfish" DNA, TEs are increasingly recognized as formative elements in genome evolution, wired intimately into the structure and function of the human genome. Advances in sequencing technologies and computational methods have ushered in an era of unprecedented insight into how TE activity impacts human biology in health and disease. Here we discuss the current views on how TEs have shaped the regulatory landscape of the human genome, how TE activity is implicated in human cancers, and how recent findings motivate novel strategies to leverage TE activity for improved cancer therapy. Given the crucial role of methodological advances in TE biology, we pair our conceptual discussions with an in-depth review of the inherent technical challenges in studying repeats, specifically related to structural variation, expression analyses, and chromatin regulation. Lastly, we provide a catalog of existing and emerging assays and bioinformatic software that altogether are enabling the most sophisticated and comprehensive investigations yet into the regulation and function of interspersed repeats in cancer genomes.
Collapse
Affiliation(s)
- Michael Lee
- Department of Pediatrics, Children's Medical Center Research Institute, University of Texas Southwestern Medical Center, 6000 Harry Hines Blvd., Dallas, TX, 75390, USA.
| | - Syed Farhan Ahmad
- Department of Pathology, Center of Excellence for Leukemia Studies, St. Jude Children's Research Hospital, 262 Danny Thomas Place - MS 345, Memphis, TN, 38105, USA
| | - Jian Xu
- Department of Pathology, Center of Excellence for Leukemia Studies, St. Jude Children's Research Hospital, 262 Danny Thomas Place - MS 345, Memphis, TN, 38105, USA.
| |
Collapse
|
77
|
Kokot M, Dehghannasiri R, Baharav T, Salzman J, Deorowicz S. SPLASH2 provides ultra-efficient, scalable, and unsupervised discovery on raw sequencing reads. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.03.17.533189. [PMID: 36993432 PMCID: PMC10055302 DOI: 10.1101/2023.03.17.533189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
Abstract
SPLASH is an unsupervised, reference-free, and unifying algorithm that discovers regulated sequence variation through statistical analysis of k-mer composition, subsuming many application-specific methods. Here, we introduce SPLASH2, a fast, scalable implementation of SPLASH based on an efficient k-mer counting approach. SPLASH2 enables rapid analysis of massive datasets from a wide range of sequencing technologies and biological contexts, delivering unparalleled scale and speed. The SPLASH2 algorithm unveils new biology (without tuning) in single-cell RNA-sequencing data from human muscle cells, as well as bulk RNA-seq from the entire Cancer Cell Line Encyclopedia (CCLE), including substantial unannotated alternative splicing in cancer transcriptome. The same untuned SPLASH2 algorithm recovers the BCR-ABL gene fusion, and detects circRNA sensitively and specifically, underscoring SPLASH2's unmatched precision and scalability across diverse RNA-seq detection tasks.
Collapse
Affiliation(s)
- Marek Kokot
- Department of Algorithmics and Software, Silesian University of Technology, Gliwice, Poland
| | - Roozbeh Dehghannasiri
- Department of Biomedical Data Science, Stanford University, Stanford, 94305, USA
- Department of Biochemistry, Stanford University, Stanford, 94305, USA
| | - Tavor Baharav
- Department of Electrical Engineering, Stanford University, Stanford, 94305, USA
| | - Julia Salzman
- Department of Biomedical Data Science, Stanford University, Stanford, 94305, USA
- Department of Biochemistry, Stanford University, Stanford, 94305, USA
- Department of Statistics (by courtesy), Stanford University, Stanford, 94305, USA
| | - Sebastian Deorowicz
- Department of Algorithmics and Software, Silesian University of Technology, Gliwice, Poland
| |
Collapse
|
78
|
Hiatt SM, Lawlor JM, Handley LH, Latner DR, Bonnstetter ZT, Finnila CR, Thompson ML, Boston LB, Williams M, Nunez IR, Jenkins J, Kelley WV, Bebin EM, Lopez MA, Hurst ACE, Korf BR, Schmutz J, Grimwood J, Cooper GM. Long-read genome sequencing and variant reanalysis increase diagnostic yield in neurodevelopmental disorders. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.22.24304633. [PMID: 38585854 PMCID: PMC10996728 DOI: 10.1101/2024.03.22.24304633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Variant detection from long-read genome sequencing (lrGS) has proven to be considerably more accurate and comprehensive than variant detection from short-read genome sequencing (srGS). However, the rate at which lrGS can increase molecular diagnostic yield for rare disease is not yet precisely characterized. We performed lrGS using Pacific Biosciences "HiFi" technology on 96 short-read-negative probands with rare disease that were suspected to be genetic. We generated hg38-aligned variants and de novo phased genome assemblies, and subsequently annotated, filtered, and curated variants using clinical standards. New disease-relevant or potentially relevant genetic findings were identified in 16/96 (16.7%) probands, eight of which (8/96, 8.33%) harbored pathogenic or likely pathogenic variants. Newly identified variants were visible in both srGS and lrGS in nine probands (~9.4%) and resulted from changes to interpretation mostly from recent gene-disease association discoveries. Seven cases included variants that were only interpretable in lrGS, including copy-number variants, an inversion, a mobile element insertion, two low-complexity repeat expansions, and a 1 bp deletion. While evidence for each of these variants is, in retrospect, visible in srGS, they were either: not called within srGS data, were represented by calls with incorrect sizes or structures, or failed quality-control and filtration. Thus, while reanalysis of older data clearly increases diagnostic yield, we find that lrGS allows for substantial additional yield (7/96, 7.3%) beyond srGS. We anticipate that as lrGS analysis improves, and as lrGS datasets grow allowing for better variant frequency annotation, the additional lrGS-only rare disease yield will grow over time.
Collapse
Affiliation(s)
- Susan M. Hiatt
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA
| | | | - Lori H. Handley
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA
| | - Donald R. Latner
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA
| | | | | | | | - Lori Beth Boston
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA
| | - Melissa Williams
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA
| | | | - Jerry Jenkins
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA
| | | | - E. Martina Bebin
- Department of Neurology, University of Alabama at Birmingham, Birmingham, AL, 35924, USA
| | - Michael A. Lopez
- Department of Neurology, University of Alabama at Birmingham, Birmingham, AL, 35924, USA
- Department of Pediatrics, University of Alabama at Birmingham, Birmingham, AL, 35924, USA
- Department of Genetics, University of Alabama at Birmingham, Birmingham, AL, 35924, USA
| | - Anna C. E. Hurst
- Department of Genetics, University of Alabama at Birmingham, Birmingham, AL, 35924, USA
| | - Bruce R. Korf
- Department of Genetics, University of Alabama at Birmingham, Birmingham, AL, 35924, USA
| | - Jeremy Schmutz
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA
| | - Jane Grimwood
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA
| | | |
Collapse
|
79
|
Keskus A, Bryant A, Ahmad T, Yoo B, Aganezov S, Goretsky A, Donmez A, Lansdon LA, Rodriguez I, Park J, Liu Y, Cui X, Gardner J, McNulty B, Sacco S, Shetty J, Zhao Y, Tran B, Narzisi G, Helland A, Cook DE, Chang PC, Kolesnikov A, Carroll A, Molloy EK, Pushel I, Guest E, Pastinen T, Shafin K, Miga KH, Malikic S, Day CP, Robine N, Sahinalp C, Dean M, Farooqi MS, Paten B, Kolmogorov M. Severus: accurate detection and characterization of somatic structural variation in tumor genomes using long reads. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.22.24304756. [PMID: 38585974 PMCID: PMC10996739 DOI: 10.1101/2024.03.22.24304756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Most current studies rely on short-read sequencing to detect somatic structural variation (SV) in cancer genomes. Long-read sequencing offers the advantage of better mappability and long-range phasing, which results in substantial improvements in germline SV detection. However, current long-read SV detection methods do not generalize well to the analysis of somatic SVs in tumor genomes with complex rearrangements, heterogeneity, and aneuploidy. Here, we present Severus: a method for the accurate detection of different types of somatic SVs using a phased breakpoint graph approach. To benchmark various short- and long-read SV detection methods, we sequenced five tumor/normal cell line pairs with Illumina, Nanopore, and PacBio sequencing platforms; on this benchmark Severus showed the highest F1 scores (harmonic mean of the precision and recall) as compared to long-read and short-read methods. We then applied Severus to three clinical cases of pediatric cancer, demonstrating concordance with known genetic findings as well as revealing clinically relevant cryptic rearrangements missed by standard genomic panels.
Collapse
Affiliation(s)
- Ayse Keskus
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Asher Bryant
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Tanveer Ahmad
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Byunggil Yoo
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | | | - Anton Goretsky
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Ataberk Donmez
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Lisa A. Lansdon
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Isabel Rodriguez
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Jimin Park
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Yuelin Liu
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Xiwen Cui
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | | | | | - Samuel Sacco
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Jyoti Shetty
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Yongmei Zhao
- Sequencing Facility Bioinformatics Group, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Bao Tran
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | | | | | | | | | | | - Erin K. Molloy
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Irina Pushel
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Erin Guest
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Tomi Pastinen
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Kishwar Shafin
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Salem Malikic
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Chi-Ping Day
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | | | - Cenk Sahinalp
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Michael Dean
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Midhat S. Farooqi
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | | | - Mikhail Kolmogorov
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| |
Collapse
|
80
|
Ermini L, Driguez P. The Application of Long-Read Sequencing to Cancer. Cancers (Basel) 2024; 16:1275. [PMID: 38610953 PMCID: PMC11011098 DOI: 10.3390/cancers16071275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 03/20/2024] [Accepted: 03/21/2024] [Indexed: 04/14/2024] Open
Abstract
Cancer is a multifaceted disease arising from numerous genomic aberrations that have been identified as a result of advancements in sequencing technologies. While next-generation sequencing (NGS), which uses short reads, has transformed cancer research and diagnostics, it is limited by read length. Third-generation sequencing (TGS), led by the Pacific Biosciences and Oxford Nanopore Technologies platforms, employs long-read sequences, which have marked a paradigm shift in cancer research. Cancer genomes often harbour complex events, and TGS, with its ability to span large genomic regions, has facilitated their characterisation, providing a better understanding of how complex rearrangements affect cancer initiation and progression. TGS has also characterised the entire transcriptome of various cancers, revealing cancer-associated isoforms that could serve as biomarkers or therapeutic targets. Furthermore, TGS has advanced cancer research by improving genome assemblies, detecting complex variants, and providing a more complete picture of transcriptomes and epigenomes. This review focuses on TGS and its growing role in cancer research. We investigate its advantages and limitations, providing a rigorous scientific analysis of its use in detecting previously hidden aberrations missed by NGS. This promising technology holds immense potential for both research and clinical applications, with far-reaching implications for cancer diagnosis and treatment.
Collapse
Affiliation(s)
- Luca Ermini
- NORLUX Neuro-Oncology Laboratory, Department of Cancer Research, Luxembourg Institute of Health, L-1210 Luxembourg, Luxembourg
| | - Patrick Driguez
- Bioscience Core Lab, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
81
|
Plender EG, Prodanov T, Hsieh P, Nizamis E, Harvey WT, Sulovari A, Munson KM, Kaufman EJ, O’Neal WK, Valdmanis PN, Marschall T, Bloom JD, Eichler EE. Structural and genetic diversity in the secreted mucins, MUC5AC and MUC5B. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.18.585560. [PMID: 38562829 PMCID: PMC10983947 DOI: 10.1101/2024.03.18.585560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
The secreted mucins MUC5AC and MUC5B play critical defensive roles in airway pathogen entrapment and mucociliary clearance by encoding large glycoproteins with variable number tandem repeats (VNTRs). These polymorphic and degenerate protein coding VNTRs make the loci difficult to investigate with short reads. We characterize the structural diversity of MUC5AC and MUC5B by long-read sequencing and assembly of 206 human and 20 nonhuman primate (NHP) haplotypes. We find that human MUC5B is largely invariant (5761-5762aa); however, seven haplotypes have expanded VNTRs (6291-7019aa). In contrast, 30 allelic variants of MUC5AC encode 16 distinct proteins (5249-6325aa) with cysteine-rich domain and VNTR copy number variation. We grouped MUC5AC alleles into three phylogenetic clades: H1 (46%, ~5654aa), H2 (33%, ~5742aa), and H3 (7%, ~6325aa). The two most common human MUC5AC variants are smaller than NHP gene models, suggesting a reduction in protein length during recent human evolution. Linkage disequilibrium (LD) and Tajima's D analyses reveal that East Asians carry exceptionally large MUC5AC LD blocks with an excess of rare variation (p<0.05). To validate this result, we used Locityper for genotyping MUC5AC haplogroups in 2,600 unrelated samples from the 1000 Genomes Project. We observed signatures of positive selection in H1 and H2 among East Asians and a depletion of the likely ancestral haplogroup (H3). In Africans and Europeans, H3 alleles show an excess of common variation and deviate from Hardy-Weinberg equilibrium, consistent with heterozygote advantage and balancing selection. This study provides a generalizable strategy to characterize complex protein coding VNTRs for improved disease associations.
Collapse
Affiliation(s)
- Elizabeth G. Plender
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Timofey Prodanov
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Evangelos Nizamis
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - William T. Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Eli J. Kaufman
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Wanda K. O’Neal
- Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, 27599, North Carolina, USA
| | - Paul N. Valdmanis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany
| | - Jesse D. Bloom
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
- Howard Hughes Medical Institute, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
82
|
Olbrich M, Bartels L, Wohlers I. Sequencing technologies and hardware-accelerated parallel computing transform computational genomics research. FRONTIERS IN BIOINFORMATICS 2024; 4:1384497. [PMID: 38567256 PMCID: PMC10985184 DOI: 10.3389/fbinf.2024.1384497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 03/07/2024] [Indexed: 04/04/2024] Open
Affiliation(s)
- Michael Olbrich
- Center for Biotechnology, Khalifa University for Science and Technology, Abu Dhabi, United Arab Emirates
| | - Lennart Bartels
- Biomolecular Data Science in Pneumology, Research Center Borstel, Borstel, Germany
| | - Inken Wohlers
- Biomolecular Data Science in Pneumology, Research Center Borstel, Borstel, Germany
- University of Lübeck, Lübeck, Germany
| |
Collapse
|
83
|
Liu YH, Luo C, Golding SG, Ioffe JB, Zhou XM. Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data. Nat Commun 2024; 15:2447. [PMID: 38503752 PMCID: PMC10951360 DOI: 10.1038/s41467-024-46614-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 03/04/2024] [Indexed: 03/21/2024] Open
Abstract
Long-read sequencing offers long contiguous DNA fragments, facilitating diploid genome assembly and structural variant (SV) detection. Efficient and robust algorithms for SV identification are crucial with increasing data availability. Alignment-based methods, favored for their computational efficiency and lower coverage requirements, are prominent. Alternative approaches, relying solely on available reads for de novo genome assembly and employing assembly-based tools for SV detection via comparison to a reference genome, demand significantly more computational resources. However, the lack of comprehensive benchmarking constrains our comprehension and hampers further algorithm development. Here we systematically compare 14 read alignment-based SV calling methods (including 4 deep learning-based methods and 1 hybrid method), and 4 assembly-based SV calling methods, alongside 4 upstream aligners and 7 assemblers. Assembly-based tools excel in detecting large SVs, especially insertions, and exhibit robustness to evaluation parameter changes and coverage fluctuations. Conversely, alignment-based tools demonstrate superior genotyping accuracy at low sequencing coverage (5-10×) and excel in detecting complex SVs, like translocations, inversions, and duplications. Our evaluation provides performance insights, highlighting the absence of a universally superior tool. We furnish guidelines across 31 criteria combinations, aiding users in selecting the most suitable tools for diverse scenarios and offering directions for further method development.
Collapse
Affiliation(s)
- Yichen Henry Liu
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA
| | - Can Luo
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA
| | - Staunton G Golding
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA
| | - Jacob B Ioffe
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA
| | - Xin Maizie Zhou
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA.
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA.
- Data Science Institute, Vanderbilt University, 37235, Nashville, TN, USA.
| |
Collapse
|
84
|
Mao Y, Harvey WT, Porubsky D, Munson KM, Hoekzema K, Lewis AP, Audano PA, Rozanski A, Yang X, Zhang S, Yoo D, Gordon DS, Fair T, Wei X, Logsdon GA, Haukness M, Dishuck PC, Jeong H, Del Rosario R, Bauer VL, Fattor WT, Wilkerson GK, Mao Y, Shi Y, Sun Q, Lu Q, Paten B, Bakken TE, Pollen AA, Feng G, Sawyer SL, Warren WC, Carbone L, Eichler EE. Structurally divergent and recurrently mutated regions of primate genomes. Cell 2024; 187:1547-1562.e13. [PMID: 38428424 PMCID: PMC10947866 DOI: 10.1016/j.cell.2024.01.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 11/26/2023] [Accepted: 01/31/2024] [Indexed: 03/03/2024]
Abstract
We sequenced and assembled using multiple long-read sequencing technologies the genomes of chimpanzee, bonobo, gorilla, orangutan, gibbon, macaque, owl monkey, and marmoset. We identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. We estimate that 819.47 Mbp or ∼27% of the genome has been affected by SVs across primate evolution. We identify 1,607 structurally divergent regions wherein recurrent structural variation contributes to creating SV hotspots where genes are recurrently lost (e.g., CARD, C4, and OLAH gene families) and additional lineage-specific genes are generated (e.g., CKAP2, VPS36, ACBD7, and NEK5 paralogs), becoming targets of rapid chromosomal diversification and positive selection (e.g., RGPD gene family). High-fidelity long-read sequencing has made these dynamic regions of the genome accessible for sequence-level analyses within and between primate species.
Collapse
Affiliation(s)
- Yafei Mao
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA; Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China.
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Allison Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Xiangyu Yang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David S Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Tyler Fair
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA
| | - Xiaoxi Wei
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Hyeonsoo Jeong
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ricardo Del Rosario
- McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Vanessa L Bauer
- BioFrontiers Institute, Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Bouder, CO, USA
| | - Will T Fattor
- BioFrontiers Institute, Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Bouder, CO, USA
| | - Gregory K Wilkerson
- Department of Veterinary Sciences, Michale E. Keeling Center for Comparative Medicine and Research, The University of Texas MD Anderson Cancer Center, Bastrop, TX, USA; Department of Clinical Sciences, North Carolina State University, Raleigh, NC, USA
| | - Yuxiang Mao
- Institute of Neuroscience, State Key Laboratory of Neuroscience, Center for Excellence in Brain Science & Intelligence Technology, Chinese Academy of Sciences, Shanghai, China; Shanghai Center for Brain Science and Brain-Inspired Intelligence Technology, Shanghai, China
| | - Yongyong Shi
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China; Institute of Neuroscience, State Key Laboratory of Neuroscience, Center for Excellence in Brain Science & Intelligence Technology, Chinese Academy of Sciences, Shanghai, China; Shanghai Center for Brain Science and Brain-Inspired Intelligence Technology, Shanghai, China
| | - Qiang Sun
- Institute of Neuroscience, State Key Laboratory of Neuroscience, Center for Excellence in Brain Science & Intelligence Technology, Chinese Academy of Sciences, Shanghai, China; Shanghai Center for Brain Science and Brain-Inspired Intelligence Technology, Shanghai, China
| | - Qing Lu
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Alex A Pollen
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Guoping Feng
- McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Sara L Sawyer
- BioFrontiers Institute, Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Bouder, CO, USA
| | - Wesley C Warren
- Department of Animal Sciences, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA; Department of Surgery, School of Medicine, University of Missouri, Columbia, MO, USA; Institute of Data Science and Informatics, University of Missouri, Columbia, MO, USA
| | - Lucia Carbone
- Department of Medicine, Knight Cardiovascular Institute, Oregon Health and Science University, Portland, OR, USA; Division of Genetics, Oregon National Primate Research Center, Beaverton, OR, USA; Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, USA; Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
85
|
Riess O, Sturm M, Menden B, Liebmann A, Demidov G, Witt D, Casadei N, Admard J, Schütz L, Ossowski S, Taylor S, Schaffer S, Schroeder C, Dufke A, Haack T. Genomes in clinical care. NPJ Genom Med 2024; 9:20. [PMID: 38485733 PMCID: PMC10940576 DOI: 10.1038/s41525-024-00402-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 02/07/2024] [Indexed: 03/18/2024] Open
Abstract
In the era of precision medicine, genome sequencing (GS) has become more affordable and the importance of genomics and multi-omics in clinical care is increasingly being recognized. However, how to scale and effectively implement GS on an institutional level remains a challenge for many. Here, we present Genome First and Ge-Med, two clinical implementation studies focused on identifying the key pillars and processes that are required to make routine GS and predictive genomics a reality in the clinical setting. We describe our experience and lessons learned for a variety of topics including test logistics, patient care processes, data reporting, and infrastructure. Our model of providing clinical care and comprehensive genomic analysis from a single source may be used by other centers with a similar structure to facilitate the implementation of omics-based personalized health concepts in medicine.
Collapse
Affiliation(s)
- Olaf Riess
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany.
- NGS Competence Center Tübingen, University of Tübingen, Tübingen, Germany.
- Center for Rare Diseases Tübingen, University of Tübingen, Tübingen, Germany.
| | - Marc Sturm
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
| | - Benita Menden
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
| | - Alexandra Liebmann
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
| | - German Demidov
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
| | - Dennis Witt
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
| | - Nicolas Casadei
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- NGS Competence Center Tübingen, University of Tübingen, Tübingen, Germany
| | - Jakob Admard
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
| | - Leon Schütz
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
| | - Stephan Ossowski
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- NGS Competence Center Tübingen, University of Tübingen, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, Tübingen, Germany
| | | | | | - Christopher Schroeder
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Center for Rare Diseases Tübingen, University of Tübingen, Tübingen, Germany
| | - Andreas Dufke
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Center for Rare Diseases Tübingen, University of Tübingen, Tübingen, Germany
| | - Tobias Haack
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Center for Rare Diseases Tübingen, University of Tübingen, Tübingen, Germany
| |
Collapse
|
86
|
Helal AA, Saad BT, Saad MT, Mosaad GS, Aboshanab KM. Benchmarking long-read aligners and SV callers for structural variation detection in Oxford nanopore sequencing data. Sci Rep 2024; 14:6160. [PMID: 38486064 PMCID: PMC10940726 DOI: 10.1038/s41598-024-56604-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 03/08/2024] [Indexed: 03/18/2024] Open
Abstract
Structural variants (SVs) are one of the significant types of DNA mutations and are typically defined as larger-than-50-bp genomic alterations that include insertions, deletions, duplications, inversions, and translocations. These modifications can profoundly impact the phenotypic characteristics and contribute to disorders like cancer, response to treatment, and infections. Four long-read aligners and five SV callers have been evaluated using three Oxford Nanopore NGS human genome datasets in terms of precision, recall, and F1-score statistical metrics, depth of coverage, and speed of analysis. The best SV caller regarding recall, precision, and F1-score when matched with different aligners at different coverage levels tend to vary depending on the dataset and the specific SV types being analyzed. However, based on our findings, Sniffles and CuteSV tend to perform well across different aligners and coverage levels, followed by SVIM, PBSV, and SVDSS in the last place. The CuteSV caller has the highest average F1-score (82.51%) and recall (78.50%), and Sniffles has the highest average precision value (94.33%). Minimap2 as an aligner and Sniffles as an SV caller act as a strong base for the pipeline of SV calling because of their high speed and reasonable accomplishment. PBSV has a lower average F1-score, precision, and recall and may generate more false positives and overlook some actual SVs. Our results are valuable in the comprehensive evaluation of popular SV callers and aligners as they provide insight into the performance of several long-read aligners and SV callers and serve as a reference for researchers in selecting the most suitable tools for SV detection.
Collapse
Affiliation(s)
- Asmaa A Helal
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt
| | - Bishoy T Saad
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt.
| | - Mina T Saad
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt
| | - Gamal S Mosaad
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt
| | - Khaled M Aboshanab
- Department of Microbiology and Immunology, Faculty of Pharmacy, Ain Shams University, Organization of African Unity St., Abassi, Cairo, 11566, Egypt.
| |
Collapse
|
87
|
Sigurpalsdottir BD, Stefansson OA, Holley G, Beyter D, Zink F, Hardarson MÞ, Sverrisson SÞ, Kristinsdottir N, Magnusdottir DN, Magnusson OÞ, Gudbjartsson DF, Halldorsson BV, Stefansson K. A comparison of methods for detecting DNA methylation from long-read sequencing of human genomes. Genome Biol 2024; 25:69. [PMID: 38468278 PMCID: PMC10929077 DOI: 10.1186/s13059-024-03207-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 02/28/2024] [Indexed: 03/13/2024] Open
Abstract
BACKGROUND Long-read sequencing can enable the detection of base modifications, such as CpG methylation, in single molecules of DNA. The most commonly used methods for long-read sequencing are nanopore developed by Oxford Nanopore Technologies (ONT) and single molecule real-time (SMRT) sequencing developed by Pacific Bioscience (PacBio). In this study, we systematically compare the performance of CpG methylation detection from long-read sequencing. RESULTS We demonstrate that CpG methylation detection from 7179 nanopore-sequenced DNA samples is highly accurate and consistent with 132 oxidative bisulfite-sequenced (oxBS) samples, isolated from the same blood draws. We introduce quality filters for CpGs that further enhance the accuracy of CpG methylation detection from nanopore-sequenced DNA, while removing at most 30% of CpGs. We evaluate the per-site performance of CpG methylation detection across different genomic features and CpG methylation rates and demonstrate how the latest R10.4 flowcell chemistry and base-calling algorithms improve methylation detection from nanopore sequencing. Additionally, we show how the methylation detection of 50 SMRT-sequenced genomes compares to nanopore sequencing and oxBS. CONCLUSIONS This study provides the first systematic comparison of CpG methylation detection tools for long-read sequencing methods. We compare two commonly used computational methods for the detection of CpG methylation in a large number of nanopore genomes, including samples sequenced using the latest R10.4 nanopore flowcell chemistry and 50 SMRT sequenced samples. We provide insights into the strengths and limitations of each sequencing method as well as recommendations for standardization and evaluation of tools designed for genome-scale modified base detection using long-read sequencing.
Collapse
Affiliation(s)
- Brynja D Sigurpalsdottir
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland.
- School of Technology, Reykjavík University, Reykjavík, Iceland.
| | | | | | - Doruk Beyter
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland
| | - Florian Zink
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland
| | - Marteinn Þ Hardarson
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland
- School of Technology, Reykjavík University, Reykjavík, Iceland
| | | | | | | | | | - Daniel F Gudbjartsson
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavík, Iceland
| | - Bjarni V Halldorsson
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland.
- School of Technology, Reykjavík University, Reykjavík, Iceland.
| | - Kari Stefansson
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland
- Faculty of Medicine, School of Health Science, University of Iceland, Reykjavík, Iceland
| |
Collapse
|
88
|
Gustafson JA, Gibson SB, Damaraju N, Zalusky MPG, Hoekzema K, Twesigomwe D, Yang L, Snead AA, Richmond PA, De Coster W, Olson ND, Guarracino A, Li Q, Miller AL, Goffena J, Anderson Z, Storz SHR, Ward SA, Sinha M, Gonzaga-Jauregui C, Clarke WE, Basile AO, Corvelo A, Reeves C, Helland A, Musunuri RL, Revsine M, Patterson KE, Paschal CR, Zakarian C, Goodwin S, Jensen TD, Robb E, McCombie WR, Sedlazeck FJ, Zook JM, Montgomery SB, Garrison E, Kolmogorov M, Schatz MC, McLaughlin RN, Dashnow H, Zody MC, Loose M, Jain M, Eichler EE, Miller DE. Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.05.24303792. [PMID: 38496498 PMCID: PMC10942501 DOI: 10.1101/2024.03.05.24303792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Less than half of individuals with a suspected Mendelian condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.
Collapse
Affiliation(s)
- Jonas A. Gustafson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | - Sophia B. Gibson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Nikhita Damaraju
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Institute for Public Health Genetics, University of Washington, Seattle, WA, USA
| | - Miranda PG Zalusky
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - David Twesigomwe
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Lei Yang
- Pacific Northwest Research Institute, Seattle, WA, USA
| | | | | | - Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Nathan D. Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Human Technopole, Milan, Italy
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Angela L. Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Joy Goffena
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Zachery Anderson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Sophie HR Storz
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Sydney A. Ward
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Maisha Sinha
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Claudia Gonzaga-Jauregui
- International Laboratory for Human Genome Research, Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México
| | - Wayne E. Clarke
- New York Genome Center, New York, NY, USA
- Outlier Informatics Inc., Saskatoon, SK, Canada
| | | | | | | | | | | | - Mahler Revsine
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | | | - Cate R. Paschal
- Department of Laboratories, Seattle Children’s Hospital, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Christina Zakarian
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | | | - Esther Robb
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | | | | | | | | | - Fritz J. Sedlazeck
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Mikhail Kolmogorov
- Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Richard N. McLaughlin
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
- Pacific Northwest Research Institute, Seattle, WA, USA
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
| | | | - Matt Loose
- Deep Seq, School of Life Sciences, University of Nottingham, Nottingham, England
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Khoury College of Computer Sciences, Northeastern University, Boston, MA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Danny E. Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA
| |
Collapse
|
89
|
Zhang P, Zhao X, Li Q, Xu Y, Cheng Z, Yang L, Wang H, Tao Y, Huang G, Wu R, Zhou H, Zhao S. Proband-independent haplotyping based on NGS-based long-read sequencing for detecting pathogenic variant carrier status in preimplantation genetic testing for monogenic diseases. Front Mol Biosci 2024; 11:1329580. [PMID: 38516188 PMCID: PMC10955336 DOI: 10.3389/fmolb.2024.1329580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Accepted: 02/12/2024] [Indexed: 03/23/2024] Open
Abstract
Preimplantation genetic testing for monogenic diseases (PGT-M) can be used to select embryos that do not develop disease phenotypes or carry disease-causing genes for implantation into the mother's uterus, to block disease transmission to the offspring, and to increase the birth rate of healthy newborns. However, the traditional PGT-M technique has some limitations, such as its time consumption, experimental procedural complexity, and the need for a complete family or reference embryo to construct the haplotype. In this study, proband-independent haplotyping based on NGS-based long-read sequencing (Phbol-seq) was used to effectively construct haplotypes. By targeting the mutation sites of single gene disease point mutations and small fragment deletion carriers, embryos carrying parental disease-causing mutations were successfully identified by linkage analysis. The efficiency of embryo resolution was then verified by classical Sanger sequencing, and it was confirmed that the construction of haplotype and SNP linkage analysis by Phbol-seq could accurately and effectively detect whether embryos carried parental pathogenic mutations. After the embryos confirmed to be nonpathogenic by Phbol-seq-based PGT-M and confirmed to have normal copy number variation by Phbol-seq-based PGT-A were transplanted into the uterus, gene detection in amniotic fluid of the implanted embryos was performed, and the results confirmed that Phbol-seq technology could accurately distinguish normal genotype embryos from genetically modified carrier embryos. Our results suggest that Phbol-seq is an effective strategy for accurately locating mutation sites and accurately distinguishing between embryos that inherit disease-causing genes and normal embryos that do not. This is critical for Phbol-seq-based PGT-M and could help more single-gene disease carriers with incomplete families, de novo mutations or suspected germline mosaicism to have healthy babies with normal phenotypes. It also helps to reduce the transmission of monogenic genetic diseases in the population.
Collapse
Affiliation(s)
- Peiyu Zhang
- Department of Obstetrics and Gynecology, Guizhou Medical University, Guiyang, China
| | - Xiaomei Zhao
- Reproductive Medicine Center, Department of Obstetrics and Gynecology of the Affiliated Hospital of Guizhou Medical University, Guiyang, China
| | - Qinshan Li
- Department of Obstetrics and Gynecology, Affiliated Hospital of Guizhou Medical University, Guiyang, China
- Prenatal Diagnosis Center, Affiliated Hospital of Guizhou Medical University, Guiyang, China
| | - Yaqiong Xu
- Department of Obstetrics and Gynecology, Guizhou Medical University, Guiyang, China
| | - Zengmei Cheng
- Department of Obstetrics and Gynecology, Guizhou Medical University, Guiyang, China
| | - Lu Yang
- Department of Obstetrics and Gynecology, Guizhou Medical University, Guiyang, China
| | - Houmei Wang
- Department of Obstetrics and Gynecology, Affiliated Hospital of Guizhou Medical University, Guiyang, China
| | - Yang Tao
- Reproductive Medicine Center, Department of Obstetrics and Gynecology of The First People’s Hospital of Bijie, Bijie, China
| | - Guanyou Huang
- Reproductive Medicine Center, Department of Obstetrics and Gynecology of the Affiliated Hospital of Guizhou Medical University, Guiyang, China
| | - Rui Wu
- Reproductive Medicine Center, Department of Obstetrics and Gynecology of the Affiliated Hospital of Guizhou Medical University, Guiyang, China
| | - Hua Zhou
- Reproductive Medicine Center, Department of Obstetrics and Gynecology of the Affiliated Hospital of Guizhou Medical University, Guiyang, China
| | - Shuyun Zhao
- Reproductive Medicine Center, Department of Obstetrics and Gynecology of the Affiliated Hospital of Guizhou Medical University, Guiyang, China
| |
Collapse
|
90
|
Olivucci G, Iovino E, Innella G, Turchetti D, Pippucci T, Magini P. Long read sequencing on its way to the routine diagnostics of genetic diseases. Front Genet 2024; 15:1374860. [PMID: 38510277 PMCID: PMC10951082 DOI: 10.3389/fgene.2024.1374860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 02/26/2024] [Indexed: 03/22/2024] Open
Abstract
The clinical application of technological progress in the identification of DNA alterations has always led to improvements of diagnostic yields in genetic medicine. At chromosome side, from cytogenetic techniques evaluating number and gross structural defects to genomic microarrays detecting cryptic copy number variants, and at molecular level, from Sanger method studying the nucleotide sequence of single genes to the high-throughput next-generation sequencing (NGS) technologies, resolution and sensitivity progressively increased expanding considerably the range of detectable DNA anomalies and alongside of Mendelian disorders with known genetic causes. However, particular genomic regions (i.e., repetitive and GC-rich sequences) are inefficiently analyzed by standard genetic tests, still relying on laborious, time-consuming and low-sensitive approaches (i.e., southern-blot for repeat expansion or long-PCR for genes with highly homologous pseudogenes), accounting for at least part of the patients with undiagnosed genetic disorders. Third generation sequencing, generating long reads with improved mappability, is more suitable for the detection of structural alterations and defects in hardly accessible genomic regions. Although recently implemented and not yet clinically available, long read sequencing (LRS) technologies have already shown their potential in genetic medicine research that might greatly impact on diagnostic yield and reporting times, through their translation to clinical settings. The main investigated LRS application concerns the identification of structural variants and repeat expansions, probably because techniques for their detection have not evolved as rapidly as those dedicated to single nucleotide variants (SNV) identification: gold standard analyses are karyotyping and microarrays for balanced and unbalanced chromosome rearrangements, respectively, and southern blot and repeat-primed PCR for the amplification and sizing of expanded alleles, impaired by limited resolution and sensitivity that have not been significantly improved by the advent of NGS. Nevertheless, more recently, with the increased accuracy provided by the latest product releases, LRS has been tested also for SNV detection, especially in genes with highly homologous pseudogenes and for haplotype reconstruction to assess the parental origin of alleles with de novo pathogenic variants. We provide a review of relevant recent scientific papers exploring LRS potential in the diagnosis of genetic diseases and its potential future applications in routine genetic testing.
Collapse
Affiliation(s)
- Giulia Olivucci
- IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
- Department of Surgical and Oncological Sciences, University of Palermo, Palermo, Italy
| | - Emanuela Iovino
- IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Giovanni Innella
- Department of Medical and Surgical Sciences (DIMEC), University of Bologna, Bologna, Italy
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Daniela Turchetti
- Department of Medical and Surgical Sciences (DIMEC), University of Bologna, Bologna, Italy
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Tommaso Pippucci
- IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Pamela Magini
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| |
Collapse
|
91
|
Wang Y, Chen Y, Gao J, Xie H, Guo Y, Yang J, Liu J, Chen Z, Li Q, Li M, Ren J, Wen L, Tang F. Mapping crossover events of mouse meiotic recombination by restriction fragment ligation-based Refresh-seq. Cell Discov 2024; 10:26. [PMID: 38443370 PMCID: PMC10915157 DOI: 10.1038/s41421-023-00638-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 12/11/2023] [Indexed: 03/07/2024] Open
Abstract
Single-cell whole-genome sequencing methods have undergone great improvements over the past decade. However, allele dropout, which means the inability to detect both alleles simultaneously in an individual diploid cell, largely restricts the application of these methods particularly for medical applications. Here, we develop a new single-cell whole-genome sequencing method based on third-generation sequencing (TGS) platform named Refresh-seq (restriction fragment ligation-based genome amplification and TGS). It is based on restriction endonuclease cutting and ligation strategy in which two alleles in an individual cell can be cut into equal fragments and tend to be amplified simultaneously. As a new single-cell long-read genome sequencing method, Refresh-seq features much lower allele dropout rate compared with SMOOTH-seq. Furthermore, we apply Refresh-seq to 688 sperm cells and 272 female haploid cells (secondary polar bodies and parthenogenetic oocytes) from F1 hybrid mice. We acquire high-resolution genetic map of mouse meiosis recombination at low sequencing depth and reveal the sexual dimorphism in meiotic crossovers. We also phase the structure variations (deletions and insertions) in sperm cells and female haploid cells with high precision. Refresh-seq shows great performance in screening aneuploid sperm cells and oocytes due to the low allele dropout rate and has great potential for medical applications such as preimplantation genetic diagnosis.
Collapse
Affiliation(s)
- Yan Wang
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Yijun Chen
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Junpeng Gao
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Emergency Center, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, China
| | - Haoling Xie
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Yuqing Guo
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Jingwei Yang
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Jun'e Liu
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Zonggui Chen
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
| | - Qingqing Li
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Mengyao Li
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Jie Ren
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Lu Wen
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Fuchou Tang
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China.
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China.
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China.
- Changping Laboratory, Beijing, China.
| |
Collapse
|
92
|
Carpinteyro-Ponce J, Machado CA. The Complex Landscape of Structural Divergence Between the Drosophila pseudoobscura and D. persimilis Genomes. Genome Biol Evol 2024; 16:evae047. [PMID: 38482945 PMCID: PMC10980976 DOI: 10.1093/gbe/evae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/07/2024] [Indexed: 04/01/2024] Open
Abstract
Structural genomic variants are key drivers of phenotypic evolution. They can span hundreds to millions of base pairs and can thus affect large numbers of genetic elements. Although structural variation is quite common within and between species, its characterization depends upon the quality of genome assemblies and the proportion of repetitive elements. Using new high-quality genome assemblies, we report a complex and previously hidden landscape of structural divergence between the genomes of Drosophila persimilis and D. pseudoobscura, two classic species in speciation research, and study the relationships among structural variants, transposable elements, and gene expression divergence. The new assemblies confirm the already known fixed inversion differences between these species. Consistent with previous studies showing higher levels of nucleotide divergence between fixed inversions relative to collinear regions of the genome, we also find a significant overrepresentation of INDELs inside the inversions. We find that transposable elements accumulate in regions with low levels of recombination, and spatial correlation analyses reveal a strong association between transposable elements and structural variants. We also report a strong association between differentially expressed (DE) genes and structural variants and an overrepresentation of DE genes inside the fixed chromosomal inversions that separate this species pair. Interestingly, species-specific structural variants are overrepresented in DE genes involved in neural development, spermatogenesis, and oocyte-to-embryo transition. Overall, our results highlight the association of transposable elements with structural variants and their importance in driving evolutionary divergence.
Collapse
Affiliation(s)
| | - Carlos A Machado
- Department of Biology, University of Maryland, College Park, MD, USA
| |
Collapse
|
93
|
Bernatchez L, Ferchaud AL, Berger CS, Venney CJ, Xuereb A. Genomics for monitoring and understanding species responses to global climate change. Nat Rev Genet 2024; 25:165-183. [PMID: 37863940 DOI: 10.1038/s41576-023-00657-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/29/2023] [Indexed: 10/22/2023]
Abstract
All life forms across the globe are experiencing drastic changes in environmental conditions as a result of global climate change. These environmental changes are happening rapidly, incur substantial socioeconomic costs, pose threats to biodiversity and diminish a species' potential to adapt to future environments. Understanding and monitoring how organisms respond to human-driven climate change is therefore a major priority for the conservation of biodiversity in a rapidly changing environment. Recent developments in genomic, transcriptomic and epigenomic technologies are enabling unprecedented insights into the evolutionary processes and molecular bases of adaptation. This Review summarizes methods that apply and integrate omics tools to experimentally investigate, monitor and predict how species and communities in the wild cope with global climate change, which is by genetically adapting to new environmental conditions, through range shifts or through phenotypic plasticity. We identify advantages and limitations of each method and discuss future research avenues that would improve our understanding of species' evolutionary responses to global climate change, highlighting the need for holistic, multi-omics approaches to ecosystem monitoring during global climate change.
Collapse
Affiliation(s)
- Louis Bernatchez
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, Quebec, Canada
| | - Anne-Laure Ferchaud
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, Quebec, Canada.
- Parks Canada, Office of the Chief Ecosystem Scientist, Protected Areas Establishment, Quebec City, Quebec, Canada.
| | - Chloé Suzanne Berger
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, Quebec, Canada
| | - Clare J Venney
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, Quebec, Canada
| | - Amanda Xuereb
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, Quebec, Canada
| |
Collapse
|
94
|
Dorey A, Howorka S. Nanopore DNA sequencing technologies and their applications towards single-molecule proteomics. Nat Chem 2024; 16:314-334. [PMID: 38448507 DOI: 10.1038/s41557-023-01322-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 07/14/2023] [Indexed: 03/08/2024]
Abstract
Sequencing of nucleic acids with nanopores has emerged as a powerful tool offering rapid readout, high accuracy, low cost and portability. This label-free method for sequencing at the single-molecule level is an achievement on its own. However, nanopores also show promise for the technologically even more challenging sequencing of polypeptides, something that could considerably benefit biological discovery, clinical diagnostics and homeland security, as current techniques lack portability and speed. Here we survey the biochemical innovations underpinning commercial and academic nanopore DNA/RNA sequencing techniques, and explore how these advances can fuel developments in future protein sequencing with nanopores.
Collapse
Affiliation(s)
- Adam Dorey
- Department of Chemistry & Institute of Structural Molecular Biology, University College London, London, UK.
| | - Stefan Howorka
- Department of Chemistry & Institute of Structural Molecular Biology, University College London, London, UK.
| |
Collapse
|
95
|
Cui X, Lin Q, Chen M, Wang Y, Wang Y, Wang Y, Tao J, Yin H, Zhao T. Long-read sequencing unveils novel somatic variants and methylation patterns in the genetic information system of early lung cancer. Comput Biol Med 2024; 171:108174. [PMID: 38442557 DOI: 10.1016/j.compbiomed.2024.108174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 01/25/2024] [Accepted: 02/18/2024] [Indexed: 03/07/2024]
Abstract
Lung cancer poses a global health challenge, necessitating advanced diagnostics for improved outcomes. Intensive efforts are ongoing to pinpoint early detection biomarkers, such as genomic variations and DNA methylation, to elevate diagnostic precision. We conducted long-read sequencing on cancerous and adjacent non-cancerous tissues from a patient with lung adenocarcinoma. We identified somatic structural variations (SVs) specific to lung cancer by integrating data from various SV calling methods and differentially methylated regions (DMRs) that were distinct between these two tissue samples, revealing a unique methylation pattern associated with lung cancer. This study discovered over 40,000 somatic SVs and over 180,000 DMRs linked to lung cancer. We identified approximately 700 genes of significant relevance through comprehensive analysis, including genes intricately associated with many lung cancers, such as NOTCH1, SMOC2, CSMD2, and others. Furthermore, we observed that somatic SVs and DMRs were substantially enriched in several pathways, such as axon guidance signaling pathways, which suggests a comprehensive multi-omics impact on lung cancer progression across various biological investigation levels. These datasets can potentially serve as biomarkers for early lung cancer detection and may hold significant value in clinical diagnosis and treatment applications.
Collapse
Affiliation(s)
- Xinran Cui
- School of Computer Science and Technology, Harbin Institute of Technology, 92 West Da Zhi St, Harbin, Heilongjiang, 150000, China
| | - Qingyan Lin
- Department of Respiratory and Critical Care, Heilongjiang Provincial Hospital, 405 Gorokhovaya Street, Harbin, Heilongjiang, 150000, China
| | - Ming Chen
- Institute of Bioinformatics, Harbin Institute of Technology, 92 West Da Zhi St, Harbin, Heilongjiang, 150000, China
| | - Yidan Wang
- Department of Respiratory and Critical Care, Heilongjiang Provincial Hospital, 405 Gorokhovaya Street, Harbin, Heilongjiang, 150000, China
| | - Yiwen Wang
- Tanwei College, Tsinghua University, Shuangqing Road, Beijing, 100084, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, 92 West Da Zhi St, Harbin, Heilongjiang, 150000, China.
| | - Jiang Tao
- School of Computer Science and Technology, Harbin Institute of Technology, 92 West Da Zhi St, Harbin, Heilongjiang, 150000, China.
| | - Honglei Yin
- Department of Respiratory and Critical Care, Heilongjiang Provincial Hospital, 405 Gorokhovaya Street, Harbin, Heilongjiang, 150000, China.
| | - Tianyi Zhao
- School of Medicine, Harbin Institute of Technology, 92 West Da Zhi St, Harbin, Heilongjiang, 150000, China.
| |
Collapse
|
96
|
Genner R, Akeson S, Meredith M, Jerez PA, Malik L, Baker B, Miano-Burkhardt A, Paten B, Billingsley KJ, Blauwendraat C, Jain M. Assessing methylation detection for primary human tissue using Nanopore sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.29.581569. [PMID: 38464144 PMCID: PMC10925257 DOI: 10.1101/2024.02.29.581569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
DNA methylation most commonly occurs as 5-methylcytosine (5-mC) in the human genome and has been associated with human diseases. Recent developments in single-molecule sequencing technologies (Oxford Nanopore Technologies (ONT) and Pacific Biosciences) have enabled readouts of long, native DNA molecules, including cytosine methylation. ONT recently upgraded their Nanopore sequencing chemistry and kits from R9 to the R10 version, which yielded increased accuracy and sequencing throughput. However the effects on methylation detection have not yet been documented. Here we performed a series of computational analyses to characterize differences in Nanopore-based 5mC detection between the ONT R9 and R10 chemistries. We compared 5mC calls in R9 and R10 for three human genome datasets: a cell line, a frontal cortex brain sample, and a blood sample. We performed an in-depth analysis on CpG islands and homopolymer regions, and documented high concordance for methylation detection among sequencing technologies. The strongest correlation was observed between Nanopore R10 and Illumina bisulfite technologies for cell line-derived datasets. Subtle differences in methylation datasets between technologies can impact analysis tools such as differential methylation calling software. Our findings show that comparisons can be drawn between methylation data from different Nanopore chemistries using guided hypotheses. This work will facilitate comparison among Nanopore data cohorts derived using different chemistries from large scale sequencing efforts, such as the NIH CARD Long Read Initiative.
Collapse
Affiliation(s)
- Rylee Genner
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Stuart Akeson
- Department of Bioengineering, Northeastern University, Boston, MA, USA
| | - Melissa Meredith
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Pilar Alvarez Jerez
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Laksh Malik
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Breeana Baker
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Benedict Paten
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kimberley J Billingsley
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Cornelis Blauwendraat
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Miten Jain
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Department of Bioengineering, Northeastern University, Boston, MA, USA
- Department of Physics, Northeastern University, Boston, MA, USA
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| |
Collapse
|
97
|
Behrens YL, Pietzsch S, Antić Ž, Zhang Y, Bergmann AK. The landscape of cytogenetic and molecular genetic methods in diagnostics for hematologic neoplasia. Best Pract Res Clin Haematol 2024; 37:101539. [PMID: 38490767 DOI: 10.1016/j.beha.2024.101539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 01/28/2024] [Indexed: 03/17/2024]
Abstract
Improvements made during the last decades in the management of patients with hematologic neoplasia have resulted in increase of overall survival. These advancements have become possible through progress in our understanding of genetic basis of different hematologic malignancies and their role in the current risk-adapted treatment protocols. In this review, we provide an overview of current cytogenetic and molecular genetic methods, commonly used in the genetic characterization of hematologic malignancies, describe the current developments in the cytogenetic and molecular diagnostics, and give an outlook into their future development. Furthermore, we give a brief overview of the most important public databases and guidelines for sequence variant interpretation.
Collapse
Affiliation(s)
- Yvonne Lisa Behrens
- Department of Human Genetics, Hannover Medical School, 30625, Hannover, Germany
| | - Stefan Pietzsch
- Department of Human Genetics, Hannover Medical School, 30625, Hannover, Germany
| | - Željko Antić
- Department of Human Genetics, Hannover Medical School, 30625, Hannover, Germany
| | - Yanming Zhang
- Cytogenetics Laboratory, Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Anke K Bergmann
- Department of Human Genetics, Hannover Medical School, 30625, Hannover, Germany.
| |
Collapse
|
98
|
Timmaraju VA, Finkelstein SD, Levine JA. Analytical Validation of Loss of Heterozygosity and Mutation Detection in Pancreatic Fine-Needle Aspirates by Capillary Electrophoresis and Sanger Sequencing. Diagnostics (Basel) 2024; 14:514. [PMID: 38472986 DOI: 10.3390/diagnostics14050514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 02/15/2024] [Accepted: 02/23/2024] [Indexed: 03/14/2024] Open
Abstract
Pancreatic cystic disease, including duct dilation, represents precursor states towards the development of pancreatic cancer, a form of malignancy with relatively low incidence but high mortality. While most of these cysts (>85%) are benign, the remainder can progress over time, leading to malignant transformation, invasion, and metastasis. Cytologic diagnosis is challenging, limited by the paucity or complete absence of cells representative of cystic lesions and fibrosis. Molecular analysis of fluids collected from endoscopic-guided fine-needle aspiration of pancreatic cysts and dilated duct lesions can be used to evaluate the risk of progression to malignancy. The basis for the enhanced diagnostic utility of molecular approaches is the ability to interrogate cell-free nucleic acid of the cyst/duct and/or extracellular fluid. The allelic imbalances at tumor suppressor loci and the selective oncogenic drivers are used clinically to help differentiate benign stable pancreatic cysts from those progressing toward high-grade dysplasia. Methods are discussed and used to determine the efficacy for diagnostic implementation. Here, we report the analytical validation of methods to detect causally associated molecular changes integral to the pathogenesis of pancreatic cancer from pancreatic cyst fluids.
Collapse
|
99
|
Wang Z, Liu C, Liu W, Lv X, Hu T, Yang F, Yang W, He L, Huang X. Long-read sequencing reveals the structural complexity of genomic integration of HPV DNA in cervical cancer cell lines. BMC Genomics 2024; 25:198. [PMID: 38378450 PMCID: PMC10877919 DOI: 10.1186/s12864-024-10101-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 02/08/2024] [Indexed: 02/22/2024] Open
Abstract
BACKGROUND Cervical cancer (CC) causes more than 311,000 deaths annually worldwide. The integration of human papillomavirus (HPV) is a crucial genetic event that contributes to cervical carcinogenesis. Despite HPV DNA integration is known to disrupt the genomic architecture of both the host and viral genomes in CC, the complexity of this process remains largely unexplored. RESULTS In this study, we conducted whole-genome sequencing (WGS) at 55-65X coverage utilizing the PacBio long-read sequencing platform in SiHa and HeLa cells, followed by comprehensive analyses of the sequence data to elucidate the complexity of HPV integration. Firstly, our results demonstrated that PacBio long-read sequencing effectively identifies HPV integration breakpoints with comparable accuracy to targeted-capture Next-generation sequencing (NGS) methods. Secondly, we constructed detailed models of complex integrated genome structures that included both the HPV genome and nearby regions of the human genome by utilizing PacBio long-read WGS. Thirdly, our sequencing results revealed the occurrence of a wide variety of genome-wide structural variations (SVs) in SiHa and HeLa cells. Additionally, our analysis further revealed a potential correlation between changes in gene expression levels and SVs on chromosome 13 in the genome of SiHa cells. CONCLUSIONS Using PacBio long-read sequencing, we have successfully constructed complex models illustrating HPV integrated genome structures in SiHa and HeLa cells. This accomplishment serves as a compelling demonstration of the valuable capabilities of long-read sequencing in detecting and characterizing HPV genomic integration structures within human cells. Furthermore, these findings offer critical insights into the complex process of HPV16 and HPV18 integration and their potential contribution to the development of cervical cancer.
Collapse
Affiliation(s)
- Zhijie Wang
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
| | - Chen Liu
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
| | - Wanxin Liu
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
| | - Xinyi Lv
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
| | - Ting Hu
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
| | - Fan Yang
- Wuhan Kandwise Biotechnology, Inc. Wuhan, Hubei, China
| | - Wenhui Yang
- Wuhan Kandwise Biotechnology, Inc. Wuhan, Hubei, China
| | - Liang He
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China.
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China.
| | - Xiaoyuan Huang
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China.
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China.
| |
Collapse
|
100
|
Packiaraj J, Thakur J. DNA satellite and chromatin organization at mouse centromeres and pericentromeres. Genome Biol 2024; 25:52. [PMID: 38378611 PMCID: PMC10880262 DOI: 10.1186/s13059-024-03184-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 02/12/2024] [Indexed: 02/22/2024] Open
Abstract
BACKGROUND Centromeres are essential for faithful chromosome segregation during mitosis and meiosis. However, the organization of satellite DNA and chromatin at mouse centromeres and pericentromeres is poorly understood due to the challenges of assembling repetitive genomic regions. RESULTS Using recently available PacBio long-read sequencing data from the C57BL/6 strain, we find that contrary to the previous reports of their homogeneous nature, both centromeric minor satellites and pericentromeric major satellites exhibit a high degree of variation in sequence and organization within and between arrays. While most arrays are continuous, a significant fraction is interspersed with non-satellite sequences, including transposable elements. Using chromatin immunoprecipitation sequencing (ChIP-seq), we find that the occupancy of CENP-A and H3K9me3 chromatin at centromeric and pericentric regions, respectively, is associated with increased sequence enrichment and homogeneity at these regions. The transposable elements at centromeric regions are not part of functional centromeres as they lack significant CENP-A enrichment. Furthermore, both CENP-A and H3K9me3 nucleosomes occupy minor and major satellites spanning centromeric-pericentric junctions and a low yet significant amount of CENP-A spreads locally at centromere junctions on both pericentric and telocentric sides. Finally, while H3K9me3 nucleosomes display a well-phased organization on major satellite arrays, CENP-A nucleosomes on minor satellite arrays are poorly phased. Interestingly, the homogeneous class of major satellites also phase CENP-A and H3K27me3 nucleosomes, indicating that the nucleosome phasing is an inherent property of homogeneous major satellites. CONCLUSIONS Our findings reveal that mouse centromeres and pericentromeres display a high diversity in satellite sequence, organization, and chromatin structure.
Collapse
Affiliation(s)
- Jenika Packiaraj
- Department of Biology, Emory University, 1510 Clifton Rd, Atlanta, GA, 30322, USA
| | - Jitendra Thakur
- Department of Biology, Emory University, 1510 Clifton Rd, Atlanta, GA, 30322, USA.
| |
Collapse
|