1
|
Cheng Y, Xu SM, Santucci K, Lindner G, Janitz M. Machine learning and related approaches in transcriptomics. Biochem Biophys Res Commun 2024; 724:150225. [PMID: 38852503 DOI: 10.1016/j.bbrc.2024.150225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 05/18/2024] [Accepted: 06/03/2024] [Indexed: 06/11/2024]
Abstract
Data acquisition for transcriptomic studies used to be the bottleneck in the transcriptomic analytical pipeline. However, recent developments in transcriptome profiling technologies have increased researchers' ability to obtain data, resulting in a shift in focus to data analysis. Incorporating machine learning to traditional analytical methods allows the possibility of handling larger volumes of complex data more efficiently. Many bioinformaticians, especially those unfamiliar with ML in the study of human transcriptomics and complex biological systems, face a significant barrier stemming from their limited awareness of the current landscape of ML utilisation in this field. To address this gap, this review endeavours to introduce those individuals to the general types of ML, followed by a comprehensive range of more specific techniques, demonstrated through examples of their incorporation into analytical pipelines for human transcriptome investigations. Important computational aspects such as data pre-processing, task formulation, results (performance of ML models), and validation methods are encompassed. In hope of better practical relevance, there is a strong focus on studies published within the last five years, almost exclusively examining human transcriptomes, with outcomes compared with standard non-ML tools.
Collapse
Affiliation(s)
- Yuning Cheng
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Si-Mei Xu
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Kristina Santucci
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Grace Lindner
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Michael Janitz
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia.
| |
Collapse
|
2
|
Ahmad F, Muhmood T. Clinical translation of nanomedicine with integrated digital medicine and machine learning interventions. Colloids Surf B Biointerfaces 2024; 241:114041. [PMID: 38897022 DOI: 10.1016/j.colsurfb.2024.114041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 06/11/2024] [Accepted: 06/13/2024] [Indexed: 06/21/2024]
Abstract
Nanomaterials based therapeutics transform the ways of disease prevention, diagnosis and treatment with increasing sophistications in nanotechnology at a breakneck pace, but very few could reach to the clinic due to inconsistencies in preclinical studies followed by regulatory hinderances. To tackle this, integrating the nanomedicine discovery with digital medicine provide technologies as tools of specific biological activity measurement. Hence, overcome the redundancies in nanomedicine discovery by the on-site data acquisition and analytics through integrating intelligent sensors and artificial intelligence (AI) or machine learning (ML). Integrated AI/ML wearable sensors directly gather clinically relevant biochemical information from the subject's body and process data for physicians to make right clinical decision(s) in a time and cost-effective way. This review summarizes insights and recommend the infusion of actionable big data computation enabled sensors in burgeoning field of nanomedicine at academia, research institutes, and pharmaceutical industries, with a potential of clinical translation. Furthermore, many blind spots are present in modern clinically relevant computation, one of which could prevent ML-guided low-cost new nanomedicine development from being successfully translated into the clinic was also discussed.
Collapse
Affiliation(s)
- Farooq Ahmad
- State Key Laboratory of Chemistry and Utilization of Carbon Based Energy Resources, College of Chemistry, Xinjiang University, Urumqi 830017, China.
| | - Tahir Muhmood
- International Iberian Nanotechnology Laboratory (INL), Avenida Mestre José Veiga, Braga 4715-330, Portugal.
| |
Collapse
|
3
|
Cornejo-Corona I, Boland DJ, Devarenne TP. Method for isolation of high molecular weight genomic DNA from Botryococcus biomass. PLoS One 2024; 19:e0301680. [PMID: 39046949 PMCID: PMC11268603 DOI: 10.1371/journal.pone.0301680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 03/19/2024] [Indexed: 07/27/2024] Open
Abstract
The development of high molecular weight (HMW) genomic DNA (gDNA) extraction protocols for non-model species is essential to fully exploit long-read sequencing technologies in order to generate genome assemblies that can help answer complex questions about these organisms. Obtaining enough high-quality HMW gDNA can be challenging for these species, especially for tissues rich in polysaccharides such as biomass from species within the Botryococcus genus. The existing protocols based on column-based DNA extraction and biochemical lysis kits can be inefficient and may not be useful due to variations in biomass polysaccharide content. We developed an optimized protocol for the efficient extraction of HMW gDNA from Botryococcus biomass for use in long-read sequencing technologies. The protocol utilized an initial wash step with sorbitol to remove polysaccharides and yielded HMW gDNA concentrations up to 220 ng/μL with high purity. We then demonstrated the suitability of the HMW gDNA isolated from this protocol for long-read sequencing on the Oxford Nanopore PromethION platform for three Botryococcus species. Our protocol can be used as a standard for efficient HMW gDNA extraction in microalgae rich in polysaccharides and may be adapted for other challenging species.
Collapse
Affiliation(s)
- Ivette Cornejo-Corona
- Biochemistry and Biophysics, Texas A&M University, College Station, Texas, United States of America
| | - Devon J. Boland
- Biochemistry and Biophysics, Texas A&M University, College Station, Texas, United States of America
| | - Timothy P. Devarenne
- Biochemistry and Biophysics, Texas A&M University, College Station, Texas, United States of America
| |
Collapse
|
4
|
Junjun R, Zhengqian Z, Ying W, Jialiang W, Yongzhuang L. A comprehensive review of deep learning-based variant calling methods. Brief Funct Genomics 2024; 23:303-313. [PMID: 38366908 DOI: 10.1093/bfgp/elae003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/14/2024] [Accepted: 01/18/2023] [Indexed: 02/18/2024] Open
Abstract
Genome sequencing data have become increasingly important in the field of personalized medicine and diagnosis. However, accurately detecting genomic variations remains a challenging task. Traditional variation detection methods rely on manual inspection or predefined rules, which can be time-consuming and prone to errors. Consequently, deep learning-based approaches for variation detection have gained attention due to their ability to automatically learn genomic features that distinguish between variants. In our review, we discuss the recent advancements in deep learning-based algorithms for detecting small variations and structural variations in genomic data, as well as their advantages and limitations.
Collapse
Affiliation(s)
- Ren Junjun
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Zhang Zhengqian
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wu Ying
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wang Jialiang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Liu Yongzhuang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| |
Collapse
|
5
|
Tang X, Berger MF, Solit DB. Precision oncology: current and future platforms for treatment selection. Trends Cancer 2024:S2405-8033(24)00135-3. [PMID: 39030146 DOI: 10.1016/j.trecan.2024.06.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 06/20/2024] [Accepted: 06/21/2024] [Indexed: 07/21/2024]
Abstract
Genomic profiling of hundreds of cancer-associated genes is now a component of routine cancer care. DNA sequencing can identify mutations, mutational signatures, and structural alterations predictive of therapy response and assess for heritable cancer risk, but it has been less useful for identifying predictive biomarkers of sensitivity to cytotoxic chemotherapies, antibody drug conjugates, and immunotherapies. The clinical adoption of molecular profiling platforms such as RNA sequencing better suited to identifying those patients most likely to respond to immunotherapies and drug combinations will be critical to expanding the benefits of precision oncology. This review discusses the potential advantages of innovative molecular and functional profiling platforms designed to replace or complement targeted DNA sequencing and the major hurdles to their clinical adoption.
Collapse
Affiliation(s)
- Xinran Tang
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Graduate School of Medical Sciences, Weill Cornell Medicine, New York, NY 10065, USA
| | - Michael F Berger
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - David B Solit
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
| |
Collapse
|
6
|
Liu Z, Xie Z, Li M. Comprehensive and deep evaluation of structural variation detection pipelines with third-generation sequencing data. Genome Biol 2024; 25:188. [PMID: 39010145 PMCID: PMC11247875 DOI: 10.1186/s13059-024-03324-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 06/26/2024] [Indexed: 07/17/2024] Open
Abstract
BACKGROUND Structural variation (SV) detection methods using third-generation sequencing data are widely employed, yet accurately detecting SVs remains challenging. Different methods often yield inconsistent results for certain SV types, complicating tool selection and revealing biases in detection. RESULTS This study comprehensively evaluates 53 SV detection pipelines using simulated and real data from PacBio (CLR: Continuous Long Read, CCS: Circular Consensus Sequencing) and Nanopore (ONT) platforms. We assess their performance in detecting various sizes and types of SVs, breakpoint biases, and genotyping accuracy with various sequencing depths. Notably, pipelines such as Minimap2-cuteSV2, NGMLR-SVIM, PBMM2-pbsv, Winnowmap-Sniffles2, and Winnowmap-SVision exhibit comparatively higher recall and precision. Our findings also show that combining multiple pipelines with the same aligner, like pbmm2 or winnowmap, can significantly enhance performance. The individual pipelines' detailed ranking and performance metrics can be viewed in a dynamic table: http://pmglab.top/SVPipelinesRanking . CONCLUSIONS This study comprehensively characterizes the strengths and weaknesses of numerous pipelines, providing valuable insights that can improve SV detection in third-generation sequencing data and inform SV annotation and function prediction.
Collapse
Affiliation(s)
- Zhi Liu
- Program in Bioinformatics, Zhongshan School of Medicine, The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, China
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Miaoxin Li
- Program in Bioinformatics, Zhongshan School of Medicine, The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China.
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, China.
- Center for Precision Medicine, Sun Yat-Sen University, Guangzhou, China.
- Department of Psychiatry, The University of Hong Kong, Hong Kong, SAR, China.
- Guangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-Sen University, Zhuhai, China.
| |
Collapse
|
7
|
Zheng J, Li T, Ye H, Jiang Z, Jiang W, Yang H, Wu Z, Xie Z. Comprehensive identification of pathogenic variants in retinoblastoma by long- and short-read sequencing. Cancer Lett 2024; 598:217121. [PMID: 39009069 DOI: 10.1016/j.canlet.2024.217121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 06/16/2024] [Accepted: 07/11/2024] [Indexed: 07/17/2024]
Abstract
Retinoblastoma (RB) is the most common intraocular malignancy in childhood. The causal variants in RB are mostly characterized by previously used short-read sequencing (SRS) analysis, which has technical limitations in identifying structural variants (SVs) and phasing information. Long-read sequencing (LRS) technology has advantages over SRS in detecting SVs, phased genetic variants, and methylation. In this study, we comprehensively characterized the genetic landscape of RB using combinatorial LRS and SRS of 16 RB tumors and 16 matched blood samples. We detected a total of 232 somatic SVs, with an average of 14.5 SVs per sample across the whole genome in our cohort. We identified 20 distinct pathogenic variants disrupting RB1 gene, including three novel small variants and five somatic SVs. We found more somatic SVs were detected from LRS than SRS (140 vs. 122) in RB samples with WGS data, particularly the insertions (18 vs. 1). Furthermore, our analysis shows that, with the exception of one sample who lacked the methylation data, all samples presented biallelic inactivation of RB1 in various forms, including two cases with the biallelic hypermethylated promoter and four cases with compound heterozygous mutations which were missing in SRS analysis. By inferring relative timing of somatic events, we reveal the genetic progression that RB1 disruption early and followed by copy number changes, including amplifications of Chr2p and deletions of Chr16q, during RB tumorigenesis. Altogether, we characterize the comprehensive genetic landscape of RB, providing novel insights into the genetic alterations and mechanisms contributing to RB initiation and development. Our work also establishes a framework to analyze genomic landscape of cancers based on LRS data.
Collapse
Affiliation(s)
- Jingjing Zheng
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Tong Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Huijing Ye
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Zehang Jiang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Wenbing Jiang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Huasheng Yang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China.
| | - Zhikun Wu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China.
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
8
|
Kolesnikov A, Cook D, Nattestad M, Brambrink L, McNulty B, Gorzynski J, Goenka S, Ashley EA, Jain M, Miga KH, Paten B, Chang PC, Carroll A, Shafin K. Local read haplotagging enables accurate long-read small variant calling. Nat Commun 2024; 15:5907. [PMID: 39003259 PMCID: PMC11246426 DOI: 10.1038/s41467-024-50079-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 06/28/2024] [Indexed: 07/15/2024] Open
Abstract
Long-read sequencing technology has enabled variant detection in difficult-to-map regions of the genome and enabled rapid genetic diagnosis in clinical settings. Rapidly evolving third-generation sequencing platforms like Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) are introducing newer platforms and data types. It has been demonstrated that variant calling methods based on deep neural networks can use local haplotyping information with long-reads to improve the genotyping accuracy. However, using local haplotype information creates an overhead as variant calling needs to be performed multiple times which ultimately makes it difficult to extend to new data types and platforms as they get introduced. In this work, we have developed a local haplotype approximate method that enables state-of-the-art variant calling performance with multiple sequencing platforms including PacBio Revio system, ONT R10.4 simplex and duplex data. This addition of local haplotype approximation simplifies long-read variant calling with DeepVariant.
Collapse
Affiliation(s)
| | - Daniel Cook
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
| | | | | | - Brandy McNulty
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | | | | | | | - Miten Jain
- Northeastern university, Boston, MA, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Pi-Chuan Chang
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
| | - Andrew Carroll
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA.
| | - Kishwar Shafin
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA.
| |
Collapse
|
9
|
Bai X, Chen Z, Chen K, Wu Z, Wang R, Liu J, Chang L, Wen L, Tang F. Simultaneous de novo calling and phasing of genetic variants at chromosome-scale using NanoStrand-seq. Cell Discov 2024; 10:74. [PMID: 38977679 PMCID: PMC11231365 DOI: 10.1038/s41421-024-00694-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 05/23/2024] [Indexed: 07/10/2024] Open
Abstract
The successful accomplishment of the first telomere-to-telomere human genome assembly, T2T-CHM13, marked a milestone in achieving completeness of the human reference genome. The upcoming era of genome study will focus on fully phased diploid genome assembly, with an emphasis on genetic differences between individual haplotypes. Most existing sequencing approaches only achieved localized haplotype phasing and relied on additional pedigree information for further whole-chromosome scale phasing. The short-read-based Strand-seq method is able to directly phase single nucleotide polymorphisms (SNPs) at whole-chromosome scale but falls short when it comes to phasing structural variations (SVs). To shed light on this issue, we developed a Nanopore sequencing platform-based Strand-seq approach, which we named NanoStrand-seq. This method allowed for de novo SNP calling with high precision (99.52%) and acheived a superior phasing accuracy (0.02% Hamming error rate) at whole-chromosome scale, a level of performance comparable to Strand-seq for haplotype phasing of the GM12878 genome. Importantly, we demonstrated that NanoStrand-seq can efficiently resolve the MHC locus, a highly polymorphic genomic region. Moreover, NanoStrand-seq enabled independent direct calling and phasing of deletions and insertions at whole-chromosome level; when applied to long genomic regions of SNP homozygosity, it outperformed the strategy that combined Strand-seq with bulk long-read sequencing. Finally, we showed that, like Strand-seq, NanoStrand-seq was also applicable to primary cultured cells. Together, here we provided a novel methodology that enabled interrogation of a full spectrum of haplotype-resolved SNPs and SVs at whole-chromosome scale, with broad applications for species with diploid or even potentially polypoid genomes.
Collapse
Affiliation(s)
- Xiuzhen Bai
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
| | - Zonggui Chen
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Changping Laboratory, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Kexuan Chen
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- School of Life Sciences, Peking University, Beijing, China
| | - Zixin Wu
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Rui Wang
- Department of Medicine, Cancer Institute, Stanford University, Stanford, CA, USA
| | - Jun'e Liu
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
- School of Life Sciences, Peking University, Beijing, China
| | - Liang Chang
- State Key Laboratory of Female Fertility Promotion, Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, China
- National Clinical Research Center for Obstetrics and Gynecology (Peking University Third Hospital), Beijing, China
- Key Laboratory of Assisted Reproduction (Peking University), Ministry of Education Beijing, Beijing, China
- Key Laboratory of Reproductive Endocrinology and Assisted Reproductive Technology, Beijing, China
| | - Lu Wen
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
| | - Fuchou Tang
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China.
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China.
- Changping Laboratory, Beijing, China.
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China.
- School of Life Sciences, Peking University, Beijing, China.
| |
Collapse
|
10
|
Luan T, Commichaux S, Hoffmann M, Jayeola V, Jang JH, Pop M, Rand H, Luo Y. Benchmarking short and long read polishing tools for nanopore assemblies: achieving near-perfect genomes for outbreak isolates. BMC Genomics 2024; 25:679. [PMID: 38978005 PMCID: PMC11232133 DOI: 10.1186/s12864-024-10582-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 07/01/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND Oxford Nanopore provides high throughput sequencing platforms able to reconstruct complete bacterial genomes with 99.95% accuracy. However, even small levels of error can obscure the phylogenetic relationships between closely related isolates. Polishing tools have been developed to correct these errors, but it is uncertain if they obtain the accuracy needed for the high-resolution source tracking of foodborne illness outbreaks. RESULTS We tested 132 combinations of assembly and short- and long-read polishing tools to assess their accuracy for reconstructing the genome sequences of 15 highly similar Salmonella enterica serovar Newport isolates from a 2020 onion outbreak. While long-read polishing alone improved accuracy, near perfect accuracy (99.9999% accuracy or ~ 5 nucleotide errors across the 4.8 Mbp genome, excluding low confidence regions) was only obtained by pipelines that combined both long- and short-read polishing tools. Notably, medaka was a more accurate and efficient long-read polisher than Racon. Among short-read polishers, NextPolish showed the highest accuracy, but Pilon, Polypolish, and POLCA performed similarly. Among the 5 best performing pipelines, polishing with medaka followed by NextPolish was the most common combination. Importantly, the order of polishing tools mattered i.e., using less accurate tools after more accurate ones introduced errors. Indels in homopolymers and repetitive regions, where the short reads could not be uniquely mapped, remained the most challenging errors to correct. CONCLUSIONS Short reads are still needed to correct errors in nanopore sequenced assemblies to obtain the accuracy required for source tracking investigations. Our granular assessment of the performance of the polishing pipelines allowed us to suggest best practices for tool users and areas for improvement for tool developers.
Collapse
Affiliation(s)
- Tu Luan
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Seth Commichaux
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, Laurel, MD, 20708, USA.
| | - Maria Hoffmann
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| | - Victor Jayeola
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| | - Jae Hee Jang
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| | - Mihai Pop
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Hugh Rand
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| | - Yan Luo
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| |
Collapse
|
11
|
Ji Y, Zhao J, Gong J, Sedlazeck FJ, Fan S. Unveiling novel genetic variants in 370 challenging medically relevant genes using the long read sequencing data of 41 samples from 19 global populations. Mol Genet Genomics 2024; 299:65. [PMID: 38972030 DOI: 10.1007/s00438-024-02158-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 06/16/2024] [Indexed: 07/08/2024]
Abstract
BACKGROUND A large number of challenging medically relevant genes (CMRGs) are situated in complex or highly repetitive regions of the human genome, hindering comprehensive characterization of genetic variants using next-generation sequencing technologies. In this study, we employed long-read sequencing technology, extensively utilized in studying complex genomic regions, to characterize genetic alterations, including short variants (single nucleotide variants and short insertions and deletions) and copy number variations, in 370 CMRGs across 41 individuals from 19 global populations. RESULTS Our analysis revealed high levels of genetic variants in CMRGs, with 68.73% exhibiting copy number variations and 65.20% containing short variants that may disrupt protein function across individuals. Such variants can influence pharmacogenomics, genetic disease susceptibility, and other clinical outcomes. We observed significant differences in CMRG variation across populations, with individuals of African ancestry harboring the highest number of copy number variants and short variants compared to samples from other continents. Notably, 15.79% to 33.96% of short variants were exclusively detectable through long-read sequencing. While the T2T-CHM13 reference genome significantly improved the assembly of CMRG regions, thereby facilitating variant detection in these regions, some regions still lacked resolution. CONCLUSION Our results provide an important reference for future clinical and pharmacogenetic studies, highlighting the need for a comprehensive representation of global genetic diversity in the reference genome and improved variant calling techniques to fully resolve medically relevant genes.
Collapse
Affiliation(s)
- Yanfeng Ji
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Junfan Zhao
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Jiao Gong
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA.
| | - Shaohua Fan
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China.
| |
Collapse
|
12
|
Jia H, Tan S, Cai Y, Guo Y, Shen J, Zhang Y, Ma H, Zhang Q, Chen J, Qiao G, Ruan J, Zhang YE. Low-input PacBio sequencing generates high-quality individual fly genomes and characterizes mutational processes. Nat Commun 2024; 15:5644. [PMID: 38969648 PMCID: PMC11226609 DOI: 10.1038/s41467-024-49992-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 06/20/2024] [Indexed: 07/07/2024] Open
Abstract
Long-read sequencing, exemplified by PacBio, revolutionizes genomics, overcoming challenges like repetitive sequences. However, the high DNA requirement ( > 1 µg) is prohibitive for small organisms. We develop a low-input (100 ng), low-cost, and amplification-free library-generation method for PacBio sequencing (LILAP) using Tn5-based tagmentation and DNA circularization within one tube. We test LILAP with two Drosophila melanogaster individuals, and generate near-complete genomes, surpassing preexisting single-fly genomes. By analyzing variations in these two genomes, we characterize mutational processes: complex transpositions (transposon insertions together with extra duplications and/or deletions) prefer regions characterized by non-B DNA structures, and gene conversion of transposons occurs on both DNA and RNA levels. Concurrently, we generate two complete assemblies for the endosymbiotic bacterium Wolbachia in these flies and similarly detect transposon conversion. Thus, LILAP promises a broad PacBio sequencing adoption for not only mutational studies of flies and their symbionts but also explorations of other small organisms or precious samples.
Collapse
Affiliation(s)
- Hangxing Jia
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
| | - Shengjun Tan
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
| | - Yingao Cai
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yanyan Guo
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jieyu Shen
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yaqiong Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Huijing Ma
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Qingzhu Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jinfeng Chen
- University of Chinese Academy of Sciences, Beijing, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Gexia Qiao
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
| | - Yong E Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
13
|
Lewis SA, Ruttenberg A, Iyiyol T, Kong N, Jin SC, Kruer MC. Potential clinical applications of advanced genomic analysis in cerebral palsy. EBioMedicine 2024; 106:105229. [PMID: 38970919 PMCID: PMC11282942 DOI: 10.1016/j.ebiom.2024.105229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/26/2024] [Accepted: 06/20/2024] [Indexed: 07/08/2024] Open
Abstract
Cerebral palsy (CP) has historically been attributed to acquired insults, but emerging research suggests that genetic variations are also important causes of CP. While microarray and whole-exome sequencing based studies have been the primary methods for establishing new CP-gene relationships and providing a genetic etiology for individual patients, the cause of their condition remains unknown for many patients with CP. Recent advancements in genomic technologies offer additional opportunities to uncover variations in human genomes, transcriptomes, and epigenomes that have previously escaped detection. In this review, we outline the use of these state-of-the-art technologies to address the molecular diagnostic challenges experienced by individuals with CP. We also explore the importance of identifying a molecular etiology whenever possible, given the potential for genomic medicine to provide opportunities to treat patients with CP in new and more precise ways.
Collapse
Affiliation(s)
- Sara A Lewis
- Pediatric Movement Disorders Program, Barrow Neurological Institute, Phoenix Children's Hospital, Phoenix, AZ, United States; Departments of Child Health, Neurology, and Cellular & Molecular Medicine and Program in Genetics, University of Arizona College of Medicine, Phoenix, AZ, United States
| | - Andrew Ruttenberg
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, United States
| | - Tuğçe Iyiyol
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, United States
| | - Nahyun Kong
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, United States
| | - Sheng Chih Jin
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, United States; Department of Pediatrics, Washington University School of Medicine, St. Louis, MO, United States.
| | - Michael C Kruer
- Pediatric Movement Disorders Program, Barrow Neurological Institute, Phoenix Children's Hospital, Phoenix, AZ, United States; Departments of Child Health, Neurology, and Cellular & Molecular Medicine and Program in Genetics, University of Arizona College of Medicine, Phoenix, AZ, United States; Programs in Neuroscience and Molecular & Cellular Biology, School of Life Sciences, Arizona State University, Tempe, AZ, United States.
| |
Collapse
|
14
|
Plender EG, Prodanov T, Hsieh P, Nizamis E, Harvey WT, Sulovari A, Munson KM, Kaufman EJ, O'Neal WK, Valdmanis PN, Marschall T, Bloom JD, Eichler EE. Structural and genetic diversity in the secreted mucins MUC5AC and MUC5B. Am J Hum Genet 2024:S0002-9297(24)00213-1. [PMID: 38991590 DOI: 10.1016/j.ajhg.2024.06.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 06/14/2024] [Accepted: 06/17/2024] [Indexed: 07/13/2024] Open
Abstract
The secreted mucins MUC5AC and MUC5B are large glycoproteins that play critical defensive roles in pathogen entrapment and mucociliary clearance. Their respective genes contain polymorphic and degenerate protein-coding variable number tandem repeats (VNTRs) that make the loci difficult to investigate with short reads. We characterize the structural diversity of MUC5AC and MUC5B by long-read sequencing and assembly of 206 human and 20 nonhuman primate (NHP) haplotypes. We find that human MUC5B is largely invariant (5,761-5,762 amino acids [aa]); however, seven haplotypes have expanded VNTRs (6,291-7,019 aa). In contrast, 30 allelic variants of MUC5AC encode 16 distinct proteins (5,249-6,325 aa) with cysteine-rich domain and VNTR copy-number variation. We group MUC5AC alleles into three phylogenetic clades: H1 (46%, ∼5,654 aa), H2 (33%, ∼5,742 aa), and H3 (7%, ∼6,325 aa). The two most common human MUC5AC variants are smaller than NHP gene models, suggesting a reduction in protein length during recent human evolution. Linkage disequilibrium and Tajima's D analyses reveal that East Asians carry exceptionally large blocks with an excess of rare variation (p < 0.05) at MUC5AC. To validate this result, we use Locityper for genotyping MUC5AC haplogroups in 2,600 unrelated samples from the 1000 Genomes Project. We observe a signature of positive selection in H1 among East Asians and a depletion of the likely ancestral haplogroup (H3). In Europeans, H3 alleles show an excess of common variation and deviate from Hardy-Weinberg equilibrium (p < 0.05), consistent with heterozygote advantage and balancing selection. This study provides a generalizable strategy to characterize complex protein-coding VNTRs for improved disease associations.
Collapse
Affiliation(s)
- Elizabeth G Plender
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Timofey Prodanov
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany; Center for Digital Medicine, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Department of Genetics, Cell Biology, and Development, University of Minnesota Medical School, Minneapolis, MN 55455, USA
| | - Evangelos Nizamis
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Computational Biology, Cajal Neuroscience Inc, Seattle, WA 98102, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Eli J Kaufman
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Wanda K O'Neal
- Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Paul N Valdmanis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany; Center for Digital Medicine, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany
| | - Jesse D Bloom
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA; Howard Hughes Medical Institute, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
15
|
Jia H, Tan S, Zhang YE. Chasing Sequencing Perfection: Marching Toward Higher Accuracy and Lower Costs. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae024. [PMID: 38991976 DOI: 10.1093/gpbjnl/qzae024] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 01/25/2024] [Accepted: 01/29/2024] [Indexed: 07/13/2024]
Abstract
Next-generation sequencing (NGS), represented by Illumina platforms, has been an essential cornerstone of basic and applied research. However, the sequencing error rate of 1 per 1000 bp (10-3) represents a serious hurdle for research areas focusing on rare mutations, such as somatic mosaicism or microbe heterogeneity. By examining the high-fidelity sequencing methods developed in the past decade, we summarized three major factors underlying errors and the corresponding 12 strategies mitigating these errors. We then proposed a novel framework to classify 11 preexisting representative methods according to the corresponding combinatory strategies and identified three trends that emerged during methodological developments. We further extended this analysis to eight long-read sequencing methods, emphasizing error reduction strategies. Finally, we suggest two promising future directions that could achieve comparable or even higher accuracy with lower costs in both NGS and long-read sequencing.
Collapse
Affiliation(s)
- Hangxing Jia
- CAS Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Shengjun Tan
- CAS Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Yong E Zhang
- CAS Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
| |
Collapse
|
16
|
Yano N, Chong PF, Kojima KK, Miyoshi T, Luqmen-Fatah A, Kimura Y, Kora K, Kayaki T, Maizuru K, Hayashi T, Yokoyama A, Ajiro M, Hagiwara M, Kondo T, Kira R, Takita J, Yoshida T. Long-read sequencing identifies an SVA_D retrotransposon insertion deep within the intron of ATP7A as a novel cause of occipital horn syndrome. J Med Genet 2024:jmg-2024-110056. [PMID: 38960580 DOI: 10.1136/jmg-2024-110056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Accepted: 06/25/2024] [Indexed: 07/05/2024]
Abstract
BACKGROUND SINE-VNTR-Alu (SVA) retrotransposons move from one genomic location to another in a 'copy-and-paste' manner. They continue to move actively and cause monogenic diseases through various mechanisms. Currently, disease-causing SVA retrotransposons are classified into human-specific young SVA_E or SVA_F subfamilies. In this study, we identified an evolutionarily old SVA_D retrotransposon as a novel cause of occipital horn syndrome (OHS). OHS is an X-linked, copper metabolism disorder caused by dysfunction of the copper transporter, ATP7A. METHODS We investigated a 16-year-old boy with OHS whose pathogenic variant could not be detected via routine molecular genetic analyses. RESULTS A 2.8 kb insertion was detected deep within the intron of the patient's ATP7A gene. This insertion caused aberrant mRNA splicing activated by a new donor splice site located within it. Long-read circular consensus sequencing enabled us to accurately read the entire insertion sequence, which contained highly repetitive and GC-rich segments. Consequently, the insertion was identified as an SVA_D retrotransposon. Antisense oligonucleotides (AOs) targeting the new splice site restored the expression of normal transcripts and functional ATP7A proteins. AO treatment alleviated excessive accumulation of copper in patient fibroblasts in a dose-dependent manner. Pedigree analysis revealed that the retrotransposon had moved into the OHS-causing position two generations ago. CONCLUSION This is the first report of a human monogenic disease caused by the SVA_D retrotransposon. The fact that the evolutionarily old SVA_D is still actively transposed, leading to increased copy numbers may make a notable impact on rare genetic disease research.
Collapse
Affiliation(s)
- Naoko Yano
- Department of Pediatrics, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Pin Fee Chong
- Department of Pediatric Neurology, Fukuoka Children's Hospital, Fukuoka, Japan
- Department of Pediatrics, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
| | - Kenji K Kojima
- Genetic Information Research Institute, Cupertino, CA, USA
| | - Tomoichiro Miyoshi
- Laboratory for Retrotransposon Dynamics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Department of Gene Mechanisms, Kyoto University Graduate School of Biostudies, Kyoto, Japan
| | - Ahmad Luqmen-Fatah
- Laboratory for Retrotransposon Dynamics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yu Kimura
- Department of Energy and Hydrocarbon Chemistry, Graduate School of Engineering, Kyoto University, Kyoto, Japan
| | - Kengo Kora
- Department of Pediatrics, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Taisei Kayaki
- Department of Pediatrics, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Kanako Maizuru
- Department of Pediatrics, Tenri Yorozu Hospital, Tenri, Japan
| | - Takahiro Hayashi
- Department of Pediatrics, Kurashiki Central Hospital, Kurashiki, Japan
| | - Atsushi Yokoyama
- Department of Pediatrics, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Masahiko Ajiro
- Division of Cancer RNA Research, National Cancer Center Research Institute, Tokyo, Japan
| | - Masatoshi Hagiwara
- Department of Drug Discovery Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
- Department of Anatomy and Developmental Biology, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Teruyuki Kondo
- Department of Energy and Hydrocarbon Chemistry, Graduate School of Engineering, Kyoto University, Kyoto, Japan
| | - Ryutaro Kira
- Department of Pediatric Neurology, Fukuoka Children's Hospital, Fukuoka, Japan
| | - Junko Takita
- Department of Pediatrics, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Takeshi Yoshida
- Department of Pediatrics, Kyoto University Graduate School of Medicine, Kyoto, Japan
| |
Collapse
|
17
|
Umair M, Alharbi M, Aloyouni E, Al Abdulrahman A, Aldrees M, Al Tuwaijri A, Bilal M, Alfadhel M. Mutated neuron navigator 3 as a candidate gene for a rare neurodevelopmental disorder. Mol Genet Genomic Med 2024; 12:e2473. [PMID: 39038237 PMCID: PMC11262617 DOI: 10.1002/mgg3.2473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Revised: 05/13/2024] [Accepted: 05/16/2024] [Indexed: 07/24/2024] Open
Abstract
BACKGROUND Neuron navigator 3 (NAV3) is characterized as one of the neuron navigator family (NAV1, NAV2, NAV3) proteins predominantly expressed in the nervous system. The NAV3-encoded protein comprises a conserved AAA and coiled-coil domains characteristic of ATPases, which are associated with different cellular activities. METHODS We describe a Saudi proband presenting a complex recessive neurodevelopmental disorder (NDD). Whole exome sequencing (WES) followed by Sanger sequencing, 3D protein modeling and RT-qPCR was performed. RESULTS WES revealed a bi-allelic frameshift variant (c.2604_2605delAG; p.Val870SerfsTer12) in exon 12 of the NAV3 gene. Furthermore, RT-qPCR revealed a significant decrease in the NAV3 mRNA expression in the patient sample, and 3D protein modeling revealed disruption of the overall secondary structure. CONCLUSION For the time, we associate a bi-allelic variant in the NAV3 gene causing NDD in humans.
Collapse
Affiliation(s)
- Muhammad Umair
- Medical Genomics Research Department, King Abdullah International Medical Research Center (KAIMRC)King Saud Bin Abdulaziz University for Health Sciences (KSAU‐HS), Ministry of National Guard Health Affairs (MNGH)RiyadhSaudi Arabia
| | - Meshael Alharbi
- Medical Genomics Research Department, King Abdullah International Medical Research Center (KAIMRC)King Saud Bin Abdulaziz University for Health Sciences (KSAU‐HS), Ministry of National Guard Health Affairs (MNGH)RiyadhSaudi Arabia
| | - Essra Aloyouni
- Medical Genomics Research Department, King Abdullah International Medical Research Center (KAIMRC)King Saud Bin Abdulaziz University for Health Sciences (KSAU‐HS), Ministry of National Guard Health Affairs (MNGH)RiyadhSaudi Arabia
| | - Abdulkareem Al Abdulrahman
- Medical Genomics Research Department, King Abdullah International Medical Research Center (KAIMRC)King Saud Bin Abdulaziz University for Health Sciences (KSAU‐HS), Ministry of National Guard Health Affairs (MNGH)RiyadhSaudi Arabia
| | - Mohammed Aldrees
- Medical Genomics Research Department, King Abdullah International Medical Research Center (KAIMRC)King Saud Bin Abdulaziz University for Health Sciences (KSAU‐HS), Ministry of National Guard Health Affairs (MNGH)RiyadhSaudi Arabia
| | - Abeer Al Tuwaijri
- Medical Genomics Research Department, King Abdullah International Medical Research Center (KAIMRC)King Saud Bin Abdulaziz University for Health Sciences (KSAU‐HS), Ministry of National Guard Health Affairs (MNGH)RiyadhSaudi Arabia
- Clinical Laboratory Sciences DepartmentCollege of Applied Medical Sciences, KSAU‐HSRiyadhSaudi Arabia
| | - Muhammad Bilal
- Department of Pathology and Laboratory MedicineAga Khan UniversityKarachiPakistan
| | - Majid Alfadhel
- Medical Genomics Research Department, King Abdullah International Medical Research Center (KAIMRC)King Saud Bin Abdulaziz University for Health Sciences (KSAU‐HS), Ministry of National Guard Health Affairs (MNGH)RiyadhSaudi Arabia
- Genetics and Precision Medicine DepartmentKing Abdullah Specialized Children Hospital (KASCH), MNGHARiyadhSaudi Arabia
| |
Collapse
|
18
|
Brankovic M, Ivanovic V, Basta I, Khang R, Lee E, Stevic Z, Ralic B, Tubic R, Seo G, Markovic V, Bozovic I, Svetel M, Marjanovic A, Veselinovic N, Mesaros S, Jankovic M, Savic-Pavicevic D, Jovin Z, Novakovic I, Lee H, Peric S. Whole exome sequencing in Serbian patients with hereditary spastic paraplegia. Neurogenetics 2024; 25:165-177. [PMID: 38499745 DOI: 10.1007/s10048-024-00755-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 03/08/2024] [Indexed: 03/20/2024]
Abstract
Hereditary spastic paraplegia (HSP) is a group of neurodegenerative diseases with a high genetic and clinical heterogeneity. Numerous HSP patients remain genetically undiagnosed despite screening for known genetic causes of HSP. Therefore, identification of novel variants and genes is needed. Our previous study analyzed 74 adult Serbian HSP patients from 65 families using panel of the 13 most common HSP genes in combination with a copy number variation analysis. Conclusive genetic findings were established in 23 patients from 19 families (29%). In the present study, nine patients from nine families previously negative on the HSP gene panel were selected for the whole exome sequencing (WES). Further, 44 newly diagnosed adult HSP patients from 44 families were sent to WES directly, since many studies showed WES may be used as the first step in HSP diagnosis. WES analysis of cohort 1 revealed a likely genetic cause in five (56%) of nine HSP families, including variants in the ETHE1, ZFYVE26, RNF170, CAPN1, and WASHC5 genes. In cohort 2, possible causative variants were found in seven (16%) of 44 patients (later updated to 27% when other diagnosis were excluded), comprising six different genes: SPAST, SPG11, WASCH5, KIF1A, KIF5A, and ABCD1. These results expand the genetic spectrum of HSP patients in Serbia and the region with implications for molecular genetic diagnosis and future causative therapies. Wide HSP panel can be the first step in diagnosis, alongside with the copy number variation (CNV) analysis, while WES should be performed after.
Collapse
Affiliation(s)
- Marija Brankovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia.
| | - Vukan Ivanovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
| | - Ivana Basta
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | | | | | - Zorica Stevic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | | | - Radoje Tubic
- Institute for Oncology and Radiology of Serbia, Belgrade, Serbia
| | | | - Vladana Markovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Ivo Bozovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
| | - Marina Svetel
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Ana Marjanovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
| | - Nikola Veselinovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Sarlota Mesaros
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Milena Jankovic
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Dusanka Savic-Pavicevic
- Center for Human Molecular Genetics, Faculty of Biology, University of Belgrade, Belgrade, Serbia
| | - Zita Jovin
- Neurology Clinic, University Clinical Center of Vojvodina, Novi Sad, Serbia
| | - Ivana Novakovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
| | - Hane Lee
- 3Billion, Inc., Seoul, South Korea
| | - Stojan Peric
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| |
Collapse
|
19
|
Yuen ZWS, Shanmuganandam S, Stanley M, Jiang S, Hein N, Daniel R, McNevin D, Jack C, Eyras E. Profiling age and body fluid DNA methylation markers using nanopore adaptive sampling. Forensic Sci Int Genet 2024; 71:103048. [PMID: 38640705 DOI: 10.1016/j.fsigen.2024.103048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 04/07/2024] [Accepted: 04/11/2024] [Indexed: 04/21/2024]
Abstract
DNA methylation plays essential roles in regulating physiological processes, from tissue and organ development to gene expression and aging processes and has emerged as a widely used biomarker for the identification of body fluids and age prediction. Currently, methylation markers are targeted independently at specific CpG sites as part of a multiplexed assay rather than through a unified assay. Methylation detection is also dependent on divergent methodologies, ranging from enzyme digestion and affinity enrichment to bisulfite treatment, alongside various technologies for high-throughput profiling, including microarray and sequencing. In this pilot study, we test the simultaneous identification of age-associated and body fluid-specific methylation markers using a single technology, nanopore adaptive sampling. This innovative approach enables the profiling of multiple CpG marker sites across entire gene regions from a single sample without the need for specialized DNA preparation or additional biochemical treatments. Our study demonstrates that adaptive sampling achieves sufficient coverage in regions of interest to accurately determine the methylation status, shows a robust consistency with whole-genome bisulfite sequencing data, and corroborates known CpG markers of age and body fluids. Our work also resulted in the identification of new sites strongly correlated with age, suggesting new possible age methylation markers. This study lays the groundwork for the systematic development of nanopore-based methodologies in both age prediction and body fluid identification, highlighting the feasibility and potential of nanopore adaptive sampling while acknowledging the need for further validation and expansion in future research.
Collapse
Affiliation(s)
- Zaka Wing-Sze Yuen
- EMBL Australia Partner Laboratory Network, John Curtin School of Medical Research, The Australian National University, Canberra, Australia; The Shine-Dalgarno Centre for RNA Innovation, John Curtin School of Medical Research, The Australian National University, Canberra, Australia; The Centre for Computational Biomedical Sciences, John Curtin School of Medical Research, The Australian National University, Canberra, Australia
| | - Somasundhari Shanmuganandam
- Department of Immunity, Inflammation and Infection, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia; Centre for Personalised Immunology, NHMRC Centre for Research Excellence, Australian National University, Canberra, ACT 2601, Australia
| | - Maurice Stanley
- Department of Immunity, Inflammation and Infection, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia; Centre for Personalised Immunology, NHMRC Centre for Research Excellence, Australian National University, Canberra, ACT 2601, Australia
| | - Simon Jiang
- Department of Immunity, Inflammation and Infection, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia; Centre for Personalised Immunology, NHMRC Centre for Research Excellence, Australian National University, Canberra, ACT 2601, Australia; Department of Renal Medicine, The Canberra Hospital, Canberra, ACT 2605, Australia
| | - Nadine Hein
- ACRF Department of Cancer Biology and Therapeutics and Division of Genome Sciences and Cancer, John Curtin School of Medical Research, Australian National University, Acton, Canberra, Australia
| | - Runa Daniel
- Centre for Genomics and Personalised Health, School of Biomedical Sciences, Queensland University of Technology, Queensland, Australia
| | - Dennis McNevin
- Centre for Forensic Science, School of Mathematical & Physical Sciences, Faculty of Science, University of Technology Sydney, Sydney, Australia
| | - Cameron Jack
- ANU Bioinformatics Consultancy, John Curtin School of Medical Research, The Australian National University, Canberra, Australia
| | - Eduardo Eyras
- EMBL Australia Partner Laboratory Network, John Curtin School of Medical Research, The Australian National University, Canberra, Australia; The Shine-Dalgarno Centre for RNA Innovation, John Curtin School of Medical Research, The Australian National University, Canberra, Australia; The Centre for Computational Biomedical Sciences, John Curtin School of Medical Research, The Australian National University, Canberra, Australia.
| |
Collapse
|
20
|
Trégouët DA, Morange PE. Next-generation sequencing strategies in venous thromboembolism: in whom and for what purpose? J Thromb Haemost 2024; 22:1826-1834. [PMID: 38641321 DOI: 10.1016/j.jtha.2024.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/04/2024] [Accepted: 04/05/2024] [Indexed: 04/21/2024]
Abstract
This invited review follows the oral presentation "To Sequence or Not to Sequence, That Is Not the Question; But 'When, Who, Which and What For?' Is" given during the State of the Art session "Translational Genomics in Thrombosis: From OMICs to Clinics" of the International Society on Thrombosis and Haemostasis 2023 Congress. Emphasizing the power of next-generation sequencing technologies and the diverse strategies associated with DNA variant analysis, this review highlights the unresolved questions and challenges in their implementation both for the clinical diagnosis of venous thromboembolism and in translational research.
Collapse
Affiliation(s)
- David-Alexandre Trégouët
- University of Bordeaux, Institut National de la Santé et de la Recherche Médicale, Bordeaux Population Health Research Center, Unité Mixte de Recherche 1219, Bordeaux, France.
| | - Pierre-Emmanuel Morange
- Cardiovascular and Nutrition Research Center (Centre de Recherche en CardioVasculaire et Nutrition), Institut National de la Santé et de la Recherche Médicale, Institut National de Recherche pour l'agriculture, l' Alimentation et l'Environnement, Aix-Marseille University, Marseille, France
| |
Collapse
|
21
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024; 25:460-475. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
22
|
Liu S, Obert C, Yu YP, Zhao J, Ren BG, Liu JJ, Wiseman K, Krajacich BJ, Wang W, Metcalfe K, Smith M, Ben-Yehezkel T, Luo JH. Utility Analyses of AVITI Sequencing Chemistry. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.18.590136. [PMID: 38712138 PMCID: PMC11071311 DOI: 10.1101/2024.04.18.590136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Background DNA sequencing is a critical tool in modern biology. Over the last two decades, it has been revolutionized by the advent of massively parallel sequencing, leading to significant advances in the genome and transcriptome sequencing of various organisms. Nevertheless, challenges with accuracy, lack of competitive options and prohibitive costs associated with high throughput parallel short-read sequencing persist. Results Here, we conduct a comparative analysis using matched DNA and RNA short-reads assays between Element Biosciences' AVITI and Illumina's NextSeq 550 chemistries. Similar comparisons were evaluated for synthetic long-read sequencing for RNA and targeted single-cell transcripts between the AVITI and Illumina's NovaSeq 6000. For both DNA and RNA short-read applications, the study found that the AVITI produced significantly higher per sequence quality scores. For PCR-free DNA libraries, we observed an average 89.7% lower experimentally determined error rate when using the AVITI chemistry, compared to the NextSeq 550. For short-read RNA quantification, AVITI platform had an average of 32.5% lower error rate than that for NextSeq 550. With regards to synthetic long-read mRNA and targeted synthetic long read single cell mRNA sequencing, both platforms' respective chemistries performed comparably in quantification of genes and isoforms. The AVITI displayed a marginally lower error rate for long reads, with fewer chemistry-specific errors and a higher mutation detection rate. Conclusion These results point to the potential of the AVITI platform as a competitive candidate in high-throughput short read sequencing analyses when juxtaposed with the Illumina NextSeq 550.
Collapse
Affiliation(s)
- Silvia Liu
- Department of Pathology, University of Pittsburgh School of Medicine, United States
- High Throughput Genome Center, University of Pittsburgh School of Medicine, United States
- Pittsburgh Liver Research Center, University of Pittsburgh School of Medicine, United States
| | - Caroline Obert
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA 92121, United States
| | - Yan-Ping Yu
- Department of Pathology, University of Pittsburgh School of Medicine, United States
- High Throughput Genome Center, University of Pittsburgh School of Medicine, United States
- Pittsburgh Liver Research Center, University of Pittsburgh School of Medicine, United States
| | - Junhua Zhao
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA 92121, United States
| | - Bao-Guo Ren
- Department of Pathology, University of Pittsburgh School of Medicine, United States
- High Throughput Genome Center, University of Pittsburgh School of Medicine, United States
| | - Jia-Jun Liu
- Department of Pathology, University of Pittsburgh School of Medicine, United States
- High Throughput Genome Center, University of Pittsburgh School of Medicine, United States
- Pittsburgh Liver Research Center, University of Pittsburgh School of Medicine, United States
| | - Kelly Wiseman
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA 92121, United States
| | - Benjamin J Krajacich
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA 92121, United States
| | - Wenjia Wang
- Department of Biostatistics, University of Pittsburgh School of Public Health, United States
| | - Kyle Metcalfe
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA 92121, United States
| | - Mat Smith
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA 92121, United States
| | - Tuval Ben-Yehezkel
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA 92121, United States
| | - Jian-Hua Luo
- Department of Pathology, University of Pittsburgh School of Medicine, United States
- High Throughput Genome Center, University of Pittsburgh School of Medicine, United States
- Pittsburgh Liver Research Center, University of Pittsburgh School of Medicine, United States
| |
Collapse
|
23
|
Lai J, Yang Y, Liu Y, Scharpf RB, Karchin R. Assessing the merits: an opinion on the effectiveness of simulation techniques in tumor subclonal reconstruction. BIOINFORMATICS ADVANCES 2024; 4:vbae094. [PMID: 38948008 PMCID: PMC11213631 DOI: 10.1093/bioadv/vbae094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 05/28/2024] [Accepted: 06/15/2024] [Indexed: 07/02/2024]
Abstract
Summary Neoplastic tumors originate from a single cell, and their evolution can be traced through lineages characterized by mutations, copy number alterations, and structural variants. These lineages are reconstructed and mapped onto evolutionary trees with algorithmic approaches. However, without ground truth benchmark sets, the validity of an algorithm remains uncertain, limiting potential clinical applicability. With a growing number of algorithms available, there is urgent need for standardized benchmark sets to evaluate their merits. Benchmark sets rely on in silico simulations of tumor sequence, but there are no accepted standards for simulation tools, presenting a major obstacle to progress in this field. Availability and implementation All analysis done in the paper was based on publicly available data from the publication of each accessed tool.
Collapse
Affiliation(s)
- Jiaying Lai
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Yi Yang
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Yunzhou Liu
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Robert B Scharpf
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21231, United States
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, United States
| | - Rachel Karchin
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21231, United States
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, United States
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
| |
Collapse
|
24
|
Wu X, Lu M, Yun D, Gao S, Sun F. Long-read single-cell sequencing reveals the transcriptional landscape of spermatogenesis in obstructive azoospermia and Sertoli cell-only patients. QJM 2024; 117:422-435. [PMID: 38192002 DOI: 10.1093/qjmed/hcae009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 12/16/2023] [Indexed: 01/10/2024] Open
Abstract
BACKGROUND High-throughput single-cell RNA sequencing (scRNA-seq) is widely used in spermatogenesis. However, it only reveals short reads in germ and somatic cells, limiting the discovery of novel transcripts and genes. AIM This study shows the long-read transcriptional landscape of spermatogenesis in obstructive azoospermia (OA) and Sertoli cell-only patients. DESIGN Single cells were isolated from testicular biopsies of OA and non-obstructive azoospermia (NOA) patients. Cell culture was identified by comparing PacBio long-read single-cell sequencing (OA n = 3, NOA n = 3) with short-read scRNA-seq (OA n = 6, NOA n = 6). Ten germ cell types and eight somatic cell types were classified based on known markers. METHODS PacBio long-read single-cell sequencing, short-read scRNA-seq, polymerase chain reaction. RESULTS A total of 130 426 long-read transcripts (100 517 novel transcripts and 29 909 known transcripts) and 49 508 long-read transcripts (26 002 novel transcripts and 23 506 known transcripts) have been detected in OA and NOA patients, respectively. Moreover, 36 373 and 1642 new genes are identified in OA and NOA patients, respectively. Importantly, specific expressions of long-read transcripts were detected in germ and stomatic cells during normal spermatogenesis. CONCLUSION We have identified total full-length transcripts in OA and NOA, and new genes were found. Furthermore, specific expressed full-length transcripts were detected, and the genomic structure of transcripts was mapped in different cell types. These findings may provide valuable information on human spermatogenesis and the treatment of male infertility.
Collapse
Affiliation(s)
- X Wu
- Department of Urology and Andrology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - M Lu
- Department of Urology and Andrology, Ren Ji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - D Yun
- Institute of Reproductive Medicine, Medical School of Nantong University, Nantong, Jiangsu, China
| | - S Gao
- Institute of Reproductive Medicine, Medical School of Nantong University, Nantong, Jiangsu, China
| | - F Sun
- Department of Urology and Andrology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| |
Collapse
|
25
|
Wang ZY, Ge LP, Ouyang Y, Jin X, Jiang YZ. Targeting transposable elements in cancer: developments and opportunities. Biochim Biophys Acta Rev Cancer 2024; 1879:189143. [PMID: 38936517 DOI: 10.1016/j.bbcan.2024.189143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 05/23/2024] [Accepted: 06/19/2024] [Indexed: 06/29/2024]
Abstract
Transposable elements (TEs), comprising nearly 50% of the human genome, have transitioned from being perceived as "genomic junk" to key players in cancer progression. Contemporary research links TE regulatory disruptions with cancer development, underscoring their therapeutic potential. Advances in long-read sequencing, computational analytics, single-cell sequencing, proteomics, and CRISPR-Cas9 technologies have enriched our understanding of TEs' clinical implications, notably their impact on genome architecture, gene regulation, and evolutionary processes. In cancer, TEs, including long interspersed element-1 (LINE-1), Alus, and long terminal repeat (LTR) elements, demonstrate altered patterns, influencing both tumorigenic and tumor-suppressive mechanisms. TE-derived nucleic acids and tumor antigens play critical roles in tumor immunity, bridging innate and adaptive responses. Given their central role in oncology, TE-targeted therapies, particularly through reverse transcriptase inhibitors and epigenetic modulators, represent a novel avenue in cancer treatment. Combining these TE-focused strategies with existing chemotherapy or immunotherapy regimens could enhance efficacy and offer a new dimension in cancer treatment. This review delves into recent TE detection advancements, explores their multifaceted roles in tumorigenesis and immune regulation, discusses emerging diagnostic and therapeutic approaches centered on TEs, and anticipates future directions in cancer research.
Collapse
Affiliation(s)
- Zi-Yu Wang
- Department of Breast Surgery, Fudan University Shanghai Cancer Center; Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Li-Ping Ge
- Department of Breast Surgery, Fudan University Shanghai Cancer Center; Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Yang Ouyang
- Department of Breast Surgery, Fudan University Shanghai Cancer Center; Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Xi Jin
- Department of Breast Surgery, Fudan University Shanghai Cancer Center; Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Yi-Zhou Jiang
- Department of Breast Surgery, Fudan University Shanghai Cancer Center; Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China.
| |
Collapse
|
26
|
Araki R, Suga T, Hoki Y, Imadome K, Sunayama M, Kamimura S, Fujita M, Abe M. iPS cell generation-associated point mutations include many C > T substitutions via different cytosine modification mechanisms. Nat Commun 2024; 15:4946. [PMID: 38862540 PMCID: PMC11166658 DOI: 10.1038/s41467-024-49335-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 05/31/2024] [Indexed: 06/13/2024] Open
Abstract
Genomic aberrations are a critical impediment for the safe medical use of iPSCs and their origin and developmental mechanisms remain unknown. Here we find through WGS analysis of human and mouse iPSC lines that genomic mutations are de novo events and that, in addition to unmodified cytosine base prone to deamination, the DNA methylation sequence CpG represents a significant mutation-prone site. CGI and TSS regions show increased mutations in iPSCs and elevated mutations are observed in retrotransposons, especially in the AluY subfamily. Furthermore, increased cytosine to thymine mutations are observed in differentially methylated regions. These results indicate that in addition to deamination of cytosine, demethylation of methylated cytosine, which plays a central role in genome reprogramming, may act mutagenically during iPSC generation.
Collapse
Affiliation(s)
- Ryoko Araki
- Stem Cell Biology Team, Institute for Quantum Life Science, National Institutes for Quantum Science and Technology, Chiba, Japan.
- Department of Radiation Regulatory Science Research, Institute for Radiological Science, National Institutes for Quantum Science and Technology, Chiba, Japan.
| | - Tomo Suga
- Stem Cell Biology Team, Institute for Quantum Life Science, National Institutes for Quantum Science and Technology, Chiba, Japan
- Department of Radiation Regulatory Science Research, Institute for Radiological Science, National Institutes for Quantum Science and Technology, Chiba, Japan
| | - Yuko Hoki
- Stem Cell Biology Team, Institute for Quantum Life Science, National Institutes for Quantum Science and Technology, Chiba, Japan
- Department of Radiation Regulatory Science Research, Institute for Radiological Science, National Institutes for Quantum Science and Technology, Chiba, Japan
| | - Kaori Imadome
- Stem Cell Biology Team, Institute for Quantum Life Science, National Institutes for Quantum Science and Technology, Chiba, Japan
- Department of Radiation Regulatory Science Research, Institute for Radiological Science, National Institutes for Quantum Science and Technology, Chiba, Japan
| | - Misato Sunayama
- Stem Cell Biology Team, Institute for Quantum Life Science, National Institutes for Quantum Science and Technology, Chiba, Japan
- Department of Radiation Regulatory Science Research, Institute for Radiological Science, National Institutes for Quantum Science and Technology, Chiba, Japan
| | - Satoshi Kamimura
- Stem Cell Biology Team, Institute for Quantum Life Science, National Institutes for Quantum Science and Technology, Chiba, Japan
- Department of Radiation Regulatory Science Research, Institute for Radiological Science, National Institutes for Quantum Science and Technology, Chiba, Japan
| | - Mayumi Fujita
- Stem Cell Biology Team, Institute for Quantum Life Science, National Institutes for Quantum Science and Technology, Chiba, Japan
- Department of Radiation Regulatory Science Research, Institute for Radiological Science, National Institutes for Quantum Science and Technology, Chiba, Japan
| | - Masumi Abe
- Institute for Quantum Medical Science, National Institutes for Quantum Science and Technology, Chiba, Japan.
| |
Collapse
|
27
|
Boukoura S, Larsen DH. Nucleolar organization and ribosomal DNA stability in response to DNA damage. Curr Opin Cell Biol 2024; 89:102380. [PMID: 38861757 DOI: 10.1016/j.ceb.2024.102380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 05/17/2024] [Accepted: 05/20/2024] [Indexed: 06/13/2024]
Abstract
Eukaryotic nuclei are structured into sub-compartments orchestrating various cellular functions. The nucleolus is the largest nuclear organelle: a biomolecular condensate with an architecture composed of immiscible fluids facilitating ribosome biogenesis. The nucleolus forms upon the transcription of the repetitive ribosomal RNA genes (rDNA) that cluster in this compartment. rDNA is intrinsically unstable and prone to rearrangements and copy number variation. Upon DNA damage, a specialized nucleolar-DNA Damage Response (n-DDR) is activated: nucleolar transcription is inhibited, the architecture is rearranged, and rDNA is relocated to the nucleolar periphery. Recent data have highlighted how the composition of nucleoli, its structure, chemical and physical properties, contribute to rDNA stability. In this mini-review we focus on recent data that start to reveal how nucleolar composition and the n-DDR work together to ensure rDNA integrity.
Collapse
Affiliation(s)
- Stavroula Boukoura
- Nucleolar Stress and Disease Group, Danish Cancer Institute, Strandboulevarden 49, 2100 Copenhagen, Denmark
| | - Dorthe Helena Larsen
- Nucleolar Stress and Disease Group, Danish Cancer Institute, Strandboulevarden 49, 2100 Copenhagen, Denmark.
| |
Collapse
|
28
|
Nanda AS, Wu K, Irkliyenko I, Woo B, Ostrowski MS, Clugston AS, Sayles LC, Xu L, Satpathy AT, Nguyen HG, Alejandro Sweet-Cordero E, Goodarzi H, Kasinathan S, Ramani V. Direct transposition of native DNA for sensitive multimodal single-molecule sequencing. Nat Genet 2024; 56:1300-1309. [PMID: 38724748 PMCID: PMC11176058 DOI: 10.1038/s41588-024-01748-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 04/08/2024] [Indexed: 05/23/2024]
Abstract
Concurrent readout of sequence and base modifications from long unamplified DNA templates by Pacific Biosciences of California (PacBio) single-molecule sequencing requires large amounts of input material. Here we adapt Tn5 transposition to introduce hairpin oligonucleotides and fragment (tagment) limiting quantities of DNA for generating PacBio-compatible circular molecules. We developed two methods that implement tagmentation and use 90-99% less input than current protocols: (1) single-molecule real-time sequencing by tagmentation (SMRT-Tag), which allows detection of genetic variation and CpG methylation; and (2) single-molecule adenine-methylated oligonucleosome sequencing assay by tagmentation (SAMOSA-Tag), which uses exogenous adenine methylation to add a third channel for probing chromatin accessibility. SMRT-Tag of 40 ng or more human DNA (approximately 7,000 cell equivalents) yielded data comparable to gold standard whole-genome and bisulfite sequencing. SAMOSA-Tag of 30,000-50,000 nuclei resolved single-fiber chromatin structure, CTCF binding and DNA methylation in patient-derived prostate cancer xenografts and uncovered metastasis-associated global epigenome disorganization. Tagmentation thus promises to enable sensitive, scalable and multimodal single-molecule genomics for diverse basic and clinical applications.
Collapse
Affiliation(s)
- Arjun S Nanda
- Gladstone Institute for Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
| | - Ke Wu
- Gladstone Institute for Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA
| | - Iryna Irkliyenko
- Gladstone Institute for Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA
| | - Brian Woo
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Helen-Diller Cancer Center, San Francisco, CA, USA
| | - Megan S Ostrowski
- Gladstone Institute for Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA
| | - Andrew S Clugston
- Helen-Diller Cancer Center, San Francisco, CA, USA
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA, USA
| | - Leanne C Sayles
- Helen-Diller Cancer Center, San Francisco, CA, USA
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA, USA
| | - Lingru Xu
- Helen-Diller Cancer Center, San Francisco, CA, USA
| | - Ansuman T Satpathy
- Department of Pathology, Stanford University, Stanford, CA, USA
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA
- Gladstone-University of California, San Francisco Institute for Genomic Immunology, Gladstone Institutes, San Francisco, CA, USA
| | - Hao G Nguyen
- Helen-Diller Cancer Center, San Francisco, CA, USA
| | - E Alejandro Sweet-Cordero
- Helen-Diller Cancer Center, San Francisco, CA, USA
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA, USA
| | - Hani Goodarzi
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Helen-Diller Cancer Center, San Francisco, CA, USA
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, San Francisco, CA, USA
| | - Sivakanthan Kasinathan
- Gladstone-University of California, San Francisco Institute for Genomic Immunology, Gladstone Institutes, San Francisco, CA, USA.
- Division of Rheumatology, Department of Pediatrics, Stanford University, Stanford, CA, USA.
| | - Vijay Ramani
- Gladstone Institute for Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA.
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA.
- Helen-Diller Cancer Center, San Francisco, CA, USA.
- Bakar Computational Health Sciences Institute, San Francisco, CA, USA.
| |
Collapse
|
29
|
Kuroki Y, Hattori A, Matsubara K, Fukami M. Long-read next-generation sequencing for molecular diagnosis of pediatric endocrine disorders. Ann Pediatr Endocrinol Metab 2024; 29:156-160. [PMID: 38956752 PMCID: PMC11220396 DOI: 10.6065/apem.2448028.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 04/23/2024] [Indexed: 07/04/2024] Open
Abstract
Recent advances in long-read next-generation sequencing (NGS) have enabled researchers to identify several pathogenic variants overlooked by short-read NGS, array-based comparative genomic hybridization, and other conventional methods. Long-read NGS is particularly useful in the detection of structural variants and repeat expansions. Furthermore, it can be used for mutation screening in difficultto- sequence regions, as well as for DNA-methylation analyses and haplotype phasing. This mini-review introduces the usefulness of long-read NGS in the molecular diagnosis of pediatric endocrine disorders.
Collapse
Affiliation(s)
- Yoko Kuroki
- Division of Diversity Research, National Research Institute for Child Health and Development, Tokyo, Japan
- Department of Genome Medicine, National Research Institute for Child Health and Development, Tokyo, Japan
| | - Atsushi Hattori
- Division of Diversity Research, National Research Institute for Child Health and Development, Tokyo, Japan
- Department of Molecular Endocrinology, National Research Institute for Child Health and Development, Tokyo, Japan
| | - Keiko Matsubara
- Division of Diversity Research, National Research Institute for Child Health and Development, Tokyo, Japan
- Department of Molecular Endocrinology, National Research Institute for Child Health and Development, Tokyo, Japan
| | - Maki Fukami
- Division of Diversity Research, National Research Institute for Child Health and Development, Tokyo, Japan
- Department of Molecular Endocrinology, National Research Institute for Child Health and Development, Tokyo, Japan
| |
Collapse
|
30
|
Zhou H, Su X, Song B. ACMGA: a reference-free multiple-genome alignment pipeline for plant species. BMC Genomics 2024; 25:515. [PMID: 38796435 PMCID: PMC11127342 DOI: 10.1186/s12864-024-10430-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 05/20/2024] [Indexed: 05/28/2024] Open
Abstract
BACKGROUND The short-read whole-genome sequencing (WGS) approach has been widely applied to investigate the genomic variation in the natural populations of many plant species. With the rapid advancements in long-read sequencing and genome assembly technologies, high-quality genome sequences are available for a group of varieties for many plant species. These genome sequences are expected to help researchers comprehensively investigate any type of genomic variants that are missed by the WGS technology. However, multiple genome alignment (MGA) tools designed by the human genome research community might be unsuitable for plant genomes. RESULTS To fill this gap, we developed the AnchorWave-Cactus Multiple Genome Alignment (ACMGA) pipeline, which improved the alignment of repeat elements and could identify long (> 50 bp) deletions or insertions (INDELs). We conducted MGA using ACMGA and Cactus for 8 Arabidopsis (Arabidopsis thaliana) and 26 Maize (Zea mays) de novo assembled genome sequences and compared them with the previously published short-read variant calling results. MGA identified more single nucleotide variants (SNVs) and long INDELs than did previously published WGS variant callings. Additionally, ACMGA detected significantly more SNVs and long INDELs in repetitive regions and the whole genome than did Cactus. Compared with the results of Cactus, the results of ACMGA were more similar to the previously published variants called using short-read. These two MGA pipelines identified numerous multi-allelic variants that were missed by the WGS variant calling pipeline. CONCLUSIONS Aligning de novo assembled genome sequences could identify more SNVs and INDELs than mapping short-read. ACMGA combines the advantages of AnchorWave and Cactus and offers a practical solution for plant MGA by integrating global alignment, a 2-piece-affine-gap cost strategy, and the progressive MGA algorithm.
Collapse
Affiliation(s)
- Huafeng Zhou
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong, 266071, China
- National Key Laboratory of Wheat Improvement, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agriculture Sciences in Weifang, Weifang, Shandong, 261325, China
| | - Xiaoquan Su
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong, 266071, China.
| | - Baoxing Song
- National Key Laboratory of Wheat Improvement, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agriculture Sciences in Weifang, Weifang, Shandong, 261325, China.
- Key Laboratory of Maize Biology and Genetic Breeding in Arid Area of Northwest Region of the Ministry of Agriculture, College of Agronomy, Northwest A&F University, Yangling, Shaanxi, 712100, China.
| |
Collapse
|
31
|
Kumari P, Kaur M, Dindhoria K, Ashford B, Amarasinghe SL, Thind AS. Advances in long-read single-cell transcriptomics. Hum Genet 2024:10.1007/s00439-024-02678-x. [PMID: 38787419 DOI: 10.1007/s00439-024-02678-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 05/07/2024] [Indexed: 05/25/2024]
Abstract
Long-read single-cell transcriptomics (scRNA-Seq) is revolutionizing the way we profile heterogeneity in disease. Traditional short-read scRNA-Seq methods are limited in their ability to provide complete transcript coverage, resolve isoforms, and identify novel transcripts. The scRNA-Seq protocols developed for long-read sequencing platforms overcome these limitations by enabling the characterization of full-length transcripts. Long-read scRNA-Seq techniques initially suffered from comparatively poor accuracy compared to short read scRNA-Seq. However, with improvements in accuracy, accessibility, and cost efficiency, long-reads are gaining popularity in the field of scRNA-Seq. This review details the advances in long-read scRNA-Seq, with an emphasis on library preparation protocols and downstream bioinformatics analysis tools.
Collapse
Affiliation(s)
- Pallawi Kumari
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Manmeet Kaur
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Kiran Dindhoria
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Bruce Ashford
- Illawarra Shoalhaven Local Health District (ISLHD), NSW Health, Wollongong, NSW, Australia
| | - Shanika L Amarasinghe
- Monash Biomedical Discovery Institute, Monash University, Clayton, VIC, 3800, Australia
- Walter and Eliza Hall Institute of Medical Research, 1G, Royal Parade, Parkville, VIC, 3025, Australia
| | - Amarinder Singh Thind
- Illawarra Shoalhaven Local Health District (ISLHD), NSW Health, Wollongong, NSW, Australia.
- The School of Chemistry and Molecular Bioscience (SCMB), University of Wollongong, Loftus St, Wollongong, NSW, 2500, Australia.
| |
Collapse
|
32
|
Chao KH, Heinz JM, Hoh C, Mao A, Shumate A, Pertea M, Salzberg SL. Combining DNA and protein alignments to improve genome annotation with LiftOn. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.16.593026. [PMID: 38798552 PMCID: PMC11118573 DOI: 10.1101/2024.05.16.593026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
As the number and variety of assembled genomes continues to grow, the number of annotated genomes is falling behind, particularly for eukaryotes. DNA-based mapping tools help to address this challenge, but they are only able to transfer annotation between closely-related species. Here we introduce LiftOn, a homology-based software tool that integrates DNA and protein alignments to enhance the accuracy of genome-scale annotation and to allow mapping between relatively distant species. LiftOn's protein-centric algorithm considers both types of alignments, chooses optimal open reading frames, resolves overlapping gene loci, and finds additional gene copies where they exist. LiftOn can reliably transfer annotation between genomes representing members of the same species, as we demonstrate on human, mouse, honey bee, rice, and Arabidopsis thaliana. It can further map annotation effectively across species pairs as far apart as mouse and rat or Drosophila melanogaster and D. erecta.
Collapse
Affiliation(s)
- Kuan-Hao Chao
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jakob M. Heinz
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Celine Hoh
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Alan Mao
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Alaina Shumate
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Mihaela Pertea
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Steven L Salzberg
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21211, USA
| |
Collapse
|
33
|
Ji CM, Feng XY, Huang YW, Chen RA. The Applications of Nanopore Sequencing Technology in Animal and Human Virus Research. Viruses 2024; 16:798. [PMID: 38793679 PMCID: PMC11125791 DOI: 10.3390/v16050798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 05/07/2024] [Accepted: 05/13/2024] [Indexed: 05/26/2024] Open
Abstract
In recent years, an increasing number of viruses have triggered outbreaks that pose a severe threat to both human and animal life, as well as caused substantial economic losses. It is crucial to understand the genomic structure and epidemiology of these viruses to guide effective clinical prevention and treatment strategies. Nanopore sequencing, a third-generation sequencing technology, has been widely used in genomic research since 2014. This technology offers several advantages over traditional methods and next-generation sequencing (NGS), such as the ability to generate ultra-long reads, high efficiency, real-time monitoring and analysis, portability, and the ability to directly sequence RNA or DNA molecules. As a result, it exhibits excellent applicability and flexibility in virus research, including viral detection and surveillance, genome assembly, the discovery of new variants and novel viruses, and the identification of chemical modifications. In this paper, we provide a comprehensive review of the development, principles, advantages, and applications of nanopore sequencing technology in animal and human virus research, aiming to offer fresh perspectives for future studies in this field.
Collapse
Affiliation(s)
- Chun-Miao Ji
- Zhaoqing Branch Center of Guangdong Laboratory for Lingnan Modern Agricultural Science and Technology, Zhaoqing 526238, China; (C.-M.J.); (X.-Y.F.)
| | - Xiao-Yin Feng
- Zhaoqing Branch Center of Guangdong Laboratory for Lingnan Modern Agricultural Science and Technology, Zhaoqing 526238, China; (C.-M.J.); (X.-Y.F.)
| | - Yao-Wei Huang
- College of Veterinary Medicine, South China Agricultural University, Guangzhou 510642, China;
- Department of Veterinary Medicine, Zhejiang University, Hangzhou 310058, China
| | - Rui-Ai Chen
- Zhaoqing Branch Center of Guangdong Laboratory for Lingnan Modern Agricultural Science and Technology, Zhaoqing 526238, China; (C.-M.J.); (X.-Y.F.)
- College of Veterinary Medicine, South China Agricultural University, Guangzhou 510642, China;
| |
Collapse
|
34
|
Su Y, Yu Z, Jin S, Ai Z, Yuan R, Chen X, Xue Z, Guo Y, Chen D, Liang H, Liu Z, Liu W. Comprehensive assessment of mRNA isoform detection methods for long-read sequencing data. Nat Commun 2024; 15:3972. [PMID: 38730241 PMCID: PMC11087464 DOI: 10.1038/s41467-024-48117-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 04/19/2024] [Indexed: 05/12/2024] Open
Abstract
The advancement of Long-Read Sequencing (LRS) techniques has significantly increased the length of sequencing to several kilobases, thereby facilitating the identification of alternative splicing events and isoform expressions. Recently, numerous computational tools for isoform detection using long-read sequencing data have been developed. Nevertheless, there remains a deficiency in comparative studies that systemically evaluate the performance of these tools, which are implemented with different algorithms, under various simulations that encompass potential influencing factors. In this study, we conducted a benchmark analysis of thirteen methods implemented in nine tools capable of identifying isoform structures from long-read RNA-seq data. We evaluated their performances using simulated data, which represented diverse sequencing platforms generated by an in-house simulator, RNA sequins (sequencing spike-ins) data, as well as experimental data. Our findings demonstrate IsoQuant as a highly effective tool for isoform detection with LRS, with Bambu and StringTie2 also exhibiting strong performance. These results offer valuable guidance for future research on alternative splicing analysis and the ongoing improvement of tools for isoform detection using LRS data.
Collapse
Affiliation(s)
- Yaqi Su
- Department of Orthopedic Surgery of the Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
| | - Zhejian Yu
- Department of Orthopedic Surgery of the Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Siqian Jin
- Department of Orthopedic Surgery of the Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Zhipeng Ai
- Division of Human Reproduction and Developmental Genetics, Women's Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310006, Zhejiang, China
| | - Ruihong Yuan
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Xinyi Chen
- Department of Orthopedic Surgery of the Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Ziwei Xue
- Department of Orthopedic Surgery of the Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Yixin Guo
- Department of Orthopedic Surgery of the Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Di Chen
- Center for Reproductive Medicine of the Second Affiliated Hospital Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China
- Centre for Regeneration and Cell Therapy of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Hongqing Liang
- Division of Human Reproduction and Developmental Genetics, Women's Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310006, Zhejiang, China
| | - Zuozhu Liu
- Zhejiang University-Angel Align Inc. R&D Center for Intelligent Healthcare, Zhejiang University-University of Illinois at Urbana-Champaign Institute (ZJU-UIUC Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Wanlu Liu
- Department of Orthopedic Surgery of the Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China.
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China.
- Future Health Laboratory, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314100, China.
- Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
35
|
Haer-Wigman L, den Ouden A, Derks R, van Genderen MM, Lugtenberg D, Verheij J, Vijzelaar R, Yntema HG, Vissers LELM, Neveling K. Reply to: Pitfalls in the genetic testing of the OPN1LW-OPN1MW gene cluster in human subjects. NPJ Genom Med 2024; 9:29. [PMID: 38704388 PMCID: PMC11069539 DOI: 10.1038/s41525-024-00409-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 03/13/2024] [Indexed: 05/06/2024] Open
Affiliation(s)
- Lonneke Haer-Wigman
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands.
- Research Institute for Medical Innovation, Radboud University Medical Center, Nijmegen, The Netherlands.
| | - Amber den Ouden
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Ronny Derks
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Maria M van Genderen
- Bartiméus Diagnostic Center for complex visual disorders, Zeist, the Netherlands
- Department of Ophthalmology, University Medical Centre Utrecht, Utrecht, the Netherlands
| | - Dorien Lugtenberg
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Research Institute for Medical Innovation, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Joke Verheij
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | | | - Helger G Yntema
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Research Institute for Medical Innovation, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Lisenka E L M Vissers
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Research Institute for Medical Innovation, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Kornelia Neveling
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Research Institute for Medical Innovation, Radboud University Medical Center, Nijmegen, The Netherlands
| |
Collapse
|
36
|
Anantharam R, Duchen D, Cox AL, Timp W, Thomas DL, Clipman SJ, Kandathil AJ. Long-Read Nanopore-Based Sequencing of Anelloviruses. Viruses 2024; 16:723. [PMID: 38793605 PMCID: PMC11125752 DOI: 10.3390/v16050723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 04/27/2024] [Accepted: 04/30/2024] [Indexed: 05/26/2024] Open
Abstract
Routinely used metagenomic next-generation sequencing (mNGS) techniques often fail to detect low-level viremia (<104 copies/mL) and appear biased towards viruses with linear genomes. These limitations hinder the capacity to comprehensively characterize viral infections, such as those attributed to the Anelloviridae family. These near ubiquitous non-pathogenic components of the human virome have circular single-stranded DNA genomes that vary in size from 2.0 to 3.9 kb and exhibit high genetic diversity. Hence, species identification using short reads can be challenging. Here, we introduce a rolling circle amplification (RCA)-based metagenomic sequencing protocol tailored for circular single-stranded DNA genomes, utilizing the long-read Oxford Nanopore platform. The approach was assessed by sequencing anelloviruses in plasma drawn from people who inject drugs (PWID) in two geographically distinct cohorts. We detail the methodological adjustments implemented to overcome difficulties inherent in sequencing circular genomes and describe a computational pipeline focused on anellovirus detection. We assessed our protocol across various sample dilutions and successfully differentiated anellovirus sequences in conditions simulating mixed infections. This method provides a robust framework for the comprehensive characterization of circular viruses within the human virome using the Oxford Nanopore.
Collapse
Affiliation(s)
- Raghavendran Anantharam
- Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; (R.A.)
| | - Dylan Duchen
- Center for Biomedical Data Science, Yale University School of Medicine, New Haven, CT 06511, USA;
- Department of Pathology, Yale University School of Medicine, New Haven, CT 06519, USA
| | - Andrea L. Cox
- Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; (R.A.)
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - David L. Thomas
- Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; (R.A.)
| | - Steven J. Clipman
- Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; (R.A.)
| | - Abraham J. Kandathil
- Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; (R.A.)
| |
Collapse
|
37
|
Su C, Chandradoss KR, Malachowski T, Boya R, Ryu HS, Brennand KJ, Phillips-Cremins JE. MASTR-seq: Multiplexed Analysis of Short Tandem Repeats with sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.29.591790. [PMID: 38746155 PMCID: PMC11092654 DOI: 10.1101/2024.04.29.591790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
More than 60 human disorders have been linked to unstable expansion of short tandem repeat (STR) tracts. STR length and the extent of DNA methylation is linked to disease pathology and can be mosaic in a cell type-specific manner in several repeat expansion disorders. Mosaic phenomenon have been difficult to study to date due to technical bias intrinsic to repeat sequences and the need for multi-modal measurements at single-allele resolution. Nanopore long-read sequencing accurately measures STR length and DNA methylation in the same single molecule but is cost prohibitive for studies assessing a target locus across multiple experimental conditions or patient samples. Here, we describe MASTR-seq, M ultiplexed A nalysis of S hort T andem R epeats, for cost-effective, high-throughput, accurate, multi-modal measurements of DNA methylation and STR genotype at single-allele resolution. MASTR-seq couples long-read sequencing, Cas9-mediated target enrichment, and PCR-free multiplexed barcoding to achieve a >ten-fold increase in on-target read mapping for 8-12 pooled samples in a single MinION flow cell. We provide a detailed experimental protocol and computational tools and present evidence that MASTR-seq quantifies tract length and DNA methylation status for CGG and CAG STR loci in normal-length and mutation-length human cell lines. The MASTR-seq protocol takes approximately eight days for experiments and one additional day for data processing and analyses. Key points We provide a protocol for MASTR-seq: M ultiplexed A nalysis of S hort T andem R epeats using Cas9-mediated target enrichment and PCR-free, multiplexed nanopore sequencing. MASTR-seq achieves a >10-fold increase in on-target read proportion for highly repetitive, technically inaccessible regions of the genome relevant for human health and disease.MASTR-seq allows for high-throughput, efficient, accurate, and cost-effective measurement of STR length and DNA methylation in the same single allele for up to 8-12 samples in parallel in one Nanopore MinION flow cell.
Collapse
|
38
|
Nicolas G. Lessons from genetic studies in Alzheimer disease. Rev Neurol (Paris) 2024; 180:368-377. [PMID: 38429159 DOI: 10.1016/j.neurol.2023.12.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 12/27/2023] [Indexed: 03/03/2024]
Abstract
Research on Alzheimer disease (AD) genetics has provided critical advances to the knowledge of AD pathophysiological mechanisms. The etiology of AD can be divided into monogenic (autosomal dominant inheritance) and complex (multifactorial determinism). In monogenic AD, recent advances mainly concern mutation-associated mechanisms, presymptomatic clinical studies, and the search for modifiers of ages of onset that are still ongoing. In complex AD, genetic factors can be further categorized into three classes: (i) the APOE-ɛ4 and ɛ2 common alleles that represent a category by themselves as they are both common and with a strong impact on AD risk; (ii) common variants with a modest effect, identified in genome-wide association studies (GWAS); and (iii) rare variants with a moderate-to-strong effect, identified in case-control sequencing studies. Regarding APOE, odds ratios, available in multiple ethnicities, can now be converted into penetrance curves, although such curves remain to be performed in diverse ethnicities. In addition, advances in the understanding of mechanisms have been recently reported and rare APOE variants add to the complexity. In the GWAS category, novel loci have been discovered thanks to larger studies, doubling the number of hits as compared to the previous reference meta-analysis. However, such modest risk factors cannot be used in the clinic, neither individually, nor in genetic risk scores. In the category of rare variants, two novel genes, ABCA1 and ATP8B4 now add to the three main ones, TREM2, SORL1, and ABCA7. The study of such rare variants suggests oligogenic inheritance in some families, as also suggested by digenic penetrance curves for SORL1 loss-of-function variants with APOE-ɛ4. Cumulate frequencies of definite (so-called) rare risk factors are 2.3% to 3.6% (depending on thresholds on odds ratios) in control databases and many more remain to be classified and identified, showing how important these risk factors may be as part of the complex determinism of AD. A better understanding of these rare risk factors and their combined effects on each other, with common variants, and with environmental factors, should allow for a prediction of AD risk and, eventually, preventive medicine. Taken together, most genetic determinants of AD, in monogenic and in complex forms, point toward the aggregation of Aβ as a pivotal triggering factor, such that targeting it may be efficient as prevention in at-risk individuals. The role of neuroinflammation, microglia, and Tau pathology modulation are important sources of research for disease modification.
Collapse
Affiliation(s)
- G Nicolas
- Univ Rouen Normandie, Normandie Univ, Inserm U1245 and CHU Rouen, Department of Genetics and CNRMAJ, 76000 Rouen, France.
| |
Collapse
|
39
|
Mascher M, Marone MP, Schreiber M, Stein N. Are cereal grasses a single genetic system? NATURE PLANTS 2024; 10:719-731. [PMID: 38605239 DOI: 10.1038/s41477-024-01674-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 03/17/2024] [Indexed: 04/13/2024]
Abstract
In 1993, a passionate and provocative call to arms urged cereal researchers to consider the taxon they study as a single genetic system and collaborate with each other. Since then, that group of scientists has seen their discipline blossom. In an attempt to understand what unity of genetic systems means and how the notion was borne out by later research, we survey the progress and prospects of cereal genomics: sequence assemblies, population-scale sequencing, resistance gene cloning and domestication genetics. Gene order may not be as extraordinarily well conserved in the grasses as once thought. Still, several recurring themes have emerged. The same ancestral molecular pathways defining plant architecture have been co-opted in the evolution of different cereal crops. Such genetic convergence as much as cross-fertilization of ideas between cereal geneticists has led to a rich harvest of genes that, it is hoped, will lead to improved varieties.
Collapse
Affiliation(s)
- Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research, Gatersleben, Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.
| | - Marina Püpke Marone
- Leibniz Institute of Plant Genetics and Crop Plant Research, Gatersleben, Germany
| | - Mona Schreiber
- University of Marburg, Department of Biology, Marburg, Germany
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research, Gatersleben, Germany.
- Martin Luther University Halle-Wittenberg, Halle (Saale), Germany.
| |
Collapse
|
40
|
Del Gobbo GF, Wang X, Couse M, Mackay L, Goldsmith C, Marshall AE, Liang Y, Lambert C, Zhang S, Dhillon H, Fanslow C, Rowell WJ, Marshall CR, Kernohan KD, Boycott KM. Long-read genome sequencing reveals a novel intronic retroelement insertion in NR5A1 associated with 46,XY differences of sexual development. Am J Med Genet A 2024; 194:e63522. [PMID: 38131126 DOI: 10.1002/ajmg.a.63522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 12/12/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023]
Abstract
Despite significant advancements in rare genetic disease diagnostics, many patients with rare genetic disease remain without a molecular diagnosis. Novel tools and methods are needed to improve the detection of disease-associated variants and understand the genetic basis of many rare diseases. Long-read genome sequencing provides improved sequencing in highly repetitive, homologous, and low-complexity regions, and improved assessment of structural variation and complex genomic rearrangements compared to short-read genome sequencing. As such, it is a promising method to explore overlooked genetic variants in rare diseases with a high suspicion of a genetic basis. We therefore applied PacBio HiFi sequencing in a large multi-generational family presenting with autosomal dominant 46,XY differences of sexual development (DSD), for whom extensive molecular testing over multiple decades had failed to identify a molecular diagnosis. This revealed a rare SINE-VNTR-Alu retroelement insertion in intron 4 of NR5A1, a gene in which loss-of-function variants are an established cause of 46,XY DSD. The insertion segregated among affected family members and was associated with loss-of-expression of alleles in cis, demonstrating a functional impact on NR5A1. This case highlights the power of long-read genome sequencing to detect genomic variants that have previously been intractable to detection by standard short-read genomic testing.
Collapse
Affiliation(s)
- Giulia F Del Gobbo
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Canada
| | - Xueqi Wang
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Canada
| | - Madeline Couse
- Centre for Computational Medicine, The Hospital for Sick Children, Toronto, Canada
| | - Layla Mackay
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Canada
- Department of Genetics, Children's Hospital of Eastern Ontario, Ottawa, Canada
| | - Claire Goldsmith
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Canada
- Department of Genetics, Children's Hospital of Eastern Ontario, Ottawa, Canada
| | - Aren E Marshall
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Canada
| | - Yijing Liang
- Centre for Computational Medicine, The Hospital for Sick Children, Toronto, Canada
| | | | - Siyuan Zhang
- PacBio of California, Inc, Menlo Park, California, USA
| | | | | | | | | | - Kristin D Kernohan
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Canada
- Newborn Screening Ontario, Ottawa, Canada
| | - Kym M Boycott
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Canada
- Department of Genetics, Children's Hospital of Eastern Ontario, Ottawa, Canada
| |
Collapse
|
41
|
Komoto T, Ikeo K, Yaguchi S, Yamamoto T, Sakamoto N, Awazu A. Assembly of continuous high-resolution draft genome sequence of Hemicentrotus pulcherrimus using long-read sequencing. Dev Growth Differ 2024; 66:297-304. [PMID: 38634255 DOI: 10.1111/dgd.12924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/13/2024] [Accepted: 04/02/2024] [Indexed: 04/19/2024]
Abstract
The update of the draft genome assembly of sea urchin, Hemicentrotus pulcherrimus, which is widely studied in East Asia as a model organism of early development, was performed using Oxford nanopore long-read sequencing. The updated assembly provided ~600-Mb genome sequences divided into 2,163 contigs with N50 = 516 kb. BUSCO completeness score and transcriptome model mapping ratio (TMMR) of the present assembly were obtained as 96.5% and 77.8%, respectively. These results were more continuous with higher resolution than those by the previous version of H. pulcherrimus draft genome, HpulGenome_v1, where the number of scaffolds = 16,251 with a total of ~100 Mb, N50 = 143 kb, BUSCO completeness score = 86.1%, and TMMR = 55.4%. The obtained genome contained 36,055 gene models that were consistent with those in other echinoderms. Additionally, two tandem repeat sequences of early histone gene locus containing 47 copies and 34 copies of all histone genes, and 185 of the homologous sequences of the interspecifically conserved region of the Ars insulator, ArsInsC, were obtained. These results provide further advance for genome-wide research of development, gene regulation, and intranuclear structural dynamics of multicellular organisms using H. pulcherrimus.
Collapse
Affiliation(s)
- Tetsushi Komoto
- Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima, Japan
| | - Kazuho Ikeo
- Department of Genomics and Evolutionary Biology, National Institute of Genetics, Shizuoka, Japan
| | - Shunsuke Yaguchi
- Shimoda Marine Research Center, University of Tsukuba, Shimoda, Japan
| | - Takashi Yamamoto
- Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima, Japan
- Research Center for the Mathematics on Chromatin Live Dynamics, Hiroshima University, Higashi-Hiroshima, Japan
| | - Naoaki Sakamoto
- Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima, Japan
- Research Center for the Mathematics on Chromatin Live Dynamics, Hiroshima University, Higashi-Hiroshima, Japan
| | - Akinori Awazu
- Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima, Japan
- Research Center for the Mathematics on Chromatin Live Dynamics, Hiroshima University, Higashi-Hiroshima, Japan
| |
Collapse
|
42
|
Kronzer VL, Sparks JA, Raychaudhuri S, Cerhan JR. Low-frequency and rare genetic variants associated with rheumatoid arthritis risk. Nat Rev Rheumatol 2024; 20:290-300. [PMID: 38538758 DOI: 10.1038/s41584-024-01096-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/20/2024] [Indexed: 04/28/2024]
Abstract
Rheumatoid arthritis (RA) has an estimated heritability of nearly 50%, which is particularly high in seropositive RA. HLA alleles account for a large proportion of this heritability, in addition to many common single-nucleotide polymorphisms with smaller individual effects. Low-frequency and rare variants, such as those captured by next-generation sequencing, can also have a large role in heritability in some individuals. Rare variant discovery has informed the development of drugs such as inhibitors of PCSK9 and Janus kinases. Some 34 low-frequency and rare variants are currently associated with RA risk. One variant (19:10352442G>C in TYK2) was identified in five separate studies, and might therefore represent a promising therapeutic target. Following a set of best practices in future studies, including studying diverse populations, using large sample sizes, validating RA and serostatus, replicating findings, adjusting for other variants and performing functional assessment, could help to ensure the relevance of identified variants. Exciting opportunities are now on the horizon for genetics in RA, including larger datasets and consortia, whole-genome sequencing and direct applications of findings in the management, and especially treatment, of RA.
Collapse
Affiliation(s)
| | - Jeffrey A Sparks
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Soumya Raychaudhuri
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - James R Cerhan
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
43
|
Bjørnstad PM, Aaløkken R, Åsheim J, Sundaram AYM, Felde CN, Østby GH, Dalland M, Sjursen W, Carrizosa C, Vigeland MD, Sorte HS, Sheng Y, Ariansen SL, Grindedal EM, Gilfillan GD. A 39 kb structural variant causing Lynch Syndrome detected by optical genome mapping and nanopore sequencing. Eur J Hum Genet 2024; 32:513-520. [PMID: 38030917 PMCID: PMC11061271 DOI: 10.1038/s41431-023-01494-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 10/19/2023] [Accepted: 11/06/2023] [Indexed: 12/01/2023] Open
Abstract
Lynch Syndrome (LS) is a hereditary cancer syndrome caused by pathogenic germline variants in one of the four mismatch repair (MMR) genes MLH1, MSH2, MSH6 and PMS2. It is characterized by a significantly increased risk of multiple cancer types, particularly colorectal and endometrial cancer, with autosomal dominant inheritance. Access to precise and sensitive methods for genetic testing is important, as early detection and prevention of cancer is possible when the variant is known. We present here two unrelated Norwegian families with family histories strongly suggestive of LS, where immunohistochemical and microsatellite instability analyses indicated presence of a pathogenic variant in MSH2, but targeted exon sequencing and multiplex ligation-dependent probe amplification (MLPA) were negative. Using Bionano optical genome mapping, we detected a 39 kb insertion in the MSH2 gene. Precise mapping of the insertion breakpoints and inserted sequence was performed by low-coverage whole-genome sequencing with an Oxford Nanopore MinION. The same variant was present in both families, and later found in other families from the same region of Norway, indicative of a founder event. To our knowledge, this is the first diagnosis of LS caused by a structural variant using these technologies. We suggest that structural variant detection be performed when LS is suspected but not confirmed with first-tier standard genetic testing.
Collapse
Affiliation(s)
- Pål Marius Bjørnstad
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Ragnhild Aaløkken
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - June Åsheim
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Arvind Y M Sundaram
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Caroline N Felde
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - G Henriette Østby
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Marianne Dalland
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Wenche Sjursen
- Department of Clinical & Molecular Medicine, NTNU and Department of Medical Genetics, St Olavs Hospital, Trondheim, Norway
| | - Christian Carrizosa
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Magnus D Vigeland
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
- Department of Forensic Sciences, Oslo University Hospital, 0372, Oslo, Norway
| | - Hanne S Sorte
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Ying Sheng
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Sarah L Ariansen
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Eli Marie Grindedal
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Gregor D Gilfillan
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway.
| |
Collapse
|
44
|
Espinosa E, Bautista R, Larrosa R, Plata O. Advancements in long-read genome sequencing technologies and algorithms. Genomics 2024; 116:110842. [PMID: 38608738 DOI: 10.1016/j.ygeno.2024.110842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/01/2024] [Accepted: 04/06/2024] [Indexed: 04/14/2024]
Abstract
The recent advent of long read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore technology (ONT), have led to substantial improvements in accuracy and computational cost in sequencing genomes. However, de novo whole-genome assembly still presents significant challenges related to the quality of the results. Pursuing de novo whole-genome assembly remains a formidable challenge, underscored by intricate considerations surrounding computational demands and result quality. As sequencing accuracy and throughput steadily advance, a continuous stream of innovative assembly tools floods the field. Navigating this dynamic landscape necessitates a reasonable choice of sequencing platform, depth, and assembly tools to orchestrate high-quality genome reconstructions. This comprehensive review delves into the intricate interplay between cutting-edge long read sequencing technologies, assembly methodologies, and the ever-evolving field of genomics. With a focus on addressing the pivotal challenges and harnessing the opportunities presented by these advancements, we provide an in-depth exploration of the crucial factors influencing the selection of optimal strategies for achieving robust and insightful genome assemblies.
Collapse
Affiliation(s)
- Elena Espinosa
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain.
| | - Rocio Bautista
- Supercomputing and Bioinnovation Center, University of Malaga, C. Severo Ochoa, 34, Malaga 29590, Spain.
| | - Rafael Larrosa
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain; Supercomputing and Bioinnovation Center, University of Malaga, C. Severo Ochoa, 34, Malaga 29590, Spain.
| | - Oscar Plata
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain.
| |
Collapse
|
45
|
Bilgrav Saether K, Eisfeldt J, Bengtsson J, Lun MY, Grochowski CM, Mahmoud M, Chao HT, Rosenfeld JA, Liu P, Schuy J, Ameur A, Hwang JP, Sedlazeck FJ, Bi W, Marom R, Nordgren A, Carvalho CMB, Lindstrand A. Mind the gap: the relevance of the genome reference to resolve rare and pathogenic inversions. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.22.24305780. [PMID: 38712270 PMCID: PMC11071548 DOI: 10.1101/2024.04.22.24305780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Both long-read genome sequencing (lrGS) and the recently published Telomere to Telomere (T2T) reference genome provide increased coverage and resolution across repetitive regions promising heightened structural variant detection and improved mapping. Inversions (INV), intrachromosomal segments which are rotated 180° and inserted back into the same chromosome, are a class of structural variants particularly challenging to detect due to their copy-number neutral state and association with repetitive regions. Inversions represent about 1/20 of all balanced structural chromosome aberrations and can lead to disease by gene disruption or altering regulatory regions of dosage sensitive genes in cis . Here we remapped the genome data from six individuals carrying unsolved cytogenetically detected inversions. An INV6 and INV10 were resolved using GRCh38 and T2T-CHM13. Finally, an INV9 required optical genome mapping, de novo assembly of lrGS data and T2T-CHM13. This inversion disrupted intron 25 of EHMT1, confirming a diagnosis of Kleefstra syndrome 1 (MIM#610253). These three inversions, only mappable in specific references, prompted us to investigate the presence and population frequencies of differential reference regions (DRRs) between T2T-CHM13, GRCh37, GRCh38, the chimpanzee and bonobo, and hundreds of megabases of DRRs were identified. Our results emphasize the significance of the chosen reference genome and the added benefits of lrGS and optical genome mapping in solving rearrangements in challenging regions of the genome. This is particularly important for inversions and may impact clinical diagnostics.
Collapse
|
46
|
Schloissnig S, Pani S, Rodriguez-Martin B, Ebler J, Hain C, Tsapalou V, Söylev A, Hüther P, Ashraf H, Prodanov T, Asparuhova M, Hunt S, Rausch T, Marschall T, Korbel JO. Long-read sequencing and structural variant characterization in 1,019 samples from the 1000 Genomes Project. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.18.590093. [PMID: 38659906 PMCID: PMC11042266 DOI: 10.1101/2024.04.18.590093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Structural variants (SVs) contribute significantly to human genetic diversity and disease 1-4 . Previously, SVs have remained incompletely resolved by population genomics, with short-read sequencing facing limitations in capturing the whole spectrum of SVs at nucleotide resolution 5-7 . Here we leveraged nanopore sequencing 8 to construct an intermediate coverage resource of 1,019 long-read genomes sampled within 26 human populations from the 1000 Genomes Project. By integrating linear and graph-based approaches for SV analysis via pangenome graph-augmentation, we uncover 167,291 sequence-resolved SVs in these samples, considerably advancing SV characterization compared to population-wide short-read sequencing studies 3,4 . Our analysis details diverse SV classes-deletions, duplications, insertions, and inversions-at population-scale. LINE-1 and SVA retrotransposition activities frequently mediate transductions 9,10 of unique sequences, with both mobile element classes transducing sequences at either the 3'- or 5'-end, depending on the source element locus. Furthermore, analyses of SV breakpoint junctions suggest a continuum of homology-mediated rearrangement processes are integral to SV formation, and highlight evidence for SV recurrence involving repeat sequences. Our open-access dataset underscores the transformative impact of long-read sequencing in advancing the characterisation of polymorphic genomic architectures, and provides a resource for guiding variant prioritisation in future long-read sequencing-based disease studies.
Collapse
|
47
|
Ten Berk de Boer E, Ameur A, Bunikis I, Ek M, Stattin EL, Feuk L, Eisfeldt J, Lindstrand A. Long-read sequencing and optical mapping generates near T2T assemblies that resolves a centromeric translocation. Sci Rep 2024; 14:9000. [PMID: 38637641 PMCID: PMC11026446 DOI: 10.1038/s41598-024-59683-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 04/13/2024] [Indexed: 04/20/2024] Open
Abstract
Long-read genome sequencing (lrGS) is a promising method in genetic diagnostics. Here we investigate the potential of lrGS to detect a disease-associated chromosomal translocation between 17p13 and the 19 centromere. We constructed two sets of phased and non-phased de novo assemblies; (i) based on lrGS only and (ii) hybrid assemblies combining lrGS with optical mapping using lrGS reads with a median coverage of 34X. Variant calling detected both structural variants (SVs) and small variants and the accuracy of the small variant calling was compared with those called with short-read genome sequencing (srGS). The de novo and hybrid assemblies had high quality and contiguity with N50 of 62.85 Mb, enabling a near telomere to telomere assembly with less than a 100 contigs per haplotype. Notably, we successfully identified the centromeric breakpoint of the translocation. A concordance of 92% was observed when comparing small variant calling between srGS and lrGS. In summary, our findings underscore the remarkable potential of lrGS as a comprehensive and accurate solution for the analysis of SVs and small variants. Thus, lrGS could replace a large battery of genetic tests that were used for the diagnosis of a single symptomatic translocation carrier, highlighting the potential of lrGS in the realm of digital karyotyping.
Collapse
Affiliation(s)
- Esmee Ten Berk de Boer
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76, Stockholm, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, 171 76, Stockholm, Sweden
- Science for Life Laboratory, Karolinska Institutet Science Park, 171 65, Solna, Sweden
| | - Adam Ameur
- Department of Immunology, Genetics and Pathology, Uppsala University, 752 36, Uppsala, Sweden
| | - Ignas Bunikis
- Department of Immunology, Genetics and Pathology, Uppsala University, 752 36, Uppsala, Sweden
| | - Marlene Ek
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76, Stockholm, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, 171 76, Stockholm, Sweden
| | - Eva-Lena Stattin
- Department of Immunology, Genetics and Pathology, Uppsala University, 752 36, Uppsala, Sweden
| | - Lars Feuk
- Department of Immunology, Genetics and Pathology, Uppsala University, 752 36, Uppsala, Sweden
| | - Jesper Eisfeldt
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76, Stockholm, Sweden.
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, 171 76, Stockholm, Sweden.
- Science for Life Laboratory, Karolinska Institutet Science Park, 171 65, Solna, Sweden.
| | - Anna Lindstrand
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76, Stockholm, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, 171 76, Stockholm, Sweden
| |
Collapse
|
48
|
Zhang S, Xu N, Fu L, Yang X, Li Y, Yang Z, Feng Y, Ma K, Jiang X, Han J, Hu R, Zhang L, de Gennaro L, Ryabov F, Meng D, He Y, Wu D, Yang C, Paparella A, Mao Y, Bian X, Lu Y, Antonacci F, Ventura M, Shepelev VA, Miga KH, Alexandrov IA, Logsdon GA, Phillippy AM, Su B, Zhang G, Eichler EE, Lu Q, Shi Y, Sun Q, Mao Y. Comparative genomics of macaques and integrated insights into genetic variation and population history. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.07.588379. [PMID: 38645259 PMCID: PMC11030432 DOI: 10.1101/2024.04.07.588379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
The crab-eating macaques ( Macaca fascicularis ) and rhesus macaques ( M. mulatta ) are widely studied nonhuman primates in biomedical and evolutionary research. Despite their significance, the current understanding of the complex genomic structure in macaques and the differences between species requires substantial improvement. Here, we present a complete genome assembly of a crab-eating macaque and 20 haplotype-resolved macaque assemblies to investigate the complex regions and major genomic differences between species. Segmental duplication in macaques is ∼42% lower, while centromeres are ∼3.7 times longer than those in humans. The characterization of ∼2 Mbp fixed genetic variants and ∼240 Mbp complex loci highlights potential associations with metabolic differences between the two macaque species (e.g., CYP2C76 and EHBP1L1 ). Additionally, hundreds of alternative splicing differences show post-transcriptional regulation divergence between these two species (e.g., PNPO ). We also characterize 91 large-scale genomic differences between macaques and humans at a single-base-pair resolution and highlight their impact on gene regulation in primate evolution (e.g., FOLH1 and PIEZO2 ). Finally, population genetics recapitulates macaque speciation and selective sweeps, highlighting potential genetic basis of reproduction and tail phenotype differences (e.g., STAB1 , SEMA3F , and HOXD13 ). In summary, the integrated analysis of genetic variation and population genetics in macaques greatly enhances our comprehension of lineage-specific phenotypes, adaptation, and primate evolution, thereby improving their biomedical applications in human diseases.
Collapse
|
49
|
Eisenhofer R, Nesme J, Santos-Bay L, Koziol A, Sørensen SJ, Alberdi A, Aizpurua O. A comparison of short-read, HiFi long-read, and hybrid strategies for genome-resolved metagenomics. Microbiol Spectr 2024; 12:e0359023. [PMID: 38451230 PMCID: PMC10986573 DOI: 10.1128/spectrum.03590-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 02/11/2024] [Indexed: 03/08/2024] Open
Abstract
Shotgun metagenomics enables the reconstruction of complex microbial communities at a high level of detail. Such an approach can be conducted using both short-read and long-read sequencing data, as well as a combination of both. To assess the pros and cons of these different approaches, we used 22 fecal DNA extracts collected weekly for 11 weeks from two respective lab mice to study seven performance metrics over four combinations of sequencing depth and technology: (i) 20 Gbp of Illumina short-read data, (ii) 40 Gbp of short-read data, (iii) 20 Gbp of PacBio HiFi long-read data, and (iv) 40 Gbp of hybrid (20 Gbp of short-read +20 Gbp of long-read) data. No strategy was best for all metrics; instead, each one excelled across different metrics. The long-read approach yielded the best assembly statistics, with the highest N50 and lowest number of contigs. The 40 Gbp short-read approach yielded the highest number of refined bins. Finally, the hybrid approach yielded the longest assemblies and the highest mapping rate to the bacterial genomes. Our results suggest that while long-read sequencing significantly improves the quality of reconstructed bacterial genomes, it is more expensive and requires deeper sequencing than short-read approaches to recover a comparable amount of reconstructed genomes. The most optimal strategy is study-specific and depends on how researchers assess the trade-off between the quantity and quality of recovered genomes.IMPORTANCEMice are an important model organism for understanding the gut microbiome. When studying these gut microbiomes using DNA techniques, researchers can choose from technologies that use short or long DNA reads. In this study, we perform an extensive benchmark between short- and long-read DNA sequencing for studying mice gut microbiomes. We find that no one approach was best for all metrics and provide information that can help guide researchers in planning their experiments.
Collapse
Affiliation(s)
- Raphael Eisenhofer
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Joseph Nesme
- Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Luisa Santos-Bay
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Adam Koziol
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Søren Johannes Sørensen
- Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Antton Alberdi
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Ostaizka Aizpurua
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
50
|
Krenn M, Wagner M, Zulehner G, Weng R, Jäger F, Keritam O, Sener M, Brücke C, Milenkovic I, Langer A, Buchinger D, Habersam R, Mayerhanser K, Brugger M, Brunet T, Jacob M, Graf E, Berutti R, Cetin H, Hoefele J, Winkelmann J, Zimprich F, Rath J. Next-generation sequencing and comprehensive data reassessment in 263 adult patients with neuromuscular disorders: insights into the gray zone of molecular diagnoses. J Neurol 2024; 271:1937-1946. [PMID: 38127101 PMCID: PMC10972933 DOI: 10.1007/s00415-023-12101-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 11/03/2023] [Accepted: 11/04/2023] [Indexed: 12/23/2023]
Abstract
BACKGROUND Neuromuscular disorders (NMDs) are heterogeneous conditions with a considerable fraction attributed to monogenic defects. Despite the advancements in genomic medicine, many patients remain without a diagnosis. Here, we investigate whether a comprehensive reassessment strategy improves the diagnostic outcomes. METHODS We analyzed 263 patients with NMD phenotypes that underwent diagnostic exome or genome sequencing at our tertiary referral center between 2015 and 2023. We applied a comprehensive reassessment encompassing variant reclassification, re-phenotyping and NGS data reanalysis. Multivariable logistic regression was performed to identify predictive factors associated with a molecular diagnosis. RESULTS Initially, a molecular diagnosis was identified in 53 cases (20%), while an additional 23 (9%) had findings of uncertain significance. Following comprehensive reassessment, the diagnostic yield increased to 23%, revealing 44 distinct monogenic etiologies. Reasons for newly obtained molecular diagnoses were variant reclassifications in 7 and NGS data reanalysis in 3 cases including one recently described disease-gene association (DNAJB4). Male sex reduced the odds of receiving a molecular diagnosis (OR 0.42; 95%CI 0.21-0.82), while a positive family history (OR 5.46; 95%CI 2.60-11.76) and a myopathy phenotype (OR 2.72; 95%CI 1.11-7.14) increased the likelihood. 7% were resolved through targeted genetic testing or classified as acquired etiologies. CONCLUSION Our findings reinforce the use of NGS in NMDs of suspected monogenic origin. We show that a comprehensive reassessment enhances diagnostic accuracy. However, one needs to be aware that genetic diagnoses are often made with uncertainty and can even be downgraded based on new evidence.
Collapse
Affiliation(s)
- Martin Krenn
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Matias Wagner
- Institute of Human Genetics, Klinikum Rechts Der Isar, School of Medicine, Technical University of Munich, Munich, Germany
- Institute of Neurogenomics, Helmholtz Zentrum München, Munich, Germany
| | - Gudrun Zulehner
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Rosa Weng
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Fiona Jäger
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Omar Keritam
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Merve Sener
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Christof Brücke
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Ivan Milenkovic
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Agnes Langer
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Dominic Buchinger
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Richard Habersam
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Katharina Mayerhanser
- Institute of Human Genetics, Klinikum Rechts Der Isar, School of Medicine, Technical University of Munich, Munich, Germany
| | - Melanie Brugger
- Institute of Human Genetics, Klinikum Rechts Der Isar, School of Medicine, Technical University of Munich, Munich, Germany
| | - Theresa Brunet
- Institute of Human Genetics, Klinikum Rechts Der Isar, School of Medicine, Technical University of Munich, Munich, Germany
- Department of Pediatric Neurology, Developmental Medicine and Social Pediatrics, Dr. Von Hauner's Children's Hospital, University of Munich, Munich, Germany
| | - Maureen Jacob
- Institute of Human Genetics, Klinikum Rechts Der Isar, School of Medicine, Technical University of Munich, Munich, Germany
| | - Elisabeth Graf
- Institute of Human Genetics, Klinikum Rechts Der Isar, School of Medicine, Technical University of Munich, Munich, Germany
| | - Riccardo Berutti
- Institute of Human Genetics, Klinikum Rechts Der Isar, School of Medicine, Technical University of Munich, Munich, Germany
- Institute of Neurogenomics, Helmholtz Zentrum München, Munich, Germany
| | - Hakan Cetin
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Julia Hoefele
- Institute of Human Genetics, Klinikum Rechts Der Isar, School of Medicine, Technical University of Munich, Munich, Germany
| | - Juliane Winkelmann
- Institute of Human Genetics, Klinikum Rechts Der Isar, School of Medicine, Technical University of Munich, Munich, Germany
- Institute of Neurogenomics, Helmholtz Zentrum München, Munich, Germany
| | - Fritz Zimprich
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria
| | - Jakob Rath
- Department of Neurology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria.
- Comprehensive Center for Clinical Neurosciences and Mental Health, Medical University of Vienna, Vienna, Austria.
| |
Collapse
|