1
|
Ji Y, Zhao J, Gong J, Sedlazeck FJ, Fan S. Unveiling novel genetic variants in 370 challenging medically relevant genes using the long read sequencing data of 41 samples from 19 global populations. Mol Genet Genomics 2024; 299:65. [PMID: 38972030 DOI: 10.1007/s00438-024-02158-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 06/16/2024] [Indexed: 07/08/2024]
Abstract
BACKGROUND A large number of challenging medically relevant genes (CMRGs) are situated in complex or highly repetitive regions of the human genome, hindering comprehensive characterization of genetic variants using next-generation sequencing technologies. In this study, we employed long-read sequencing technology, extensively utilized in studying complex genomic regions, to characterize genetic alterations, including short variants (single nucleotide variants and short insertions and deletions) and copy number variations, in 370 CMRGs across 41 individuals from 19 global populations. RESULTS Our analysis revealed high levels of genetic variants in CMRGs, with 68.73% exhibiting copy number variations and 65.20% containing short variants that may disrupt protein function across individuals. Such variants can influence pharmacogenomics, genetic disease susceptibility, and other clinical outcomes. We observed significant differences in CMRG variation across populations, with individuals of African ancestry harboring the highest number of copy number variants and short variants compared to samples from other continents. Notably, 15.79% to 33.96% of short variants were exclusively detectable through long-read sequencing. While the T2T-CHM13 reference genome significantly improved the assembly of CMRG regions, thereby facilitating variant detection in these regions, some regions still lacked resolution. CONCLUSION Our results provide an important reference for future clinical and pharmacogenetic studies, highlighting the need for a comprehensive representation of global genetic diversity in the reference genome and improved variant calling techniques to fully resolve medically relevant genes.
Collapse
Affiliation(s)
- Yanfeng Ji
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Junfan Zhao
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Jiao Gong
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA.
| | - Shaohua Fan
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China.
| |
Collapse
|
2
|
Fu Y, Aganezov S, Mahmoud M, Beaulaurier J, Juul S, Treangen TJ, Sedlazeck FJ. MethPhaser: methylation-based long-read haplotype phasing of human genomes. Nat Commun 2024; 15:5327. [PMID: 38909018 PMCID: PMC11193733 DOI: 10.1038/s41467-024-49588-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 06/11/2024] [Indexed: 06/24/2024] Open
Abstract
The assignment of variants across haplotypes, phasing, is crucial for predicting the consequences, interaction, and inheritance of mutations and is a key step in improving our understanding of phenotype and disease. However, phasing is limited by read length and stretches of homozygosity along the genome. To overcome this limitation, we designed MethPhaser, a method that utilizes methylation signals from Oxford Nanopore Technologies to extend Single Nucleotide Variation (SNV)-based phasing. We demonstrate that haplotype-specific methylations extensively exist in Human genomes and the advent of long-read technologies enabled direct report of methylation signals. For ONT R9 and R10 cell line data, we increase the phase length N50 by 78%-151% at a phasing accuracy of 83.4-98.7% To assess the impact of tissue purity and random methylation signals due to inactivation, we also applied MethPhaser on blood samples from 4 patients, still showing improvements over SNV-only phasing. MethPhaser further improves phasing across HLA and multiple other medically relevant genes, improving our understanding of how mutations interact across multiple phenotypes. The concept of MethPhaser can also be extended to non-human diploid genomes. MethPhaser is available at https://github.com/treangenlab/methphaser .
Collapse
Affiliation(s)
- Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | | | - Sissel Juul
- Oxford Nanopore Technologies Inc, New York, NY, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA.
- Department of Bioengineering, Rice University, Houston, TX, USA.
| | - Fritz J Sedlazeck
- Department of Computer Science, Rice University, Houston, TX, USA.
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA.
| |
Collapse
|
3
|
Cumlin T, Karlsson I, Haars J, Rosengren M, Lennerstrand J, Pimushyna M, Feuk L, Ladenvall C, Kaden R. From SARS-CoV-2 to Global Preparedness: A Graphical Interface for Standardised High-Throughput Bioinformatics Analysis in Pandemic Scenarios and Surveillance of Drug Resistance. Int J Mol Sci 2024; 25:6645. [PMID: 38928350 PMCID: PMC11204113 DOI: 10.3390/ijms25126645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 06/04/2024] [Accepted: 06/15/2024] [Indexed: 06/28/2024] Open
Abstract
The COVID-19 pandemic highlighted the need for a rapid, convenient, and scalable diagnostic method for detecting a novel pathogen amidst a global pandemic. While command-line interface tools offer automation for SARS-CoV-2 Oxford Nanopore Technology sequencing data analysis, they are inapplicable to users with limited programming skills. A solution is to establish such automated workflows within a graphical user interface software. We developed two workflows in the software Geneious Prime 2022.1.1, adapted for data obtained from the Midnight and Artic's nCoV-2019 sequencing protocols. Both workflows perform trimming, read mapping, consensus generation, and annotation on SARS-CoV-2 Nanopore sequencing data. Additionally, one workflow includes phylogenetic assignment using the bioinformatic tools pangolin and Nextclade as plugins. The basic workflow was validated in 2020, adhering to the requirements of the European Centre for Disease Prevention and Control for SARS-CoV-2 sequencing and analysis. The enhanced workflow, providing phylogenetic assignment, underwent validation at Uppsala University Hospital by analysing 96 clinical samples. It provided accurate diagnoses matching the original results of the basic workflow while also reducing manual clicks and analysis time. These bioinformatic workflows streamline SARS-CoV-2 Nanopore data analysis in Geneious Prime, saving time and manual work for operators lacking programming knowledge.
Collapse
Affiliation(s)
- Tomas Cumlin
- Department of Medical Sciences, Section for Clinical Microbiology, Uppsala University, Akademiska Sjukhuset Entrance 40, 751 85 Uppsala, Sweden
| | - Ida Karlsson
- Clinical Genomics Uppsala, Science for Life Laboratory, Uppsala University, 751 85 Uppsala, Sweden
| | - Jonathan Haars
- Department of Medical Sciences, Section for Clinical Microbiology, Uppsala University, Akademiska Sjukhuset Entrance 40, 751 85 Uppsala, Sweden
| | - Maria Rosengren
- Department of Medical Sciences, Section for Clinical Microbiology, Uppsala University, Akademiska Sjukhuset Entrance 40, 751 85 Uppsala, Sweden
| | - Johan Lennerstrand
- Department of Medical Sciences, Section for Clinical Microbiology, Uppsala University, Akademiska Sjukhuset Entrance 40, 751 85 Uppsala, Sweden
| | - Maryna Pimushyna
- Department of Medical Sciences, Section for Clinical Microbiology, Uppsala University, Akademiska Sjukhuset Entrance 40, 751 85 Uppsala, Sweden
| | - Lars Feuk
- National Genomics Infrastructure Uppsala, Uppsala University, 751 08 Uppsala, Sweden
- Department of Immunology, Genetics and Pathology, Uppsala University, 751 08 Uppsala, Sweden
| | - Claes Ladenvall
- Clinical Genomics Uppsala, Science for Life Laboratory, Uppsala University, 751 85 Uppsala, Sweden
- Department of Immunology, Genetics and Pathology, Uppsala University, 751 08 Uppsala, Sweden
| | - Rene Kaden
- Department of Medical Sciences, Section for Clinical Microbiology, Uppsala University, Akademiska Sjukhuset Entrance 40, 751 85 Uppsala, Sweden
- Clinical Genomics Uppsala, Science for Life Laboratory, Uppsala University, 751 85 Uppsala, Sweden
| |
Collapse
|
4
|
Pan C, Reinert K. Leaf: an ultrafast filter for population-scale long-read SV detection. Genome Biol 2024; 25:155. [PMID: 38872200 PMCID: PMC11170821 DOI: 10.1186/s13059-024-03297-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 06/04/2024] [Indexed: 06/15/2024] Open
Abstract
Advances in sequencing technology have facilitated population-scale long-read structural variant (SV) detection. Arguably, one of the main challenges in population-scale analysis is developing effective computational pipelines. Here, we present a new filter-based pipeline for population-scale long-read SV detection. It better captures SV signals at an early stage than conventional assembly-based or alignment-based pipelines. Assessments in this work suggest that the filter-based pipeline helps better resolve intra-read rearrangements. Moreover, it is also more computationally efficient than conventional pipelines and thus may facilitate population-scale long-read applications.
Collapse
Affiliation(s)
- Chenxu Pan
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany.
| | - Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, 14195, Germany
| |
Collapse
|
5
|
Agustinho DP, Fu Y, Menon VK, Metcalf GA, Treangen TJ, Sedlazeck FJ. Unveiling microbial diversity: harnessing long-read sequencing technology. Nat Methods 2024; 21:954-966. [PMID: 38689099 DOI: 10.1038/s41592-024-02262-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 03/29/2024] [Indexed: 05/02/2024]
Abstract
Long-read sequencing has recently transformed metagenomics, enhancing strain-level pathogen characterization, enabling accurate and complete metagenome-assembled genomes, and improving microbiome taxonomic classification and profiling. These advancements are not only due to improvements in sequencing accuracy, but also happening across rapidly changing analysis methods. In this Review, we explore long-read sequencing's profound impact on metagenomics, focusing on computational pipelines for genome assembly, taxonomic characterization and variant detection, to summarize recent advancements in the field and provide an overview of available analytical methods to fully leverage long reads. We provide insights into the advantages and disadvantages of long reads over short reads and their evolution from the early days of long-read sequencing to their recent impact on metagenomics and clinical diagnostics. We further point out remaining challenges for the field such as the integration of methylation signals in sub-strain analysis and the lack of benchmarks.
Collapse
Affiliation(s)
- Daniel P Agustinho
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
| | - Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Vipin K Menon
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
- Senior research project manager, Human Genetics, Genentech, South San Francisco, CA, USA
| | - Ginger A Metcalf
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA
- Department of Bioengineering, Rice University, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
6
|
Sharei M, Kamal M, Afzali-Kusha A, Pedram M. GEMA: A Genome Exact Mapping Accelerator Based on Learned Indexes. IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS 2024; 18:523-538. [PMID: 38157470 DOI: 10.1109/tbcas.2023.3348152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]
Abstract
In this article, we introduce GEMA, a genome exact mapping accelerator based on learned indexes, specifically designed for FPGA implementation. GEMA utilizes a machine learning (ML) algorithm to precisely locate the exact position of read sequences within the original sequence. To enhance the accuracy of the trained ML model, we incorporate data augmentation and data-distribution-aware partitioning techniques. Additionally, we present an efficient yet low-overhead error recovery technique. To map long reads more efficiently, we propose a speculative prefetching approach, which reduces the required memory bandwidth. Furthermore, we suggest an FPGA-based architecture for implementing the proposed mapping accelerator, optimizing the accesses to off-chip memory. Our studies demonstrate that GEMA achieves up to 1.36 × higher speed for short reads compared to the corresponding results reported in recently published exact mapping accelerators. Moreover, GEMA achieves up to ∼22 × faster mapping of long reads compared to the available results for the longest mapped reads using these accelerators.
Collapse
|
7
|
Inamo J, Suzuki A, Ueda MT, Yamaguchi K, Nishida H, Suzuki K, Kaneko Y, Takeuchi T, Hatano H, Ishigaki K, Ishihama Y, Yamamoto K, Kochi Y. Long-read sequencing for 29 immune cell subsets reveals disease-linked isoforms. Nat Commun 2024; 15:4285. [PMID: 38806455 PMCID: PMC11133395 DOI: 10.1038/s41467-024-48615-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Accepted: 05/02/2024] [Indexed: 05/30/2024] Open
Abstract
Alternative splicing events are a major causal mechanism for complex traits, but they have been understudied due to the limitation of short-read sequencing. Here, we generate a full-length isoform annotation of human immune cells from an individual by long-read sequencing for 29 cell subsets. This contains a number of unannotated transcripts and isoforms such as a read-through transcript of TOMM40-APOE in the Alzheimer's disease locus. We profile characteristics of isoforms and show that repetitive elements significantly explain the diversity of unannotated isoforms, providing insight into the human genome evolution. In addition, some of the isoforms are expressed in a cell-type specific manner, whose alternative 3'-UTRs usage contributes to their specificity. Further, we identify disease-associated isoforms by isoform switch analysis and by integration of several quantitative trait loci analyses with genome-wide association study data. Our findings will promote the elucidation of the mechanism of complex diseases via alternative splicing.
Collapse
Affiliation(s)
- Jun Inamo
- Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan
- Division of Rheumatology, Department of Internal Medicine, Keio University School of Medicine, Tokyo, 160-8582, Japan
| | - Akari Suzuki
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan
| | - Mahoko Takahashi Ueda
- Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan
| | - Kensuke Yamaguchi
- Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan
- Biomedical Engineering Research Innovation Center, Institute of Biomaterials and Bioengineering, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan
| | - Hiroshi Nishida
- Department of Molecular Systems Bioanalysis, Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, 606-8501, Japan
| | - Katsuya Suzuki
- Division of Rheumatology, Department of Internal Medicine, Keio University School of Medicine, Tokyo, 160-8582, Japan
| | - Yuko Kaneko
- Division of Rheumatology, Department of Internal Medicine, Keio University School of Medicine, Tokyo, 160-8582, Japan
| | - Tsutomu Takeuchi
- Division of Rheumatology, Department of Internal Medicine, Keio University School of Medicine, Tokyo, 160-8582, Japan
- Saitama Medical University, 38 Morohongo, Moroyama, Iruma, Saitama, 350-0495, Japan
| | - Hiroaki Hatano
- Laboratory for Human Immunogenetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan
| | - Kazuyoshi Ishigaki
- Laboratory for Human Immunogenetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan
| | - Yasushi Ishihama
- Department of Molecular Systems Bioanalysis, Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, 606-8501, Japan
- Laboratory of Proteomics for Drug Discovery, National Institute of Biomedical Innovation, Health and Nutrition, Ibaraki, Osaka, 567-0085, Japan
| | - Kazuhiko Yamamoto
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan
| | - Yuta Kochi
- Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan.
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan.
| |
Collapse
|
8
|
Schrauwen I, Rajendran Y, Acharya A, Öhman S, Arvio M, Paetau R, Siren A, Avela K, Granvik J, Leal SM, Määttä T, Kokkonen H, Järvelä I. Optical genome mapping unveils hidden structural variants in neurodevelopmental disorders. Sci Rep 2024; 14:11239. [PMID: 38755281 PMCID: PMC11099145 DOI: 10.1038/s41598-024-62009-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 05/13/2024] [Indexed: 05/18/2024] Open
Abstract
While short-read sequencing currently dominates genetic research and diagnostics, it frequently falls short of capturing certain structural variants (SVs), which are often implicated in the etiology of neurodevelopmental disorders (NDDs). Optical genome mapping (OGM) is an innovative technique capable of capturing SVs that are undetectable or challenging-to-detect via short-read methods. This study aimed to investigate NDDs using OGM, specifically focusing on cases that remained unsolved after standard exome sequencing. OGM was performed in 47 families using ultra-high molecular weight DNA. Single-molecule maps were assembled de novo, followed by SV and copy number variant calling. We identified 7 variants of interest, of which 5 (10.6%) were classified as likely pathogenic or pathogenic, located in BCL11A, OPHN1, PHF8, SON, and NFIA. We also identified an inversion disrupting NAALADL2, a gene which previously was found to harbor complex rearrangements in two NDD cases. Variants in known NDD genes or candidate variants of interest missed by exome sequencing mainly consisted of larger insertions (> 1kbp), inversions, and deletions/duplications of a low number of exons (1-4 exons). In conclusion, in addition to improving molecular diagnosis in NDDs, this technique may also reveal novel NDD genes which may harbor complex SVs often missed by standard sequencing techniques.
Collapse
Affiliation(s)
- Isabelle Schrauwen
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA.
| | - Yasmin Rajendran
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA
| | - Anushree Acharya
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA
| | | | - Maria Arvio
- Päijät-Häme Wellbeing Services, Neurology, Lahti, Finland
| | - Ritva Paetau
- Department of Child Neurology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
| | - Auli Siren
- Kanta-Häme Central Hospital, Hämeenlinna, Finland
| | - Kristiina Avela
- Institute of Biomedicine, University of Turku, Turku, Finland
| | - Johanna Granvik
- The Wellbeing Services County of Ostrobothnia, Kokkola, Finland
| | - Suzanne M Leal
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA
- Taub Institute for Alzheimer's Disease and the Aging Brain, Columbia University Medical Center, New York, NY, USA
| | - Tuomo Määttä
- The Wellbeing Services County of Kainuu, Kajaani, Finland
| | - Hannaleena Kokkonen
- Northern Finland Laboratory Centre NordLab and Medical Research Centre, Oulu University Hospital and University of Oulu, Oulu, Finland
| | - Irma Järvelä
- Department of Medical Genetics, University of Helsinki, Helsinki, Finland
| |
Collapse
|
9
|
Tunjić-Cvitanić M, García-Souto D, Pasantes JJ, Šatović-Vukšić E. Dominance of transposable element-related satDNAs results in great complexity of "satDNA library" and invokes the extension towards "repetitive DNA library". MARINE LIFE SCIENCE & TECHNOLOGY 2024; 6:236-251. [PMID: 38827134 PMCID: PMC11136912 DOI: 10.1007/s42995-024-00218-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 02/26/2024] [Indexed: 06/04/2024]
Abstract
Research on bivalves is fast-growing, including genome-wide analyses and genome sequencing. Several characteristics qualify oysters as a valuable model to explore repetitive DNA sequences and their genome organization. Here we characterize the satellitomes of five species in the family Ostreidae (Crassostrea angulata, C. virginica, C. hongkongensis, C. ariakensis, Ostrea edulis), revealing a substantial number of satellite DNAs (satDNAs) per genome (ranging between 33 and 61) and peculiarities in the composition of their satellitomes. Numerous satDNAs were either associated to or derived from transposable elements, displaying a scarcity of transposable element-unrelated satDNAs in these genomes. Due to the non-conventional satellitome constitution and dominance of Helitron-associated satDNAs, comparative satellitomics demanded more in-depth analyses than standardly employed. Comparative analyses (including C. gigas, the first bivalve species with a defined satellitome) revealed that 13 satDNAs occur in all six oyster genomes, with Cg170/HindIII satDNA being the most abundant in all of them. Evaluating the "satDNA library model" highlighted the necessity to adjust this term when studying tandem repeat evolution in organisms with such satellitomes. When repetitive sequences with potential variation in the organizational form and repeat-type affiliation are examined across related species, the introduction of the terms "TE library" and "repetitive DNA library" becomes essential. Supplementary Information The online version contains supplementary material available at 10.1007/s42995-024-00218-0.
Collapse
Affiliation(s)
| | - Daniel García-Souto
- Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, 15706 Santiago de Compostela, Spain
- Department of Zoology, Genetics and Physical Anthropology, Universidade de Santiago de Compostela, 15706 Santiago de Compostela, Spain
| | - Juan J. Pasantes
- Centro de Investigación Mariña, Dpto de Bioquímica, Xenética e Inmunoloxía, Universidade de Vigo, 36310 Vigo, Spain
| | - Eva Šatović-Vukšić
- Division of Molecular Biology, Ruđer Bošković Institute, 10000 Zagreb, Croatia
| |
Collapse
|
10
|
Miano-Burkhardt A, Alvarez Jerez P, Daida K, Bandres Ciga S, Billingsley KJ. The Role of Structural Variants in the Genetic Architecture of Parkinson's Disease. Int J Mol Sci 2024; 25:4801. [PMID: 38732020 PMCID: PMC11084710 DOI: 10.3390/ijms25094801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 04/17/2024] [Accepted: 04/22/2024] [Indexed: 05/13/2024] Open
Abstract
Parkinson's disease (PD) significantly impacts millions of individuals worldwide. Although our understanding of the genetic foundations of PD has advanced, a substantial portion of the genetic variation contributing to disease risk remains unknown. Current PD genetic studies have primarily focused on one form of genetic variation, single nucleotide variants (SNVs), while other important forms of genetic variation, such as structural variants (SVs), are mostly ignored due to the complexity of detecting these variants with traditional sequencing methods. Yet, these forms of genetic variation play crucial roles in gene expression and regulation in the human brain and are causative of numerous neurological disorders, including forms of PD. This review aims to provide a comprehensive overview of our current understanding of the involvement of coding and noncoding SVs in the genetic architecture of PD.
Collapse
Affiliation(s)
- Abigail Miano-Burkhardt
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD 20892, USA; (A.M.-B.); (K.D.)
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD 20892, USA; (P.A.J.); (S.B.C.)
| | - Pilar Alvarez Jerez
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD 20892, USA; (P.A.J.); (S.B.C.)
| | - Kensuke Daida
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD 20892, USA; (A.M.-B.); (K.D.)
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD 20892, USA; (P.A.J.); (S.B.C.)
| | - Sara Bandres Ciga
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD 20892, USA; (P.A.J.); (S.B.C.)
| | - Kimberley J. Billingsley
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD 20892, USA; (A.M.-B.); (K.D.)
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD 20892, USA; (P.A.J.); (S.B.C.)
| |
Collapse
|
11
|
Petraccioli A, Maio N, Carotenuto R, Odierna G, Guarino FM. The Satellite DNA PcH-Sat, Isolated and Characterized in the Limpet Patella caerulea (Mollusca, Gastropoda), Suggests the Origin from a Nin-SINE Transposable Element. Genes (Basel) 2024; 15:541. [PMID: 38790169 PMCID: PMC11121367 DOI: 10.3390/genes15050541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 04/16/2024] [Accepted: 04/23/2024] [Indexed: 05/26/2024] Open
Abstract
Satellite DNA (sat-DNA) was previously described as junk and selfish DNA in the cellular economy, without a clear functional role. However, during the last two decades, evidence has been accumulated about the roles of sat-DNA in different cellular functions and its probable involvement in tumorigenesis and adaptation to environmental changes. In molluscs, studies on sat-DNAs have been performed mainly on bivalve species, especially those of economic interest. Conversely, in Gastropoda (which includes about 80% of the currently described molluscs species), studies on sat-DNA have been largely neglected. In this study, we isolated and characterized a sat-DNA, here named PcH-sat, in the limpet Patella caerulea using the restriction enzyme method, particularly HaeIII. Monomeric units of PcH-sat are 179 bp long, AT-rich (58.7%), and with an identity among monomers ranging from 91.6 to 99.8%. Southern blot showed that PcH-sat is conserved in P. depressa and P. ulyssiponensis, while a smeared signal of hybridization was present in the other three investigated limpets (P. ferruginea, P. rustica and P. vulgata). Dot blot showed that PcH-sat represents about 10% of the genome of P. caerulea, 5% of that of P. depressa, and 0.3% of that of P. ulyssiponensis. FISH showed that PcH-sat was mainly localized on pericentromeric regions of chromosome pairs 2 and 4-7 of P. caerulea (2n = 18). A database search showed that PcH-sat contains a large segment (of 118 bp) showing high identity with a homologous trait of the Nin-SINE transposable element (TE) of the patellogastropod Lottia gigantea, supporting the hypothesis that TEs are involved in the rising and tandemization processes of sat-DNAs.
Collapse
Affiliation(s)
| | | | | | - Gaetano Odierna
- Department of Biology, University of Naples Federico II, Via Cinthia, I-80126 Naples, Italy; (A.P.); (N.M.); (R.C.); (F.M.G.)
| | | |
Collapse
|
12
|
Buthasane W, Shotelersuk V, Chetruengchai W, Srichomthong C, Assawapitaksakul A, Tangphatsornruang S, Pootakham W, Sonthirod C, Tongsima S, Wangkumhang P, Wilantho A, Thongphakdee A, Sanannu S, Poksawat C, Nipanunt T, Kasorndorkbua C, Koepfli KP, Pukazhenthi BS, Suriyaphol P, Wongsurawat T, Jenjaroenpun P, Suriyaphol G. Comprehensive genome assembly reveals genetic diversity and carcass consumption insights in critically endangered Asian king vultures. Sci Rep 2024; 14:9455. [PMID: 38658744 PMCID: PMC11043450 DOI: 10.1038/s41598-024-59990-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 04/17/2024] [Indexed: 04/26/2024] Open
Abstract
The Asian king vulture (AKV), a vital forest scavenger, is facing globally critical endangerment. This study aimed to construct a reference genome to unveil the mechanisms underlying its scavenger abilities and to assess the genetic relatedness of the captive population in Thailand. A reference genome of a female AKV was assembled from sequencing reads obtained from both PacBio long-read and MGI short-read sequencing platforms. Comparative genomics with New World vultures (NWVs) and other birds in the Family Accipitridae revealed unique gene families in AKV associated with retroviral genome integration and feather keratin, contrasting with NWVs' genes related to olfactory reception. Expanded gene families in AKV were linked to inflammatory response, iron regulation and spermatogenesis. Positively selected genes included those associated with anti-apoptosis, immune response and muscle cell development, shedding light on adaptations for carcass consumption and high-altitude soaring. Using restriction site-associated DNA sequencing (RADseq)-based genome-wide single nucleotide polymorphisms (SNPs), genetic relatedness and inbreeding status of five captive AKVs were determined, revealing high genomic inbreeding in two females. In conclusion, the AKV reference genome was established, providing insights into its unique characteristics. Additionally, the potential of RADseq-based genome-wide SNPs for selecting AKV breeders was demonstrated.
Collapse
Affiliation(s)
- Wannapol Buthasane
- Biochemistry Unit, Department of Physiology, Faculty of Veterinary Science, Chulalongkorn University, Bangkok, 10330, Thailand
| | - Vorasuk Shotelersuk
- Center of Excellence for Medical Genomics, Department of Pediatrics, Faculty of Medicine, Chulalongkorn University, Henri Dunant Road, Pathumwan, Bangkok, 10330, Thailand
- Excellence Center for Genomics and Precision Medicine, King Chulalongkorn Memorial Hospital, The Thai Red Cross Society, Bangkok, 10330, Thailand
| | - Wanna Chetruengchai
- Center of Excellence for Medical Genomics, Department of Pediatrics, Faculty of Medicine, Chulalongkorn University, Henri Dunant Road, Pathumwan, Bangkok, 10330, Thailand
- Excellence Center for Genomics and Precision Medicine, King Chulalongkorn Memorial Hospital, The Thai Red Cross Society, Bangkok, 10330, Thailand
| | - Chalurmpon Srichomthong
- Center of Excellence for Medical Genomics, Department of Pediatrics, Faculty of Medicine, Chulalongkorn University, Henri Dunant Road, Pathumwan, Bangkok, 10330, Thailand
- Excellence Center for Genomics and Precision Medicine, King Chulalongkorn Memorial Hospital, The Thai Red Cross Society, Bangkok, 10330, Thailand
| | - Adjima Assawapitaksakul
- Center of Excellence for Medical Genomics, Department of Pediatrics, Faculty of Medicine, Chulalongkorn University, Henri Dunant Road, Pathumwan, Bangkok, 10330, Thailand
- Excellence Center for Genomics and Precision Medicine, King Chulalongkorn Memorial Hospital, The Thai Red Cross Society, Bangkok, 10330, Thailand
| | - Sithichoke Tangphatsornruang
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency, Pathum Thani, 12120, Thailand
| | - Wirulda Pootakham
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency, Pathum Thani, 12120, Thailand
| | - Chutima Sonthirod
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency, Pathum Thani, 12120, Thailand
| | - Sissades Tongsima
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency, Pathum Thani, 12120, Thailand
| | - Pongsakorn Wangkumhang
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency, Pathum Thani, 12120, Thailand
| | - Alisa Wilantho
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency, Pathum Thani, 12120, Thailand
| | - Ampika Thongphakdee
- Animal Conservation and Research Institute, The Zoological Park Organization of Thailand under the Royal Patronage of H.M. The King, Bangkok, 10300, Thailand
| | - Saowaphang Sanannu
- Animal Conservation and Research Institute, The Zoological Park Organization of Thailand under the Royal Patronage of H.M. The King, Bangkok, 10300, Thailand
| | - Chaianan Poksawat
- Animal Conservation and Research Institute, The Zoological Park Organization of Thailand under the Royal Patronage of H.M. The King, Bangkok, 10300, Thailand
| | - Tarasak Nipanunt
- Huai Kha Khaeng Wildlife Breeding Center, Department of National Parks, Wildlife and Plant Conservation, Uthai Thani, 61160, Thailand
| | - Chaiyan Kasorndorkbua
- Laboratory of Raptor Research and Conservation Medicine, Department of Pathology, Faculty of Veterinary Medicine, Kasetsart University, Bangkok, 10900, Thailand
| | - Klaus-Peter Koepfli
- Smithsonian-Mason School of Conservation, George Mason University, Front Royal, VA, 22630, USA
- Center for Species Survival, Smithsonian Conservation Biology Institute, National Zoological Park, Front Royal, VA, 22630, USA
| | - Budhan S Pukazhenthi
- Center for Species Survival, Smithsonian Conservation Biology Institute, National Zoological Park, Front Royal, VA, 22630, USA
| | - Prapat Suriyaphol
- Division of Medical Bioinformatics, Department of Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, 10700, Thailand
| | - Thidathip Wongsurawat
- Division of Medical Bioinformatics, Department of Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, 10700, Thailand
| | - Piroon Jenjaroenpun
- Division of Medical Bioinformatics, Department of Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, 10700, Thailand
| | - Gunnaporn Suriyaphol
- Biochemistry Unit, Department of Physiology, Faculty of Veterinary Science, Chulalongkorn University, Bangkok, 10330, Thailand.
| |
Collapse
|
13
|
Chen Z, Ain NU, Zhao Q, Zhang X. From tradition to innovation: conventional and deep learning frameworks in genome annotation. Brief Bioinform 2024; 25:bbae138. [PMID: 38581418 PMCID: PMC10998533 DOI: 10.1093/bib/bbae138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 03/08/2024] [Accepted: 03/10/2024] [Indexed: 04/08/2024] Open
Abstract
Following the milestone success of the Human Genome Project, the 'Encyclopedia of DNA Elements (ENCODE)' initiative was launched in 2003 to unearth information about the numerous functional elements within the genome. This endeavor coincided with the emergence of numerous novel technologies, accompanied by the provision of vast amounts of whole-genome sequences, high-throughput data such as ChIP-Seq and RNA-Seq. Extracting biologically meaningful information from this massive dataset has become a critical aspect of many recent studies, particularly in annotating and predicting the functions of unknown genes. The core idea behind genome annotation is to identify genes and various functional elements within the genome sequence and infer their biological functions. Traditional wet-lab experimental methods still rely on extensive efforts for functional verification. However, early bioinformatics algorithms and software primarily employed shallow learning techniques; thus, the ability to characterize data and features learning was limited. With the widespread adoption of RNA-Seq technology, scientists from the biological community began to harness the potential of machine learning and deep learning approaches for gene structure prediction and functional annotation. In this context, we reviewed both conventional methods and contemporary deep learning frameworks, and highlighted novel perspectives on the challenges arising during annotation underscoring the dynamic nature of this evolving scientific landscape.
Collapse
Affiliation(s)
- Zhaojia Chen
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangzhou 518120, China
- College of Biomedical Engineering, Taiyuan University of Technology, Jinzhong 030600, China
| | - Noor ul Ain
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangzhou 518120, China
| | - Qian Zhao
- State Key Laboratory for Ecological Pest Control of Fujian/Taiwan Crops and College of Life Science, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Xingtan Zhang
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangzhou 518120, China
| |
Collapse
|
14
|
Ángeles-Argáiz RE, Aguirre-Beltrán LFL, Hernández-Oaxaca D, Quintero-Corrales C, Trujillo-Roldán MA, Castillo-Ramírez S, Garibay-Orijel R. Assembly collapsing versus heterozygosity oversizing: detection of homokaryotic and heterokaryotic Laccaria trichodermophora strains by hybrid genome assembly. Microb Genom 2024; 10:001218. [PMID: 38529901 PMCID: PMC10995626 DOI: 10.1099/mgen.0.001218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 03/01/2024] [Indexed: 03/27/2024] Open
Abstract
Genome assembly and annotation using short-paired reads is challenging for eukaryotic organisms due to their large size, variable ploidy and large number of repetitive elements. However, the use of single-molecule long reads improves assembly quality (completeness and contiguity), but haplotype duplications still pose assembly challenges. To address the effect of read length on genome assembly quality, gene prediction and annotation, we compared genome assemblers and sequencing technologies with four strains of the ectomycorrhizal fungus Laccaria trichodermophora. By analysing the predicted repertoire of carbohydrate enzymes, we investigated the effects of assembly quality on functional inferences. Libraries were generated using three different sequencing platforms (Illumina Next-Seq, Mi-Seq and PacBio Sequel), and genomes were assembled using single and hybrid assemblies/libraries. Long reads or hybrid assemby resolved the collapsing of repeated regions, but the nuclear heterozygous versions remained unresolved. In dikaryotic fungi, each cell includes two nuclei and each nucleus has differences not only in allelic gene version but also in gene composition and synteny. These heterokaryotic cells produce fragmentation and size overestimation of the genome assembly of each nucleus. Hybrid assembly revealed a wider functional diversity of genomes. Here, several predicted oxidizing activities on glycosyl residues of oligosaccharides and several chitooligosaccharide acetylase activities would have passed unnoticed in short-read assemblies. Also, the size and fragmentation of the genome assembly, in combination with heterozygosity analysis, allowed us to distinguish homokaryotic and heterokaryotic strains isolated from L. trichodermophora fruit bodies.
Collapse
Affiliation(s)
- Rodolfo Enrique Ángeles-Argáiz
- Posgrado en Ciencias Biológicas, Universidad Nacional Autónoma de México, Circuito de los Posgrados s/n, Ciudad Universitaria, Delegación Coyoacán, Ciudad de México, México, C.P. 04510, Mexico
- Instituto de Biología, Universidad Nacional Autónoma de México, Tercer Circuito s/n, Ciudad Universitaria, Delegación Coyoacán, Ciudad de México, México, C.P. 04510, Mexico
- Red de Manejo Biotecnológico de Recursos, Instituto de Ecología A. C. Carretera antigua a Coatepec 351, Col. El Haya, Xalapa, Veracruz, México, C.P. 91612, Mexico
| | - Luis Fernando Lozano Aguirre-Beltrán
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Universidad Autónoma del Estado de Morelos, Cuernavaca, Morelos, México, C.P. 62210, Mexico
| | - Diana Hernández-Oaxaca
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Universidad Autónoma del Estado de Morelos, Cuernavaca, Morelos, México, C.P. 62210, Mexico
- Red de Biodiversidad y Sistemática, Instituto de Ecología A. C. Carretera antigua a Coatepec 351, Col. El Haya, Xalapa, Veracruz, México, C.P. 91073, Mexico
| | - Christian Quintero-Corrales
- Posgrado en Ciencias Biológicas, Universidad Nacional Autónoma de México, Circuito de los Posgrados s/n, Ciudad Universitaria, Delegación Coyoacán, Ciudad de México, México, C.P. 04510, Mexico
- Instituto de Biología, Universidad Nacional Autónoma de México, Tercer Circuito s/n, Ciudad Universitaria, Delegación Coyoacán, Ciudad de México, México, C.P. 04510, Mexico
| | - Mauricio A. Trujillo-Roldán
- Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, Tercer Circuito s/n, Ciudad Universitaria, Delegación Coyoacán, Ciudad de México, México, C.P. 04510, Mexico
- Centro de Nanociencias y Nanotecnología, Universidad Nacional Autónoma de México, Km 107 carretera Tijuana-Ensenada, Ensenada, Baja California, Mexico, C.P. 22860, Mexico
| | - Santiago Castillo-Ramírez
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Universidad Autónoma del Estado de Morelos, Cuernavaca, Morelos, México, C.P. 62210, Mexico
| | - Roberto Garibay-Orijel
- Instituto de Biología, Universidad Nacional Autónoma de México, Tercer Circuito s/n, Ciudad Universitaria, Delegación Coyoacán, Ciudad de México, México, C.P. 04510, Mexico
| |
Collapse
|
15
|
Yamano K, Haseda A, Iwabuchi K, Osabe T, Sudo Y, Pachakkil B, Tanaka K, Suzuki Y, Toyoda A, Hirakawa H, Onodera Y. QTL analysis of femaleness in monoecious spinach and fine mapping of a major QTL using an updated version of chromosome-scale pseudomolecules. PLoS One 2024; 19:e0296675. [PMID: 38394294 PMCID: PMC10890751 DOI: 10.1371/journal.pone.0296675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 12/15/2023] [Indexed: 02/25/2024] Open
Abstract
Although spinach is predominantly dioecious, monoecious plants with varying proportions of female and male flowers are also present. Recently, monoecious inbred lines with highly female and male conditions have been preferentially used as parents for F1-hybrids, rather than dioecious lines. Accordingly, identifying the loci for monoecism is an important issue for spinach breeding. We here used long-read sequencing and Hi-C technology to construct SOL_r2.0_pseudomolecule, a set of six pseudomolecules of spinach chromosomes (total length: 879.2 Mb; BUSCO complete 97.0%) that are longer and more genetically complete than our previous version of pseudomolecules (688.0 Mb; 81.5%). Three QTLs, qFem2.1, qFem3.1, and qFem6.1, responsible for monoecism were mapped to SOL_r2.0_pseudomolecule. qFem3.1 had the highest LOD score and corresponded to the M locus, which was previously identified as a determinant of monoecious expression, by genetic analysis of progeny from female and monoecious plants. The other QTLs were shown to modulate the ratio of female to male flowers in monoecious plants harboring a dominant allele of the M gene. Our findings will enable breeders to efficiently produce highly female- and male-monoecious parental lines for F1-hybrids by pyramiding the three QTLs. Through fine-mapping, we narrowed the candidate region for the M locus to a 19.5 kb interval containing three protein-coding genes and one long non-coding RNA gene. Among them, only RADIALIS-like-2a showed a higher expression in the reproductive organs, suggesting that it might play a role in reproductive organogenesis. However, there is no evidence that it is involved in the regulation of stamen and pistil initiation, which are directly related to the floral sex differentiation system in spinach. Given that auxin is involved in reproductive organ formation in many plant species, genes related to auxin transport/response, in addition to floral organ formation, were identified as candidates for regulators of floral sex-differentiation from qFem2.1 and qFem6.1.
Collapse
Affiliation(s)
- Kaoru Yamano
- Graduate School of Agriculture, Hokkaido University, Sapporo, Japan
| | - Akane Haseda
- Graduate School of Agriculture, Hokkaido University, Sapporo, Japan
| | - Keisuke Iwabuchi
- Graduate School of Agriculture, Hokkaido University, Sapporo, Japan
| | - Takayuki Osabe
- School of Agriculture, Hokkaido University, Sapporo, Japan
| | - Yuki Sudo
- Graduate School of Agriculture, Hokkaido University, Sapporo, Japan
| | - Babil Pachakkil
- Department of International Agricultural Development, Faculty of International Agriculture and Food Studies, Tokyo University of Agriculture, Setagaya-ku, Tokyo, Japan
| | - Keisuke Tanaka
- NODAI Genome Research Center, Tokyo University of Agriculture, Setagaya-ku, Tokyo, Japan
- Department of Informatics, Tokyo University of Information Sciences, Chiba, Japan
| | - Yutaka Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Japan
| | - Atsushi Toyoda
- Department of Genomics and Evolutionary Biology, National Institute of Genetics, Mishima, Japan
| | - Hideki Hirakawa
- The Department of Technology Development, Kazusa DNA Research Institute, Kisarazu, Japan
| | - Yasuyuki Onodera
- The Research Faculty of Agriculture, Hokkaido University, Sapporo, Japan
| |
Collapse
|
16
|
Schreiber M, Jayakodi M, Stein N, Mascher M. Plant pangenomes for crop improvement, biodiversity and evolution. Nat Rev Genet 2024:10.1038/s41576-024-00691-4. [PMID: 38378816 DOI: 10.1038/s41576-024-00691-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/14/2023] [Indexed: 02/22/2024]
Abstract
Plant genome sequences catalogue genes and the genetic elements that regulate their expression. Such inventories further research aims as diverse as mapping the molecular basis of trait diversity in domesticated plants or inquiries into the origin of evolutionary innovations in flowering plants millions of years ago. The transformative technological progress of DNA sequencing in the past two decades has enabled researchers to sequence ever more genomes with greater ease. Pangenomes - complete sequences of multiple individuals of a species or higher taxonomic unit - have now entered the geneticists' toolkit. The genomes of crop plants and their wild relatives are being studied with translational applications in breeding in mind. But pangenomes are applicable also in ecological and evolutionary studies, as they help classify and monitor biodiversity across the tree of life, deepen our understanding of how plant species diverged and show how plants adapt to changing environments or new selection pressures exerted by human beings.
Collapse
Affiliation(s)
- Mona Schreiber
- Department of Biology, University of Marburg, Marburg, Germany
| | - Murukarthick Jayakodi
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
- Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.
| |
Collapse
|
17
|
Junjun R, Zhengqian Z, Ying W, Jialiang W, Yongzhuang L. A comprehensive review of deep learning-based variant calling methods. Brief Funct Genomics 2024:elae003. [PMID: 38366908 DOI: 10.1093/bfgp/elae003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/14/2024] [Accepted: 01/18/2023] [Indexed: 02/18/2024] Open
Abstract
Genome sequencing data have become increasingly important in the field of personalized medicine and diagnosis. However, accurately detecting genomic variations remains a challenging task. Traditional variation detection methods rely on manual inspection or predefined rules, which can be time-consuming and prone to errors. Consequently, deep learning-based approaches for variation detection have gained attention due to their ability to automatically learn genomic features that distinguish between variants. In our review, we discuss the recent advancements in deep learning-based algorithms for detecting small variations and structural variations in genomic data, as well as their advantages and limitations.
Collapse
Affiliation(s)
- Ren Junjun
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Zhang Zhengqian
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wu Ying
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wang Jialiang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Liu Yongzhuang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| |
Collapse
|
18
|
Arnqvist G, Westerberg I, Galbraith J, Sayadi A, Scofield DG, Olsen RA, Immonen E, Bonath F, Ewels P, Suh A. A chromosome-level assembly of the seed beetle Callosobruchus maculatus genome with annotation of its repetitive elements. G3 (BETHESDA, MD.) 2024; 14:jkad266. [PMID: 38092066 PMCID: PMC10849321 DOI: 10.1093/g3journal/jkad266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 10/30/2023] [Indexed: 02/09/2024]
Abstract
Callosobruchus maculatus is a major agricultural pest of legume crops worldwide and an established model system in ecology and evolution. Yet, current molecular biological resources for this species are limited. Here, we employ Hi-C sequencing to generate a greatly improved genome assembly and we annotate its repetitive elements in a dedicated in-depth effort where we manually curate and classify the most abundant unclassified repeat subfamilies. We present a scaffolded chromosome-level assembly, which is 1.01 Gb in total length with 86% being contained within the 9 autosomes and the X chromosome. Repetitive sequences accounted for 70% of the total assembly. DNA transposons covered 18% of the genome, with the most abundant superfamily being Tc1-Mariner (9.75% of the genome). This new chromosome-level genome assembly of C. maculatus will enable future genetic and evolutionary studies not only of this important species but of beetles more generally.
Collapse
Affiliation(s)
- Göran Arnqvist
- Animal Ecology, Department of Ecology and Genetics, Uppsala University, Uppsala SE75236, Sweden
| | - Ivar Westerberg
- Systematic Biology, Department of Organismal Biology, Uppsala University, Uppsala SE75236, Sweden
- Department of Ecology, Environment and Plant Sciences, Stockholm University, Stockholm SE10691, Sweden
| | - James Galbraith
- School of Biological Sciences, University of Adelaide, Adelaide 5005, Australia
- Faculty of Environment, Science and Economy, University of Exeter, Cornwall TR10 9FE, UK
| | - Ahmed Sayadi
- Rheumatology, Department of Medical Sciences, Uppsala University, Uppsala SE75236, Sweden
| | - Douglas G Scofield
- Evolutionary Biology, Department of Ecology and Genetics, Uppsala University, Uppsala SE75236, Sweden
- Uppsala Multidisciplinary Center for Advanced Computational Science, Uppsala University, Uppsala SE75236, Sweden
| | - Remi-André Olsen
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Stockholm SE10691, Sweden
| | - Elina Immonen
- Evolutionary Biology, Department of Ecology and Genetics, Uppsala University, Uppsala SE75236, Sweden
| | - Franziska Bonath
- Science for Life Laboratory, Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Stockholm SE10691, Sweden
| | | | - Alexander Suh
- Systematic Biology, Department of Organismal Biology, Uppsala University, Uppsala SE75236, Sweden
| |
Collapse
|
19
|
Zhou H, Huang X, Liu J, Ding J, Xu K, Zhu W, He C, Yang L, Zhu J, Han C, Qin C, Luo H, Chen K, Jiang S, Shi Y, Zeng J, Weng Z, Xu Y, Wang Q, Zhong M, Du B, Song S, Meng H. De novo Phased Genome Assembly, Annotation and Population Genotyping of Alectoris Chukar. Sci Data 2024; 11:162. [PMID: 38307880 PMCID: PMC10837146 DOI: 10.1038/s41597-024-02991-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 01/22/2024] [Indexed: 02/04/2024] Open
Abstract
The Alectoris Chukar (chukar) is the most geographically widespread partridge species in the world, demonstrating exceptional adaptability to diverse ecological environments. However, the scarcity of genetic resources for chukar has hindered research into its adaptive evolution and molecular breeding. In this study, we have sequenced and assembled a high-quality, phased chukar genome that consists of 31 pairs of relatively complete diploid chromosomes. Our BUSCO analysis reported a high completeness score of 96.8% and 96.5%, with respect to universal single-copy orthologs and a low duplication rate (0.3% and 0.5%) for two assemblies. Through resequencing and population genomic analyses of six subspecies, we have curated invaluable genotype data that underscores the adaptive evolution of chukar in response to both arid and high-altitude environments. These data will significantly contribute to research on how chukars adaptively evolve to cope with desertification and alpine climates.
Collapse
Affiliation(s)
- Hao Zhou
- Shanghai Collaborative Innovation Center of Agri-Seeds/School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Xunhe Huang
- Jiaying University/Guangdong Provincial Key Laboratory of Conservation and Precision Utilization of Characteristic Agricultural Resources in Mountainous Areas, Meizhou, 514015, China
| | - Jiajia Liu
- Shanghai Collaborative Innovation Center of Agri-Seeds/School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Jinmei Ding
- Shanghai Collaborative Innovation Center of Agri-Seeds/School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Ke Xu
- Shanghai Collaborative Innovation Center of Agri-Seeds/School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Wenqi Zhu
- Shanghai Collaborative Innovation Center of Agri-Seeds/School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Chuan He
- Shanghai Collaborative Innovation Center of Agri-Seeds/School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Lingyu Yang
- Shanghai Collaborative Innovation Center of Agri-Seeds/School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Jianshen Zhu
- Shanghai Collaborative Innovation Center of Agri-Seeds/School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Chengxiao Han
- Shanghai Collaborative Innovation Center of Agri-Seeds/School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Chao Qin
- Shanghai Collaborative Innovation Center of Agri-Seeds/School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Huaixi Luo
- Shanghai Collaborative Innovation Center of Agri-Seeds/School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Kangchun Chen
- Shanghai Collaborative Innovation Center of Agri-Seeds/School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Shengyao Jiang
- Shanghai Collaborative Innovation Center of Agri-Seeds/School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yurou Shi
- School of Life Sciences, Lanzhou University, Lanzhou, 730000, China
| | - Jinyuan Zeng
- School of Life Sciences, Lanzhou University, Lanzhou, 730000, China
| | - Zhuoxian Weng
- Jiaying University/Guangdong Provincial Key Laboratory of Conservation and Precision Utilization of Characteristic Agricultural Resources in Mountainous Areas, Meizhou, 514015, China
| | - Yongjie Xu
- Jiaying University/Guangdong Provincial Key Laboratory of Conservation and Precision Utilization of Characteristic Agricultural Resources in Mountainous Areas, Meizhou, 514015, China
| | - Qing Wang
- Jiaying University/Guangdong Provincial Key Laboratory of Conservation and Precision Utilization of Characteristic Agricultural Resources in Mountainous Areas, Meizhou, 514015, China
| | - Ming Zhong
- Jiaying University/Guangdong Provincial Key Laboratory of Conservation and Precision Utilization of Characteristic Agricultural Resources in Mountainous Areas, Meizhou, 514015, China
| | - Bingwang Du
- Jiaying University/Guangdong Provincial Key Laboratory of Conservation and Precision Utilization of Characteristic Agricultural Resources in Mountainous Areas, Meizhou, 514015, China.
- Department of Animal Science, Guangdong Ocean University, Huguangyan East, Zhanjiang, Guangdong, 524088, China.
| | - Sen Song
- School of Life Sciences, Lanzhou University, Lanzhou, 730000, China.
| | - He Meng
- Shanghai Collaborative Innovation Center of Agri-Seeds/School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| |
Collapse
|
20
|
Mahmoud M, Huang Y, Garimella K, Audano PA, Wan W, Prasad N, Handsaker RE, Hall S, Pionzio A, Schatz MC, Talkowski ME, Eichler EE, Levy SE, Sedlazeck FJ. Utility of long-read sequencing for All of Us. Nat Commun 2024; 15:837. [PMID: 38281971 PMCID: PMC10822842 DOI: 10.1038/s41467-024-44804-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 01/03/2024] [Indexed: 01/30/2024] Open
Abstract
The All of Us (AoU) initiative aims to sequence the genomes of over one million Americans from diverse ethnic backgrounds to improve personalized medical care. In a recent technical pilot, we compare the performance of traditional short-read sequencing with long-read sequencing in a small cohort of samples from the HapMap project and two AoU control samples representing eight datasets. Our analysis reveals substantial differences in the ability of these technologies to accurately sequence complex medically relevant genes, particularly in terms of gene coverage and pathogenic variant identification. We also consider the advantages and challenges of using low coverage sequencing to increase sample numbers in large cohort analysis. Our results show that HiFi reads produce the most accurate results for both small and large variants. Further, we present a cloud-based pipeline to optimize SNV, indel and SV calling at scale for long-reads analysis. These results lead to widespread improvements across AoU.
Collapse
Affiliation(s)
- M Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Y Huang
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, 02141, USA
| | - K Garimella
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, 02141, USA
| | - P A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - W Wan
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, 02141, USA
| | - N Prasad
- Discovery Life Sciences, Huntsville, AL, 35806, USA
| | - R E Handsaker
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02141, USA
| | - S Hall
- Discovery Life Sciences, Huntsville, AL, 35806, USA
| | - A Pionzio
- Discovery Life Sciences, Huntsville, AL, 35806, USA
| | - M C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - M E Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02141, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - E E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - S E Levy
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA
| | - F J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
21
|
Zhang Z, Jiang T, Li G, Cao S, Liu Y, Liu B, Wang Y. Kled: an ultra-fast and sensitive structural variant detection tool for long-read sequencing data. Brief Bioinform 2024; 25:bbae049. [PMID: 38385878 PMCID: PMC10883419 DOI: 10.1093/bib/bbae049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 01/12/2024] [Accepted: 01/26/2024] [Indexed: 02/23/2024] Open
Abstract
Structural Variants (SVs) are a crucial type of genetic variant that can significantly impact phenotypes. Therefore, the identification of SVs is an essential part of modern genomic analysis. In this article, we present kled, an ultra-fast and sensitive SV caller for long-read sequencing data given the specially designed approach with a novel signature-merging algorithm, custom refinement strategies and a high-performance program structure. The evaluation results demonstrate that kled can achieve optimal SV calling compared to several state-of-the-art methods on simulated and real long-read data for different platforms and sequencing depths. Furthermore, kled excels at rapid SV calling and can efficiently utilize multiple Central Processing Unit (CPU) cores while maintaining low memory usage. The source code for kled can be obtained from https://github.com/CoREse/kled.
Collapse
Affiliation(s)
- Zhendong Zhang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Tao Jiang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, 450000, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Gaoyang Li
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Shuqi Cao
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yadong Liu
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, 450000, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Bo Liu
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, 450000, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yadong Wang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, 450000, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| |
Collapse
|
22
|
Behera S, Catreux S, Rossi M, Truong S, Huang Z, Ruehle M, Visvanath A, Parnaby G, Roddey C, Onuchic V, Cameron DL, English A, Mehtalia S, Han J, Mehio R, Sedlazeck FJ. Comprehensive and accurate genome analysis at scale using DRAGEN accelerated algorithms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.02.573821. [PMID: 38260545 PMCID: PMC10802302 DOI: 10.1101/2024.01.02.573821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Research and medical genomics require comprehensive and scalable solutions to drive the discovery of novel disease targets, evolutionary drivers, and genetic markers with clinical significance. This necessitates a framework to identify all types of variants independent of their size (e.g., SNV/SV) or location (e.g., repeats). Here we present DRAGEN that utilizes novel methods based on multigenomes, hardware acceleration, and machine learning based variant detection to provide novel insights into individual genomes with ~30min computation time (from raw reads to variant detection). DRAGEN outperforms all other state-of-the-art methods in speed and accuracy across all variant types (SNV, indel, STR, SV, CNV) and further incorporates specialized methods to obtain key insights in medically relevant genes (e.g., HLA, SMN, GBA). We showcase DRAGEN across 3,202 genomes and demonstrate its scalability, accuracy, and innovations to further advance the integration of comprehensive genomics for research and medical applications.
Collapse
Affiliation(s)
- Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | | | | | | | | | | | | | | | | | | | - Adam English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | | | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, TX, USA
- Department of Computer Science, Rice University, TX, USA
| |
Collapse
|
23
|
Koch E, Pardiñas AF, O'Connell KS, Selvaggi P, Camacho Collados J, Babic A, Marshall SE, Van der Eycken E, Angulo C, Lu Y, Sullivan PF, Dale AM, Molden E, Posthuma D, White N, Schubert A, Djurovic S, Heimer H, Stefánsson H, Stefánsson K, Werge T, Sønderby I, O'Donovan MC, Walters JTR, Milani L, Andreassen OA. How Real-World Data Can Facilitate the Development of Precision Medicine Treatment in Psychiatry. Biol Psychiatry 2024:S0006-3223(24)00003-9. [PMID: 38185234 DOI: 10.1016/j.biopsych.2024.01.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 12/20/2023] [Accepted: 01/02/2024] [Indexed: 01/09/2024]
Abstract
Precision medicine has the ambition to improve treatment response and clinical outcomes through patient stratification and holds great potential for the treatment of mental disorders. However, several important factors are needed to transform current practice into a precision psychiatry framework. Most important are 1) the generation of accessible large real-world training and test data including genomic data integrated from multiple sources, 2) the development and validation of advanced analytical tools for stratification and prediction, and 3) the development of clinically useful management platforms for patient monitoring that can be integrated into health care systems in real-life settings. This narrative review summarizes strategies for obtaining the key elements-well-powered samples from large biobanks integrated with electronic health records and health registry data using novel artificial intelligence algorithms-to predict outcomes in severe mental disorders and translate these models into clinical management and treatment approaches. Key elements are massive mental health data and novel artificial intelligence algorithms. For the clinical translation of these strategies, we discuss a precision medicine platform for improved management of mental disorders. We use cases to illustrate how precision medicine interventions could be brought into psychiatry to improve the clinical outcomes of mental disorders.
Collapse
Affiliation(s)
- Elise Koch
- Norwegian Centre for Mental Disorders Research, Division of Mental Health and Addiction, Oslo University Hospital, and Institute of Clinical Medicine, University of Oslo, Oslo, Norway.
| | - Antonio F Pardiñas
- Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - Kevin S O'Connell
- Norwegian Centre for Mental Disorders Research, Division of Mental Health and Addiction, Oslo University Hospital, and Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Pierluigi Selvaggi
- Department of Translational Biomedicine and Neuroscience, University of Bari Aldo Moro, Bari, Italy
| | - José Camacho Collados
- CardiffNLP, School of Computer Science and Informatics, Cardiff University, Cardiff, United Kingdom
| | | | | | - Erik Van der Eycken
- Global Alliance of Mental Illness Advocacy Networks-Europe, Brussels, Belgium
| | - Cecilia Angulo
- Global Alliance of Mental Illness Advocacy Networks-Europe, Brussels, Belgium
| | - Yi Lu
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Solna, Sweden
| | - Patrick F Sullivan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Solna, Sweden; Departments of Genetics and Psychiatry, University of North Carolina, Chapel Hill, North Carolina
| | - Anders M Dale
- Multimodal Imaging Laboratory, University of California San Diego, La Jolla, California; Departments of Radiology, Psychiatry, and Neurosciences, University of California, San Diego, La Jolla, California
| | - Espen Molden
- Center for Psychopharmacology, Diakonhjemmet Hospital, Oslo, Norway
| | - Danielle Posthuma
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Nathan White
- CorTechs Laboratories, Inc., San Diego, California
| | | | - Srdjan Djurovic
- Department of Medical Genetics, Oslo University Hospital, Oslo, Norway; The Norwegian Centre for Mental Disorders Research Centre, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Hakon Heimer
- Norwegian Centre for Mental Disorders Research, Division of Mental Health and Addiction, Oslo University Hospital, and Institute of Clinical Medicine, University of Oslo, Oslo, Norway; Nordic Society of Human Genetics and Precision Medicine, Copenhagen, Denmark
| | | | | | - Thomas Werge
- Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Mental Health Services Copenhagen, Roskilde, Denmark; Lundbeck Foundation Initiative for Integrative Psychiatric Research, Copenhagen, Denmark; Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| | - Ida Sønderby
- Norwegian Centre for Mental Disorders Research, Division of Mental Health and Addiction, Oslo University Hospital, and Institute of Clinical Medicine, University of Oslo, Oslo, Norway; Department of Medical Genetics, Oslo University Hospital, Oslo, Norway; KG Jebsen Centre for Neurodevelopmental Disorders, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Michael C O'Donovan
- Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - James T R Walters
- Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - Lili Milani
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia; Genetics and Personalized Medicine Clinic, Tartu University Hospital, Tartu, Estonia
| | - Ole A Andreassen
- Norwegian Centre for Mental Disorders Research, Division of Mental Health and Addiction, Oslo University Hospital, and Institute of Clinical Medicine, University of Oslo, Oslo, Norway; KG Jebsen Centre for Neurodevelopmental Disorders, University of Oslo and Oslo University Hospital, Oslo, Norway.
| |
Collapse
|
24
|
Smolka M, Paulin LF, Grochowski CM, Horner DW, Mahmoud M, Behera S, Kalef-Ezra E, Gandhi M, Hong K, Pehlivan D, Scholz SW, Carvalho CMB, Proukakis C, Sedlazeck FJ. Detection of mosaic and population-level structural variants with Sniffles2. Nat Biotechnol 2024:10.1038/s41587-023-02024-y. [PMID: 38168980 PMCID: PMC11217151 DOI: 10.1038/s41587-023-02024-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 10/11/2023] [Indexed: 01/05/2024]
Abstract
Calling structural variations (SVs) is technically challenging, but using long reads remains the most accurate way to identify complex genomic alterations. Here we present Sniffles2, which improves over current methods by implementing a repeat aware clustering coupled with a fast consensus sequence and coverage-adaptive filtering. Sniffles2 is 11.8 times faster and 29% more accurate than state-of-the-art SV callers across different coverages (5-50×), sequencing technologies (ONT and HiFi) and SV types. Furthermore, Sniffles2 solves the problem of family-level to population-level SV calling to produce fully genotyped VCF files. Across 11 probands, we accurately identified causative SVs around MECP2, including highly complex alleles with three overlapping SVs. Sniffles2 also enables the detection of mosaic SVs in bulk long-read data. As a result, we identified multiple mosaic SVs in brain tissue from a patient with multiple system atrophy. The identified SV showed a remarkable diversity within the cingulate cortex, impacting both genes involved in neuron function and repetitive elements.
Collapse
Affiliation(s)
- Moritz Smolka
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Luis F Paulin
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | | | - Dominic W Horner
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Sairam Behera
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Ester Kalef-Ezra
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Mira Gandhi
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Karl Hong
- Bionano Genomics, San Diego, CA, USA
| | - Davut Pehlivan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Division of Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Sonja W Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Claudia M B Carvalho
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Christos Proukakis
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
25
|
Dylus D, Altenhoff A, Majidian S, Sedlazeck FJ, Dessimoz C. Inference of phylogenetic trees directly from raw sequencing reads using Read2Tree. Nat Biotechnol 2024; 42:139-147. [PMID: 37081138 PMCID: PMC10791578 DOI: 10.1038/s41587-023-01753-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 03/16/2023] [Indexed: 04/22/2023]
Abstract
Current methods for inference of phylogenetic trees require running complex pipelines at substantial computational and labor costs, with additional constraints in sequencing coverage, assembly and annotation quality, especially for large datasets. To overcome these challenges, we present Read2Tree, which directly processes raw sequencing reads into groups of corresponding genes and bypasses traditional steps in phylogeny inference, such as genome assembly, annotation and all-versus-all sequence comparisons, while retaining accuracy. In a benchmark encompassing a broad variety of datasets, Read2Tree is 10-100 times faster than assembly-based approaches and in most cases more accurate-the exception being when sequencing coverage is high and reference species very distant. Here, to illustrate the broad applicability of the tool, we reconstruct a yeast tree of life of 435 species spanning 590 million years of evolution. We also apply Read2Tree to >10,000 Coronaviridae samples, accurately classifying highly diverse animal samples and near-identical severe acute respiratory syndrome coronavirus 2 sequences on a single tree. The speed, accuracy and versatility of Read2Tree enable comparative genomics at scale.
Collapse
Affiliation(s)
- David Dylus
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- F. Hoffmann-La Roche Ltd, Immunology, Infectious Disease, and Ophthalmology (I2O), Roche Pharmaceutical Research and Early Development (pRED), Basel, Switzerland
| | - Adrian Altenhoff
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Computer Science, ETH, Zurich, Switzerland
| | - Sina Majidian
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.
- Department of Computer Science, University College London, London, UK.
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London, UK.
| |
Collapse
|
26
|
Ye R, Wang A, Bu B, Luo P, Deng W, Zhang X, Yin S. Viral oncogenes, viruses, and cancer: a third-generation sequencing perspective on viral integration into the human genome. Front Oncol 2023; 13:1333812. [PMID: 38188304 PMCID: PMC10768168 DOI: 10.3389/fonc.2023.1333812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 12/06/2023] [Indexed: 01/09/2024] Open
Abstract
The link between viruses and cancer has intrigued scientists for decades. Certain viruses have been shown to be vital in the development of various cancers by integrating viral DNA into the host genome and activating viral oncogenes. These viruses include the Human Papillomavirus (HPV), Hepatitis B and C Viruses (HBV and HCV), Epstein-Barr Virus (EBV), and Human T-Cell Leukemia Virus (HTLV-1), which are all linked to the development of a myriad of human cancers. Third-generation sequencing technologies have revolutionized our ability to study viral integration events at unprecedented resolution in recent years. They offer long sequencing capabilities along with the ability to map viral integration sites, assess host gene expression, and track clonal evolution in cancer cells. Recently, researchers have been exploring the application of Oxford Nanopore Technologies (ONT) nanopore sequencing and Pacific BioSciences (PacBio) single-molecule real-time (SMRT) sequencing in cancer research. As viral integration is crucial to the development of cancer via viruses, third-generation sequencing would provide a novel approach to studying the relationship interlinking viral oncogenes, viruses, and cancer. This review article explores the molecular mechanisms underlying viral oncogenesis, the role of viruses in cancer development, and the impact of third-generation sequencing on our understanding of viral integration into the human genome.
Collapse
Affiliation(s)
- Ruichen Ye
- Department of Pathology, Albert Einstein College of Medicine, Bronx, NY, United States
- Einstein Pathology Single-cell & Bioinformatics Laboratory, Bronx, NY, United States
- Stony Brook University, Stony Brook, NY, United States
| | - Angelina Wang
- Tufts Friedman School of Nutrition, Boston, MA, United States
| | - Brady Bu
- Horace Mann School, Bronx, NY, United States
| | - Pengxiang Luo
- Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Wenjun Deng
- Clinical Proteomics Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Xinyi Zhang
- Department of Respiratory Diseases, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Shanye Yin
- Department of Pathology, Albert Einstein College of Medicine, Bronx, NY, United States
- Einstein Pathology Single-cell & Bioinformatics Laboratory, Bronx, NY, United States
| |
Collapse
|
27
|
Liu S, Ebel ER, Luniewski A, Zulawinska J, Simpson ML, Kim J, Ene N, Braukmann TWA, Congdon M, Santos W, Yeh E, Guler JL. Direct long read visualization reveals metabolic interplay between two antimalarial drug targets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.13.528367. [PMID: 36824743 PMCID: PMC9948948 DOI: 10.1101/2023.02.13.528367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
Abstract
Increases in the copy number of large genomic regions, termed genome amplification, are an important adaptive strategy for malaria parasites. Numerous amplifications across the Plasmodium falciparum genome contribute directly to drug resistance or impact the fitness of this protozoan parasite. During the characterization of parasite lines with amplifications of the dihydroorotate dehydrogenase (DHODH) gene, we detected increased copies of an additional genomic region that encompassed 3 genes (~5 kb) including GTP cyclohydrolase I (GCH1 amplicon). While this gene is reported to increase the fitness of antifolate resistant parasites, GCH1 amplicons had not previously been implicated in any other antimalarial resistance context. Here, we further explored the association between GCH1 and DHODH copy number. Using long read sequencing and single read visualization, we directly observed a higher number of tandem GCH1 amplicons in parasites with increased DHODH copies (up to 9 amplicons) compared to parental parasites (3 amplicons). While all GCH1 amplicons shared a consistent structure, expansions arose in 2-unit steps (from 3 to 5 to 7, etc copies). Adaptive evolution of DHODH and GCH1 loci was further bolstered when we evaluated prior selection experiments; DHODH amplification was only successful in parasite lines with pre-existing GCH1 amplicons. These observations, combined with the direct connection between metabolic pathways that contain these enzymes, lead us to propose that the GCH1 locus is beneficial for the fitness of parasites exposed to DHODH inhibitors. This finding highlights the importance of studying variation within individual parasite genomes as well as biochemical connections of drug targets as novel antimalarials move towards clinical approval.
Collapse
Affiliation(s)
- Shiwei Liu
- University of Virginia, Department of Biology, Charlottesville, VA, USA
- Current affiliation: Indiana University School of Medicine, Indianapolis, IN, USA
| | - Emily R. Ebel
- Stanford, Departments of Pediatrics and Microbiology & Immunology, Stanford, CA, USA
| | | | - Julia Zulawinska
- University of Virginia, Department of Biology, Charlottesville, VA, USA
| | | | - Jane Kim
- University of Virginia, Department of Biology, Charlottesville, VA, USA
| | - Nnenna Ene
- University of Virginia, Department of Biology, Charlottesville, VA, USA
| | | | - Molly Congdon
- Virginia Tech, Department of Chemistry, Blacksburg, VA, USA
| | - Webster Santos
- Virginia Tech, Department of Chemistry, Blacksburg, VA, USA
| | - Ellen Yeh
- Stanford University, Departments of Pathology and Microbiology & Immunology, Stanford, CA, USA
| | - Jennifer L. Guler
- University of Virginia, Department of Biology, Charlottesville, VA, USA
| |
Collapse
|
28
|
Choo ZN, Behr JM, Deshpande A, Hadi K, Yao X, Tian H, Takai K, Zakusilo G, Rosiene J, Da Cruz Paula A, Weigelt B, Setton J, Riaz N, Powell SN, Busam K, Shoushtari AN, Ariyan C, Reis-Filho J, de Lange T, Imieliński M. Most large structural variants in cancer genomes can be detected without long reads. Nat Genet 2023; 55:2139-2148. [PMID: 37945902 PMCID: PMC10703688 DOI: 10.1038/s41588-023-01540-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 09/19/2023] [Indexed: 11/12/2023]
Abstract
Short-read sequencing is the workhorse of cancer genomics yet is thought to miss many structural variants (SVs), particularly large chromosomal alterations. To characterize missing SVs in short-read whole genomes, we analyzed 'loose ends'-local violations of mass balance between adjacent DNA segments. In the landscape of loose ends across 1,330 high-purity cancer whole genomes, most large (>10-kb) clonal SVs were fully resolved by short reads in the 87% of the human genome where copy number could be reliably measured. Some loose ends represent neotelomeres, which we propose as a hallmark of the alternative lengthening of telomeres phenotype. These pan-cancer findings were confirmed by long-molecule profiles of 38 breast cancer and melanoma cases. Our results indicate that aberrant homologous recombination is unlikely to drive the majority of large cancer SVs. Furthermore, analysis of mass balance in short-read whole genome data provides a surprisingly complete picture of cancer chromosomal structure.
Collapse
Affiliation(s)
- Zi-Ning Choo
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Tri-institutional MD PhD Program, Weill Cornell Medicine, New York, NY, USA
- Physiology and Biophysics PhD Program, Weill Cornell Medicine, New York, NY, USA
| | - Julie M Behr
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Tri-institutional PhD Program in Computational Biology and Medicine, New York, NY, USA
| | - Aditya Deshpande
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Tri-institutional PhD Program in Computational Biology and Medicine, New York, NY, USA
| | - Kevin Hadi
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Physiology and Biophysics PhD Program, Weill Cornell Medicine, New York, NY, USA
| | - Xiaotong Yao
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Tri-institutional PhD Program in Computational Biology and Medicine, New York, NY, USA
| | - Huasong Tian
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Grossman School of Medicine, New York, NY, USA
| | - Kaori Takai
- Laboratory of Cell Biology and Genetics, Rockefeller University, New York, NY, USA
| | - George Zakusilo
- Laboratory of Cell Biology and Genetics, Rockefeller University, New York, NY, USA
| | - Joel Rosiene
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | | | - Britta Weigelt
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Jeremy Setton
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Nadeem Riaz
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Simon N Powell
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Klaus Busam
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | | | | | | | - Titia de Lange
- Laboratory of Cell Biology and Genetics, Rockefeller University, New York, NY, USA
| | - Marcin Imieliński
- New York Genome Center, New York, NY, USA.
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA.
- Perlmutter Cancer Center, NYU Grossman School of Medicine, New York, NY, USA.
- Department of Pathology, NYU Grossman School of Medicine, New York, NY, USA.
| |
Collapse
|
29
|
Steigerwald C, Borsuk J, Pappas J, Galey M, Scott A, Devaney JM, Miller DE, Abreu NJ. CLN2 disease resulting from a novel homozygous deep intronic splice variant in TPP1 discovered using long-read sequencing. Mol Genet Metab 2023; 140:107713. [PMID: 37922835 DOI: 10.1016/j.ymgme.2023.107713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 10/22/2023] [Indexed: 11/07/2023]
Abstract
Neuronal ceroid lipofuscinosis type 2 (CLN2) is an autosomal recessive neurodegenerative disorder with enzyme replacement therapy available. We present two siblings with a clinical diagnosis of CLN2 disease, but no identifiable TPP1 variants after standard clinical testing. Long-read sequencing identified a homozygous deep intronic variant predicted to affect splicing, confirmed by clinical DNA and RNA sequencing. This case demonstrates how traditional laboratory assays can complement emerging molecular technologies to provide a precise molecular diagnosis.
Collapse
Affiliation(s)
- Connolly Steigerwald
- Division of Neurogenetics, Department of Neurology, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Jill Borsuk
- Division of Clinical Genetics, Department of Pediatrics, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - John Pappas
- Division of Clinical Genetics, Department of Pediatrics, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Miranda Galey
- Division of Genetic Medicine, Department of Pediatrics, University of Washington and Seattle Children's Hospital, Seattle, WA 98195, USA
| | - Anna Scott
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, USA; Department of Laboratories, Seattle Children's Hospital, Seattle, WA 08105, USA
| | | | - Danny E Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington and Seattle Children's Hospital, Seattle, WA 98195, USA; Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, USA; Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA
| | - Nicolas J Abreu
- Division of Neurogenetics, Department of Neurology, NYU Grossman School of Medicine, New York, NY 10016, USA.
| |
Collapse
|
30
|
Dallaire X, Bouchard R, Hénault P, Ulmo-Diaz G, Normandeau E, Mérot C, Bernatchez L, Moore JS. Widespread Deviant Patterns of Heterozygosity in Whole-Genome Sequencing Due to Autopolyploidy, Repeated Elements, and Duplication. Genome Biol Evol 2023; 15:evad229. [PMID: 38085037 PMCID: PMC10752349 DOI: 10.1093/gbe/evad229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/30/2023] [Indexed: 12/28/2023] Open
Abstract
Most population genomic tools rely on accurate single nucleotide polymorphism (SNP) calling and filtering to meet their underlying assumptions. However, genomic complexity, resulting from structural variants, paralogous sequences, and repetitive elements, presents significant challenges in assembling contiguous reference genomes. Consequently, short-read resequencing studies can encounter mismapping issues, leading to SNPs that deviate from Mendelian expected patterns of heterozygosity and allelic ratio. In this study, we employed the ngsParalog software to identify such deviant SNPs in whole-genome sequencing (WGS) data with low (1.5×) to intermediate (4.8×) coverage for four species: Arctic Char (Salvelinus alpinus), Lake Whitefish (Coregonus clupeaformis), Atlantic Salmon (Salmo salar), and the American Eel (Anguilla rostrata). The analyses revealed that deviant SNPs accounted for 22% to 62% of all SNPs in salmonid datasets and approximately 11% in the American Eel dataset. These deviant SNPs were particularly concentrated within repetitive elements and genomic regions that had recently undergone rediploidization in salmonids. Additionally, narrow peaks of elevated coverage were ubiquitous along all four reference genomes, encompassed most deviant SNPs, and could be partially associated with transposons and tandem repeats. Including these deviant SNPs in genomic analyses led to highly distorted site frequency spectra, underestimated pairwise FST values, and overestimated nucleotide diversity. Considering the widespread occurrence of deviant SNPs arising from a variety of sources, their important impact in estimating population parameters, and the availability of effective tools to identify them, we propose that excluding deviant SNPs from WGS datasets is required to improve genomic inferences for a wide range of taxa and sequencing depths.
Collapse
Affiliation(s)
- Xavier Dallaire
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- Centre d'Études Nordiques, Université Laval, Québec, Canada
| | - Raphael Bouchard
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- Ressources Aquatique Québec, Université de Rimouski, Rimouski, Canada
| | - Philippe Hénault
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- Ressources Aquatique Québec, Université de Rimouski, Rimouski, Canada
| | - Gabriela Ulmo-Diaz
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- Ressources Aquatique Québec, Université de Rimouski, Rimouski, Canada
| | - Eric Normandeau
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- Ressources Aquatique Québec, Université de Rimouski, Rimouski, Canada
- Plateforme de bio-informatique de l’IBIS, Université Laval, Québec, Canada
| | - Claire Mérot
- CNRS, UMR 6553 ECOBIO, Université de Rennes, Rennes, France
| | - Louis Bernatchez
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- Ressources Aquatique Québec, Université de Rimouski, Rimouski, Canada
| | - Jean-Sébastien Moore
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- Centre d'Études Nordiques, Université Laval, Québec, Canada
- Ressources Aquatique Québec, Université de Rimouski, Rimouski, Canada
| |
Collapse
|
31
|
Ren L, Duan X, Dong L, Zhang R, Yang J, Gao Y, Peng R, Hou W, Liu Y, Li J, Yu Y, Zhang N, Shang J, Liang F, Wang D, Chen H, Sun L, Hao L, Scherer A, Nordlund J, Xiao W, Xu J, Tong W, Hu X, Jia P, Ye K, Li J, Jin L, Hong H, Wang J, Fan S, Fang X, Zheng Y, Shi L. Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance. Genome Biol 2023; 24:270. [PMID: 38012772 PMCID: PMC10680274 DOI: 10.1186/s13059-023-03109-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 11/13/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND Genomic DNA reference materials are widely recognized as essential for ensuring data quality in omics research. However, relying solely on reference datasets to evaluate the accuracy of variant calling results is incomplete, as they are limited to benchmark regions. Therefore, it is important to develop DNA reference materials that enable the assessment of variant detection performance across the entire genome. RESULTS We established a DNA reference material suite from four immortalized cell lines derived from a family of parents and monozygotic twins. Comprehensive reference datasets of 4.2 million small variants and 15,000 structural variants were integrated and certified for evaluating the reliability of germline variant calls inside the benchmark regions. Importantly, the genetic built-in-truth of the Quartet family design enables estimation of the precision of variant calls outside the benchmark regions. Using the Quartet reference materials along with study samples, batch effects are objectively monitored and alleviated by training a machine learning model with the Quartet reference datasets to remove potential artifact calls. Moreover, the matched RNA and protein reference materials and datasets from the Quartet project enables cross-omics validation of variant calls from multiomics data. CONCLUSIONS The Quartet DNA reference materials and reference datasets provide a unique resource for objectively assessing the quality of germline variant calls throughout the whole-genome regions and improving the reliability of large-scale genomic profiling.
Collapse
Affiliation(s)
- Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Xiaoke Duan
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | | | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, Guangdong, China
| | - Yuechen Gao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Rongxue Peng
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Jingjing Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Fan Liang
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Depeng Wang
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Hui Chen
- OrigiMed Co., Ltd, Shanghai, China
| | - Lele Sun
- Sequanta Technologies Co., Ltd, Shanghai, China
| | | | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Jessica Nordlund
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Department of Medical Sciences, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Wenming Xiao
- Office of Oncologic Diseases, Office of New Drugs, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Xin Hu
- Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Peng Jia
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Jing Wang
- National Institute of Metrology, Beijing, China.
| | - Shaohua Fan
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Xiang Fang
- National Institute of Metrology, Beijing, China.
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Shanghai Cancer Center, Fudan University, Shanghai, China
- International Human Phenome Institutes, Shanghai, China
| |
Collapse
|
32
|
Zhao Y, Huang F, Wang W, Gao R, Fan L, Wang A, Gao SH. Application of high-throughput sequencing technologies and analytical tools for pathogen detection in urban water systems: Progress and future perspectives. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 900:165867. [PMID: 37516185 DOI: 10.1016/j.scitotenv.2023.165867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 07/25/2023] [Accepted: 07/26/2023] [Indexed: 07/31/2023]
Abstract
The ubiquitous presence of pathogenic microorganisms, such as viruses, bacteria, fungi, and protozoa, in urban water systems poses a significant risk to public health. The emergence of infectious waterborne diseases mediated by urban water systems has become one of the leading global causes of mortality. However, the detection and monitoring of these pathogenic microorganisms have been limited by the complexity and diversity in the environmental samples. Conventional methods were restricted by long assay time, high benchmarks of identification, and narrow application sceneries. Novel technologies, such as high-throughput sequencing technologies, enable potentially full-spectrum detection of trace pathogenic microorganisms in complex environmental matrices. This review discusses the current state of high-throughput sequencing technologies for identifying pathogenic microorganisms in urban water systems with a concise summary. Furthermore, future perspectives in pathogen research emphasize the need for detection methods with high accuracy and sensitivity, the establishment of precise detection standards and procedures, and the significance of bioinformatics software and platforms. We have compiled a list of pathogens analysis software/platforms/databases that boast robust engines and high accuracy for preference. We highlight the significance of analyses by combining targeted and non-targeted sequencing technologies, short and long reads technologies, sequencing technologies, and bioinformatic tools in pursuing upgraded biosafety in urban water systems.
Collapse
Affiliation(s)
- Yanmei Zhao
- State Key Laboratory of Urban Water Resource and Environment, School of Civil & Environmental Engineering, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China
| | - Fang Huang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Wenxiu Wang
- Department of Ocean Science and Engineering, Southern University of Science and Technology (SUSTech), Shenzhen, China.
| | - Rui Gao
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Lu Fan
- Department of Ocean Science and Engineering, Southern University of Science and Technology (SUSTech), Shenzhen, China; Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou, China
| | - Aijie Wang
- State Key Laboratory of Urban Water Resource and Environment, School of Civil & Environmental Engineering, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China; State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Shu-Hong Gao
- State Key Laboratory of Urban Water Resource and Environment, School of Civil & Environmental Engineering, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China.
| |
Collapse
|
33
|
Bredemeyer KR, Hillier L, Harris AJ, Hughes GM, Foley NM, Lawless C, Carroll RA, Storer JM, Batzer MA, Rice ES, Davis BW, Raudsepp T, O'Brien SJ, Lyons LA, Warren WC, Murphy WJ. Single-haplotype comparative genomics provides insights into lineage-specific structural variation during cat evolution. Nat Genet 2023; 55:1953-1963. [PMID: 37919451 PMCID: PMC10845050 DOI: 10.1038/s41588-023-01548-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 09/20/2023] [Indexed: 11/04/2023]
Abstract
The role of structurally dynamic genomic regions in speciation is poorly understood due to challenges inherent in diploid genome assembly. Here we reconstructed the evolutionary dynamics of structural variation in five cat species by phasing the genomes of three interspecies F1 hybrids to generate near-gapless single-haplotype assemblies. We discerned that cat genomes have a paucity of segmental duplications relative to great apes, explaining their remarkable karyotypic stability. X chromosomes were hotspots of structural variation, including enrichment with inversions in a large recombination desert with characteristics of a supergene. The X-linked macrosatellite DXZ4 evolves more rapidly than 99.5% of the genome clarifying its role in felid hybrid incompatibility. Resolved sensory gene repertoires revealed functional copy number changes associated with ecomorphological adaptations, sociality and domestication. This study highlights the value of gapless genomes to reveal structural mechanisms underpinning karyotypic evolution, reproductive isolation and ecological niche adaptation.
Collapse
Affiliation(s)
- Kevin R Bredemeyer
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
- Interdisciplinary Program in Genetics & Genomics, Texas A&M University, College Station, TX, USA
| | - LaDeana Hillier
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Andrew J Harris
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
- Interdisciplinary Program in Genetics & Genomics, Texas A&M University, College Station, TX, USA
| | - Graham M Hughes
- School of Biology & Environmental Sciences, University College Dublin, Dublin, Ireland
| | - Nicole M Foley
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
| | - Colleen Lawless
- School of Biology & Environmental Sciences, University College Dublin, Dublin, Ireland
| | - Rachel A Carroll
- Department of Animal Sciences, University of Missouri, Columbia, MO, USA
| | | | - Mark A Batzer
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| | - Edward S Rice
- Department of Animal Sciences, University of Missouri, Columbia, MO, USA
| | - Brian W Davis
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
- Interdisciplinary Program in Genetics & Genomics, Texas A&M University, College Station, TX, USA
| | - Terje Raudsepp
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
- Interdisciplinary Program in Genetics & Genomics, Texas A&M University, College Station, TX, USA
| | - Stephen J O'Brien
- Guy Harvey Oceanographic Center, Nova Southeastern University, Fort Lauderdale, FL, USA
| | - Leslie A Lyons
- Department of Veterinary Medicine & Surgery, University of Missouri, Columbia, MO, USA
| | - Wesley C Warren
- Department of Animal Sciences, University of Missouri, Columbia, MO, USA.
| | - William J Murphy
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA.
- Interdisciplinary Program in Genetics & Genomics, Texas A&M University, College Station, TX, USA.
| |
Collapse
|
34
|
Dwivedi SL, Quiroz LF, Reddy ASN, Spillane C, Ortiz R. Alternative Splicing Variation: Accessing and Exploiting in Crop Improvement Programs. Int J Mol Sci 2023; 24:15205. [PMID: 37894886 PMCID: PMC10607462 DOI: 10.3390/ijms242015205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 10/09/2023] [Accepted: 10/10/2023] [Indexed: 10/29/2023] Open
Abstract
Alternative splicing (AS) is a gene regulatory mechanism modulating gene expression in multiple ways. AS is prevalent in all eukaryotes including plants. AS generates two or more mRNAs from the precursor mRNA (pre-mRNA) to regulate transcriptome complexity and proteome diversity. Advances in next-generation sequencing, omics technology, bioinformatics tools, and computational methods provide new opportunities to quantify and visualize AS-based quantitative trait variation associated with plant growth, development, reproduction, and stress tolerance. Domestication, polyploidization, and environmental perturbation may evolve novel splicing variants associated with agronomically beneficial traits. To date, pre-mRNAs from many genes are spliced into multiple transcripts that cause phenotypic variation for complex traits, both in model plant Arabidopsis and field crops. Cataloguing and exploiting such variation may provide new paths to enhance climate resilience, resource-use efficiency, productivity, and nutritional quality of staple food crops. This review provides insights into AS variation alongside a gene expression analysis to select for novel phenotypic diversity for use in breeding programs. AS contributes to heterosis, enhances plant symbiosis (mycorrhiza and rhizobium), and provides a mechanistic link between the core clock genes and diverse environmental clues.
Collapse
Affiliation(s)
| | - Luis Felipe Quiroz
- Agriculture and Bioeconomy Research Centre, Ryan Institute, University of Galway, University Road, H91 REW4 Galway, Ireland
| | - Anireddy S N Reddy
- Department of Biology and Program in Cell and Molecular Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - Charles Spillane
- Agriculture and Bioeconomy Research Centre, Ryan Institute, University of Galway, University Road, H91 REW4 Galway, Ireland
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences, 23053 Alnarp, SE, Sweden
| |
Collapse
|
35
|
Majidian S, Agustinho DP, Chin CS, Sedlazeck FJ, Mahmoud M. Genomic variant benchmark: if you cannot measure it, you cannot improve it. Genome Biol 2023; 24:221. [PMID: 37798733 PMCID: PMC10552390 DOI: 10.1186/s13059-023-03061-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 09/18/2023] [Indexed: 10/07/2023] Open
Abstract
Genomic benchmark datasets are essential to driving the field of genomics and bioinformatics. They provide a snapshot of the performances of sequencing technologies and analytical methods and highlight future challenges. However, they depend on sequencing technology, reference genome, and available benchmarking methods. Thus, creating a genomic benchmark dataset is laborious and highly challenging, often involving multiple sequencing technologies, different variant calling tools, and laborious manual curation. In this review, we discuss the available benchmark datasets and their utility. Additionally, we focus on the most recent benchmark of genes with medical relevance and challenging genomic complexity.
Collapse
Affiliation(s)
- Sina Majidian
- Department of Computational Biology, University of Lausanne, 1015, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | | | | | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, 77030, USA.
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA.
| | - Medhat Mahmoud
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, 77030, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
36
|
Kolmogorov M, Billingsley KJ, Mastoras M, Meredith M, Monlong J, Lorig-Roach R, Asri M, Alvarez Jerez P, Malik L, Dewan R, Reed X, Genner RM, Daida K, Behera S, Shafin K, Pesout T, Prabakaran J, Carnevali P, Yang J, Rhie A, Scholz SW, Traynor BJ, Miga KH, Jain M, Timp W, Phillippy AM, Chaisson M, Sedlazeck FJ, Blauwendraat C, Paten B. Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation. Nat Methods 2023; 20:1483-1492. [PMID: 37710018 PMCID: PMC11222905 DOI: 10.1038/s41592-023-01993-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Accepted: 08/04/2023] [Indexed: 09/16/2023]
Abstract
Long-read sequencing technologies substantially overcome the limitations of short-reads but have not been considered as a feasible replacement for population-scale projects, being a combination of too expensive, not scalable enough or too error-prone. Here we develop an efficient and scalable wet lab and computational protocol, Napu, for Oxford Nanopore Technologies long-read sequencing that seeks to address those limitations. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the National Institutes of Health Center for Alzheimer's and Related Dementias. Using a single PromethION flow cell, we can detect single nucleotide polymorphisms with F1-score comparable to Illumina short-read sequencing. Small indel calling remains difficult within homopolymers and tandem repeats, but achieves good concordance to Illumina indel calls elsewhere. Further, we can discover structural variants with F1-score on par with state-of-the-art de novo assembly methods. Our protocol phases small and structural variants at megabase scales and produces highly accurate, haplotype-specific methylation calls.
Collapse
Affiliation(s)
- Mikhail Kolmogorov
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Kimberley J Billingsley
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA.
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA.
| | - Mira Mastoras
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | - Mobin Asri
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Pilar Alvarez Jerez
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Laksh Malik
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Ramita Dewan
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Xylena Reed
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Rylee M Genner
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Kensuke Daida
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | - Trevor Pesout
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Jeshuwin Prabakaran
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, USA
| | | | - Jianzhi Yang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sonja W Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Bryan J Traynor
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Miten Jain
- Department of Bioengineering, Northeastern University, Boston, MA, USA
- Department of Physics, Northeastern University, Boston, MA, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mark Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Cornelis Blauwendraat
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA.
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA.
| | | |
Collapse
|
37
|
Bonnet K, Marschall T, Doerr D. Constructing founder sets under allelic and non-allelic homologous recombination. Algorithms Mol Biol 2023; 18:15. [PMID: 37775806 PMCID: PMC10543304 DOI: 10.1186/s13015-023-00241-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 08/23/2023] [Indexed: 10/01/2023] Open
Abstract
Homologous recombination between the maternal and paternal copies of a chromosome is a key mechanism for human inheritance and shapes population genetic properties of our species. However, a similar mechanism can also act between different copies of the same sequence, then called non-allelic homologous recombination (NAHR). This process can result in genomic rearrangements-including deletion, duplication, and inversion-and is underlying many genomic disorders. Despite its importance for genome evolution and disease, there is a lack of computational models to study genomic loci prone to NAHR. In this work, we propose such a computational model, providing a unified framework for both (allelic) homologous recombination and NAHR. Our model represents a set of genomes as a graph, where haplotypes correspond to walks through this graph. We formulate two founder set problems under our recombination model, provide flow-based algorithms for their solution, describe exact methods to characterize the number of recombinations, and demonstrate scalability to problem instances arising in practice.
Collapse
Affiliation(s)
- Konstantinn Bonnet
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, and Center for Digital Medicine, Heinrich Heine University, Moorenstr. 5, 40225, Düsseldorf, Germany
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, and Center for Digital Medicine, Heinrich Heine University, Moorenstr. 5, 40225, Düsseldorf, Germany.
| | - Daniel Doerr
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, and Center for Digital Medicine, Heinrich Heine University, Moorenstr. 5, 40225, Düsseldorf, Germany.
| |
Collapse
|
38
|
Mochizuki T, Sakamoto M, Tanizawa Y, Nakayama T, Tanifuji G, Kamikawa R, Nakamura Y. A practical assembly guideline for genomes with various levels of heterozygosity. Brief Bioinform 2023; 24:bbad337. [PMID: 37798248 PMCID: PMC10555665 DOI: 10.1093/bib/bbad337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 08/06/2023] [Accepted: 09/03/2023] [Indexed: 10/07/2023] Open
Abstract
Although current long-read sequencing technologies have a long-read length that facilitates assembly for genome reconstruction, they have high sequence errors. While various assemblers with different perspectives have been developed, no systematic evaluation of assemblers with long reads for diploid genomes with varying heterozygosity has been performed. Here, we evaluated a series of processes, including the estimation of genome characteristics such as genome size and heterozygosity, de novo assembly, polishing, and removal of allelic contigs, using six genomes with various heterozygosity levels. We evaluated five long-read-only assemblers (Canu, Flye, miniasm, NextDenovo and Redbean) and five hybrid assemblers that combine short and long reads (HASLR, MaSuRCA, Platanus-allee, SPAdes and WENGAN) and proposed a concrete guideline for the construction of haplotype representation according to the degree of heterozygosity, followed by polishing and purging haplotigs, using stable and high-performance assemblers: Redbean, Flye and MaSuRCA.
Collapse
Affiliation(s)
| | - Mika Sakamoto
- Genome Informatics Laboratory, National Institute of Genetics
| | | | - Takuro Nakayama
- Division of Life Sciences Center for Computational Sciences, University of Tsukuba, Japan
| | - Goro Tanifuji
- Department of Zoology, National Museum of Nature and Science
| | | | | |
Collapse
|
39
|
de Moraes RLR, de Menezes Cavalcante Sassi F, Vidal JAD, Goes CAG, dos Santos RZ, Stornioli JHF, Porto-Foresti F, Liehr T, Utsunomia R, de Bello Cioffi M. Chromosomal Rearrangements and Satellite DNAs: Extensive Chromosome Reshuffling and the Evolution of Neo-Sex Chromosomes in the Genus Pyrrhulina (Teleostei; Characiformes). Int J Mol Sci 2023; 24:13654. [PMID: 37686460 PMCID: PMC10563077 DOI: 10.3390/ijms241713654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 08/31/2023] [Accepted: 09/02/2023] [Indexed: 09/10/2023] Open
Abstract
Chromosomal rearrangements play a significant role in the evolution of fish genomes, being important forces in the rise of multiple sex chromosomes and in speciation events. Repetitive DNAs constitute a major component of the genome and are frequently found in heterochromatic regions, where satellite DNA sequences (satDNAs) usually represent their main components. In this work, we investigated the association of satDNAs with chromosome-shuffling events, as well as their potential relevance in both sex and karyotype evolution, using the well-known Pyrrhulina fish model. Pyrrhulina species have a conserved karyotype dominated by acrocentric chromosomes present in all examined species up to date. However, two species, namely P. marilynae and P. semifasciata, stand out for exhibiting unique traits that distinguish them from others in this group. The first shows a reduced diploid number (with 2n = 32), while the latter has a well-differentiated multiple X1X2Y sex chromosome system. In addition to isolating and characterizing the full collection of satDNAs (satellitomes) of both species, we also in situ mapped these sequences in the chromosomes of both species. Moreover, the satDNAs that displayed signals on the sex chromosomes of P. semifasciata were also mapped in some phylogenetically related species to estimate their potential accumulation on proto-sex chromosomes. Thus, a large collection of satDNAs for both species, with several classes being shared between them, was characterized for the first time. In addition, the possible involvement of these satellites in the karyotype evolution of P. marilynae and P. semifasciata, especially sex-chromosome formation and karyotype reduction in P. marilynae, could be shown.
Collapse
Affiliation(s)
- Renata Luiza Rosa de Moraes
- Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Carlos 13565-905, SP, Brazil; (R.L.R.d.M.); (F.d.M.C.S.); (J.A.D.V.)
- Institute of Human Genetics, University Hospital Jena, 07747 Jena, Germany
| | - Francisco de Menezes Cavalcante Sassi
- Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Carlos 13565-905, SP, Brazil; (R.L.R.d.M.); (F.d.M.C.S.); (J.A.D.V.)
- Institute of Human Genetics, University Hospital Jena, 07747 Jena, Germany
| | - Jhon Alex Dziechciarz Vidal
- Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Carlos 13565-905, SP, Brazil; (R.L.R.d.M.); (F.d.M.C.S.); (J.A.D.V.)
| | - Caio Augusto Gomes Goes
- Faculdade de Ciências, UNESP, Bauru 17033-36, SP, Brazil; (C.A.G.G.); (R.Z.d.S.); (F.P.-F.); (R.U.)
| | - Rodrigo Zeni dos Santos
- Faculdade de Ciências, UNESP, Bauru 17033-36, SP, Brazil; (C.A.G.G.); (R.Z.d.S.); (F.P.-F.); (R.U.)
| | - José Henrique Forte Stornioli
- Institute of Biological Sciences and Health, Universidade Federal Rural do Rio de Janeiro, Seropédica 23890-000, RJ, Brazil;
| | - Fábio Porto-Foresti
- Faculdade de Ciências, UNESP, Bauru 17033-36, SP, Brazil; (C.A.G.G.); (R.Z.d.S.); (F.P.-F.); (R.U.)
| | - Thomas Liehr
- Institute of Human Genetics, University Hospital Jena, 07747 Jena, Germany
| | - Ricardo Utsunomia
- Faculdade de Ciências, UNESP, Bauru 17033-36, SP, Brazil; (C.A.G.G.); (R.Z.d.S.); (F.P.-F.); (R.U.)
| | - Marcelo de Bello Cioffi
- Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Carlos 13565-905, SP, Brazil; (R.L.R.d.M.); (F.d.M.C.S.); (J.A.D.V.)
- Institute of Human Genetics, University Hospital Jena, 07747 Jena, Germany
| |
Collapse
|
40
|
Smirnov D, Konstantinovskiy N, Prokisch H. Integrative omics approaches to advance rare disease diagnostics. J Inherit Metab Dis 2023; 46:824-838. [PMID: 37553850 DOI: 10.1002/jimd.12663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 07/26/2023] [Accepted: 07/27/2023] [Indexed: 08/10/2023]
Abstract
Over the past decade high-throughput DNA sequencing approaches, namely whole exome and whole genome sequencing became a standard procedure in Mendelian disease diagnostics. Implementation of these technologies greatly facilitated diagnostics and shifted the analysis paradigm from variant identification to prioritisation and evaluation. The diagnostic rates vary widely depending on the cohort size, heterogeneity and disease and range from around 30% to 50% leaving the majority of patients undiagnosed. Advances in omics technologies and computational analysis provide an opportunity to increase these unfavourable rates by providing evidence for disease-causing variant validation and prioritisation. This review aims to provide an overview of the current application of several omics technologies including RNA-sequencing, proteomics, metabolomics and DNA-methylation profiling for diagnostics of rare genetic diseases in general and inborn errors of metabolism in particular.
Collapse
Affiliation(s)
- Dmitrii Smirnov
- School of Medicine, Institute of Human Genetics, Technical University of Munich, Munich, Germany
- Institute of Neurogenomics, Computational Health Center, Helmholtz Munich, Neuherberg, Germany
| | - Nikita Konstantinovskiy
- School of Medicine, Institute of Human Genetics, Technical University of Munich, Munich, Germany
| | - Holger Prokisch
- School of Medicine, Institute of Human Genetics, Technical University of Munich, Munich, Germany
- Institute of Neurogenomics, Computational Health Center, Helmholtz Munich, Neuherberg, Germany
| |
Collapse
|
41
|
Mak L, Meleshko D, Danko DC, Barakzai WN, Maharjan S, Belchikov N, Hajirasouliha I. Ariadne: synthetic long read deconvolution using assembly graphs. Genome Biol 2023; 24:197. [PMID: 37641111 PMCID: PMC10463629 DOI: 10.1186/s13059-023-03033-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 08/07/2023] [Indexed: 08/31/2023] Open
Abstract
Synthetic long read sequencing techniques such as UST's TELL-Seq and Loop Genomics' LoopSeq combine 3[Formula: see text] barcoding with standard short-read sequencing to expand the range of linkage resolution from hundreds to tens of thousands of base-pairs. However, the lack of a 1:1 correspondence between a long fragment and a 3[Formula: see text] unique molecular identifier confounds the assignment of linkage between short reads. We introduce Ariadne, a novel assembly graph-based synthetic long read deconvolution algorithm, that can be used to extract single-species read-clouds from synthetic long read datasets to improve the taxonomic classification and de novo assembly of complex populations, such as metagenomes.
Collapse
Affiliation(s)
- Lauren Mak
- Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, New York, USA
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, USA
| | - Dmitry Meleshko
- Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, New York, USA
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, USA
| | - David C. Danko
- Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, New York, USA
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, USA
| | | | - Salil Maharjan
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, USA
| | - Natan Belchikov
- Physiology, Biophysics & Systems Biology Program, Weill Cornell Medicine of Cornell University, New York, USA
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, USA
- Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine of Cornell University, New York, USA
| |
Collapse
|
42
|
Shiraishi Y, Koya J, Chiba K, Okada A, Arai Y, Saito Y, Shibata T, Kataoka K. Precise characterization of somatic complex structural variations from tumor/control paired long-read sequencing data with nanomonsv. Nucleic Acids Res 2023; 51:e74. [PMID: 37336583 PMCID: PMC10415145 DOI: 10.1093/nar/gkad526] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 05/23/2023] [Accepted: 06/07/2023] [Indexed: 06/21/2023] Open
Abstract
We present our novel software, nanomonsv, for detecting somatic structural variations (SVs) using tumor and matched control long-read sequencing data with a single-base resolution. The current version of nanomonsv includes two detection modules, Canonical SV module, and Single breakend SV module. Using tumor/control paired long-read sequencing data from three cancer and their matched lymphoblastoid lines, we demonstrate that Canonical SV module can identify somatic SVs that can be captured by short-read technologies with higher precision and recall than existing methods. In addition, we have developed a workflow to classify mobile element insertions while elucidating their in-depth properties, such as 5' truncations, internal inversions, as well as source sites for 3' transductions. Furthermore, Single breakend SV module enables the detection of complex SVs that can only be identified by long-reads, such as SVs involving highly-repetitive centromeric sequences, and LINE1- and virus-mediated rearrangements. In summary, our approaches applied to cancer long-read sequencing data can reveal various features of somatic SVs and will lead to a better understanding of mutational processes and functional consequences of somatic SVs.
Collapse
Affiliation(s)
- Yuichi Shiraishi
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Junji Koya
- Division of Molecular Oncology, National Cancer Center Research Institute, Tokyo, Japan
| | - Kenichi Chiba
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Ai Okada
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Yasuhito Arai
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
| | - Yuki Saito
- Division of Molecular Oncology, National Cancer Center Research Institute, Tokyo, Japan
- Department of Gastroenterology, Keio University School of Medicine, Tokyo, Japan
| | - Tatsuhiro Shibata
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
- Laboratory of Molecular Medicine, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Keisuke Kataoka
- Division of Molecular Oncology, National Cancer Center Research Institute, Tokyo, Japan
- Department of Hematology, Keio University School of Medicine, Tokyo, Japan
| |
Collapse
|
43
|
Wojcik MH, Reuter CM, Marwaha S, Mahmoud M, Duyzend MH, Barseghyan H, Yuan B, Boone PM, Groopman EE, Délot EC, Jain D, Sanchis-Juan A, Starita LM, Talkowski M, Montgomery SB, Bamshad MJ, Chong JX, Wheeler MT, Berger SI, O'Donnell-Luria A, Sedlazeck FJ, Miller DE. Beyond the exome: What's next in diagnostic testing for Mendelian conditions. Am J Hum Genet 2023; 110:1229-1248. [PMID: 37541186 PMCID: PMC10432150 DOI: 10.1016/j.ajhg.2023.06.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 06/13/2023] [Accepted: 06/14/2023] [Indexed: 08/06/2023] Open
Abstract
Despite advances in clinical genetic testing, including the introduction of exome sequencing (ES), more than 50% of individuals with a suspected Mendelian condition lack a precise molecular diagnosis. Clinical evaluation is increasingly undertaken by specialists outside of clinical genetics, often occurring in a tiered fashion and typically ending after ES. The current diagnostic rate reflects multiple factors, including technical limitations, incomplete understanding of variant pathogenicity, missing genotype-phenotype associations, complex gene-environment interactions, and reporting differences between clinical labs. Maintaining a clear understanding of the rapidly evolving landscape of diagnostic tests beyond ES, and their limitations, presents a challenge for non-genetics professionals. Newer tests, such as short-read genome or RNA sequencing, can be challenging to order, and emerging technologies, such as optical genome mapping and long-read DNA sequencing, are not available clinically. Furthermore, there is no clear guidance on the next best steps after inconclusive evaluation. Here, we review why a clinical genetic evaluation may be negative, discuss questions to be asked in this setting, and provide a framework for further investigation, including the advantages and disadvantages of new approaches that are nascent in the clinical sphere. We present a guide for the next best steps after inconclusive molecular testing based upon phenotype and prior evaluation, including when to consider referral to research consortia focused on elucidating the underlying cause of rare unsolved genetic disorders.
Collapse
Affiliation(s)
- Monica H Wojcik
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Division of Newborn Medicine, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Chloe M Reuter
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Shruti Marwaha
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Michael H Duyzend
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Hayk Barseghyan
- Center for Genetics Medicine Research, Children's National Research Institute, Children's National Hospital, Washington, DC 20010, USA; Department of Genomics and Precision Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, USA
| | - Bo Yuan
- Department of Molecular and Human Genetics and Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Philip M Boone
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Emily E Groopman
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Emmanuèle C Délot
- Department of Genomics and Precision Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, USA; Center for Genetics Medicine Research, Children's National Research and Innovation Campus, Washington, DC, USA; Department of Pediatrics, George Washington University, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, USA
| | - Deepti Jain
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA 98195, USA
| | - Alba Sanchis-Juan
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Lea M Starita
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Michael Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Stephen B Montgomery
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Michael J Bamshad
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA 98195, USA
| | - Jessica X Chong
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA; Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA 98195, USA
| | - Matthew T Wheeler
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Seth I Berger
- Center for Genetics Medicine Research and Rare Disease Institute, Children's National Hospital, Washington, DC 20010, USA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA; Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | - Danny E Miller
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA; Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA 98195, USA; Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
44
|
Chin CS, Behera S, Khalak A, Sedlazeck FJ, Sudmant PH, Wagner J, Zook JM. Multiscale analysis of pangenomes enables improved representation of genomic diversity for repetitive and clinically relevant genes. Nat Methods 2023; 20:1213-1221. [PMID: 37365340 PMCID: PMC10406601 DOI: 10.1038/s41592-023-01914-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 05/17/2023] [Indexed: 06/28/2023]
Abstract
Advancements in sequencing technologies and assembly methods enable the regular production of high-quality genome assemblies characterizing complex regions. However, challenges remain in efficiently interpreting variation at various scales, from smaller tandem repeats to megabase rearrangements, across many human genomes. We present a PanGenome Research Tool Kit (PGR-TK) enabling analyses of complex pangenome structural and haplotype variation at multiple scales. We apply the graph decomposition methods in PGR-TK to the class II major histocompatibility complex demonstrating the importance of the human pangenome for analyzing complicated regions. Moreover, we investigate the Y-chromosome genes, DAZ1/DAZ2/DAZ3/DAZ4, of which structural variants have been linked to male infertility, and X-chromosome genes OPN1LW and OPN1MW linked to eye disorders. We further showcase PGR-TK across 395 complex repetitive medically important genes. This highlights the power of PGR-TK to resolve complex variation in regions of the genome that were previously too complex to analyze.
Collapse
Affiliation(s)
- Chen-Shan Chin
- GeneDX, Stamford, CT, USA.
- Foundation of Biological Data Science, Belmont, CA, USA.
| | - Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Asif Khalak
- Foundation of Biological Data Science, Belmont, CA, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Peter H Sudmant
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| |
Collapse
|
45
|
Ryan NM, Corvin A. Investigating the dark-side of the genome: a barrier to human disease variant discovery? Biol Res 2023; 56:42. [PMID: 37468985 DOI: 10.1186/s40659-023-00455-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 07/11/2023] [Indexed: 07/21/2023] Open
Abstract
The human genome contains regions that cannot be adequately assembled or aligned using next generation short-read sequencing technologies. More than 2500 genes are known contain such 'dark' regions. In this study, we investigate the negative consequences of dark regions on gene discovery across a range of disease and study types, showing that dark regions are likely preventing researchers from identifying genetic variants relevant to human disease.
Collapse
Affiliation(s)
- Niamh M Ryan
- Neuropsychiatric Genetics Research Group, Department of Psychiatry, Trinity College Dublin, Dublin, Ireland.
| | - Aiden Corvin
- Neuropsychiatric Genetics Research Group, Department of Psychiatry, Trinity College Dublin, Dublin, Ireland
| |
Collapse
|
46
|
Gable SM, Mendez JM, Bushroe NA, Wilson A, Byars MI, Tollis M. The State of Squamate Genomics: Past, Present, and Future of Genome Research in the Most Speciose Terrestrial Vertebrate Order. Genes (Basel) 2023; 14:1387. [PMID: 37510292 PMCID: PMC10379679 DOI: 10.3390/genes14071387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 06/28/2023] [Accepted: 06/29/2023] [Indexed: 07/30/2023] Open
Abstract
Squamates include more than 11,000 extant species of lizards, snakes, and amphisbaenians, and display a dazzling diversity of phenotypes across their over 200-million-year evolutionary history on Earth. Here, we introduce and define squamates (Order Squamata) and review the history and promise of genomic investigations into the patterns and processes governing squamate evolution, given recent technological advances in DNA sequencing, genome assembly, and evolutionary analysis. We survey the most recently available whole genome assemblies for squamates, including the taxonomic distribution of available squamate genomes, and assess their quality metrics and usefulness for research. We then focus on disagreements in squamate phylogenetic inference, how methods of high-throughput phylogenomics affect these inferences, and demonstrate the promise of whole genomes to settle or sustain persistent phylogenetic arguments for squamates. We review the role transposable elements play in vertebrate evolution, methods of transposable element annotation and analysis, and further demonstrate that through the understanding of the diversity, abundance, and activity of transposable elements in squamate genomes, squamates can be an ideal model for the evolution of genome size and structure in vertebrates. We discuss how squamate genomes can contribute to other areas of biological research such as venom systems, studies of phenotypic evolution, and sex determination. Because they represent more than 30% of the living species of amniote, squamates deserve a genome consortium on par with recent efforts for other amniotes (i.e., mammals and birds) that aim to sequence most of the extant families in a clade.
Collapse
Affiliation(s)
- Simone M Gable
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Jasmine M Mendez
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Nicholas A Bushroe
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Adam Wilson
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Michael I Byars
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Marc Tollis
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| |
Collapse
|
47
|
Kumar S, Gerstein M. Unified views on variant impact across many diseases. Trends Genet 2023; 39:442-450. [PMID: 36858880 PMCID: PMC10192142 DOI: 10.1016/j.tig.2023.02.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 02/02/2023] [Accepted: 02/02/2023] [Indexed: 03/03/2023]
Abstract
Genomic studies of human disorders are often performed by distinct research communities (i.e., focused on rare diseases, common diseases, or cancer). Despite underlying differences in the mechanistic origin of different disease categories, these studies share the goal of identifying causal genomic events that are critical for the clinical manifestation of the disease phenotype. Moreover, these studies face common challenges, including understanding the complex genetic architecture of the disease, deciphering the impact of variants on multiple scales, and interpreting noncoding mutations. Here, we highlight these challenges in depth and argue that properly addressing them will require a more unified vocabulary and approach across disease communities. Toward this goal, we present a unified perspective on relating variant impact to various genomic disorders.
Collapse
Affiliation(s)
- Sushant Kumar
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada; Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA; Department of Computer Science, Yale University, New Haven, CT 06520, USA; Department of Statistics & Data Science, Yale University, New Haven, CT 06520, USA.
| |
Collapse
|
48
|
Olson ND, Wagner J, Dwarshuis N, Miga KH, Sedlazeck FJ, Salit M, Zook JM. Variant calling and benchmarking in an era of complete human genome sequences. Nat Rev Genet 2023:10.1038/s41576-023-00590-0. [PMID: 37059810 DOI: 10.1038/s41576-023-00590-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2023] [Indexed: 04/16/2023]
Abstract
Genetic variant calling from DNA sequencing has enabled understanding of germline variation in hundreds of thousands of humans. Sequencing technologies and variant-calling methods have advanced rapidly, routinely providing reliable variant calls in most of the human genome. We describe how advances in long reads, deep learning, de novo assembly and pangenomes have expanded access to variant calls in increasingly challenging, repetitive genomic regions, including medically relevant regions, and how new benchmark sets and benchmarking methods illuminate their strengths and limitations. Finally, we explore the possible future of more complete characterization of human genome variation in light of the recent completion of a telomere-to-telomere human genome reference assembly and human pangenomes, and we consider the innovations needed to benchmark their newly accessible repetitive regions and complex variants.
Collapse
Affiliation(s)
- Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan Dwarshuis
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, USA
| | | | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
| |
Collapse
|
49
|
Kolmogorov M, Billingsley KJ, Mastoras M, Meredith M, Monlong J, Lorig-Roach R, Asri M, Jerez PA, Malik L, Dewan R, Reed X, Genner RM, Daida K, Behera S, Shafin K, Pesout T, Prabakaran J, Carnevali P, Yang J, Rhie A, Scholz SW, Traynor BJ, Miga KH, Jain M, Timp W, Phillippy AM, Chaisson M, Sedlazeck FJ, Blauwendraat C, Paten B. Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.12.523790. [PMID: 36711673 PMCID: PMC9882142 DOI: 10.1101/2023.01.12.523790] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Long-read sequencing technologies substantially overcome the limitations of short-reads but to date have not been considered as feasible replacement at scale due to a combination of being too expensive, not scalable enough, or too error-prone. Here, we develop an efficient and scalable wet lab and computational protocol for Oxford Nanopore Technologies (ONT) long-read sequencing that seeks to provide a genuine alternative to short-reads for large-scale genomics projects. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the NIH Center for Alzheimer's and Related Dementias (CARD). Using a single PromethION flow cell, we can detect SNPs with F1-score better than Illumina short-read sequencing. Small indel calling remains to be difficult inside homopolymers and tandem repeats, but is comparable to Illumina calls elsewhere. Further, we can discover structural variants with F1-score comparable to state-of the-art methods involving Pacific Biosciences HiFi sequencing and trio information (but at a lower cost and greater throughput). Using ONT based phasing, we can then combine and phase small and structural variants at megabase scales. Our protocol also produces highly accurate, haplotype-specific methylation calls. Overall, this makes large-scale long-read sequencing projects feasible; the protocol is currently being used to sequence thousands of brain-based genomes as a part of the NIH CARD initiative. We provide the protocol and software as open-source integrated pipelines for generating phased variant calls and assemblies.
Collapse
Affiliation(s)
- Mikhail Kolmogorov
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, USA
| | - Kimberley J. Billingsley
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Mira Mastoras
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | - Mobin Asri
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Pilar Alvarez Jerez
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Laksh Malik
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Ramita Dewan
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Xylena Reed
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Rylee M. Genner
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Kensuke Daida
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Kishwar Shafin
- Google LLC, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Jeshuwin Prabakaran
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, USA
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, USA
| | | | | | - Jianzhi Yang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sonja W. Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, National Institutes of Health, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Bryan J. Traynor
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Northeastern University, Boston, MA, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mark Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, USA
| | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, Texas, USA
| | - Cornelis Blauwendraat
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | | |
Collapse
|
50
|
Ma H, Zhong C, Chen D, He H, Yang F. cnnLSV: detecting structural variants by encoding long-read alignment information and convolutional neural network. BMC Bioinformatics 2023; 24:119. [PMID: 36977976 PMCID: PMC10045035 DOI: 10.1186/s12859-023-05243-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 03/21/2023] [Indexed: 03/30/2023] Open
Abstract
BACKGROUND Genomic structural variant detection is a significant and challenging issue in genome analysis. The existing long-read based structural variant detection methods still have space for improvement in detecting multi-type structural variants. RESULTS In this paper, we propose a method called cnnLSV to obtain detection results with higher quality by eliminating false positives in the detection results merged from the callsets of existing methods. We design an encoding strategy for four types of structural variants to represent long-read alignment information around structural variants into images, input the images into a constructed convolutional neural network to train a filter model, and load the trained model to remove the false positives to improve the detection performance. We also eliminate mislabeled training samples in the training model phase by using principal component analysis algorithm and unsupervised clustering algorithm k-means. Experimental results on both simulated and real datasets show that our proposed method outperforms existing methods overall in detecting insertions, deletions, inversions, and duplications. The program of cnnLSV is available at https://github.com/mhuidong/cnnLSV . CONCLUSIONS The proposed cnnLSV can detect structural variants by using long-read alignment information and convolutional neural network to achieve overall higher performance, and effectively eliminate incorrectly labeled samples by using the principal component analysis and k-means algorithms in training model stage.
Collapse
Affiliation(s)
- Huidong Ma
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004, China
- Key Laboratory of Parallel, Distributed and Intelligent Computing of Guangxi Universities and Colleges, Guangxi University, Nanning, 530004, China
| | - Cheng Zhong
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004, China.
- Key Laboratory of Parallel, Distributed and Intelligent Computing of Guangxi Universities and Colleges, Guangxi University, Nanning, 530004, China.
| | - Danyang Chen
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004, China
- Key Laboratory of Parallel, Distributed and Intelligent Computing of Guangxi Universities and Colleges, Guangxi University, Nanning, 530004, China
| | - Haofa He
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004, China
- Key Laboratory of Parallel, Distributed and Intelligent Computing of Guangxi Universities and Colleges, Guangxi University, Nanning, 530004, China
| | - Feng Yang
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004, China
- Key Laboratory of Parallel, Distributed and Intelligent Computing of Guangxi Universities and Colleges, Guangxi University, Nanning, 530004, China
| |
Collapse
|